Calculus and Differential Geometry: An Introduction to Curvature
Donna Dietz Howard Iseri
Department of Mathematics and Computer Information Science, Mansfield University, Mansfield, PA 16933 E-mail address: [email protected]
Contents
Chapter 1. Angles and Curvature 1 1. Rotation 1 2. Angles 3 3. Rotation 4 4. Definition of Curvature 6 5. Impulse Curvature 8
Chapter 2. Solid Angles and Gauss Curvature 11 1. Total curvature for cone points 11 2. Total curvature for smooth surfaces 13 3. Gauss curvature and impulse curvature 14 4. Gauss-Bonnet Theorem (Exact exerpt from Creative Visualization handout. 15 5. Defining Gauss curvature 16 6. Intrinsic aspects of the Gauss curvature 19
Chapter 3. Intrinsic Curvature 21 1. Parallel vectors 21
Chapter 4. Functions 25 1. Introduction 25 2. Piecewise-Linear Approximations for Functions of One Variable 25 3. Uniform Continuity 27 4. Differentiation in One Variable 29 5. Derivatives and PL Approximations 33 6. Parametrizations of Curves 35 7. Functions of Two Variables 37 8. Differentiability for Functions of Two Variables 37
Chapter 5. The Riemannian Curvature Tensor in Two Dimensions 47 1. Parametrizations 48
Chapter 6. Riemannian Curvature Tensor 53 1. The Riemannian Metric for a Plane 53 2. The Riemannian Metric for Curved Surfaces 56 3. Curvature 60 4. The Inverse of the Metric 62
Chapter 7. Riemannian Curvature Tensor 63 1. Intrinsic Interpretations 63
3 4 CONTENTS
Chapter 8. Curvature of 3-Dimensional Spaces 69 1. What we know 69 2. What is the geometry like around a vertex of a cubed 3-manifold? 69 3. A positive curvature example 69 CHAPTER 1
Angles and Curvature
0.1. Overview. As you walk around a closed path (along a simple closed curve on the floor), the direction you are facing will make a net rotation of 2π radians or 360◦.
1. Rotation Imagine a circle drawn on the floor (the radius might be ten feet). You are to walk around the circle once in a counter-clockwise direction. If you are initially facing north, you will soon be facing north-west and then west. We can naturally ◦ π say that the direction in which you are facing has changed by 90 or 2 radians. After that, you will face south, then east, and finally north again. The direction in which you are facing has experienced a rotation of 360◦. We will want to think of this rotation as describing how the direction you are facing has changed as opposed to your change in location as you make an orbit around the circle. For a curve in the plane, we can talk about the rotation of a tangent vector in the same way that we have talked about the rotation of our body as we walk along a curve drawn on the floor. Intuitively at least, we would like to identify these two concepts. That is, what we discover about one should apply equally to the other. Throughout this book, we will use the convention that counter-clockwise rota- tions are positive. For example, if you were to turn 45◦ to the left and then 90◦ to the right, the net rotation would be −45◦.
A
B
C
Figure 1. Walk along this path marked on the floor. (Exercise 1)
1 2 1. ANGLES AND CURVATURE
1.1. Exercises. 1. Suppose you are walking around the curve shown in Figure 1 in a counter- clockwise direction. Assume that the curve is smooth (the direction varies smoothly) and that the direction you are facing is the same as that of a tangent vector. How does the direction you face change as you move from the starting point A to the point B? From B to C? From A to C? What is your total (net) rotation for the entire circuit?
Figure 2. Walk along this path marked on the floor. (Exercise 2)
2. What would your total rotation be as you walked in the direction indicated around the path shown in Figure 2?
Figure 3. Walk along this path marked on the floor. (Exercise 3)
3. What would your total rotation be as you walked in the direction indicated around the path shown in Figure 3? 4. Make a conjecture about the net rotation of a tangent vector moving around a simple closed curve in the plane in a counter-clockwise direction. 5. Make a conjecture about the net rotation of a normal vector moving around a simple closed curve in the plane in a counter-clockwise direction. Does it make a difference whether the normal vector is pointing outward or inwards? Are there other directions that a normal vector can point? 1.2. Overview. Angles are abrupt changes in direction. Total curvature is the net change in direction over some section of a curve or polygonal path. 2. ANGLES 3
2. Angles One of the most important theorems in Euclidean geometry states that the sum of the angles of a triangle is 180◦. Virtually all of the theorems that involve angle measure or parallelism can be proved with this fact. Among these would be that the angle sum of a quadrilateral is 360◦, the angle sum of a pentagon is 540◦, the angle sum of a hexagon is 720◦, and in general, Theorem 1. The angle sum of a (convex) n-gon is (n − 2) · 180◦
95◦
95◦
100◦
70◦
Figure 4. The turning angles for a quadrilateral.
This is all very nice, but the sequence of theorems just mentioned can be restated more simply and intuitively in terms of the turning angle or angle defect. The reason for using the term turning angle should become clear, and angle defect refers to the idea that the turning angle measures how far the angle is from being a straight angle. In Figure 4, a quadrilateral is shown with the turning angles marked. You should imagine yourself walking around the quadrilateral in a counter-clockwise direction. The turning angles then measure the amount you must turn to your left as you start the next edge. In this case, the sum of the turning angles is 360◦. If you imagine yourself walking around any closed path, taking left turns, and coming back to your original position, you must have rotated a full 360◦. This should agree completely with your answers to the exercises in the previous section. It seems reasonable, therefore, that the sum of the turning angles is 360◦ for any polygon. This is in fact true, and Theorem 1 can be restated as Theorem 2. The turning angle sum of a (convex) n-gon is 360◦. It is not necessarily true that Theorem 2 is a better theorem than Theorem 1, but it is certainly simpler and more intuitive. The angle sum theorem is probably more convenient for analyzing geometric figures, but we are wanting to understand curvature, and the turning angle sum theorem sets us off in the right direction. 2.1. Exercises. 6. Theorem 1 states that the angle sum of an n-gon is (n − 2)180◦ or n − 2 times the angle sum of a triangle. Draw a figure illustrating that a convex pentagon has the angle sum of three triangles. Do the same for a hexagon. 4 1. ANGLES AND CURVATURE
7. Suppose the quadrilateral of Figure 4 is drawn on the floor with up in the picture corresponding to north, and you are to walk around it in the counter- clockwise direction. Draw a picture of the face of a compass, and for one of the sides, draw the position of the needle corresponding to the direction you are facing as you walk along it. On the same picture, draw the needle positions corresponding to the other three sides. At each vertex, you would need to pivot as you finish walking along one side of the quadrilateral and start on the next. In your picure, for each vertex, indicate which directions you sweep through as you turn to the left.
3. Rotation Our goal is to formulate definitions in differential geometry. Before we do that for curves in the plane, let us summarize what we have so far. Given an object moving in a counter-clockwise direction around a simple closed curve, a vector tangent to the curve and associated with the object must make a “full” rotation of 2π radians or 360◦. In other words, if we were to think of this tangent vector (of if you wish, a copy of it) as having its tail fixed at the origin, then as the object moves around the curve, the tangent vector will sweep through all possible directions. This rotation of the tangent vector will be predominantly in the counter-clockwise direction, but it may, for example, sweep clockwise for a bit, come back counter-clockwise an equal amount, and then continue on. These clockwise rotations are always countered by an extra counter-clockwise rotation, and the total net result is always 360◦ of counter-clockwise rotation. If the curve is smooth (whatever that means), we can easily describe a tangent vector in terms of a derivative. There are some difficulties at non-smooth parts of a curve. At the corners of a quadrilateral, for example, a derivative will not specify a unique tangent direction. In this case at least, we will be able to find a tangent direction entering the vertex and one leaving. We can and will account for the directions swept through as we pivot from one direction to the other, and we will avoid curves that are “less smooth” than this. In order to motivate the definitions describing rotations in terms of derivatives, we will consider the following. Looking at the unit tangent vector as we move around a vertex of a polygonal path, we see that the direction of the tangent vector stays the same, pivots through some angle θ at the vertex, and then again remains the same until another vertex is encountered. An example of this is illustrated in Figure 5, and in this picture the angle θ will be positive. Later, we will be interested in understanding curvature in higher dimensions, and it will be more convenient to speak in terms of a unit normal vector rather than a unit tangent. For a curve in the plane (we will assume that polygonal paths are curves) a unit normal to a curve will experience the same changes in direction that a unit tangent will. The unit normal to the same curve shown in Figure 5 will also sweep through the same angle θ, as shown in Figure 6. As described earlier, the rotation is a measure of how the direction of the unit tangent or unit normal vectors changes. If we take the unit normal at each point of the curve, and put its tail at the origin, the head of the vector will stay on the unit circle and serve as a “direction-o-meter,” as shown in Figure 7. As we move along the curve, the will stay fixed until we reach the vertex, and then it will swing over to the left as we pass through the vertex. Formally, it is common to associate each point on the curve with a point on the unit circle determined by the unit normal in this way. It is called the Gauss map, 3. ROTATION 5
θ
Figure 5. Following the unit tangent vectors around a vertex with turning angle θ.
θ
Figure 6. Following the unit normal vectors around a vertex with turning angle θ. and this will be something that we will be able to differentiate in a meaningful way. In order to perform this differentiation, we need to consider a situation where the
θ
Figure 7. The unit normal vectors moved to the origin. direction of the normal vector changes over some interval, and not all at once. If we take the polygonal curve we have been using and “smooth” it out, the change in direction is spread out over the curve, as shown in Figure 8. In particular, note that the total change in direction is the same, the positive angle θ, it is only that the direction-o-meter swings to the left more gradually. The total rotation, which we will call the total curvature, is a quantitity that applies to both polygonal curves and smooth ones. With the smooth curves, how- ever, we can also talk about the rate of rotation (it does makes some sense to say that the rate of rotation at the vertex of the polygonal curve is infinite). There are two quantities that are natural candidates to which we will compare the rotation, 6 1. ANGLES AND CURVATURE
θ
Figure 8. Following the unit normal vectors around a vertex with turning angle θ. time and distance. It makes sense to say that if a given amount of rotation takes place over a very short distance, then the curve must be very sharply curved, and conversely, the curve is not so sharply curved, if the the rotation takes place over a longer distance. Therefore, if we simply divide, total rotation (1) average rate of rotation = distance we get a reasonable measure of how much a curve curves. We will call this average rate of rotation, average curvature. The next logical step is to take a limit as the distance approaches zero, and this suggests a definition for curvature. If ρ is a quantity measuring rotation, and s is an arclength parameter, then the curvature κ should be defined dρ (2) κ = ds In the next section, we will express this definition more formally as a formula. In particular, we need to find a function corresponding to ρ. This formula will correspond exactly to the one given in calculus classes. Before we do that, however, we can check a simple case. The circumference of a circle of radius r is 2πr. The total rotation is 2π radians. Radians are more natural in this context than are degrees, but degrees would work OK. If we assume that the curvature of a circle is constant, then the curvature should be same as the average curvature. The curvature must be, therefore, 2π 1 (3) κ = = , 2πr r and this agrees with the calculus definition of curvature. The point of this book is to show that the definitions for the curvature of surfaces and of three-dimensional spaces can be motivated in an analogous way.
4. Definition of Curvature We are in search of a function that measures the angle of rotation for the unit normal vector, or equivalently, the unit tangent vector. In terms of the Gauss map, the head of the unit normal vector always lies on the unit circle. Therefore, the derivative of the unit normal vector must always be tangent to the unit circle. This is a manifestation of the fact that the derivative of a vector function that has constant magnitude is always perpendicular to the original vector function. Two notions point the way. First, over small distances, the arc of a circle near a point 4. DEFINITION OF CURVATURE 7 on the circle and the tangent line through that point are very similar. Second, the length of an arc of the unit circle is equal to the corresponding angle measured in radians. Therefore, a derivative of the unit normal vector measures change along a tangent to the unit circle (as in the Gauss map), this change is essentially the same as the change along the unit circle, which is equal to a change in the direction of the normal vector measured in radians. In other words, the conclusions of the last section suggest that the curvature can be defined as the derivative of the unit normal vector with respect to arclength. It can also be defined as the derivative of the unit tangent vector with respect to arclength. That is, dn dT (4) κ(s)= = . ds ds
4.1. Exercises. 8. Show that the derivative of the unit normal vector is perpendicular to unit normal vector. Use the fact that the unit normal vector has constant magnitude, i.e., knk = 1, and that the “dot product rule” looks like the product rule from d h i D dy E dx calculus, i.e., ds x, y = x, ds + ds , y . 9. The derivitive of the unit normal vector expresses a rate of change along a tangent to the Gauss map circle. I claimed that this rate of change could be interpreted as a rate of change of direction in terms of an angle measured in radians. Show that this interpretation is a reasonable one by showing that for the quantities dτ dτ dρ shown in Figure 9 dθ (0) = dρ (0) = dθ (0) = 1. Hint: Don’t work too hard. Just use trigonometry to express each as a function of one of the others.
τ ρ
1
θ
Figure 9. Comparing the quantities measured along the tangent line, along the unit circle, and the angle at the center of the circle.
10. Suppose an object is moving along a curve, and at a point P on the curve, dn the derivative of the unit normal vector with respect to time is dt =[3, 2 ] (this is a velocity for the head of the vector n in terms of the Gauss map in feet per second perhaps). We could say that the unit normal vector is rotating at a certain rate measured in radians per second. What is this rate? Suppose that the velocity ds of the object as it passes through P is v =[6, 4 ]. What is its speed (a.k.a. dt )? What is the curvature of the curve at the point P ? Hint: You can use the chain rule, if you want.
11. The vector function x(t)= t, t2 is a parametrization of a parabola. Find the curvature of the curve at the points corresponding to t = 0 and t =2. 8 1. ANGLES AND CURVATURE
5. Impulse Curvature We can define curvature for smooth curves, but this definition will not work for curves with sharp corners in them. The notion of total curvature applies to both cases, however. We can develop a notion of curvature that works for corners that we will call impulse curvature. Let us look closely first at a total curvature function, and how total curvature and (instantaneous) curvature are related through derivatives and integrals. Consider the smooth curve of Figure 8. As we move along the curve from right to left, the unit normal vector makes an angle with the “positive x-axis” π of 6 radians (in this particular example). In Figure 10, the initial point of the graph corresponds to this value of ρ. The curve in Figure 8 is straight initially, so the direction of the unit normal is constant, and this manifests itself in the horizontal section of the graph in Figure 10. As we move into the curved section of the curve, the unit normal vector begins to rotate counter-clockwise, so the angle ρ increases, and this is reflected in the graph. The angle reaches a maximum as we move into the final straight portion of the curve, and ρ is again constant, now at a value of 5π about 6 radians. We can think of ρ(s) as being a total curvature function, and 5π − π 4π difference between the starting and finishing values, θ = 6 6 = 6 represents the total curvature of this section of the curve.
ρ
π
θ
s Figure 10. Graph of the direction of the unit normal vector for the smooth curve of Figure 8 with respect to arclength.
If we graph the total curvature function ρ for the polygonal curve of Figure 8, the initial and final values for ρ are the same, but the increase in the value of ρ occurs at a single point on the curve. The graph for ρ is a step function in this case, but the total curvature is still the difference between the initial and final values of ρ. This total curvature function ρ keeps track of the direction of the unit normal vector, and total curvature is the net change in this function. As concluded before, dρ curvature is the derivative of this function, ds . Therefore, we can see curvature in the graphs of Figures 10 and 11 as slopes. The slopes are zero on either end of the graph of Figure 10, so the curvature of the corresponding parts of the smooth curve of Figure 8 must also be zero. This agrees with the fact that the ends of this curve are straight. The slopes are positive in the middle of the graph, and this corresponds with the fact that the middle section of the smooth curve has positive curvature (positive, since the unit normal vector is rotating counter-clockwise). On the other hand, the polygonal curve of Figure 8 is straight everywhere except at the vertex. Therefore, the slopes in the graph of Figure 11 are zero everywhere except 5. IMPULSE CURVATURE 9
ρ
π
θ
s Figure 11. Graph of the direction of the unit normal vector for the polygonal curve of Figure 8 with respect to arclength. at the point in the middle. Here the slope and the curvature at the vertex are both infinite. The curvature graphs are shown in Figures 12 and 13.
dρ ds = κ
s Figure 12. Graph of the derivative of ρ with respect to s for the smooth curve of Figure 8.
dρ ds = κ
s Figure 13. Graph of the derivative of ρ with respect to s for the polygonal curve of Figure 8.
Of particular interest is the fact that it is possible to assign a finite value to the infinite curvature at the vertex of the polygonal curve. More specifically, we will think of the curvature at this point as being infinite, but if we were to integrate across this infinite function value, we would obtain a definite finite value, namely the total curvature. The total curvature at the vertex of the polygonal curve of 4π ∞∗ 4π Figure 8 is 6 radians, so we will say that the curvature at this point is 6 .We will call this impulse curvature, and the notation will simply remind us that when 4π we integrate across such a value, the result will be 6 . For example, for any interval [a, b] containing the undefined point for the function in Figure 13, we would have b dρ 4π (5) Z ds = . a ds 6 10 1. ANGLES AND CURVATURE
5.1. Exercises. 12. Consider a square with sides of length 1. Choose the midpoint of one of the sides as a starting point and consider on object moving around the square in a counter-clockwise direction. Let κ(s) be the curvature function with respect to 1 arclength from this starting point. Compute each of the following. κ(0). κ 2 . 1 4 R0 κ(s) ds. R0 κ(s) ds. 13. Consider the graph y = | sin(x)|. What is the curvature at the point (0, 0)? CHAPTER 2
Solid Angles and Gauss Curvature
1. Total curvature for cone points The goal here is to generalize our notions of curvature to surfaces. This can be done in a number of ways, but our intention will be to eventually end up with an intuitive understanding of the Gauss curvature. In the previous chapter, the curvature of a curve was obtained by extending the notion of the turning angle for the vertex of a polygonal curve. This suggests, perhaps, that we first consider the vertex of a polyhedral surface. If we imagine the vertex of a pyramid, the 3-dimensional region interior to that vertex can be compared to the 2-dimensional region inside an angle. This solid angle parallels the notion of a (plane) angle in a number of ways. It was the turning angle, however, that became the total curvature. One way of measuring this was to consider the length of the arc on the unit circle that all of the possible unit normal vectors swept out at the vertex in terms of the Gauss map. These normals were perpendicular to the tangent lines through the vertex that were outside of the angle. For the vertex of the pyramid, we could consider all tangent planes through the vertex outside of the pyramid and the unit vectors normal to these planes. Under the Gauss map (now to the unit sphere instead of the unit circle), the heads of these normals would sweep out a region on the unit sphere. The area of this region would be a natural candidate for the total curvature (and also the impulse curvature) at this vertex. This works amazingly well, but it is a bit simpler to look at a cone. We can make a cone out of a piece of paper by removing a wedge and taping the edges together. Let us suppose that we remove a wedge with angle θ as in Figure 1. A circle of radius R is shown in Figure 1, and θ radians have been removed from the circle as well (an arc of length Rθ). After joining the edges, we get something like the cone in Figure 2. The circle, as a circle on the cone, still has radius R, and the radius is measured to the vertex of the cone. As a curve in space, it also has a smaller radius r. The angle between the radius of length r and the surface of the cone is marked φ, and the angle between the central axis and the normals to the surface of the cone is also φ. The angle between the central axis and the surface of the cone is marked ψ. We are interested in computing the area of the region on the unit sphere cor- responding to the normal vectors at the vertex. The normal vectors to the surface of the cone determine a circle on the unit sphere under the Gauss map as shown in Figure 2. This circle separates the sphere into two pieces, and we are interested in the area of the upper one. To find φ, consider the circle of radius R (shown in Figure 1). After removing the wedge and joining the edges, this circle becomes a circle on the cone. It has
11 12 2. SOLID ANGLES AND GAUSS CURVATURE
R θ
Figure 1. Remove a θ-wedge to construct a cone.
φ
ψ R
φ r
Figure 2. A cone with total curvature θ.
radius R along the surface, and it has radius r in R3. We can, therefore, compute its circumference two ways. We have C =2πr, as a circle in space, and C =(2π −θ)R, as a circle on the cone having been formed by removing a θ radian wedge. The right triangle shown in Figure 2 with hypotenuse R and base r has an angle φ,so
r 2π − θ (6) cos φ = = . R 2π
Equation (6) determines φ in terms of the angle θ. We can determine the area swept out by the normals at the vertex under the Gauss map in terms of φ easily using φ¯, θ¯, andρ ¯ as spherical coordinates for the sphere. Here, 0 ≤ φ¯ ≤ φ,0≤ θ¯ ≤ 2π, 2. TOTAL CURVATURE FOR SMOOTH SURFACES 13 and ρ = 1. The desired area is then 2π φ 2π φ Z Z ρ¯2 sin φd¯ φd¯ θ¯ = Z Z sin φd¯ φd¯ θ¯ 0 0 0 0 (7) 2π = Z 1 − cos φdθ¯ 0 =2π(1 − cos φ). The total curvature at the vertex is therefore 2π(1 − cos φ). Quite remarkable is the fact that this total curvature is precisely θ, the measure of the wedge removed to form the cone. This can be seen by solving equation (6) for θ. Definition 1. The impulse curvature, or total curvature, at the vertex of a cone is the area swept out by the unit normal vectors at the vertex under the Gauss map. Theorem 3. The total curvature at the vertex of a cone is equal to the angle of the wedge removed to construct it. 1.1. Exercises. 14. Show that the results are the same if we used a pyramid instead of a cone. Note that there are only four different unit normals obtained from the lateral surfaces of the pyramid. The rest of the boundary of the region on the unit sphere come from those tangent planes that contain an edge leading into the vertex. The normals would be those unit vectors perpendicular to the edge between the normals for the faces. The normal vectors at the vertex are normal to planes through the vertex that lie outside of the pyramid. 15. What is the total curvature of any region of the cone not containing the vertex? (Note: a curve on the surface of the unit sphere has no area.)
2. Total curvature for smooth surfaces If v is the vertex of a cone, then all of the area on the unit sphere under the Gauss map comes from unit normal vectors at the vertex. If we were to smooth the vertex, as in Figure 3, then these unit normals will be spread out over the smooth surface, and there will only be one unit normal at each point of the surface, but the area under the Gauss map would be the same, since we have precisely the same collection of unit normals. Therefore, smoothing the vertex should not change the total curvature, and the geometry of the surface near the circle shown is exactly the same as the geometry on the cone. With this notion of total curvature for surfaces described intuitively, we can define an instantaneous curvature for smooth surfaces, generally known as the Gauss curvature, as we did with smooth curves. 2.1. Exercises. 16. What is total curvature of a sphere? A cube? A tetrahedron? 17. It is possible to find a triangle on the unit sphere that has one vertex at the north pole, two vertices on the equator, and three right angles. What is the area of this triangle? What is the total curvature of the region inside of this triangle? π Find a triangle with two right angles and one angle measuring 4 radians. What is the total curvature of the region inside of it? Do you see a relationship between the total curvature within and the angle sums of these two triangles? 14 2. SOLID ANGLES AND GAUSS CURVATURE
r
Figure 3. We have the same total curvature as the cone in Figure 2, θ, if we smooth the vertex of the cone.
3. Gauss curvature and impulse curvature The total curvature of a curve was defined as the length of an arc of the unit circle under the Gauss map. This was an extension of the idea of a turning angle to curves. The measure of the turning angle, as interpreted through the Gauss map, can be applied to the vertex of a cone by considering the area a region of the unit sphere under the (spherical) Gauss map. This idea extends to smooth surfaces in the same way as the turning angle extends to smooth curves. We obtained an instantaneous curvature for curves by taking a limit comparing the length along the unit circle with the corresponding length along the curve. We can do the same thing here by comparing the area on the unit sphere with the corresponding area on the surface. This is the notion of curvature of surfaces used by Gauss, and it is called the Gauss curvature. Definition 2. At a point p on a surface S, the Gauss curvature at p is the limit ∆Θ (8) K = lim , ∆A→0 ∆A where ∆A is the area of some region on the surface containing p and ∆Θ is the total curvature of that region. If we think of the measure of an angle in terms the possible directions of the unit normal vector at a vertex (the turning angle), and then extend this into the curvature of the curve, then this is the most natural notion of curvature for surfaces, since it is a direct translation of the relevant notions in terms of curves to surfaces. For computational purposes, this is not the most convenient formula, but this is probably one of the more intuitive ways to think about what Gauss curvature is.
3.1. Exercises. 4. GAUSS-BONNET THEOREM (EXACT EXERPT FROM CREATIVE VISUALIZATION HANDOUT.15
18. Find the total curvature of a sphere with radius r. What is the Gauss curvature?
4. Gauss-Bonnet Theorem (Exact exerpt from Creative Visualization handout. I do not address the Gauss-Bonnet theorem in any of the labs, but after the students have completed the last lab, I would look at the cone point version of the Gauss-Bonnet theorem. From here, the definition for Gauss curvature on a smooth surface should make sense intuitively.
r θ
C
Figure 4. The angle defect corresponds to total curvature.
The basic idea can be seen using circles and spheres. Consider a circle of radius r centered at the cone point of a cone with angle defect θ, as in Figure 4. In the 1 plane, this circle will have curvature κ = r . Since the local geometry on the cone is Euclidean away from the cone point, the geodesic curvature for this circle as a 1 curve on the cone must be the same. That is, κg = r . What is different about this circle and a circle in the plane with the same radius, is that the circle on the cone has a smaller circumference. In fact, the difference must be θr. We can now compute the total geodesic curvature. Z 1 Z 1 (9) κg ds = ds = (2πr − θr)=2π − θ. C r C r Since curvature measures the rate of rotation of the tangent vector, it should make sense to students that the total rotation for a simple closed curve in the plane must always be 2π. Since any small deformation of the circle essentially takes place in the plane, it should also make sense that the total rotation for a simple closed curve around the cone point will always be 2π minus the angle defect. In any case, the formulation of the Gauss-Bonnet theorem should seem natural. Comparing Equation (9) to the Gauss-Bonnet theorem, Z Z (10) κg ds =2π − K dA, C R it’s obvious that the angle defect corresponds with the total curvature R KdA.In fact, I think it makes perfect sense to motivate the definition of the Gauss curvature K in terms of this formula. I might start out by doing the following. Consider a sphere tangent to a cone, as shown in Figure 5. The geodesic curvature for the circle of tangency will be the same on both surfaces. Therefore, the total curvature for the regions contained by the circle on both surfaces should be the same. We can then require that the Gauss curvature be an infinitesimal 16 2. SOLID ANGLES AND GAUSS CURVATURE
r
C φ R
Figure 5. The circle of tangency will have the same geodesic cur- vature on both surfaces. version of the total curvature and that it be constant on the sphere. That is,
(11) θ = Z KdA= K Z dA = KR2θ, D D and 1 (12) K = . R2 I think the actual computation is a bit tricky, but there may be a simpler way. In any case, the area integral is 2π φ (13) Z dA = Z Z R2 sin p dpdt = R2(1 − cos φ)2π, D 0 0 where the parameters p and t are the phi and theta from spherical coordinates. To express this expression in terms of θ, note that the circumference of the circle C is 2πr−θr on the cone. If the radius of this circle in space is ρ, then this circumference is also 2πρ. Since R sin φ = ρ, we have that (14) 2πr − θr =2πR sin φ, and R (15) θ =2π(1 − sin φ). r r Now, tan φ = R ,so cos φ (16) θ =2π(1 − sin φ)=2π(1 − cos φ). sin φ Equations (13) and (16) establish equation (11).
5. Defining Gauss curvature The Gauss curvature at a point on a surface is generally defined to be the product of the two principle curvatures. Very roughly, this can be described as follows. At a point on a surface in space, we can choose one of two possible unit vectors normal to the surface (one normal is as good as the other). For every plane that contains the point and the normal vector, the intersection of the plane and the surface is a curve that has a curvature within that plane. If the curve bends towards the normal vector, we will associate a positive sign with this curvature, and if the curve bends away from the normal vector, we will associate a negative sign. In other words, if the normal vector chosen points upwards and the curve is concave up, then the curvature will be positive. These signed curvatures are 5. DEFINING GAUSS CURVATURE 17 called normal curvatures. The maximum normal curvature (most positive) and the minimum normal curvature (most negative) are the principle curvature. The choice of normal vector and the which curvatures are positive is quite arbitrary, but of significance is that the Gauss curvature of a bowl-shaped surface will always be positive, and the Gauss curvature of a saddle-shaped surface will always be negative, regardless of how the choices were made. That this is as simple a definition for the curvature of a surface as could be expected is one thing, and that it works incredibly well is made very clear in any book on the subject. What is not so clear is why anyone would consider the definition in the first place and what it really represents. What we will do here is to show that this definition is a rigorous implementation of the definition we have already described and how the previous definition leads to this one. A lot of insight into what Gauss curvature is can be obtained by examining the connection between the intuitive definition given earlier and the one involving the principle curvatures. We will start with the intuitive definition of the Gauss curvature at a point. This was expressed in Equation (8). The biggest problem with this formula is that it does not say how ∆A goes to zero. Different values for K can be obtained, if there are no restrictions. We will want to choose the most boring limit possible. Sufficient for our purposes, we can take a small sphere in space centered at the point P . Each point on the surface contained in the sphere (this region has area ∆A) has a normal vector, and thus an image under the Gauss map. These Gauss map images will determine a region on the unit sphere having a well-defined area (if the surface is sufficiently smooth, which we will always assume is the case), and this area is ∆Θ. We can then take the limit as the radius of the sphere about P goes to zero. If the surface is sufficiently smooth, this limit should exist, and we will assume that all surfaces under consideration are sufficiently smooth, unless otherwise noted. As it stands, this definition is non-trivial to apply directly, so we will formulate an alternative in terms of derivatives. For one of the small regions on the surface about P contained in the small sphere, the region should be roughly disk shaped, and we can imagine it as consisting of a bouquet of radial arcs. The normal vector at P will determine one point on the unit sphere under the Gauss map. The normal vectors from the points on the radial arcs will determine arcs on the unit sphere also under the Gauss map. Of relevance is the fact that the length of the arc on the unit sphere under the Gauss map divided by the length of the radial arc on the surface will limit on the curvature of the radial arc at P . Also of relevance is the observation that the area ∆Θ is determined by the extent of these arcs. It would seem reasonable to assume, therefore, that the limit of Equation (8) will depend only on the curvatures of arcs through P . The one important assumption that we will make in the formulation of the alternative definition of the Gauss curvature is that it depends only on information provided by first and second derivatives. Suppose we have a point P on a surface in space, and we will define the Gauss curvature of the surface at P . The curvature is independent of the surface’s position and orientation in space, so we will assume that the point P is at the origin and the surface is tangent to the xy-plane. In a region about P , we will assume that the surface can be described as the graph of a function f(x, y), and since the curvature depends only on first and second derivatives, we will only consider surfaces that ensure that f has continuous first and second derivatives (i.e., f is C2). Since we 18 2. SOLID ANGLES AND GAUSS CURVATURE will only use information from the first and second derivatives at P , we can also assume that f is quadratic, f(0, 0) = 0, fx(0, 0) = 0, and fy(0, 0) = 0. Therefore, f must take the form (17) f(x, y)=ax2 + bxy + cy2. It will be convenient to use vector notation and terminology, so we will work with the parametrization
(18) x(x, y)= x, y, ax2 + bxy + cy2 .
dx dx The first (partial) derivatives, dx =[1, 0, 2ax + by ] and dy =[0, 1,bx+2cy ], are vectors tangent to the surface, and at each point, these two vectors span a plane tangent to the surface at that point. All vectors tangent to the surface at this point dx dx will lie in this plane. That dx (0, 0)=[1, 0, 0 ] and dy (0, 0)=[0, 1, 0 ] reiterate the fact that the surface is tangent to the xy-plane. The unit normal vector at each point of the surface must, essentially by def- inition, be perpendicular to the tangent plane. It must be perpendicular to both tangent vectors, and so can be obtained from the cross product. [ −2ax − by, −bx − 2cy, 1] (19) n = pb2x2 +4bcxy +4c2y2 +4a2x2 +4abxy + b2y2 +1 We are interested in how much the unit normal vector varies over a small piece of the surface about the origin, and then how it shrinks to zero. The unit vector n ranges over a region on the unit sphere, and what we want is essentially a derivative of n over two dimensions. The appropriate object is a linear function associated with a tangent plane. In particular the plane determined by the partial derivatives of n. These partial derivatives are a bit messy, but we only need to know them at (0, 0). The partial with respect to x is
dn pb2x2 +4bcxy +4c2y2 +4a2x2 +4abxy + b2y2 +1[−2a, −b, 0] (20) = 2 dx pb2x2 +4bcxy +4c2y2 +4a2x2 +4abxy + b2y2 +1 [ −2ax − by, −bx − 2cy, 1](1 )(2b2x +4bcy +8a2x +4aby) − 2 3 , pb2x2 +4bcxy +4c2y2 +4a2x2 +4abxy + b2y2 +1 which at (0, 0) is dn (21) (0, 0) = [ −2a, −b, 0]. dx Similarly, dn (22) (0, 0) = [ −b, −2c, 0]. dy
dn dn The partial derivatives dx and dy describe the linear approximation to how the unit vector n varies near the origin. For a short distance in the x-direction, therefore, the unit normal vector moves approximately a distance [ −2a, −b, 0 ], and for the same distance in the y-direction, it moves approximately [ −b, −2c, 0 ]. This completely determines the linear approximation, so an -square on the xy-plane corresponds to a “parallelogram” on the unit sphere under the Gauss map spanned 6. INTRINSIC ASPECTS OF THE GAUSS CURVATURE 19 by these vectors. The area of this parallelogram is given by the cross product ijk −2a −b (23) −2a −b 0 = −b −2c −b −2c 0
This determinant describes how areas under the Gauss map compare to areas in the domain near (0, 0), and so this should define the Gauss curvature. Note that the matrix −2a −b (24) −b −2c completely describes the linear approximation to the normal vector at (0, 0). As a point passes through the origin, this matrix describes how the corresponding normal vector is changing at the origin. For example, if we move in the direction of [ 1, 0] from the origin, then the direction of the unit normal to the surface is changing at the following (vector) rate. −2a −b 1 −2a (25) = −b −2c 0 −b This is almost, but not quite, a curvature. Specifically, if we considered the curve on the surface above the x-axis, we would have a parabola, z = −2ax, and this curve corresponds to the direction determined by the vector [ 1, 0 ]. The curvature for this curve is 2a at the origin, but this comes from the rotation of the normal to the curve in the xz-plane. The normal to the surface may rotate in the y-direction as well, as indicated by the component −b. 5.1. Exercises. . dx 19 Determine the magnitude of the tangent vector dx (x, 0), and then differ- entiate with respect to x to verify the claim made above. 20. Verify the claims above that the curvature at the origin of the curve above dT or below the x-axis has curvature 2a at the origin by computing ds where T = dx dx (x,0) dx . k dx (x,0)k
6. Intrinsic aspects of the Gauss curvature In the discussion leading to the definition of the Gauss curvature, we stumbled across a surprising relationship. If we remove a θ-wedge to form a cone, then the total curvature of the vertex of that cone is also θ. Imagine that you are a 2- dimensional person living on the surface of the cone, who is completely unaware of a third dimension. Without a concept of a third dimension, the sharpness of the vertex would be completely outside of your experiences, just as concepts requiring a fourth dimension lie outside of our 3-dimensional minds. You would perhaps notice that circles around the vertex have smaller circumferences than circles that did not contain the vertex. From this you might be able to see that the vertex of the cone only had radian measure 2π − θ around it, while there are 2π radians around every other point. We will say that the sharpness of the cone is extrinsic (seen from the outside), and the fact that a θ-wedge is missing is intrinsic (seen from the inside). This illustrates an interesting difference between curves and surfaces. The total curvature of a curve is purely extrinsic, since a 1-dimensional person living in a curve 20 2. SOLID ANGLES AND GAUSS CURVATURE would be totally unaware of it. The total curvature of a surface is both extrinsic and intrinsic. It is equally measurable from outside the surface and from within it. This idea, which originates with Descartes and Gauss, is expoited by Riemann and others, in particular Einstein, to show that while curvature is basically an extrinsic concept, it is possible to talk about the curvature of our space without there being more dimensions. We can illustrate some aspects of this by looking at the geometry of geodesics on a cone. CHAPTER 3
Intrinsic Curvature
1. Parallel vectors Understanding what it means for two vectors in the plane to be parallel is hardly an issue. It is even difficult to explain the concept, since the concept of parallel vectors seems so obvious. Imagine taking a vector in the plane based at the origin. If you were to move it to some other point without altering its direction, then few would argue with the claim that the result is a vector parallel to the original. At issue, however, is what it means for the direction to remain the same and how you would know. For vectors tangent to a sphere, on the other hand, it is impossible in most cases to move the vector in a way that keeps the vector tangent to the sphere and not change its direction. Here the concept of direction is taken from the direction of a vector in Euclidean 3-space, which most of us would think is intuitively clear. If we were to restrict our attention to the surface of the sphere, and make no reference to an ambient space the issue is much less clear. If we were two-dimensional creatures living on the sphere with no awareness of an ambient space, we probably would have some notion of moving an object without rotating it. This must also not be consistent with the notion of direction in 3-space mentioned above. One possible basis for such a notion is the concept of parallel transport. Consider the three vectors shown in Figure 1. The angle between each vector and the straight line is the same angle θ. This is consistent with our intuitive notion that all three vectors are parallel. We can phrase this as a trivial axiom: If we move a vector along a straight line and keep the angle between the vector and line constant, then the resulting vector is parallel to the original.
θ
θ
θ
Figure 1. Moving a vector without changing its direction.
21 22 3. INTRINSIC CURVATURE
If you were a 2-dimensional creature on the sphere, then a great circle would be the object for you corresponding to a straight line. This would be a curve that turns neither to the right nor left. In other words, it does not change direction (as far as you are concerned). If you were to move a vector along a great circle at a constant angle, then you must conclude that the vector did not rotate, and the resultant vector is therefore parallel to the original. If this sphere sat in a 3-dimensional Euclidean space (and there is no real reason to assume that it did), then a 3-dimensional Euclidean creature would see this differently. One of the fundamental notions of the study of manifolds is that the 3-dimensional Euclidean view is not necessarily the correct one. It is simply one of many. One very important aspect of this notion of parallel vectors on the sphere is that it is dependent on path. We can see this in the following example. Figure 2 shows parts of four great circles. One is the equator, two meet the equator at right angles, and a fourth intersects one of the vertical great circles at a right angle. This forms a quadrilateral with three right angles. We know that the angle sum of this quadrilateral must be greater than 2π, so the fourth angle must be larger than a right angle. In fact, if this sphere has radius 1, then the difference between this angle and a right angle must be equal to the area of the quadrilateral. The quadrilateral is shown in a flattened version to give a different view in Figure 3. We will perform a parallel transport from vertex A to vertex C two ways. First from A to B to C, and then from A to D to C. Suppose the vector under question is tangent to side AD at A and points towards D. It is perpendicular to side AB, so as we parallel transport it to B, it maintain this right angle. This results in a vector tangent to BC at B. Parallel transport along BC entails maintaining a zero angle, and so the resultant vector at vertex C will still be tangent to BC. On the other hand, if we parallel transport to vertex D first, we get a vector tangent to AD at vertex D. This is perpendicular to side DC, so this right angle is maintained as we slide it upwards to vertex C. The result is a vector at C that is perpendicular to DC. We have, therefore, two vectors at C that have equal claim to being parallel to the original vector at A. With a Euclidean bias, we might conclude that this contradiction proves that parallel transport is a flawed concept. From a more enlightened manifold view, however, we would just say that parallel transport is independent of path in Euclidean spaces.
Figure 2. A quadrilateral on the sphere. 1. PARALLEL VECTORS 23
B θ C
AD
Figure 3. The quadrilateral flattened.
Even from a fundamentalist Euclidean point of view, the notion of parallel transport has some value. A few simple calculations show that the angle marked θ in Figure 3 is equal to the difference between the angle sum of the quadrilateral and 2π. This is equal to the total curvature contained within the quadrilateral. This provides a way of computing total curvature, and is basically equivalent to using the angle sum, the turning angles, or the total rotation of a tangent vector. We are headed towards a way of computing (actually defining) curvature us- ing derivatives both extrinsically and intrisically. We will be imposing coordinate systems on surfaces, which very roughly, means imposing a grid system (ala graph paper) on the surface. In other words, we will be breaking the surface into tiny quadrilaterals, and the two parallel transported vectors just mentioned will have natural intepretations corresponding to second and third derivatives.
CHAPTER 4
Functions
1. Introduction The point of this book is to gain some understanding of the geometry of space, in particular, a space that we could live in. With that in mind, we will want to assume that the functions we use to describe these spaces are basically well- behaved. This chapter will explain what we mean by that. In order to gain an understanding of the curvature of three-dimensional space, we will first explore the curvature of lower-dimensional spaces. This approach should seem reasonable in that the lower-dimensional spaces are simpler, but that they also require concepts that generalize to the three-dimensional case. A less obvious class of relevant spaces will also provide us with significant insight into the geometry of all the spaces just mentioned. These are spaces with isolated singularities. These singularities will include sharp corners in graphs and cone points on surfaces. The functions we consider, therefore, will have these kinds of characteristics. One basic principle that we will try to exploit is that linear functions are easier to understand than functions in general, and that straight lines are easier to understand than general curves. Furthermore, finite and discrete objects are easier to understand than are infinite and continuous ones. The general aim of this book is to use what we know about the easier to understand objects to gain some insight in the harder to understand ones. This chapter takes this approach to the study of functions. The graph of a function of two variables is shown in Figure 1. This graph was produced by a software package called Maple, and we can simplistically describe the process that Maple used as follows. A 25 × 25 grid is imposed on the [ −0.1, 0.1]× [ −0.1, 0.1 ] portion of the domain, and function values are computed at each of the lattice points. This provides Maple with the coordinates for 676 points on the surface (A 25 × 25 array of squares makes a 26 × 26 array of lattice points). For each line segment in the grid, Maple draws a line segment between the appropriate points on the surface, essentially projecting the grid lines onto the surface. Looking at the surface in Figure 1, it appears that the Maple graph is a good representation of the surface. The fact that we are looking at a set of straight line segments is not overly obtrusive, and it is easy to accept that this is a representation of a nicely curving surface. It is not inconceivable, therefore, that the line segments themselves contain useful information about the underlying surface.
2. Piecewise-Linear Approximations for Functions of One Variable We are not interested in studying functions in general, but how we can use functions to describe and understand geometry. We will take a very contrained view of piecewise-linear approximations, therefore. While we will use functions
25 26 4. FUNCTIONS
Figure 1. The graph of a function of two variables z = f(x, y). that are defined on the entire real line, at any one time, we will generally only be interested in that function over some closed interval. If we were going to graph a function, for example, we would typically only graph part of it. So we will define a piecewise-linear approximation for real-valued functions over closed intervals.
Definition 3. Let f :[a, b ] → R. For some positive integer n, we divide [ a, b ] into n equal subintervals, [ x1,x2 ] , [ x2,x3 ] ,...,[ xn,xn+1 ]. This collection of subintervals will be called a partition, the interval [ xi,xi+1 ] will be called the i- − b−a th subinterval, and the common length of the subintervals ∆x = xi+1 xi = n will be called the mesh. For each subinterval, the line segment from (xi,f(xi)) to (xi+1,f(xi+1)) will be called the i-th segment. The piecewise-linear (PL) approximation of f is the function f :[a, b ] → R whose graph coincides with the collection of all n segments. Example 1. Consider the function f(x)=x2. The PL approximation of f with n = 4 has lattice {−2, −1, 0, 1, 2 } and mesh ∆x = 1. The four segments and the graph of f are shown in Figure 2. We generally will not make specific use of a formula for f, but it is possible to come up with one. In this case, for example, we have
−3x − 2 for x ∈ [ −2, −1], −x for x ∈ [ −1, 0], (26) f(x)= x for x ∈ [0, 1], 3x − 2 for x ∈ [1, 2]. 3. UNIFORM CONTINUITY 27
Figure 2. The PL approximation of f(x)=x2 with a mesh of 1.
The PL approximation of f with n = 10 gives a more accurate looking graph, as shown in Figure 3.
Figure 3. The PL approximation of f(x)=x2 with a mesh of 0.4.
As can be seen in Example 1, increasing n (or equivalently decreasing ∆x) makes f more closely resemble f. We will want to make the assumption that the difference between the two functions can be made arbitrarily small. This is not necessarily the case, as can be seen in the next example, but we will be able to make assuptions of this type by restricting our attention to sufficiently nice functions as will be explored throughout this chapter. Example 2. As can be seen in Figure 4, the PL approximation experiences 1 some difficulty in following the graph of f(x)=x sin x near x = 0, even with n = 100. There are an infinite number of oscillations in the graph in any interval about x = 0, so there is no way that a finite number of segments can portray this to any great degree of satisfaction. As mentioned, we will seek to avoid functions with characteristics such as these.
3. Uniform Continuity As we go through this chapter, we will be laying out the conditions we expect our functions to satisfy. Our underlying goal is to build an understanding of smooth geometry, so at the very least, we might expect our functions to be continuous. That we will be making use of PL approximations also speaks to the need for the 28 4. FUNCTIONS
Figure 4. 1 The graph of x sin x . assumption of continuity. We will make important use of non-continuous functions, but the discontinuities will be isolated and simple in nature. We will be making occasional reference to the definition of continuity, so let us state it here. Definition 4. For a function f :[a, b ] → Rn and an x ∈ [ a, b ], f is con- tinuous at x if for every >0, there is a δ>0 such that whenever |∆x| <δ(and x +∆x ∈ [ a, b ]), (27) | f(x +∆x) − f(x) | <. For n>1, we will take | f(x + dx) − f(x) | to mean the magnitude of this difference as vectors. If this definition is satisfied at each x ∈ [ a, b ], we will say that f is continuous on [ a, b ]. Note that if f is continuous on an interval, this definition allows a different δ for each x. This will be more than a little inconvenient for us, so we would like a slightly stronger notion of continuity. This will be the following. Definition 5. For a function f :[a, b ] → Rn, f is said to be uniformly continuous on [ a, b ] if for every >0, there is a δ>0 such that whenever |∆x| <δ(and x +∆x ∈ [ a, b ]), (28) | f(x +∆x) − f(x) | <. The main point of this definition is that the same δ works for any x. From our experience in calculus, we are familiar with a wide range of continuous functions and a few non-continuous ones. We may not, however, be as comfortable with determining which functions are uniformly continuous. It turns out that this will 4. DIFFERENTIATION IN ONE VARIABLE 29 not be a concern, since we will almost exclusively be interested in functions over some closed interval. This is a result of the following theorem about which more details can be found in a book on topology or real analysis. Theorem 4. Let f : A ⊂ Rm → Rn be a continuous function. If A is closed and bounded, then f is uniformly continuous. Uniform continuity is actually less than we desire. The function of Example 2 is continuous everywhere, and therefore by Theorem 4, is uniformly continuous over any closed interval about x = 0. So while no graph can capture the oscillations present near x = 0, since the magnitudes of the oscillations become very small, the graph can stay close to the sampling points. Requiring uniform continuity, therefore, will not exclude all of the functions we would want to exclude. It should be emphasized that we want uniform continuity for use in proofs, not necessarily to exclude bad functions.
4. Differentiation in One Variable All of our work with differentiation we extend the basic notion of the derivative studied in calculus. We will begin with the definition. Definition 6. For the function f :[a, b ] → R, and for any x ∈ ( a, b ), let f 0(x) be defined by f(x +∆x) − f(x) (29) f 0(x) = lim , ∆x→0 ∆x if the limit exists. For the endpoints a and b, the derivative is defined by f(a +∆x) − f(a) (30) f 0(a) = lim , ∆x→0+ ∆x f(b +∆x) − f(b) (31) and f 0(b) = lim , ∆x→0− ∆x where the first involves a limit from the right and the second a limit from the left. These will sometimes be specifically referred to as right- and left-sided derivatives. At each x for which the derivative exists (including a and b), we will say that f is differentiable at x.Iff is differentiable at every point of [ a, b ],we will say that f is differentiable on [ a, b ]. At a particular value of x, the number f 0(x) is typically associated with the slope of a tangent line. Let us look a bit at what that means. If the limit in (29) exists, then for any >0, there is a δ>0 such that as long as | ∆x | <δ,
0 f(x +∆x) − f(x) (32) f (x) − <. ∆x
In other words, for any epsilon, there is an interval (−δ, δ) such that (33) | f 0(x)dx + f(x) − f(x +∆x) | <∆x. for any ∆x in this interval. This tells us that the linear function t(∆x)=f 0(x)∆x+ f(x) is a reasonable approximation to the function F (∆x)=f(x +∆x) for a fixed value of x. Since is arbitrary, no other linear function will fit as well. If f is differentiable at x, therefore, then there is a unique tangent line that fits the curve better than any other line. Increasing or decreasing the slope, as in Figure 5 results in a line that does not fit as well, so intuitively, we see a certain amount 30 4. FUNCTIONS of symmetry. If the limit in the derivative definition does not exist, then a line
Figure 5. Lines with slopes different from the derivative do not fit as well. cannot be singled out as fitting better than the rest. In Figure 6, we see a point of non-differentiability where a single line cannot fit the curve as we are accustomed to seeing in a tangent line on both sides of the point, and the most “symmetric” line does not fit the curve very well at all. We do see lines that look tangent on one side of the non-differentiable point or the other, so the function may be differentiable from the right or left at this point.
Figure 6. At a non-differentiable point, there is no single best linear approximation, and no single line fits very well.
The tangent line, if it exists, is closely associated with a linear function. Since it is the slope of this function that is most important to us, we will often talk about this linear function in terms of a coordinate system whose origin is at the point (x, f(x)). Definition 7. If f is differentiable at the point x, then the differential of f at x is the linear function (34) df (dx)=f 0(x)dx. 4. DIFFERENTIATION IN ONE VARIABLE 31
Note that the variable names in this coordinate system are dx and dy and that the expression f 0(x) is a constant. Note that for the differential, the origin for the dxdy-coordinate need not be thought of as being at the point (x, f(x)) as shown in Figure 7. Compare this with the notion of putting the base of a tangent vector at a relevant point on the graph. In fact, the differential (as well as the tangent line) can be identified with the collection of all possible tangent vectors. As a result, where a vector has both a direction and a magnitude, the differential has only direction. We will use the differential, therefore, as a way of generalizing slope to higher dimensions. dy df
dx
Figure 7. We can think of the origin of the dxdy-coordinate sys- tem as being based at the relevant point on the curve, but we don’t have to.
Example 3. Consider the function f(x)=x2 + 1. Its derivative is f 0(x)=2x, and f 0(0) = 0. The slope of the tangent line at the point (0, 1) is therefore 0, and the equation of the tangent line is t(x)=0x + 1 (Note that the differential at this point is df =0dx). For = .1 in (33), the graph of f must lie between the lines y = ±.1x + 1 over some interval about x = 0. This is shown in Figure 8. No matter how small we make , there will be some inteval about x = 0 in which the parabola lies between the lines y = x + 1. This is a geometric description of our concept of a tangent line.
Figure 8. For the function f(x)=x2 +1, the line y = 1 is tangent to the curve at x = 0. Here the graph of f lies between the lines y = ±.1x + 1 over some interval close to x =0. 32 4. FUNCTIONS
Example 4. Differentiable functions allow graph behavior that lie beyond what we would like to consider. For example, note that the function g(x)=−x2 + 1 has the same tangent line at x = 0 as the function f just mentioned. It follows that any function that lies between f and g must also have the same tangent line. Consider the following function h defined below and graphed in Figure 9 with f, g, and t.
2 1 (x sin 2 ,x=06 , (35) h(x)= x 0,x=0. Clearly h lies between any pair of lines y = x + 1 over some interval, since both
Figure 9. The graph of h lies between the graphs of f and g,so it has the same tangent at x =0. f and g do. It appears in the graph depicted in Figure 9, however, that the slopes of tangents to h near x = 0 can have high-magnitude slopes. This is confirmed by a computation of the derivative. The derivative of h away from zero can be found using the basic techniques of calculus, so the derivative of h must be
1 2 1 (2x sin 2 − cos 2 ,x=06 , (36) h0(x)= x x x 0,x=0, and it is clear that h0(x) takes large values arbitraily close to x = 0. The slopes of the tangent lines to h vary so wildly that h0 is not even continuous at x = 0. The point of this book is to study the relationship between curvature and the geometry of curves and surfaces and to understand what it might mean for the universe in which we live to have curvature. As a result, we are most interested in objects that curve very gently. As this last example illustrates, differentiability alone does not guarantee the gentle curving we desire. The oscillations that are seen in the graph of Figure 9 are not really the problem. The problem is that the oscillations become more wild as we approach x = 0, and this is sufficient to make the derivative of f 0 not continuous. This particular example can be eliminated, of course, if we only considered functions with continuous derivatives. As we have just seen in Example 4, a function can be differentiable with a non-continuous derivative. We do not want to consider functions that are this wild, so we will require that our functions have continuous derivatives unless specifically noted otherwise. Such functions are called continuously differentiable or C1. 5. DERIVATIVES AND PL APPROXIMATIONS 33
Definition 8. For a function f :[a, b ] → R,iff is differentiable on [ a, b ] and f 0 is continuous on [ a, b ], then f is said to be continuously differentiable on [ a, b ]. We will also use the notation C1 for continuously differentiable functions. If the second derivative is also continuous (more specifically, if f 0 is continuously differentiable), then f is C2. Similarly, we may speak of functions that are Cn for any positive integer n, or even C∞ (f and all of its derivatives are continuously differentiable). The notation C0 is sometimes used to describe continuous functions. It is dangerous to place too much weight on what a differentiable or a contin- uously differentiable function might look like, but in general, we can think of a C1 function as looking smoother than a function that was differentiable, but not C1. A C2 function would look smoother still, but the differences become much more subtle as we consider higher levels of continuous differentiability. For a function f to be differentiable at x, we consider the slopes of secant lines. We can imagine ourselves at the point (x, f(x)) seeing an object approaching us along the graph. The expression f(x + dx) − f(x) (37) dx describes the observed direction we look in to see the object. For f to be differen- tiable, we would expect this direction to have a limit, and this limit would agree with the direction for an object approaching from the opposite direction. The limit would correspond to the directions determined by the tangent to the curve. If the object were a car with its headlights on, the direction the headlights pointed in would correspond to the tangent line at the point of the graph occupied by the car. These directions correspond to the values f 0(x + dx). For the function h described above, the direction of the headlights would swing wildly back and forth between directions perpendicular to the tangent. If f is continuously differentiable, then these values must limit on f 0(x). In other words, the direction the headlights point must approach the direction of the tangent line, and they would always be pointing in your general direction.
5. Derivatives and PL Approximations Given a PL approximation of a function f, the segments are each a portion of a secant line. At each individual lattice point, the slope of the secants through (xi,f(xi)) and (xi +∆x, f(xi +∆x)) limits on the derivative as ∆x → 0. We would expect, therefore, that for very small values of the mesh, the slopes of the segments will very closely approximate the derivatives at the lattice points. We can establish this easily with reference to the Mean Value Theorem, which we state here. Mean Value Theorem. If f is continuous on [ a, b ] and differentiable on ( a, b ), then there is a point c ∈ ( a, b ) such that f(b) − f(a) (38) f 0(c)= . b − a Let f be a continuously differentiable function on [ a, b ], and let f be the PL-approximation of f with mesh ∆x and lattice points { x1,x2,...,xn+1 }.On any particular segment, the Mean Value Theorem states that there must be a ci ∈ ( xi,xi+1 ) such that f(x ) − f(x ) (39) f 0(c )= i+1 i . i ∆x 34 4. FUNCTIONS
Since the function f 0 is continuous, it is uniformly continuous, so given any >0, 0 0 there is a δ>0 such that if ∆x<δ, then | f (ci) − f (xi) | <for all i. It follows that, for all i,
0 f(xi+1) − f(xi) (40) f (xi) − <. ∆x
We can conclude, therefore, that the slopes of the segments of f are good approxi- mations of the derivatives of f at the lattice points, and that the error can be made arbitrarily small by reducing the mesh.
Definition 9. We will define the PL differential of f at xi (and, if it is convenient to have done so, at any point in [ xi,xi+1 ))tobe
(41) Df(xi)=f(xi+1) − f(xi).
Df(xi)
∆x
Figure 10. The vector Df.
Note that
Df(xi) 0 (42) lim = f (xi), ∆x→0 ∆x b−a →∞ where ∆x = n for positive integers n, and n . If the function f were constant, its graph would be a horizontal straight line, and the PL differential would be 0. As shown in Figure 10, the PL differential measures the increase in f (or the decrease) as we move from one lattice point to the next. We can talk about the function f 00 in a similar way. If f is C2, then for each 0 i, there is a ci ∈ ( xi,xi+1 ) such that f 0(x ) − f 0(x ) (43) f 00(c0 )= i+1 i . i ∆x Since f 00 is continuous, there is a δ0 > 0 smaller than the δ mentioned above such that if ∆x<δ0, 00 0 00 (44) | f (ci) − f (xi) | <. Since
0 f(xi+1) − f(xi) (45) f (xi) − <, ∆x and
0 f(xi+2) − f(xi+1) (46) f (xi+1) − <, ∆x
6. PARAMETRIZATIONS OF CURVES 35 we can conclude that
f(xi+1)−f(xi) − f(xi+2)−f(xi+1) 00 ∆x ∆x (47) f (xi) − ∆x
00 Df(xi+1) − Df(xi) = f (xi) − < 3. ∆x2
With this in mind, we will make the following definition. Definition 10. The second PL differential of f is defined as 2 (48) D f(xi)=Df(xi+1) − Df(xi). Note that 2 D f(xi) 00 (49) lim = f (xi) ∆x→0 ∆x2 2 00 It should be noted that D f(xi) is probably a better approximation of f (xi+1) 00 than it is of f (xi), but this formulation will be more convenient for us. In Figure
B Df(xi) Df(xi+1)
−Df(xi)
A 2 Figure 11. The distance D f(xi) is equal to the sum of the dis- tances Df(xi+1) and −Df(xi).
13, if the graph of f were a straight line, then we would expect Df to be constant, and Df(xi) and Df(xi+1) would be the same. In this case, the graph of f would continue to the point marked A. Instead, the graph of f proceeds to the point 2 marked B. The difference between A and B is the quantity D f(xi). Therefore, D2f and f 00 measure how much the graph is not a straight line, and so they are measures of curvature in some way. They measure the deviation from straightness in the vertical direction, however. These values will change if the graph is rotated, for example, so they are not convenient quantities to use to describe a curve’s shape. They are easy to compute, and they contain the information necessary to describe a curve’s curvature, and they will be of use to us.
6. Parametrizations of Curves
(xi+1,f(xi+1)) Df(xi)
(xi,f(xi))
Figure 12. The vector Df. 36 4. FUNCTIONS
Df(xi) Df(xi+1)
2 D f(x1) Df(xi)
Figure 13. The vector D2f. 8. DIFFERENTIABILITY FOR FUNCTIONS OF TWO VARIABLES 37
7. Functions of Two Variables Earlier in the chapter, we discussed briefly the graph of a function of two variables (see Figure 1). The computer representation of the graph consists of a collection of line segments. These segments will play a role in our study of functions of two variables as the segments of a PL approximation did with functions of one variable. The segments form a grid on the surface breaking the surface into quadrilaterals that we will call grid parallelograms. In general, these grid parallograms are not true parallograms, which, among other things, always lie in a plane. In fact, while it would seem natural to use the segments directly to define a PL approximation to a surface, the four vertices of a grid parallelogram generally will not lie in a plane. Since any set of three points is always coplanar, we can, in some sense, fold each grid parallelogram along a diagonal to fit the segments. We can, therefore, approximate the graph of a function of two variables with a collection of flat triangular disks. From this we can naturally find a piecewise linear function. Definition 11. For the function f :[a, b ] × [ c, d ] → R, we can define the piecewise-linear (PL) approximation as follows. Given positive integers m and n, we can break [ a, b ] into m equal subintervals with lattice points { x1,x2,...,xm+1 }, and we can break [ c, d ] into n equal subintervals with lattice points { y1,y2,...,yn+1 }. From these, we can divide the rectangle [ a, b ] × [ c, d ] into mn equal rectangles [ xi,xi+1 ]×[ yj,yj+1 ] with width ∆x and height ∆y. We will say that the mesh is ∆x × ∆y. The set of rectangles is called the partition, and the points (xi,yj) are the lattice points. The (i, j)-th rectangle has vertices (xi,yj), (xi+1,yj ), (xi+1,yj+1), and (xi,yj+1). There is a flat (planar) triangular disk with vertices (xi,yj ,f(xi,yj )), (xi+1,yj ,f(xi+1,yj)), and (xi,yj+1),f(xi,yj+1), and there is an- other flat triangular disk with vertices (xi+1,yj,f(xi+1,yj )), (xi+1,yj+1,f(xi+1,yj+1)), and (xi,yj+1),f(xi,yj+1). Together these form the (i, j)-th grid parallelogram. The PL approximation of f is the function f :[a, b ] × [ c, d ] → R whose graph consists of all of the grid parallelograms. In practice, we will use only the grid segments from the PL approximation, and the pair of triangular disks that make up each grid parallelogram along with the diagonal between them will be of only secondary importance. Using the no- tation | (x1,y1) − (x2,y2) | for the distance between the two points, we can define continuity for a function of two variables as follows. Definition 12. For f :[a, b ] × [ c, d ] → R, f is continuous at (x, y) if for every >0, there is a δ>0 such that whenever | (x, y) − (x +∆x, y +∆y) | <δ, we have (50) | f(x +∆x, y +∆y) − f(x, y) | <. If the existence of δ is independent of the point (x, y), then f is said to be uniformly continuous. By Theorem 4 we see that f is uniformly continuous if it is continuous on [ a, b ] × [ c, d ].
8. Differentiability for Functions of Two Variables Our notion of differentiability for functions of more than one variable will be based on the concept of a partial derivative. Given a function f in several variables, f : R3 → R for example, we can take the derivative of f with respect to one of the variables by holding the others constant. 38 4. FUNCTIONS
Consider the function f(x, y)=3xy + x4. Taking y to be constant, we can differentiate with respect to x to obtain the expression 3y +2x. We will use the notations df (51) = f = f (x, y)=3y +4x3 dx x x for the partial derivative with respect to x. It is common to use curly ∂’s in the notation for partial derivatives, but since we will be using partial derivatives almost exclusively, there is no significant advantage to making a distinction between partial derivatives and regular ones. Of course the partial respect to y would be written as df (52) = f = f (x, y)=3x. dy y y A small portion of the graph of this function is shown in Figure 14. As we have
Figure 14. Graph of f(x, y)=3xy + x4. discussed, this depiction of the graph imposes gridlines on the surface. What we see are a collection of straight line segments each an edge shared by two grid par- allelograms. Half of the gridlines correspond to fixed values of y and the other half to fixed values of x. For example, if we were to fix y to a value of zero, this would single out those points lying on a curve corresponding to the function values f(x, 0). The points (x, 0,f(x, 0)) all lie in the xz-plane, and if y = 0 is one of the lattice coordinates, the corresponding segments would form a PL approximation of f(x, 0). The partial derivatives fx(x, 0) can be interpreted as slopes in the xz-plane, and these would be approximated by the slopes of the segments. Fixing y = 1 singles out the gridline on the closest face of the cube in Figure 14. The values of fx(x, 1) can be interpreted as slopes in this plane. For a function f of one variable, being differentiable implies the continuity of f. This does not apply to the partial derivatives of a function of more than one 8. DIFFERENTIABILITY FOR FUNCTIONS OF TWO VARIABLES 39 variable. A standard counterexample is as follows. ( xy , (x, y) =(06 , 0), (53) f(x, y)= x2+y2 0, (x, y)=(0, 0). The partial derivatives of this function are 3 2 y −x y 6 ( (x2+y2)2 , (x, y) =(0, 0), (54) fx(x, y)= 0, (x, y)=(0, 0),
3 2 ( x −xy 6 (x2+y2)2 , (x, y) =(0, 0), (55) fy(x, y)= 0, (x, y)=(0, 0). The partial derivatives exist at all points, and in particular at (0, 0). The function 1 6 f is not continuous at (0, 0), however, since f(t, t)= 2 for all t = 0, there are 1 function values equal to 2 arbitrarily close to (0, 0). A portion of the surface is shown in Figure 15. The graph is not accurate around the discontinuity, but some sense of the surface can be obtained from the picture. In fact, discontinuities can often be seen in a graph such as this with badly distorted grid parallelograms near the discontinuity. Note that the x- and y-axes lie on the surface and that 1 − − 1 the horizontal lines with points (t, t, 2 ) and with points (t, t, 2 ) also lie on the surface everywhere except for when t = 0. In particular, for any δ>0, there is a point (x, y) within δ of (0, 0) such that f(x, y) takes any particular value between − 1 1 2 and 2 . If we look at a few of the gridlines, we see that these are nicely smooth
Figure 15. xy Graph of f(x, y)= x2+y2 . individually. For example in Figure 16, graphs in the xz-plane corresponding to fixed values of y =1,.5,.1,.02 are shown. Each is the graph of a differentiable function. In fact, they are continuously differentiable as functions of one variable. For y =0,f(x, 0) = 0, so this gridline is also nicely smooth. It is the transition 40 4. FUNCTIONS to the gridline at y = 0 that is not continuous. Considering the gridlines where x is held constant shows a similar situation. What we see, therefore, is that the partial derivatives only address the differentiability of the individual gridlines, so the continuity of the function of two variables is not necessarily guaranteed.
Figure 16. Gridlines with y =1,.5,.1,.02.
Our interests lie in the geometry of nice surfaces, and we would like to avoid discontinuities such as the one exhibited by this last example. It turns out that if the partial derivatives are continuous (as functions of two variables), then the original function is also continuous. With this in mind, we make the following definition. Definition 13. If the partial derivatives of a function f are continuous, we will call f continuously differentiable or C1. If all of the second partial derivatives are also continuous, f is C2. As before, we can also speak of Cn functions in general and C∞ functions. We can roughly say that if n>m, then Cn functions are more smooth than Cm functions. For the most part, we will assume that whenever we speak of the derivative of a function, that derivative will be continuous. We will purposely encounter instances of non-differentiability and non-continuity, and these will be very important and very specific. Otherwise, if we speak of a third partial derivative of a function f, for example, we will implicitly assume that f is at least C3. For the non-continuous function f in the example above, it is difficult to imagine a plane tangent to the surface at (0, 0) (except, perhaps, for a vertical one). This function was not C1, however, and one important consequence of a function of two variables being C1 is that it is always possible to determine a tangent plane in a 8. DIFFERENTIABILITY FOR FUNCTIONS OF TWO VARIABLES 41 reasonable way. Let us examine this in some detail, since this will be central to much of what we study. Consider a function f : R2 → R. If the first partial derivatives exist, then at a point (a, b), fx(a, b) and fy(a, b) can be interpreted as being the slopes of lines tangent to the gridlines at (a, b, f(a, b)). It should seem reasonable that if a plane were to be tangent to the surface at this point, then the two tangent lines would lie on this tangent plane. In the example above, this would mean that the xy-plane would be the only possible candidate for a tangent plane at the origin, and we would not want to consider this to be the case. No reasonable tangent plane exists in this situation. Again, however, the function in the example was not C1. In any case, we can assume that the plane determined by the two partial derivatives should be the only possible candidate for a tangent plane. We wish to show that if f is C1, then we can reasonably call this plane a tangent plane. Let us now suppose that f is indeed C1 (as always, we assume that f is dif- ferentiable in some region around the point under consideration). For convenience, suppose that fx(a, b)=m and fy(a, b)=n. We can reasonably call these the x- and y-slopes at (a, b). The plane determined by these two slopes is the graph of a linear function t(x, y)=mx + ny + c where c is the constant that makes t(a, b)=f(a, b). Consider a nearby point (a + dx, b + dy), and we will attempt to estimate the dif- ference between t(a + dx, b + dy) and f(a + dx, b + dy). Our strategy will be to consider f(a + dx, b) first and then f(a + dx, b + dy) (we could just as easily start with f(a, b+dy)). We can use the partial derivative fx(a, b) to estimate f(a+dx, b). The fact that fy is continuous allows us to estimate fy(a + dx, b), and this in turn can be used to estimate f(a + dx, b + dy). Let >0. There is a δ1 > 0 such that if |dx| <δ1, then (56) | f(a, b)+fx(a, b)dx − f(a + dx, b) | = | t(a, b)+mdx − f(a + dx, b) | < dx.
We can make a similar approximation using fy(a+dx, b), but this would necessarily depend on dx complicating matters significantly. We can however consider f(a + dx, b+dy)−f(a+dx, b). Since f is differentiable with respect to y, the Mean Value Theorem tells us that there is a ν between b and b + dy (dy could be negative) such that
(57) f(a + dx, b + dy) − f(a + dx, b)=fy(a + dx, ν)dy.
The continuity of fy guarantees the existence of a δ2 > 0 such that if (a + dx, η)is within δ2 of (a, b), then
(58) | fy(a + dx, η) − fy(a, b) | = | fy(a + dx, η) − n | <.
There is a δ>0, therefore, such that whenever (a + dx, b + dy) is within δ of (a, b), |dx| <δ1 and (a + dx, η) will be within δ2 of (a, b). For a point (a + dx, b + dy) 42 4. FUNCTIONS satisfying these conditions, we have
(59) | f(a + dx, b + dy) − t(a + dx, b + dy) | (60) = | f(a + dx, b + dy) − f(a + dx, b)+f(a + dx, b) − t(a + dx, b + dy) | (61) = | f(a + dx, b + dy) − f(a + dx, b)+f(a + dx, b) − f(a, b) − mdx − ndy | (62) ≤|f(a + dx, b + dy) − f(a + dx, b) − ndy | + | f(a + dx, b) − f(a, b) − mdx | (63) ≤ |dy| + |dx| = (|dy| + |dx|) <2p|dx|2 + |dy|2.
This can be interpreted as showing that the directional derivative in the direction of the vector [ dx, dy ] exists. In other words, if we were to consider the curve on the surface containing the points (a + tdx, b + tdy, f(a + tdx, b + tdy)), then the tangent line to this curve would lie on the plane determined by the two partial derivatives. A third way of saying this is that if f is continuously differentiable, then the set of tangent lines to the surface at (a, b) will form a plane, and this plane is the same as the one determined by fx(a, b) and fy(a, b). We now can state that for a function of two variables defined in some region about (a, b) and continuously differentiable in that region, it is perfectly reasonable to speak of a tangent plane to the graph of the function. Continuous differentiability is not a necessary condition in this regard (see a book in real analysis), but since we will never consider a non-continuous derivative outside of several important special cases, it is reasonable to proceed in this way. Continuously differentiable functions have one other important property that we will exploit heavily. This is in regards to the second partial derivatives. The partial derivative of fx with respect to y will be denoted by fxy. It is important to note that in fxy we differentiated with respect to x first and then y. This same function is denoted d2f (64) f = . xy dydx
d With the dx notation, we indicate differentiation by placing symbols in front of the function name, so the latest derivative should be furthest left. Even though 2 it is fundamentally important that C functions are such that fxy = fyx,aswe will now investigate, the order of the differentiation is critical to understanding the relationship between geometry and differentiation. 2 Suppose we have a C function f in two variables. At a point (a, b), fx(a, b) is the slope of the tangent line to the gridline with y = b, and fy(a, b) is the slope of the tangent to the gridline with x = a. The second partial derivative fxy(a, b) describes the rate of rotation of the tangent in the x-direction as we move it in the y-direction. Similarly, fyx(a, b) describes the rate of rotation of the tangent in the y-direction as we move it in the x-direction. Expressed this way, there appears to be no reason to expect that fxy(a, b)=fyx(a, b). Let us try to understand why this might be the case. 8. DIFFERENTIABILITY FOR FUNCTIONS OF TWO VARIABLES 43
A standard example of a function with unequal cross-partials is as follows. 2 2 ( xy(x −y ) , (x, y) =(06 , 0), (65) f(x, y)= x2+y2 (0, 0), (x, y)=(0, 0). The graph of this function looks unremarkable (see Figure 17), but anomolies of the second derivative will not be obvious in graphs such as this. The first partial derivatives for this function are
Figure 17. For this function fxy(0, 0) =6 fyx(0, 0).
4 5 2 3 ( x y−y +4x y 6 (x2+y2)2 , (x, y) =(0, 0), (66) fx(x, y)= (0, 0), (x, y)=(0, 0).
5 4 3 2 x −xy −4x y 6 ( (x2+y2)2 , (x, y) =(0, 0), (67) fy(x, y)= (0, 0), (x, y)=(0, 0).
Away from (0, 0), the functions fxy and fyx are equal. x6 +9x4y2 − 9x2y4 − y6 (68) f (x, y)=f (x, y)= , for(x, y) =(06 , 0). xy yx (x2 + y2)3 This should not be surprising, since all of the second partial derivatives are contin- uous away from (0, 0). The common graph for the cross-partials is shown in Figure 18, and it is apparent that neither function can be continuous at (0, 0). Both func- tions have values at (0, 0), however. Since fx(0,y)=−y, the partial derivative of this function at (0, 0) must be fxy(0, 0) = −1. Similarly, since fy(x, 0) = x, we have that fyx(0, 0) = 1. The nature of this example indicates that for nice functions, the cross-partials should be expected to be equal. Let us look at why we might believe this to be 44 4. FUNCTIONS
Figure 18. This is the graph of fxy or fyx. They agree everywhere except at (0, 0). the case. The grid lines on the graph break the surface into grid parallelograms corresponding to a PL approximation of f. In general, these are not actual par- allelograms, since the four corners are assumed to lie on a curved surface, and so pairs of opposite sides cannot be expected to be parallel or the same length. For a grid with mesh ∆x × ∆y, we can consider the grid parallelogram at a point (a, b). In particular, the four corners are (a, b, f(a, b)), (a +∆x, b, f(a +∆x, b)), (a +∆x, b +∆y, f(a +∆x, b +∆y)), and (a, b +∆y, f(a, b +∆y)). The grid par- allelogram is depicted in Figure 19 along with the true parallelogram determined by the grid segments adjacent to (a, b, f(a, b)). Let Dxf(a, b) be the vector from (a, b, f(a, b)) to (a +∆x, b, f(a +∆x, b)), as shown in Figure 19. The coordinates of Dxf(a, b) are
(69) Dxf =[∆x, 0,f(a +∆x, b) − f(a, b)]. Compare this vector to the slope of this side of the grid parallelogram f(a +∆x, b) − f(a, b) (70) m = . ∆x
We are looking at the slope of a secant line whose limit is fx(a, b), and the vector Dxf(a, b) gives us information that is equivalent to this slope. The change in the values of f, the third coordinate in Dxf, will be called the PL partial differential of f with respect to x, and will be denoted Dxf. The vector Dyf(a, b) similarly contains information about a secant line whose slope approximates fy(a, b), and we define Dyf in the obvious way. Figure 20 shows Df(a, b) and Df(a +∆x, b). These reflect a change in fy as it moves in the x-direction. In other words, the difference between these two 8. DIFFERENTIABILITY FOR FUNCTIONS OF TWO VARIABLES 45
f(a +∆x, b +∆y)
f(a, b +∆y)
Dyf(a, b)
Dxf(a, b) f(a, b) f(a +∆x, b)
Figure 19. A grid parallelogram determined by four points on the surface and the actual parallelogram determined by Dxf and Dyf.
vectors is an approximation of fyx(a, b), and as long as the limits are sufficiently well-behaved,
Dyf(a +∆x, b) − Dyf(a, b) (71) lim lim =[0, 0,fyx(a, b)]. ∆y→0 ∆x→0 ∆x∆y
If there were no difference between Dyf(a, b) and Dyf(a +∆x, b), then Dyf(a + ∆x, b) would occupy the opposite side of the true parallelogram shown in Figure 20. The difference between the two vectors must be a vector from the corners of the true parallelogram and the grid parallelogram corresponding to (a +∆x, b +∆y), as shown in Figure 21.
Dyf(a +∆x, b)
Dyf(a, b) Dyf(a, b)
Figure 20. The vectors Dyf(a, b) and Dyf(a +∆x, b).
The vector Dxyf(a, b) must have precisely the same geometric interpretation, so as long as the limits are well-behaved, it should be that fxy(a, b)=fyx(a, b). 46 4. FUNCTIONS
Dyxf(a, b)
Figure 21. The vector corresponding to fyx(a, b). CHAPTER 5
The Riemannian Curvature Tensor in Two Dimensions
For a surface parametrized by x(u1,u2)= x1(u1,u2),x2(u1,u2),x3(u1,u2) , the Riemannian curvature tensor is defined to be dΓl dΓl (72) Rl = ik − ij +Γp Γl − Γp Γl , ijk duj duk ik pj ij pk where i,j,k,l,p ∈{1, 2 }, and the Einstien summation convention is used (i.e., since p occurs as both an upper and lower index, it is summed over). Using the dx d2x k notation xi = dui and xij = duj dui , we define the Christoffel symbols Γij along with the coefficients of the Second Fundamental Form by k (73) xij = Lij n +Γij xk, k again using the Einstein summation convention. Note that the Γij describe the tangential change in the tangent vectors xi in terms of the xi, and they can be obtained by making measurements along the surface (i.e., they are intrinsic). The Lij are extrinsic, and the principal curvatures, κ1 and κ2, can be obtained from them, and so the Gauss curvature, K = κ1κ2, also depends on the Lij . The Gauss l curvature can also be obtained from Rijk, in particular Rl g (74) K = 121 l2 , g where gij = h xi, xj i and g = | gij |. There are several other choices for the indices that will result in ±K. All quantities used here are intrinsic, so a proof of the relationships stated here also proves Gauss’ Theorema Egregium, that Gauss curvature is intrinsic. l The Rijk contain more curvature information than K, and the Riemannian curvature tensor generalizes to higher dimensions, where the Gauss curvature does not. Motivation for the Riemannian curvature tensor comes from the following observations. What I describe is partly nonsense, but it lays out the basic idea. Suppose a simple closed curve C on the surface bounds a region S. The total curvature of S is θ = RS KdA. If we were to parallel transport a vector around C, then the resultant vector would differ from the original by an angle θ.Ifwe had points A and B on the curve, then there are two ways that we could parallel transport a vector along C from A to B. The angle between the two resultant vectors would also be θ. The Riemannian curvature tensor captures this idea using differentials and derivatives. For example, if we were to follow the tangent vector x1 as it moved a small distance du1 in the u1-direction, and then a small distance du2 in the u2- direction, we would get some vector x1(12). We could go the other way, that is,
47 48 5. THE RIEMANNIAN CURVATURE TENSOR IN TWO DIMENSIONS we could move first in the u2-direction, and then the u1-direction. We’ll call this x1(21). These two vectors will be the same, but if we were to keep track of the rotation of the vectors relative to the surface, we would get a discrepancy of θ, where θ is the total curvature inside the little parallelogram with sides du1 and du2. The average curvature would be θ (75) K ≈ . kdu1 × du2k Note also that kx (12) − x (21)k (76) θ ≈ 1 1 . kx1k Very roughly then, the Riemannian curvature tensor describes the following. Ig- j noring the normal component of the change in xi, move it in the u -direction, and k then in the u -direction to obtain xi(jk). Obtain xi(kj) similarly and subtract. Expressed in terms of the tangent vectors x1 and x2, we have 1 2 (77) xi(jk) − xi(kj)=Rijk x1 + Rijkx2.
1. Parametrizations Let x(u1,u2)= x1(u1,u2),x2(u1,u2),x3(u1,u2) be a vector function x : R2 → R3. Let x be a piecewise linear approximation of x with mesh ∆u1 × ∆u2. At a lattice point (u1,u2)=(a, b), define the partial PL-differential with respect to ui to be 1 (78) x1 = D1x(a, b)=x(a +∆u ,b) − x(a, b), 2 (79) and x2 = D2x(a, b)=x(a, b +∆u ) − x(a, b), and the second partial PL-differentials with respect to ui and uj to be 1 (80) x11 = D11x(a, b)=D1x(a +∆u ,b) − D1x(a, b), 2 (81) x12 = D12x(a, b)=D1x(a, b +∆u ) − D1x(a, b), 1 (82) x21 = D21x(a, b)=D2x(a +∆u ,b) − D2x(a, b), 2 (83) and x22 = D22x(a, b)=D2x(a, b +∆u ) − D2x(a, b). Also at each lattice point, we can define the unit PL-normal vector at (a, b)tobe x × x (84) n(a, b)= 1 2 . kx1 × x2k
Note that n is normal to the plane determined by x1 and x2. One important goal is to understand the curvature of space, so it is important to understand curvature intrinsically. It is possible to decompose the xij into tangential and normal components. k (85) xij = Lij n +Γij xk. k The Lij are the coefficients of the PL-second form (??). The Γij are the PL- Christoffel symbols, and they describe the tangential (or geodesic) change in the tangent vectors xi. From an intrinsic point of view, we can define quantities that correspond roughly to the PL-Christoffel symbols. We will do this by working from the fact that any two adjoining grid parallelograms can be laid flat (i.e., can be embedded in a plane). 1. PARAMETRIZATIONS 49
Let us first consider the intrinsic change corresponding to x11. At a lattice point (a, b), we are looking at 1 (86) x11 = x1(a +∆u ,b) − x1(a, b), so we are interested in the grid parallelograms corresponding to (a, b) and (a + ∆u1,b). Both grid parallelograms can be embedded in the vector space spanned by 1 x1(a, b) and x2(a, b), and in particular, x1(a +∆u ,b) lies in this plane. With the following subtraction taking place in this vector space, we can therefore define the k intrinsic PL-differential, δ1x1, and the intrinsic PL-Christoffel symbols, γ11,by 1 1 2 (87) x1(a +∆u ,b) − x1(a, b)=δ1x1(a, b)=γ11x1(a, b)+γ11x2(a, b). k In general, we define δixj and γij the same way.
x1 (21) x (2) 1 θ x (12) (a +∆u1,b+∆u2) 1
x2
x1 x1 (1) (a, b)
Figure 1. Pushing x1 around the grid parallelogram.
If we consider the grid parallelograms at (a, b), (a +∆u1,b), (a, b +∆u2), and (a+∆u1,b+∆u2), it will not be possible to lay these flat if there is non-zero impulse 1 2 1 2 curvature at (a +∆u ,b+∆u ). If we cut along the vector x1(a +∆u ,b+∆u ), however, we can pull all four grid parallelograms into the plane spanned by the vectors x1(a, b) and x2(a, b), as shown in Figure 1. The angle θ between the two 1 2 1 2 copies of x1(a+∆u ,b+∆u ) is precisely the impulse curvature at (a+∆u ,b+∆u ). The Riemannian curvature tensor exploits this observation, and we will start to l build up to Rijk here. The intrinsic PL-differential δ1x1(a, b) measures the difference between the vec- 1 tors marked x1 and x1(1) in Figure 1. Similarly, δ2x1(a +∆u ,b) is the difference between the vectors marked x1(1) and x1(12). Both of these differences are com- puted in the plane spanned by x1(a, b) and x2(a, b). Also in this plane, the difference 2 between x1 and x1(21) is measured by δ2x1(a, b) and δ1x1(a, b +∆u ). Finding the angle between the vectors x1(12) and x1(21) is a bit awkward, but we can get a good approximation for small θ’s with kx (12) − x (21)k (88) θ ≈ 1 1 , kx1(12)k since the vectors x1(21) and x1(12) are the same length, and so the difference between the vectors is essentially an arc of a circle with radius kx1(12)k (or equiv- alently, kx1(21)k). With this in mind, we define the PL-Riemannian curvature 50 5. THE RIEMANNIAN CURVATURE TENSOR IN TWO DIMENSIONS tensor by 1 2 (89) xi(jk) − xi(kj)=Rijk x1 + Rijk x2.
In words, we take the vector xi and move it first in the xj -direction, and then the xk-direction. From this we subtract the result of moving this vector first in the xk-direction, and then the xj -direction. This description makes it apparent that we have the following relationships. l l (90) Rijk = −Rikj , l (91) and Rijj =0. 1 2 There are eight pairs of numbers Rijk and Rijk, and each of the four pairs for which j =6 k, along with x1 and x2, give us approximations of the impulse curvature at (a +∆u1,b+∆u2). We can express the quantities in (89) in terms of differentials at (a, b). Specif- ically, we need the following k k 1 k (92) δ1γij (a, b)=γij (a +∆u ,b) − γij (a, b), k k 2 k (93) and δ2γij (a, b)=γij (a, b +∆u ) − γij (a, b). For example, we have that k 1 k k (94) γij (a +∆u ,b)=γij (a, b)+δ1γij (a, b).
We can find x1(12) as follows, using numbers in parentheses to designate the lattice point. Since 1 2 (95) x1(1) = x1 + γ11x1 + γ11x2, 1 2 (96) x2(1) = x2 + γ21x1 + γ21x2, i i i (97) and γ12(1) = γ12 + δ1γ12, we have that 1 2 (98) x1(12) = x1(1) + γ12(1)x1(1) + γ12(1)x2(1) 1 2 (99) = x1 + γ11x1 + γ11x2 1 1 1 2 (100) +(γ12 + δ1γ12)(x1 + γ11x1 + γ11x2) 2 1 1 2 (101) +(γ12 + δ1γ12)(x2 + γ21x1 + γ21x2). Similarly, 1 2 (102) x1(21) = x1(2) + γ11(2)x1(2) + γ11(2)x2(2) 1 2 (103) = x1 + γ12x1 + γ12x2 1 1 1 2 (104) +(γ11 + δ2γ11)(x1 + γ12x1 + γ12x2) 2 1 1 2 (105) +(γ11 + δ2γ11)(x2 + γ22x1 + γ22x2). It follows that 1 1 1 1 1 2 1 2 1 (106) x1(12) − x1(21) = (γ12γ11 + δ1γ12 + δ1γ12γ11 + γ12γ21 + δ1γ12γ21)x1 1 2 1 2 2 2 2 2 2 (107) +(γ12γ11 + δ1γ12γ11 + γ12γ21 + δ1γ12 + δ1γ12γ21)x2. 1 1 1 1 1 2 1 2 1 (108) − (γ11γ12 + δ2γ11 + δ2γ11γ12 + γ11γ22 + δ2γ11γ22))x1 1 2 1 2 2 2 2 2 2 (109) − (γ11γ12 + δ2γ11γ12 + γ11γ22 + δ2γ11 + δ2γ11γ22)x2 1. PARAMETRIZATIONS 51
The x1 component can be written in Einstein notation as 1 1 p 1 p 1 p 1 p 1 (110) δ1γ12 − δ2γ11 + γ12γp1 − γ11γp2 + δ1γ12γp1 − δ2γ11γp2, and the x2-component can be written as 2 2 p 2 p 2 p 2 p 2 (111) δ1γ12 − δ2γ11 + γ12γp1 − γ11γp2 + δ1γ12γp1 − δ2γ11γp2.
In general, the xi-component would be i i p i p i p i p i (112) δ1γ12 − δ2γ11 + γ12γp1 − γ11γp2 + δ1γ12γp1 − δ2γ11γp2. All cases are covered by l l l p l p l p l p l (113) Rijk = δjγik − δkγij + γikγpj − γij γpk + δj γikγpj − δkγij γpk. Compare this to the definition of the (non-PL) Riemannian curvature tensor.
CHAPTER 6
Riemannian Curvature Tensor
1. The Riemannian Metric for a Plane One thing that should always be kept in mind is that the derivative can always be interpreted as being a linear approximation. In other words, it is often helpful to understand a situation concerning a differentiable object by studying the corre- sponding linear algebra situation. We will be exploring something that differential geometers call a metric or a Riemannian metric. This is not to be confused with the metric a topologist would impose on a metric space, although a metric-space metric can always be constructed from a Riemannian metric. Differential geometers study objects that don’t have natural (or at least not convenient) coordinate systems. We can impose coordinate systems on these spaces by identifying a piece of the space with a piece of a Euclidean space. Right now, we will associate a piece of a surface with a piece of the Eucidean plane, R2. We can talk about points of the surface using a coordinate system for R2. The geometry, however, for the two spaces will be different, and we will impose a funny geometry on R2 and pretend that R2 actually is the surface. We will start to describe how this is done in the case when the surface is another plane, and at the same time, develop some of the notation and terminology that is used in differential geometry. We will think of the derivatives at a point of a surface as being generalizations of the notion of a linear transformation, so let’s look at a linear transformation. Due to the repetitive nature of linear algebra and differential geometry, it is convenient to talk about things such as the u1u2-plane rather than the uv-plane. For the most part, superscripts are used in the same way that subscripts are. This interferes with the use of exponents, but the handyness of the notation more than makes up for this. Suppose we have a linear transformation A from the u1u2-plane to the x1x2- plane. Linear transformations can be defined in terms of matrix multiplication, so we have
1 1 a1 a2 1 1 2 u (114) A(u ,u )= 2 . 2 2 u a1 a2
i Note that aj is the entry in the i-th row and j-th column. Part of the reason for using superscripts in this way will be apparent in a minute. More will made clear
53 54 6. RIEMANNIAN CURVATURE TENSOR later. Expanding the multiplication, we see that 1 1 1 2 1 a1u + a2u x 1 2 2 = A(u ,u )= x 2 1 2 2 a1u + a2u (115) 2 1 j Pj=1 aj u = . 2 2 j Pj=1 aj u The use of the summation notation indicates some of the repetitiveness. More can be seen in the fact that both entries look the same. Another common way of expressing this relationship is
2 1 1 j x = X aj u j=1 (116) 2 2 2 j x = X aj u j=1 Both equations look the same, except for the superscripts, which are both 1 in the first equation, and both 2 in the second. Note also that the summation index j in both summations appear as both a subscript and a superscript. Albert Einstein is often credited with noticing that there is a nice scheme for deciding which indices should be subscripts and which should be superscripts, and if this is followed, summations will always range over one subscript and one superscript. The terms covariant and contravariant tensors are used in this scheme, and we will discuss this later. The important thing here is that we will always sum over an index that appears both as a superscript and a subscript. This makes the summation sign redundant (the context will make it clear what the range of the index is supposed to be), and so using the Einstein summation convention, we can write the equations in (116) as one equation and without the summation sign,
i i j (117) x = aj u . This may look odd at first, but it turns out to be a powerful use of notation that takes care of itself. Let’s look at an example. Consider the linear transformation defined by the 13 matrix A = . The vectors (the points) [ 1, 0 ] and [ 0, 1 ] in the u1u2-plane 22 map to the following vectors in the x1x2-plane. We will use A for the name of the function, as well. 13 1 1 A(1, 0) = = 22 0 2 (118) 13 0 3 A(0, 1) = = 22 1 2 Remember that we are building up intuition and notation for more complicated situations. What we want to do here is to talk about points on the x1x2-plane using coordinates from the u1u2-plane. The game is that we will use the label [1, 0 ] to talk about the vector [ 1, 2 ] in the x1x2-plane. We can compute the 1. THE RIEMANNIAN METRIC FOR A PLANE 55 magnitude of the vector [ 1, 2 ] using the dot product. That is, 1 (119) 12 =1+4=5, 2 √ so the magnitude of the vector must be 5. Under the new rules√ for the game and the new geometry for the u1u2-plane, the magnitude of [ 1, 0]is 5. Similarly, the magnitude√ of the vector√ [ 0, 1 ] is the same as the magnitude of the vector [ 3, 2], which is 9+4= 13. Under this new geometry, as√ we move from the point (0, 0) to the point (1, 0), we have traveled a distance√ of 5 units, and as we move from (0, 0) to (0, 1), we have moved a distance of 13. Our notion of distance has changed dramatically, and the way we measure angles can be different, as well. In the old geometry of the u1u2-plane, the two vectors [ 1, 0 ] and [ 0, 1 ] are perpendicular, but in the new geometry, the angle is the same as the angle between [ 1, 2 ] and [3, 2 ] in the x1x2-plane. We can compute this angle using the dot product, as we did before. That is, 3 (120) 12 =3+4=8, 2 and so the angle between them θ must satisfy √ √ (121) 8 = 5 13 cos θ. In other words, 8 (122) θ = cos−1 √ √ . 5 13 In the new geometry for the u1u2-plane, therefore, this must be the angle between [1, 0 ] and [ 0, 1 ]. All that we have done here depends on taking dot products in the x1x2-plane. The rules of the game, therefore, can be condensed into a funny dot product on the u1u2-plane. These generalizations of the dot product are called inner products, and inner products can always be expressed in terms of a matrix product as follows. Given an inner product, there is a matrix [ gij ] such that the inner product of vectors a1,a2 and b1,b2 is
1 1 2 1 2 1 2 g11 g12b a ,a , b ,b = a a 2 g21 g22 b (123) 1 1 2 1 1 2 2 2 = a b g11 + a b g21 + a b g12 + a b g22 i j = a b gij (in Einstein notation)
Assuming the existence of such a matrix, it is easy to find the matrix [ gij ] for this example. 58 (124) [ g ]= ij 813
This inner product, and equivalently the matrix gij , determines the new geometry completely. Letting e1 =[1, 0 ] and e2 =[0, 1 ], the gij are determined by what the inner products between these vectors should be. That is,
(125) gij = h ei, ej i 56 6. RIEMANNIAN CURVATURE TENSOR
We will define magnitudes and angles in the new geometry using the inner product in place of the dot product.
(126) kxk = ph x, x i (127) h x, y i = kxkkyk cos θ
1.1. Exercises.
21. Let A be the linear transformation from the u1u2-plane to the x1x2-plane 2 −3 determined by the matrix . Find the inner product matrix [ g ] for the 14 ij new geometry. Under the new geometry, determine the distance traveled along the path from (0, 0) to (1, 0). Along the path from (0, 0) to (1, 1) to (0, 3). Find the angle between the vectors [ 2, 3 ] and [ −3, 1].
. i h j i 22 Write the product of the square matrices aj and bk in Einstein no- i tation (use ck for the product).
2. The Riemannian Metric for Curved Surfaces For a function of one variable, the first derivative can be interpreted as describ- ing an approximating tangent line, and the second derivative an approximating parabola. We can make similar comparisons with surfaces. Suppose we have a surface parametrized by the following vector function.
(128) x(u1,u2)= x1(u1,u2),x2(u1,u2),x3(u1,u2)
Suppose also that we are interested in investigating the curvature of the surface at any particular point on the surface. We will interpret the derivatives of x as describing linear approximations to the surface. In other words, the first partial derivatives define a tangent plane, and they also determine a linear transformation from the u1u2-plane to that tangent plane. Let us take a look at these first partials and discuss their meaning.
dx dx1 dx2 dx3 (129) = , , du1 du1 du1 du1 dx dx1 dx2 dx3 (130) = , , du2 du2 du2 du2
These two vectors are tangent to the surface at their respective points. They can be used directly to describe a plane tangent to the surface, and the chain rule provides justification for doing this. Suppose we have a line in the u1u2- plane parametrized by γ(t)= a1t, a2t (in other words, u = a1t and v = a2t). The corresponding curve on the surface can be parametrized by x(γ(t)) = x1(a1t, a2t),x2(a1t, a2t),x3(a1t, a2t) . We can differentiate x with respect to the 2. THE RIEMANNIAN METRIC FOR CURVED SURFACES 57 new parameter t, and the chain rule illustrates the linear character of the derivative. dx dx1 du1 dx du2 dx2 du1 dx2 du2 dx3 du1 dx3 du2 (131) = + , + , + dt du1 dt du2 dt du1 dt du2 dt du1 dt du1 dt dx1 dx1 dx2 dx2 dx3 dx3 (132) = a1 + a2, a1 + a2, a1 + a2 du1 du2 du1 du2 du1 du2 dx1 dx2 dx3 dx1 dx2 dx3 (133) = a1 , , + a2 , , du1 du1 du1 du2 du2 du2 dx dx (134) = a1 + a2 du1 du2 This could be interpreted as follows. If a point is moving along a line according to the parametrization γ, it would pass through any particular point with velocity 1 2 1 dx 2 dx a ,a . The image of this point on the surface will have velocity a du1 + a du2 . The relationship between velocity vectors in the u1u2-plane at a particular point and velocity vectors at the corresponding image point on the surface are related by a linear transformation. This function is known as the differential and can be defined by dx dx (135) dx = du1 + du2, du1 du2 where a velocity vector du1,du2 at a point in the u1u2-plane corresponds to a velocity vector dx at the corresponding image point on the surface. This function is also described by a matrix called the Jacobian, dx1 dx1 du1 du2 (136) J(x)= dx2 dx2 . du1 du2 dx3 dx3 du1 du2 The differential then becomes du1 (137) dx(du1,du2)=J(x) du2 dx1 dx1 du1 du2 1 2 2 du (138) = dx dx du1 du2 2 du dx3 dx3 du1 du2 Once we have this linear function, it is easy to compare areas in the u1u2-plane with areas on the tangent plane. The unit square determined by the vectors (1, 0) and dx dx (0, 1) gets mapped to a parallelogram determined by the vectors du1 and du2 . The area of this parallelogram, if you remember from calculus, is equal to the magnitude of the cross product of these two vectors. We’ll come back to this later. We have already talked a bit about a point in the u1u2-plane and the cor- responding point on the surface. The parametrization ties points together, and we can identify the pairs of points and speak of them almost as if they were one. Expressed another way, we are again playing the game of using labels from the u1u2-plane to describe points on the surface. This is an important concept when 58 6. RIEMANNIAN CURVATURE TENSOR dealing with manifolds, since there may be no way, or at least no convenient way, of describing individual points on a manifold. What we will do is to talk about points in the u1u2-plane and endow them with properties from the manifold, or in this case, the surface. For example, let us say that we move along the segment from (0, 0) to (1, 0) in the u1u2-plane. We have traveled a distance of 1. The image of this segment on the surface might be a curve with length 5. If we change the way that we measure distances in the u1u2-plane, as we did in the case when the surface was simply another plane, then in some new geometry,the length of the segment just mentioned would be 5, and then doing geometry in the u1u2-plane becomes more like doing geometry on the surface. The goal here is to come up with a funny way of measuring things like distances and angles so that doing geometry with these new measurement schemes is equivalent to doing geometry on the surface. This is a rough description of what differential geometry is about. The differential tells us how velocity vectors in the u1u2-plane correspond to velocity vectors on the surface via a linear transformation that changes from point to point. These contain the necessary information to find a relationship between distances and angles. Note that both of these quantities for vectors are obtainable from the dot product, so if we know how the dot products compare, then we should be able to get what we need for distances and angles. Suppose the Jacobian at (0, 0) is
1 1 c1 c2 2 2 (139) J(x)(0, 0) = c1 c2 3 3 c1 c2
Consider two velocity vectors in the u1u2-plane, a1,a2 and b1,b2 . Their images are
1 1 1 1 1 2 c1 c2 1 c1a + c2a 2 2 a 2 1 2 2 (140) c1 c2 2 = c1a + c2a , 3 3 a 3 1 3 2 c1 c2 c1a + c2a and similarly
1 1 1 1 1 2 c1 c2 1 c1b + c2b 2 2 b 2 1 2 2 (141) c1 c2 2 = c1b + c2b . 3 3 b 3 1 3 2 c1 c2 c1b + c2b
The dot product of these two vectors is
1 1 1 2 c1b + c2b 1 1 1 2 2 1 2 2 3 1 3 2 2 1 2 2 (142) c1a + c2a c1a + c2a c1a + c2a c1b + c2b 3 1 3 2 c1b + c2b 1 1 1 2 1 1 1 2 2 1 2 2 2 1 2 2 =(c1a + c2a )(c1b + c2b )+(c1a + c2a )(c1b + c2b ) 3 1 3 2 3 1 3 2 +(c1a + c2a )(c1b + c2b )
Of particular interest is the way the distributive property translates to a property called bilinearity for the dot product. Note how the ai factor out, and then the bi. 2. THE RIEMANNIAN METRIC FOR CURVED SURFACES 59
The gij represent the quantities in parentheses.
1 1 1 1 1 2 2 2 1 2 2 3 3 1 3 2 = a [c1(c1b + c2b )+c1(c1b + c2b )+c1(c1b + c2b )] 2 1 1 1 1 2 2 2 1 2 2 3 3 1 3 2 + a [c2(c1b + c2b )+c2(c1b + c2b )+c2(c1b + c2b )] 1 1 1 1 2 2 3 3 1 2 1 1 2 2 3 3 (143) = a b (c1c1 + c1c1 + c1c1)+a b (c1c2 + c1c2 + c1c2) 2 1 1 1 2 2 3 3 2 2 1 1 2 2 3 3 + a b (c2c1 + c2c1 + c2c1)+a b (c2c2 + c2c2 + c2c2) 1 1 1 2 2 1 2 2 = a b g11 + a b g12 + a b g21 + a b g22.
We could have established this with less clutter by assuming that the dot product follows a distributive law, which it does. The vectors a1,a2 and b1,b2 map 1 dx 2 dx 1 dx 2 dx to the vectors on the surface a du1 + a du2 and b du1 + b du2 . Therefore,
dx dx dx dx (144) a1 + a2 · b1 + b2 du1 du2 du1 du2 dx dx dx dx dx dx dx dx = a1b1 · + a1b2 · + a2b1 · + a2b2 · du1 du1 du1 du2 du2 du1 du2 du2 1 1 1 2 2 1 2 2 = a b g11 + a b g12 + a b g21 + a b g22.
Note that the gij represent the same quantities in both derivations. The dot product is a special case of a vector space product called an inner product, which share the basic property of bilinearity. Bilinearity is described as
(145) h ax + by, z i = a h x, z i + b h y, z i and
(146) h x,by + cz i = b h x, y i + c h x, z i .
This new inner product on the vectors a1,a2 and b1,b2 is defined by
1 1 2 1 2 1 2 g11 g12b (147) a ,a , b ,b = a a 2 g21 g22 b
The matrix [ gij ], or the bilinear function (i.e., the inner product) defined by it, is called the first fundamental form. The entries of the matrix, the gij , generally vary from point to point, and we usually want to consider surfaces and parametrizations where these vary smoothly. If we were to consider a vector at a certain point, say [ 1, 0 ], we can compute its inner product with itself using the first fundamental form.
g11 g121 (148) h [1, 0], [1, 0]i = 10 =[g11 ] g21 g22 0
The quantity g11 came from the inner product of the vector on the surface corre- sponding to [ 1, 0 ], so this should not be surprising. Perhaps more importantly, note that once we have the matrix [ gij ], we can work exclusively with vectors in the u1u2-plane, and while this vector is a unit vector under the dot product, with √ respect to this new inner product, it has magnitude ph [1, 0], [1, 0]i = g11. The matrix [ gij ] is also called the the Riemannian metric or simply the metric. 60 6. RIEMANNIAN CURVATURE TENSOR
3. Curvature The curvature of the surface at a point depends on how the unit normal vector is rotating as it moves past the point. This depends, of course, on how the normal is moving past the point, but this relationship is based on the derivative, and so, it is linear. We need, therefore, to find the derivative of the unit normal in two directions, and this is most convenient in the directions corresponding to u1 and 2 dn u . In other words, the curvature of the surface is completely described by du1 and dn du2 . In practice, these can be complicated derivatives to compute. For that reason, and also to understand them better, we will look at their relationship with other derivatives. The first derivatives of x determine n, so the second derivatives of x should also determine the derivatives of n. These relationships are linear, so all should be expressible in terms of matrix multiplications. Note that the product rule and chain rule generalize inner products and cross products in a natural way, and we will use these as we need them. dx The unit normal vector can be written in terms of the first derivatives, du1 and dx du2 as follows, since the cross product is perpendicular to the two factors. dx × dx du1 du2 (149) n = dx dx 1 × 2 du du Differentiating this expression directly is not immediately illuminating, so we will d2x approach this from another direction. Each of the second partial derivatives duj dui dx measures the change in the first derivative dui at each point of the surface. Some of this change occurs in the form of a change in magnitude, and some of this change is a result of the vector rotating. Furthermore, some change occurs along the tangent plane, and the rest in the direction of the normal vector. In any case, the two tangent vectors and the unit normal at each point form a basis for R3, so each second derivative can be expressed as a linear combination of these three vectors. 2 d x 1 dx 2 dx j i = Lij n +Γij 1 +Γij 2 (150) du du du du dx = L n +Γk (in Einstein notation) ij ij duk The four numbers Lij measure how quickly the first derivatives turn away from the surface. These together, therefore, can conceivably contain all of the surface’s curvature information. We are assuming that this curvature information comes from the linear approximation of the rotation of the unit normal vector at each point. We should look for the relationship between the Lij and the derivatives of the unit normal vector. Now, the unit normal has constant magnitude, so its derivatives are perpendicular to n. In other words, the derivatives of n must be parallel to the tangent plane. They can be expressed, therefore, as a linear combination of the first derivatives. dn dx dx − 1 − 2 i = Li 1 Li 2 (151) du du du dx = −Lj i duj j The four numbers −Li are different from the Lij , and in general, the position of the indices should be understood to indicate distinct variable names. The two L’s are 3. CURVATURE 61 closely related, and differences in index placement will usually imply a particular j relationship. Furthermore, the negative signs on the −Li are customary and are j used to simplify the relationship between the L’s. We have actually seen the Li h j i before. The determinant of the matrix Li is the Gauss curvature. To establish a formula tying the L’s together, we will differentiate the inner dx product dui , n , which is zero, since the two vectors are perpendicular. dx 2 d i , n dx dn d x 0= du = , + , n duk dui duk dukdui
dx j dx = ,L + Lik (152) dui k duj dx dx = −Lj , + L k dui duj ik j = −Lkgij + Lik We have, therefore, that j (153) Lik = Lkgij (in Einstein notation). This is a typical arrangement. We may speak of lowering an index, which means multiplying by the matrix associated with the metric. Note that the ordering of the indices is not critical, since the matrices we will be dealing with are for the most part symmetric. The matrix [ gij ] has a matrix inverse, which we will denote by −1 jk (154) [ gij ] = g . Again, the g with two superscripts is distinct from the g with two subscripts. By definition, one is the metric, the other is the metric’s inverse. In particular, let (1ifi = j, (155) δi = j 0ifi =6 j.
i Essentially, δj is the identity matrix. We can express the fact that [ gij ] and gjk are inverse matrices in Einstein notation by jk k (156) gij g = δi . Equation (153) can be reversed using Einstein notation as j (157) Lik = Lkgij il j il (158) Likg = Lkgij g il j l l (159) Likg = Lkδj = Lk. i The matrix Lj defines a linear transformation relating the rate at which the unit normal vector rotates with a corresponding velocity vector on the surface at a particular point on the surface. This linear transformation is called the Weingarten map. At each point of the surface the Weingarten map is a linear approximation to the Gauss map. It is, therefore, a derivative of the Gauss map, at least in some sense. The ratio of areas under the Gauss map with areas on the surface is the Gauss curvature, so the Weingarten map is intimately related to the curvature of the surface. Since the relationship is linear, the particular region we choose to use to compute the two areas is not important. The easiest correspond to the square 62 6. RIEMANNIAN CURVATURE TENSOR determined by the unit vectors [ 1, 0 ] and [ 0, 1 ] in the u1u2-plane. The image of this square under dx is the parallelogram determined by the two tangent vectors dx dx du1 and du2 . The area of this parallelogram can be found using the cross product dx dx (160) Area on surface = × . du1 du2 dn − 1 dx − 2 dx The corresponding vectors under the Weingarten map are du1 = L1 du1 L1 du2 dn − 1 dx − 2 dx and du2 = L2 du1 L2 du2 . The area of the paralellogram determined by these two vectors can also be found using the cross product. Using the fact that the cross product is bilinear and anti-symmetric, we see that (161) dn dn × du1 du2 dx dx dx dx =(−L1 − L2 ) × (−L1 − L2 ) 1 du1 1 du2 2 du1 2 du2 dx dx dx dx dx dx dx dx = L1L1 × + L1L2 × + L2L1 × + L2L2 × 1 2 du1 du1 1 2 du1 du2 1 2 du2 du1 1 2 du2 du2 dx dx dx dx =0+L1L2 × − L2L1 × +0 1 2 du1 du2 1 2 du1 du2 dx dx =(L1L2 − L2L1) × 1 2 1 2 du1 du2 The ratio of the areas under the Gauss map and areas on the surface, therefore, is i given by the determinant of the matrix Lj . 4. The Inverse of the Metric It is more difficult to compute distances along a surface, since the metric changes from point to point. At a particular point, however, the vector [ 1, 0] √ say, has an interpretation as a velocity vector with magnitude, or speed, g11.A small movement in this direction from this point, say ∆u1, 0 , corresponds to a √ 1 distance g11∆u . This would be a good approximation for a distance along the surface in this direction for small values of ∆u1. From this point, it should be conceivable that we could compute distances using integration, but that is not the concern here. We will consider differentiation first. Remember that we obtained the metric from the first partial derivatives of the parametrization, and for unit vectors u, ph u, u i is the magnitude of the corresponding tangent vector on the surface. It is, in some sense, a directional derivative. CHAPTER 7
Riemannian Curvature Tensor
1. Intrinsic Interpretations We have discussed the second derivatives of the vector function x in terms of the following.
d2x dx (162) = L n +Γk duj dui ij ij duk
These are called Gauss’ formulas. The Lij are called the coefficients of the second k fundamental form, and the Γij are called the Christoffel symbols (of the second k kind). We have talked about the Lij a bit, and right now, we will focus on the Γij . We saw earlier that if we followed a tangent vector around a closed path in a plane, then the net rotation of the tangent vector would be 2π (radians). Any devia- tion from 2π gives us direct information about total curvature contained within the closed curve. Let us try to follow a tangent vector around the path corresponding to the unit square in the u1u2-plane ([1, 0] × [1, 0]) using information only available k at the point (via derivatives). That is, we will use the Γij and perhaps derivatives of these at the point. Imposing the normal looking graph paper of the u1u2-plane on the surface, we will be running around one of the “squares,” which we’ll call s-squares, to have a name. dx Running from (0, 0) to (1, 0), the tangent vector du1 gives us a velocity on the surface. We will assume that this vector is tangent to the first side of the s-square. dx dx The vector du1 will move a distance du1 (approximately) to the next vertex. It dx 2 1 will turn towards du2 according to Γ11. It changes magnitude according to Γ11.In particular, traversing the first side affects the tangent vector in the following way
dx dx dx (163) → (1 + Γ1 ) +Γ2 . du1 11 du1 11 du2
Now we want to push this vector along the path corresponding to the segement from (1, 0) to (1, 1). We’re pushing in the direction of u2, so we are interested in 1 2 Γ12 and Γ12, but these have changed slightly, since we’re starting at a different 1 2 dΓ12 dΓ12 point. This change can be approximated using du1 and du1 . We have moved a distance corresponding to a change in u1 of one unit, so these are the differences
63 64 7. RIEMANNIAN CURVATURE TENSOR we need. We have the effect of traversing the next side as
1 1 1 1 dΓ12 dx 1+Γ + 1+Γ Γ + 11 11 12 du1 du1 2 2 1 2 dΓ12 dx + Γ + 1+Γ Γ + 11 11 12 du1 du2 (164) dΓ1 dΓ1 dx = 1 + Γ111 +Γ1 + 12 +Γ1 Γ1 +Γ1 12 12 du1 11 12 11 du1 du1 dΓ2 dΓ2 dx + Γ2 +Γ2 + 12 +Γ1 Γ2 +Γ1 12 11 12 du1 11 12 11 du1 du2
dx Pushing du1 the other way around the square, from (0, 0) to (0, 1) to (1, 1), yields
1 1 1 1 dΓ11 dx 1+Γ + 1+Γ Γ + 12 12 11 du2 du1 2 2 1 2 dΓ11 dx + Γ + 1+Γ Γ + 12 12 11 du2 du2 (165) dΓ1 dΓ1 dx = 1+Γ1 +Γ1 + 11 +Γ1 Γ1 +Γ1 11 12 11 du2 12 11 12 du2 du1 dΓ2 dΓ2 dx + Γ2 +Γ2 + 11 +Γ1 Γ2 +Γ1 11 12 11 du2 12 11 12 du2 du2
This is all wrong. Start over again. 1 We want to push the vector x1 around the square two ways: in the u direction and then the u2 direction, and also in the u2 direction and then the u1 direction. Let’s call these x1(12) and x1(21). We’ll also say x1(1) is the intermediate vector 1 after pushing x1 in only the u direction. Based on the information available at the original point, we can make the following “best guess.” We start with
1 2 (166) x1(1) = x1 +Γ11x1 +Γ11x2 1 2 (167) x2(1) = x2 +Γ21x1 +Γ21x2
As we push x1(1) to x1(12), we can use the best information we have about the current states of the various quantities. We have estimates of x1(1) and x2(1), and k we can also estimate the new Γ11 with
d (168) Γ1 (1) = Γ1 + Γ1 12 12 du1 12 d (169) Γ2 (1) = Γ2 + Γ2 . 12 12 du1 12
Therefore, we can make the estimate
1 2 (170) x1(12) = x1(1) + Γ12(1)x1(1) + Γ12(1)x2(1), 1. INTRINSIC INTERPRETATIONS 65 and substitution yields
x1(12) 1 2 = x1 +Γ11x1 +Γ11x2
1 d 1 1 2 + Γ + Γ x1 +Γ x1 +Γ x2 12 du1 12 11 11
2 d 2 1 2 (171) + Γ + Γ x2 +Γ x1 +Γ x2 12 du1 12 21 21 dΓ1 dΓ1 dΓ2 = 1+Γ1 +Γ1 + 12 +Γ1 Γ1 + 12 Γ1 +Γ2 Γ1 + 12 Γ1 x 11 12 du1 12 11 du1 11 12 21 du1 21 1 dΓ1 dΓ2 dΓ2 + Γ2 +Γ1 Γ2 + 12 Γ2 +Γ2 + 12 +Γ2 Γ2 + 12 Γ2 x 11 12 11 du1 11 12 du1 12 21 du1 21 2
For x1(21), we can make similar approximations.
1 2 (172) x1(21) = x1(2) + Γ11(2)x1(2) + Γ11(2)x2(2).
The intemediate values are
1 2 (173) x1(2) = x1 +Γ12x1 +Γ12x2 1 2 (174) x2(2) = x2 +Γ22x1 +Γ22x2 d (175) Γ1 (2) = Γ1 + Γ1 11 11 du2 11 d (176) Γ2 (2) = Γ2 + Γ2 . 11 11 du2 11
Therefore,
x1(21) 1 2 = x1 +Γ12x1 +Γ12x2
1 d 1 1 2 + Γ + Γ x1 +Γ x1 +Γ x2 11 du2 11 12 12
2 d 2 1 2 (177) + Γ + Γ x2 +Γ x1 +Γ x2 11 du2 11 22 22 dΓ1 dΓ1 dΓ2 = 1+Γ1 +Γ1 + 11 +Γ1 Γ1 + 11 Γ1 +Γ2 Γ1 + 11 Γ1 x 12 11 du2 11 12 du2 12 11 22 du2 22 1 dΓ1 dΓ2 dΓ2 + Γ2 +Γ1 Γ2 + 11 Γ2 +Γ2 + 11 +Γ2 Γ2 + 11 Γ2 x . 12 11 12 du2 12 11 du2 11 22 du2 22 2 66 7. RIEMANNIAN CURVATURE TENSOR
Some measure of the curvature is given by how different x1(12) is from x1(21). If we subtract, we see that
x1(12) − x1(21) dΓ1 dΓ1 dΓ1 dΓ1 = 12 − 11 +Γ1 Γ1 − Γ1 Γ1 + 12 Γ1 − 11 Γ1 du1 du2 12 11 11 12 du1 11 du2 12 dΓ2 dΓ2 +Γ2 Γ1 − Γ2 Γ1 + 12 Γ1 − 11 Γ1 x 12 21 11 22 du1 21 du2 22 1 dΓ1 dΓ1 dΓ2 dΓ2 + Γ1 Γ2 − Γ1 Γ2 + 12 Γ2 − 11 Γ2 + 12 − 11 (178) 12 11 11 12 du1 11 du2 12 du1 du2 dΓ2 dΓ2 +Γ2 Γ2 − Γ2 Γ2 + 12 Γ2 − 11 Γ2 x 12 21 11 22 du1 21 du2 22 2 dΓ1 dΓ1 dΓp dΓp = 12 − 11 +Γp Γ1 − Γp Γ1 + 12 Γ1 − 11 Γ1 x du1 du2 12 p1 11 p2 du1 p1 du2 p2 1 dΓp dΓp dΓ2 dΓ2 + Γp Γ2 − Γp Γ2 + 12 Γ2 − 11 Γ2 + 12 − 11 x . 12 p1 11 p2 du1 p1 du2 p2 du1 du2 2
If we were pushing x1 around an -square, then all terms will have a factor 1 dΓ12 1 2 of , except for the terms like du1 Γ11, which would have a factor of . It is conceivable that in a comparison with the actual function, the 2-terms would become irrelevant. This brings agreement with the Riemannian curvature tensor. k That is, x1(12) − x1(21) = R112xk. In Millman and Parker, it is shown that Rl g (179) K = 121 l2 . g l l It seems that R112 or R212 could also be used with an appropriate gij . In our case, this appears to be Rl g (180) K = 112 l2 . g This appears to be h x (12) − x (21), x i (181) 1 1 2 . g
To see that the difference seen in the vector x1 as it is pushed around the square two different ways is relevant, we get an initial confirmation from the following. In the Smarandache Manifolds book, it is shown that the relative angle between two geodesics increases or decreases depending on the total curvature between the geodesics. If the relative angle decreases by θ radians, then there must be θ total curvature between. The total curvature, therefore, must be ± the angle between x1(12) and x1(21). The formula given by Millman and Parker computes this angle. We can verify this as follows.
(182) h x1(12), x2 i = kx1(12)kkx2k≈kx1kkx2k cos θ12
(183) h x1(21), x2 i = kx1(21)kkx2k≈kx1kkx2k cos θ21, and we want θ12 − θ21. We can use the trig identity α + β α − β (184) cos α − cos β = −2 sin sin 2 2 1. INTRINSIC INTERPRETATIONS 67
We also have the fact that 2 (185) g = kx1 × x2k . Therefore,
l θ12+θ21 θ12−θ21 R g kx1kkx2k (−2) sin sin 121 l2 = 2 2 2 2 2 g kx1k kx2k sin θ (186) θ − θ ≈ 12 21 kx1kkx2k sin θ = K
CHAPTER 8
Curvature of 3-Dimensional Spaces
1. What we know There has been some work done with polyhedral metrics by Gromov, and later by Aitchison, and Rubinstein. The latter two worked with cubings of 3-manifolds, which consist of flat 3-cubes, and the curvature is concentrated in the 1-skeleton. The dihedral angle around each edge must be at least 2π, so the geometry is Euclid- ean or hyperbolic around the edges. At each vertex, it is required that lk(v) has the property that every 1-cycle has at least three edges and that every 1-cycle with exactly three edges bounds a triangle contained in exactly one cube. I’m not sure what lk(v) is, but my first guess is that it is a small ball about the vertex, and I think this means that it is a simplicial complex. My second guess is that it is the surface of this ball. Each triangle corresponds to a cube. We’re looking at the triangulation of a 2-sphere. OK, I think this is it, and I believe lk(v) stands for the link of v.
2. What is the geometry like around a vertex of a cubed 3-manifold? The simplest case might be eight cubes arranged like the octants of R3 about the origin. Adding two more in the most obvious way yields the geometry of stacked − π cones with impulse curvature 2 . Question 1. What is the nature of the curvature at a point beyond this type of dihedral curvature?
3. A positive curvature example This simple configuration consists of the corners of four cubes meeting at one vertex. A polyhedral ball about this vertex would form a tetrahedron. Since only one vertex is being considered, we can build the space out of four of the eight oc- tants of R3. We will use the x+y+z+-octant and the three octants adjacent to it, the x−y+z+-octant, the x+y−z+-octant, and the x+y+z−-octant. From this con- figuration, we will identify the x−z+-quarter plane of the x−y+z+-octant with the y−z+-quarter plane of the x+y−z+-octant; the x−y+-quarter plane of the x−y+z+- octant with the y+z−-quarter plane of the x+y+z−-octant; and the x+z−-quarter plane of the x+y+z−-octant with the x+y−-quarter plane of the x+y−z+-octant. A view of this using cubes is shown in Figure 1. Note that we are left with four semi-axes. The x+-, y+-, and z+-axes remain, and the three negative axes have been identified. We will refer to this last axis as the negative axis. The geometry in the interior of each of the octants is Euclidean, and there is π 2 radians of dihedral curvature along each of the axes. This can be seen in the 2-gon shown in Figure 2. It might be that it makes sense to say that there is 2π
69 70 8. CURVATURE OF 3-DIMENSIONAL SPACES
Figure 1. A depiction of the identifications: x−z+ ≡ y−z+, x−y+ ≡ y+z−, and x+z− ≡ x+y−.
Figure 2. π A 2-gon with total curvature 2 . steradians of curvature at the central vertex. One possible effect is some torsion in the curvature of a curve near it.