VECTOR CALCULUS

Syafiq Johar [email protected]

Contents

1 Introduction 2

2 Vector Spaces 2 n 2.1 Vectors in R ...... 3 2.2 Geometry of Vectors ...... 4 n 2.3 Lines and Planes in R ...... 9

3 Quadric Surfaces 12 3.1 Curvilinear Coordinates ...... 15

4 Vector-Valued Functions 18 4.1 Derivatives of Vectors ...... 18 4.2 Length of Curves ...... 22 4.3 Parametrisation of Curves ...... 23 4.4 Integration of Vectors ...... 28

5 Multivariable Functions 29 5.1 Level Sets ...... 30 5.2 Partial Derivatives ...... 31 5.3 Gradient ...... 34 5.4 Critical and Extrema Points ...... 41 5.5 Lagrange Multipliers ...... 44 5.6 Integration over Multiple Variables ...... 45

6 Vector Fields 52 6.1 Divergence and Curl ...... 54

7 Line and Surface Integrals 56 7.1 Line Integral ...... 56 7.2 Green’s Theorem ...... 62 7.3 Surface Integral ...... 65 7.4 Stokes’ Theorem ...... 72

1 1 Introduction

In previous calculus course, you have seen calculus of single variable. This variable, usually denoted x, is fed through a function f to give a real value f(x), which then can be represented 2 as a graph (x, f(x)) in R . From this graph, as long as f is sufficiently nice, we can deduce many properties and compute extra information using differentiation and integration. Differentiation df here measures the rate of change of the function as we vary x, that is dx is the infinitesimal rate of change of the quantity f. In this course, we are going to extend the notion of one-dimensional calculus to higher dimensions. We are going to work over vector spaces over the real numbers R. In higher dimensional spaces, we have a distinction between two objects which are called scalars and vectors. Scalars are quantities that only have magnitudes or size, which can be described by a real number. Vectors on the other hand are quantities that have a magnitude and direction. Vectors can be described with an array of numbers, as we shall see later. This array of numbers is reminiscent of the array (x, f(x)) we have seen above, but of course, in higher dimensions, we would have even move numbers. In higher dimensions, it can be difficult to have an explicit graphical representation of the vectors, so we will restrict our attention to 2 and 3 dimensions most of the time. Even so, there is a rich amount of mathematics available in these low dimensions. There are also various applications of vector calculus in physics, engineering, economics, geophysical sciences, meteorology, astronomy, and optimisation. It is also used as a tool and generalised in further studies of pure mathematics, such as topology and differential geometry.

2 Vector Spaces

In order to study objects in higher dimensions, we first define vector spaces. Abstractly, a real vector space is a collection of points (or can be seen as arrows) with two operations: addition and scalar multiplication. More concretely:

Definition 2.1 (Vector spaces). A vector space V over the field F is a non-empty set V together with the addition map V × V → V such that (u, v) 7→ u + v and a scalar multiplication map F × V → V such that (λ, v) 7→ λv satisfying the vector space axioms:

1. addition is commutative: u + v = v + u,

2. addition is associative: (u + v) + w = u + (v + w),

3. existence of additive identity: there exists an element 0 such that 0 + v = v + 0 = v,

4. existence of additive inverse: for every v ∈ V , there exists a u ∈ V such that v + u = u + v = 0,

5. distributivity of scalar multiplication over addition: λ(u + v) = λu + λv,

2 6. distributivity of scalar multiplication over field addition: (λ + µ)u = λu + µu,

7. compatibility of scalar multiplication with field multiplication: (λµ)u = λ(µu),

8. existence of scalar multiplication identity: there exists an element 1 ∈ F such that 1v = v.

2.1 Vectors in Rn However, for concreteness and practical applications, we are mostly interested in the vector n n space R over the field R. In this vector space, every element v ∈ R can be written as the list (v1, v2, . . . , vn) where vi ∈ R for all i = 1, 2, . . . , n. Sometimes, these numbers are arranged in a column. In matrix notation, this is written as the transpose (v1, v2, . . . , vn)|. We can also write n the vectors in terms of the standard basis of R : n n Definition 2.2 (Standard basis of R ). The standard basis of R is given by the collection of vectors {e1, e2,..., en} such that ei = (0,..., 0, 1, 0,..., 0) where the number 1 appears in the i-th position and 0 appears everywhere else. This basis is also called the Cartesian coordinate system.

In this system, we can express a vector v = (v1, v2, . . . , vn) as the sum v = v1e1 + v2e2 + 2 ... + vnen. Note that this is simply a generalisation of the Cartesian plane R we have seen in high school where the standard basis is simply the unit vector in the x and y axes. Thus, 2 vectors in R are written as the pair (x, y) = xe1 + ye2. The difference in higher dimensions is just that we have more components to specify a point.

e3

v3 v

v1 v2

e1 e2

Figure 1: A vector v and its coordinates v = v1e1 + v2e2 + v3e3 = (v1, v2, v3).

With these concrete expression of vectors, we can define the addition and scalar multiplica- tions explicitly. Suppose that u = (u1, u2, . . . , un) and v = (v1, v2, . . . , vn), then:

u + v = (u1, u2, . . . , un) + (v1, v2, . . . , vn) = (u1 + v1, u2 + v2, . . . , un + vn),

λu = λ(u1, u2, . . . , un) = (λu1, λu2, . . . , λun).

Of course, the zero vector 0 in this expression is given by (0, 0,..., 0). With these defined, one can easily check that all the vector space axioms are satisfied.

Remark 2.3. In engineering or physics, where one usually works in low dimensional vector 2 3 2 3 spaces like R or R , the standard bases of R and R are written as {i = (1, 0), j = (0, 1)} and {i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)} respectively.

3 2.2 Geometry of Vectors

Geometrically, we can view vectors as the position of a point relative to the 0 vector (called the origin). This type of vector is called the position vector. The scalars vi are called coordinates (or components) of the vector (or point) v with respect to the standard basis. Two position vectors are called parallel if they are scalar multiples of each other. Another interpretation of vectors is that it represents a movement or direction from a specific point. For example, starting from a point w = (w1, w2, . . . , wn) if we move in the direction v = (v1, v2, . . . , vn), we would end up at the point (w1 + v1, w2 + v2, . . . , wn + vn) = w + v. The vector v is called the translation vector. Therefore, if we start at a point u and wishes to end up at the point w, we have to move in the direction w − u.

e3 e3 w + v u

w v w

e e e1 e2 1 2 w − u (a) Vector addition. (b) Vector subtraction.

Figure 2: Vector addition and subtraction.

Since vectors represent geometrical objects, we can deduce some geometrical properties such as lengths and angles.

n Definition 2.4 (Length). The length or magnitude of a vector v = (v1, v2, . . . , vn) ∈ R is the non-negative real number |v| defined by: q 2 2 2 |v| = v1 + v2 + ... + vn, which measures the distance of the point v from the origin 0.

n From the above definition, the distance between any two point u and v in R is given by the magnitude of the vector v − u, that is:

p 2 2 2 dist(u, v) = |v − u| = (v1 − u1) + (v2 − u2) + ... + (vn − un) .

Associated to any vector v is the unit vector vˆ which is defined as the vector parallel to v with unit length. This definition is useful in denoting the direction a vector is pointing, without regards to its magnitude. More concretely, it is given by the following:

n Definition 2.5 (Unit vector). Let v ∈ R be a vector. Then, the unit vector in the direction of v is given by: 1 vˆ = v. |v| Dividing a non-zero vector by its magnitude is an operation called normalising.

4 n Proposition 2.6 (Properties of magnitude). Suppose that u, v ∈ R and λ ∈ R. Then: 1. |u| ≥ 0 with equality if and only if u = 0,

2. |λu| = |λ||u|,

3. triangle inequality: |u + v| ≤ |u| + |v| with equality if and only if u = λv for some λ ≥ 0.

4. reverse triangle inequality: |u − v| ≥ ||u| − |v||.

Proof. The first two assertions are clear. To prove the third assertion, suppose that u =

(u1, u2, . . . , un) and v = (v1, v2, . . . , vn). The inequality is trivial if v = 0. So assume that v 6= 0. Then for any real number x ∈ R, we have: n n 2 X 2 2 2 2 X 0 ≤ |u + xv| = (ui + xvi) = |u| + x |v| + 2x uivi, (1) i=1 i=1 which is a quadratic expression in x. Since the quadratic expression is non-negative, its dis- criminant must be non-positive: !2 n n X 2 2 X 2 uivi − 4|u| |v| ≤ 0 ⇒ uivi ≤ |u||v|. i=1 i=1 Hence, we compute:

n n 2 2 2 X 2 2 X 2 2 2 |u+v| = |u| +|v| +2 uivi ≤ |u| +|v| +2 uivi ≤ |u| +|v| +2|u||v| = (|u|+|v|) , (2) i=1 i=1 which, upon taking the square root on both sides, implies the desired inequality. The equality in (2) occurs exactly when both of the inequalities in (2) are equalities. The second inequality is an equality if and only if the discriminant for the quadratic expression (1) is exactly zero, that is when the quadratic expression (1) has a repeated solution, say x = µ. 2 Hence |u + µv| = 0 which implies that u = −µv for some µ ∈ R. Furthermore, the first inequality in (2) is an equality if and only if:

n n X X 2 2 uivi = uivi ⇔ −µ|u| = |µ||u| , i=1 i=1 that is exactly when µ ≤ 0. So, equality occurs if and only if u = λv for some λ ≥ 0.

Example 2.7. Using the triangle inequality or otherwise, prove the reverse triangle inequality.

The inequality is called the triangle inequality since all the magnitudes are lengths of the n three sides of a triangle in R with vertices at the points 0, u, and u + v. Geometrically, it says that the length of any side of a triangle is no bigger than the sum of the lengths of the other two sides. n n We can also define a “multiplication” operation (R , R ) → R by (u, v) = u · v which is called the (also called the scalar product or the Euclidean inner product). This product is defined in the following way:

5 Definition 2.8 (Dot product). Given two vectors u = (u1, u2, . . . , un) and v = (v1, v2, . . . , vn) n in R , the dot product u · v is defined as the scalar:

u · v = u1v1 + u2v2 + ... + unvn.

If the vectors u and v are expressed as row vectors, then the dot product can be described as the matrix multiplication u · v = uv|. Clearly, the length of a vector u is equal to the square root of the dot product of u with itself, that is: √ |u| = u · u.

n Proposition 2.9 (Properties of dot product). Suppose that u, v, w ∈ R and λ ∈ R. Then:

1. u · v = v · u,

2. u · 0 = 0 · u = 0.

3.( λu) · v = λ(u · v) = u · (λv),

4.( u + v) · w = u · w + v · w,

5. 2(u · v) = |u|2 + |v|2 − |u − v|2,

6. the Cauchy-Schwarz inequality: |u · v| ≤ |u||v| with equality if and only if u = λv for some λ ∈ R.

Proof. The first five assertions can be checked by explicit computation. The final assertion was proved in the proof for Lemma 2.6.

Apart from lengths, the dot product also measures the angle between any two vectors at their point of intersection. Concretely, we have:

u · v = |u||v| cos θ, where θ ∈ [0, π] is the smaller between the lines joining the origin to u and to v. Therefore, we can see geometrically that the dot product is the length of the projection of one vector to the other times the length of the second vector. From this expression, two vectors u and v are perpendicular if and only if u · v = 0. Thus, we emphasise that u · v = 0 does not imply that u = 0 or v = 0 as both vectors might be non-zero perpendicular vectors.

u

v θ 0 |u| cos θ

Figure 3: Dot product and its geometric interpretation.

6 Exercise 2.10. Recall the cosine rule from trigonometry: given a triangle with side lengths a, b, and c with θ denoting the angle opposite the side of length c, we have c2 = a2 + b2 − 2ab cos θ. By using this fact and assertion 5 in Proposition 2.9, prove that u · v = |u||v| cos θ.

n Exercise 2.11. Let u and v be vectors in R . Prove that there exists a unique λ ∈ R such that u − λv is perpendicular to v.

Finally, another important “multiplication” of vectors is the (also called the 3 7 vector product). This is a special product as it can only be defined in R (and also in R but 3 3 3 no other dimensions). This operation is the map (R , R ) → R defined by:

Definition 2.12 (Cross product). Given two vectors u = (u1, u2, u3) and v = (v1, v2, v3) in 3 R , the dot product u × v is defined as the vector:

u × v = (u2v3 − u3v2, −u1v3 + u3v1, u1v2 − u2v1).

However, since this expression is difficult to remember, usually the cross product can be recovered by taking the formal of the following matrix:   e1 e2 e3   u × v = det u1 u2 u3 = (u2v3 − u3v2)e1 − (u1v3 − u3v1)e2 + (u1v2 − u2v1)e3.   v1 v2 v3

In this expression, the following properties can be easily verified:

3 Proposition 2.13 (Properties of cross product). Suppose that u, v, w ∈ R and λ ∈ R. Then:

1. u × v = −(v × u), so in particular u × u = 0,

2. u × 0 = 0 × u = 0,

3.( λu) × v = λ(u × v) = u × (λv),

4.( u + v) × w = u × w + v × w,

5. the Jacobi identity: u × (v × w) + v × (w × u) + w × (u × v) = 0.

Proof. The proof for these assertions are done by explicit computation and matrix identities.

By expanding both sides of the following equation, we can prove that:

|u × v|2 = |u|2|v|2 − (u · v)2.

Therefore, we get an explicit expression for the magnitude of u × v, given by:

|u × v| = |u||v| sin θ, where θ ∈ [0, π] is the angle between the lines joining the origin to u and to v. It is easy to see that the geometric interpretation of this quantity is that |u × v| is the area of the parallelogram

7 spanned by the vectors u and v. Furthermore, from this expression, two vectors u and v are parallel if and only if u × v = 0. Thus, we emphasise that u × v = 0 does not imply that u = 0 or v = 0 as both vectors might be non-zero parallel vectors.

u |u × v|

θ v 0

Figure 4: Magnitude of a cross product and its geometric interpretation.

By straightforward computation, we can check that u · (u × v) = v · (u × v) = 0. This 3 implies that the vector u × v is perpendicular to both the vectors u and v in R . Thus the vector u × v can be described as the vector

u × v = |u × v|nˆ = |u||v| sin θnˆ, where nˆ is the unit vector perpendicular to both u and v. However, there are two possible 3 choices for nˆ and we need to pick the right one. Upon inspection with the standard basis of R , this vector nˆ is the given by the right-hand rule.

Remark 2.14. The orientation for the basis in this course is always (unless stated otherwise) the right-handed orientation: put out your right hand and stick out your thumb, index and middle fingers such that they are perpendicular to each other. The index finger points in the e1 direction, the middle finger points in the e2 direction and the thumb points in the e3 direction.

u × v

u v

Figure 5: Right-hand orientation.

Along with the dot product, we can deduce the following identities:

3 Proposition 2.15. Suppose that u, v, w, z ∈ R . Then:

1. scalar : u · (v × w) = v · (w × u) = w · (u × v),

2. vector triple product: u × (v × w) = (u · w)v − (u · v)w,

3. scalar quadruple product: (u × v) · (w × z) = (u · w)(v · z) − (u · z)(v · w).

8 The scalar triple product has a geometric interpretation. Upon expansion, we have:

|u · (v × w)| = (|u| cos θ) (|v||w| sin φ), | {z } | {z } (♠) (♣) where φ is the angle between the vectors v and w and θ is the smaller angle between the vectors u and nˆ. The term (♣) is the area of the parallelogram spanned by the vectors v and w whereas the term (♠) is the perpendicular height of the parallelepiped described by the vectors u, v, and w with base spanned by v and w. Thus, the whole expression gives us the volume of the parallelepiped spanned by the vectors u, v, and w.

u

θ w

φ 0 v

Figure 6: Magnitude of scalar triple product |u · (v × w)| is the volume of the parallelepiped.

Explicitly, if u = (u1, u2, u3), v = (v1, v2, v3), and w = (w1, w2, w3), we can compute the scalar triple product via the determinant of the following matrix:   u1 u2 u3   u · (v × w) = det v1 v2 v3  .   w1 w2 w3

2.3 Lines and Planes in Rn

n We begin this section by giving a general form of lines in the Euclidean space R . n Definition 2.16 (Parametric form of a line). A line in R through the point p which is parallel to the vector a is the set:

n L = {v ∈ R : v = p + λa for λ ∈ R}. The parameter λ in the expression above uniquely describes each point on the line L.

L p + 2a p

a 0

n Figure 7: The line L = {v ∈ R : v = p + λa for λ ∈ R}.

9 2 Indeed, one can check that this is equivalent to the line equation y = mx + c in R that we know from high school. In parametric form, we have v = (x, y), p = (p1, p2), and a = (a1, a2). For each point (x, y) on the line L, we have:  x = p1 + λa1, (x, y) = (p1, p2) + λ(a1, a2) ⇒ . y = p2 + λa2

By eliminating the variable λ, we obtain the equation a2x − p1 = a1y − p2 which can be n rearranged to be in the form y = mx + c if a1 6= 0. However, in higher dimensions say R , lines are described by a system of n − 1 equations, so it is easier to write them down in the parametric form.

3 Example 2.17. Let r = (1, 0, 0), and s = (1, 2, 1) be two points in R . We wish to find the parametric form of the line that passes through these two points. We first need to find the direction the line is pointing. This can be computed by finding the direction of the point s from the point r, that is s − r = (1, 2, 1) − (1, 0, 0) = (0, 2, 1). Therefore on the line, starting at the point r, we can only move in the directions of (0, 2, 1), thus the parametric form of the plane containing the two points r and s is given by:

3 {v ∈ R : v = (1, 0, 0) + λ(0, 2, 1) for λ ∈ R}.

n n n Exercise 2.18. Let L = {v ∈ R : v = p + λa for λ ∈ R} be a line in R and q ∈ R a fixed point. Find the point u ∈ L that has the closest distance to q and find this distance.

n Analogously, a plane in R can also be described in a similar form. Instead of just one degree of freedom in a line, we have two directions we can move in on plane. However, these directions must be described by vectors which are not scalar multiples of each other (otherwise, the plane equation just describes a line equation).

n Definition 2.19 (Parametric form of a plane). A plane in R for n ≥ 2 through the point p which is parallel to the vectors a and b such that a 6= νb for any ν ∈ R is the set:

n P = {v ∈ R : v = p + λa + µb for λ, µ ∈ R}. (3)

The parameters λ and µ in the expression above uniquely describes each point on the plane P .

p + 2a + 2b p + 2a p + 2b P

p

a b 0

n Figure 8: The plane P = {v ∈ R : v = p + λa + µb for λ, µ ∈ R}.

10 3 2 From now on, we are going to focus our attention to the Euclidean space R and R . Usually, these coordinates are represented by the letters x, y, and z. Therefore, any point 3 v ∈ R is represented by the list (x, y, z) with respect to the standard basis. In this space, the coordinates of points on a line are described by a system of two equations and the coordinates of points on a planes are described by just one equation:

Exercise 2.20. By eliminating the variable λ as before, compute the two equations that de- 3 3 scribe the line L = {v ∈ R : v = p + λa for λ ∈ R} in R . Care must be taken for the various zero components of the vector a. What is the system of equations of the line constructed in Example 2.17?

3 3 Proposition 2.21 (Equation of a plane in R ). The point v lies in the plane P = {v ∈ R : 3 v = p + λa + µb for some λ, µ ∈ R} if and only if it lies in the set {v ∈ R : v · n = p · n} where n is a vector perpendicular to both a and b.

Proof. We prove that these sets are equal by showing double inclusion. (⊆): this assertion is clearly true since any v ∈ P would be of the form p + λa + µb for some λ, µ ∈ R. Then, taking the dot product with a vector n perpendicular to both a and b, we get v · n = (p + λa + µb) · n = p · n. 3 (⊇): Pick a point in the set {v ∈ R : v · n = p · n} then (v − p) · n = 0 which implies that (v − p) is perpendicular to n. Since a and b are both perpendicular to n and {n, a, b} spans 3 R , we have v − p ∈ Span(a, b). Therefore, v − p = λa + µb for some λ, µ ∈ R and thus v ∈ P .

v + n

v P p n v − p

0

Figure 9: Any point v ∈ P satisfies (v − p) · n = 0 where n is a normal vector to P .

The vector n is called the normal vector to the plane P . One convenient choice of the vector n 3 is the cross product a×b. Since any plane can be described by the set P = {v ∈ R : v·n = p·n} 3 for some fixed p, n ∈ R , then the points v = (x, y, z) in this plane satisfies the equation

xn1 + yn2 + zn3 = c, for some fixed constant c = n · p ∈ R. This is called the equation of a plane.

11 3 Example 2.22. Let q = (1, 1, 1), r = (1, 2, 0), and s = (−1, 2, 1) be three points in R . We wish to find the equation of the plane that contains all three of these points. For the parametric form, we can choose the directions a and b in (3) to be a = r−q = (0, 1, −1) and b = s−q = (−2, 1, 0). Therefore on the plane, starting at the point q, we can only move in the directions of a and b, thus the parametric form of the plane containing the three points q, r, and s is given by:

3 {v ∈ R : v = q + λ(0, 1, −1) + µ(−2, 1, 0) for λ, µ ∈ R}.

To find the equation of the plane, we need to find a vector n normal to the plane. As mentioned before, a convenient choice of n would be n = a × b = (1, 2, 2). Therefore, the equation of the plane containing q, r, and s is:

(x, y, z) · (1, 2, 2) = q · (1, 2, 2) = (1, 1, 1) · (1, 2, 2) ⇒ x + 2y + 2z = 5.

Note that the choice of point p in the equation v · n = p · n is not unique! You can choose any point that lies on the plane P . We chose the point q = (1, 1, 1) in the computations above, but choosing the point r or s would yield the exact same equation.

Exercise 2.23. Show that the shortest distance from point q from the plane P described by the equation v · n = p · n is given by: |(q − p) · n| dist(q,P ) = . |n|

3 Quadric Surfaces

3 In the previous section, we have seen lines and planes in R , which are linear objects. They are called linear objects since the equation that describe them are linear, that is the highest power of the variables x, y, and z in the equation is one: Exercise 2.20 shows that the line can be described by a set of two linear equations whilst the plane can be described as the equation xn1 + xn2 + xn3 = c, another linear equation. 3 The next geometrical object in line is the quadrics, which are objects (or locus) in R described by quadratic equations, that is equations of the form:

ax2 + by2 + cz2 + 2dxy + 2exz + 2fyz + gx + hy + iz + j = 0, where a, b, c, d, e, f, g, h, i, j ∈ R. In matrix form, this can be described as the matrix equation:       a d e x x           x y z d b f y + g h i y = j ⇔ xMx| + mx| = j.       e f c z z

By a change of coordinates (x, y, z) → (X,Y,Z) via diagonalisation of the matrix M (can always be done since M is a symmetric matrix and the resulting diagonal matrix is a real matrix) and/or completion of squares, one can reduce a quadric equation to one of 17 standard forms, nine of which are “true” quadrics surfaces, which are given in the table below:

12 Name Standard form Picture

X2 Y 2 Z2 Ellipsoid a2 + b2 + c2 = 1

X2 Y 2 Elliptic paraboloid a2 + b2 − Z = 0

X2 Y 2 Hyperbolic paraboloid a2 − b2 − Z = 0

X2 Y 2 Z2 Elliptic hyperboloid (one sheet) a2 + b2 − c2 = 1

X2 Y 2 Z2 Elliptic hyperboloid (two sheets) a2 + b2 − c2 = −1

X2 Y 2 Z2 Elliptic cone a2 + b2 − c2 = 0

X2 Y 2 Elliptic cylinder a2 + b2 = 1

X2 Y 2 Hyperbolic cylinder a2 − b2 = 1

Parabolic cylinder X2 + 2aY = 0

13 3 Exercise 3.1. For each of the quadric surface in the table above, sketch their locus in R . Hint: For each fixed value of Z, deduce the locus of by the remaining coordinates.

The following examples show some techniques available for classifying quadric surfaces:

Example 3.2. Let us first identify the quadric surface defined by the equation x2 + 2y2 + z2 − 4x + 4y − 2z + 3 = 0. Note that the x, y, and z coordinates are not coupled in this equation, therefore this can be simplified into one of the standard forms by completing the squares. Let us collect the terms with similar variables and complete the squares for each of them:

x2 − 4x + 2y2 + 4y + z2 − 2z = −3, (x2 − 4x + 4) − 4 + 2(y2 + 2y + 1) − 2 + (z2 − 2z + 1) − 1 = −3, (x + 2)2 + 2(y + 1)2 + (z − 1)2 = 4, and thus, by comparing this equation with the equations in the table above, it is an ellipsoid centred at (−2, −1, 1).

Example 3.3. Let us now identify the quadric surface defined by the equation 5x2 + 2y2 − √ 7z2 + 12yz + 5x + 5z = 15. This is a bit more complicated since the coordinate variables are coupled. Let us put this equation in matrix form to first decouple the variables:       5 0 0 x x        √    x y z 0 2 6  y + 5 0 5 y = 15. (4)       0 6 −7 z z

In order to decouple the variables, we have to diagonalise the 3 × 3 matrix in the equation by a change of variables. From linear algebra, we know that this matrix can be diagonalised as follows. The eigenvalues of the matrix are −10, 5, and 5 with orthonormal eigenvectors √1 (0, −1, 2)|, √1 (0, 2, 1)|, and (1, 0, 0)| respectively. In other words, we have: 5 5         0 0 1 0 √−1 √2 5 0 0 −10 0 0 5 5    −1 2      0 2 6  =  √ √ 0  0 5 0 0 √2 √1  .    5 5     5 5  0 6 −7 √2 √1 0 0 0 5 5 5 1 0 0 If we substitute this in equation (4), we get:           0 0 1 0 √−1 √2 −10 0 0 5 5 x x    −1 2         √    x y z  √ √ 0  0 5 0 0 √2 √1  y + 5 0 5 y = 15.  5 5     5 5      √2 √1 0 0 0 5 z z 5 5 1 0 0 Now we carry out a change of variables from (x, y, z) to (u, v, w) as follows:       0 √−1 √2 5 5 x u       0 √2 √1  y = v  .  5 5      1 0 0 z w

14 Hence, the equation now reads:         −10 0 0 u 0 0 1 u        √   −1 2    u v w  0 5 0 v  + 5 0 5  √ √ 0 v  = 15,      5 5    0 0 5 w √2 √1 0 w 5 5 which, upon expansion, is the following equation: −10u2 + 5v2 + 5w2 + 2u + v + 5w = 15. Now that the variables are decoupled, we can complete the squares to get:

 1 2  1 2  12 81 −10 u − + 5 v + + 5 w + = , 10 10 2 5 implying that the quadric surface is a one-sheeted hyperboloid.

3.1 Curvilinear Coordinates

Sometimes, some choice of coordinates other than the Cartesian coordinate would be more efficient when we are describing surfaces. Two of the most important coordinate choice in 3-dimensional spaces would be the spherical and the cylindrical coordinates. Recall that in 2-dimensional spaces, we can describe any point on the (x, y) Cartesian planes using the polar coordinates (r, θ) defined as follows:

Definition 3.4 (Polar coordinates in 2-dimensional spaces). Given the standard Euclidean 2 2 basis in R given by {e1, e2}, then any point x ∈ R \{0} can be described uniquely by the pair of real numbers x = (x, y) = xe1 + ye2 not both equal to 0. The point v can also be described uniquely by the pair of numbers (r, θ) = rer + θeθ in the new basis {er, eθ} where r > 0 and θ ∈ (−π, π] defined by x = r cos θ and y = r sin θ.

In the polar coordinate system, the quantity r is the distance of the point x from the origin in the original space whereas the quantity θ is the directed angle the vector x makes with the standard basis vector e1. This choice of coordinates is convenient to use whenever we deal with rotations around a fixed point or working with circles.

y er eθ

r x θ x

Figure 10: The components of the point x with respect to the polar coordinates and its new basis vectors er and eθ.

The new basis vector er is the radial unit vector pointing to the point x in the (x, y) plane whereas the eθ vector is the vector perpendicular to it pointing in the anticlockwise direction.

15 A thing to note here is that the basis vectors change according to the point, hence the name curvilinear coordinates. The new basis vectors are given as: x x y er = = e1 + e2 = cos θe1 + sin θe2, |x| px2 + y2 px2 + y2

eθ = − sin θe1 + cos θe2.

Remark 3.5. Be careful when you are doing change of coordinates. In the old coordinates, we write the point as (x, y) whilst in the new coordinates we write (r, θ). Without specifying the basis vectors, one might get confused while doing lengthy calculations. For example, does the point (2, 1) mean 2e1 + 1e2 or 2er + 1eθ? Therefore, a safe practice is is to write a vector down in the expansion with respect to the basis if a change of coordinate is involved! This advice also applies for the rest of this section.

The polar coordinate system extends the cylindrical coordinate system to 3-dimensional spaces by adding the extra z dimension perpendicular to the xy-plane:

Definition 3.6 (Cylindrical coordinates in 3-dimensional spaces). Given the standard Eu- 3 3 clidean basis in R given by {e1, e2, e3}, then any point x ∈ R \{λe3 : λ ∈ R} can be described uniquely by the triple of real numbers x = (x, y, z) = xe1 +ye2 +ze3 not both x and y equal to 0.

The point x can also be described uniquely by the triple of numbers (r, θ, h) = rer +θeθ +heh in the new basis {er, eθ, eh} where r > 0, θ ∈ (−π, π] and h ∈ R defined by x = r cos θ, y = r sin θ, and z = h. eh = ez.

z

z eh eθ x

er

y x θ r

Figure 11: The components of the point x with respect to the cylindrical coordinates and its new basis vectors er, eθ, and eh.

In this coordinate system, one can view the quantity r as the perpendicular distance of a point x to the z-axis and θ is the angle between the perpendicular line joining the point x with the z-axis and the x-axis. Therefore, this coordinate system is convenient when describing cylinders (hence the name), cones, and any rotationally symmetric objects.

The new basis vector er is the radial unit vector pointing to the point x projected onto the

(x, y). The vector eh is the same as e3 and the vector eθ is the unit vector perpendicular to both er and eh defined as eθ = eh × er. One can also define this by simply adding eh basis vector to the er and eθ basis vectors as we have defined for polar coordinates. Similar as before,

16 the basis vectors change according to the point, hence the name curvilinear coordinates. The new basis vectors are given as: x y er = e1 + e2 = cos θe1 + sin θe2, px2 + y2 px2 + y2

eh = e3,

eθ = − sin θe1 + cos θe2.

In fact, the objects which are rotationally symmetric about an axis has a special name:

Definition 3.7 (Surface of revolution). Let ρ : R → R. Then the surface that satisfies the equation x2 + y2 = ρ(z)2 is called a surface of revolution about the z-axis with radius ρ(z). Analogous surface of revolutions about the x-axis and y-axis can also be defined.

Exercise 3.8. Find the equation of the surface of revolution about the z-axis in terms of the cylindrical coordinate.

The final useful coordinate system that we are going to introduce here is the spherical coordinate system:

Definition 3.9 (Spherical coordinates in 3-dimensional spaces). Given the standard Euclidean 3 3 basis in R given by {e1, e2, e3}, then any point x ∈ R \{λe3 : λ ∈ R} can be described uniquely by the triple of real numbers x = (x, y, z) = xe1 + ye2 + ze3 not both x and y equal to 0. The point x can also be described uniquely by the triple of numbers (r, θ, φ = rer +θeθ +φeφ) where r > 0, θ ∈ (−π, π] and φ ∈ (0, π) defined by x = r sin φ cos θ, y = r sin φ sin θ, and z = r cos φ.

In this coordinate system, one can view the quantity r as distance of a point x from the origin 0 and θ is the angle between the perpendicular line joining the point x with the z-axis with the x-axis. Finally, the quantity φ is the angle between the line joining the point x with the origin 0 and the z-axis. The new basis vectors are given as: x x y z e = = e + e + e = sin φ cos θe + sin φ sin θe + cos θe , r |x| r 1 r 2 r 3 1 2 3

eθ = − sin θe1 + cos θe2,

eφ = − cos θ cos φe1 + sin θ cos φe2 − sin φe3.

z

er eθ φ x r eφ x θ y

Figure 12: The components of the point x with respect to the spherical coordinates and its new basis vectors er, eθ, and eφ.

17 4 Vector-Valued Functions

Now that we are familiar with the concept and ideas of vectors, let us look at an application. An obvious application of vectors is to map the position of a point particle in 3-dimensional space, for example when we want to chart the motion of particles. If the particle moves around as time varies, we have a map between the time variable and the space variable. This can be realised as a function from I ⊂ R which is the time interval 3 to R , which is the position of the particle. In the standard basis of R, we can write down the coordinates of the particle at time t ∈ I as v(t) = (v1(t), v2(t), v3(t)) = v1(t)e1 +v2(t)e2 +v3(t)e3 3 where vi(t) are scalar functions of time. Abstractly, the path traced out by the particle in R is called a curve:

3 3 Definition 4.1 (Curve). A curve γ ⊂ R is the image of a map v : I → R where I ⊂ R is 3 an interval such that the map v is continuous, that is the coordinates in some basis of R are continuous functions of I.

Curves themselves are interesting objects to study in mathematics. Geometric concepts like curvatures and embedding of curves is an active area of research in differential geometry.

3 Example 4.2. Suppose that a particle’s position in R at time t ≥ 0 is given by v(t) = (cos(t), sin(t), t) with respect to the standard basis. Then the path traced out by the particle is called a helix or a coil.

4.1 Derivatives of Vectors

Analogous to motions in 1-dimension, once we have the position of a particle at time t, we can also determine its velocity and acceleration. Note that since we are working in 3-dimensional space, the velocity would also have a direction. Similar to one-dimensional calculus, we define the derivative as follows:

3 Definition 4.3 (Derivative of a vector valued function). Let v : I → R be a vector valued function from an interval of R. Let the standard basis representation of this vector be v(t) = 0 (v1(t), v2(t), v3(t)). The derivative v (t0) of the function v(t) at the point t0 ∈ I is given by:

0 v(t0 + h) − v(t0) v (t0) = lim h→0 h  v (t + h) − v (t ) v (t + h) − v (t ) v (t + h) − v (t ) = lim 1 0 1 0 , lim 2 0 2 0 , lim 3 0 3 0 , h→0 h h→0 h h→0 h

0 if the limits are defined. If the derivative v (t0) is defined for all t0 ∈ I, then the function v is called differentiable.

Remark 4.4. Sometimes, other notations are also used in the derivative of the vector v(t), d such as dt v(t),Dv(t) or v˙ (t). The latter, introduced by Newton, is used mainly in mechanics and kinematics.

18 We are now going to restrict our attention to a nice family of curves, called regular curves:

3 Definition 4.5 (Regular curve). A curve γ ⊂ R is a regular curve if its parametrisation v(t) has non-vanishing derivatives, that is for all t ∈ I, v0(t) 6= 0.

0 In fact, by construction, the vector v (t0) is parallel to the tangent of the curve v(t) at the point v(t0). Therefore, we define the tangent line:

3 Definition 4.6 (Tangent line to a curve). Let v : I → R be a vector valued function from an 3 interval of R and γ ⊂ R be the regular curve defined by v. The tangent line L of the curve γ at the point q = v(t0) is the line which is given parametrically by:

3 0 Lq = {u ∈ R : u = q + λv (t0) for λ ∈ R}.

Lq

v(t0) = q 0 v(t0) + v (t0)

v(t0 + h) 0 0 γ = Image(v) v (t0)

v(t0 + h) − v(t0)

0 Figure 13: The curve γ of a vector-valued function v(t) and its derivative v (t0) at the point v(t0). The tangent line of the curve defined by v(t) at the point v(t0) is denoted by Lq.

Example 4.7. Recall the particle in Example 4.2 whose position at time t ≥ 0 is v(t) = (cos(t), sin(t), t) with respect to the standard basis. Then the velocity of the particle at time t > 0 is given by the vector: v0(t) = (− sin(t), cos(t), 1).

The length of the velocity vector |v0(t)| is called speed. Speed, unlike velocity, is a scalar quantity. The unit vector vˆ0(t) is called the unit tangent vector at the point v(t).

3 Definition 4.8 (Unit tangent vector). Let v : I → R be a vector valued function from an 0 interval of R defining a regular curve γ. The unit tangent vector vˆ (t) (also denoted by T(t)) at the point v(t) is given by: v0(t) T(t) = . |v0(t)| Here are some results regarding derivatives of vector valued functions:

3 Proposition 4.9. Let u, v : I → R be differentiable vector-valued functions, f : I → R is a differentiable real-valued function, and λ ∈ R. Then:

19 d d 1. dt (λv(t)) = λ dt v(t),

d d d 2. dt (u(t) ± v(t)) = dt u(t) ± dt v(t),

d df d 3. dt (f(t)v(t)) = dt v(t) + f(t) dt v(t),

d d d 4. dt (u(t) · v(t)) = dt u(t) · v(t) + u(t) · dt v(t),

d d d 5. dt (u(t) × v(t)) = dt u(t) × v(t) + u(t) × dt v(t).

6. if g : J → I is a differentiable function between intervals J and I in R, then we have d 0 dg dt (v(g(t))) = v (g(t)) dt .

Proof. The first two assertions are clear. The proofs of the rest are done by expressing the vectors in coordinates and applying the scalar product rule or chain rule component-wise.

Exercise 4.10. By using chain rule or otherwise, find the derivative of the function |v(t)| for 3 v : I → R . Hence show that the length of the non-zero vector v(t) is constant for all time if and only if v(t) and v0(t) are perpendicular for all time. Hint: Write the function |v(t)| = pv(t) · v(t) as a composition of two real-valued functions.

We can also take higher derivatives of vector-valued function in an analogous way. Since the first derivative of the function of position with respect to time is the velocity of the particle, the second derivative of the position vector represents the acceleration of the particle:

Position: v(t), Velocity: v0(t), Acceleration: v00(t).

Other physical quantities of interest in the study of mechanics of objects with constant mass m > 0 are momentum and force, defined by:

Momentum: p(t) = mv0(t), d Force: F(t) = p(t) = mv00(t). dt Example 4.11. Recall the particle in Example 4.2 whose position at time t ≥ 0 is v(t) = (cos(t), sin(t), t) with respect to the standard basis. We have shown that it moves with velocity v0(t) = (− sin(t), cos(t), 1). The acceleration is then:

v00(t) = (− cos(t), sin(t), 0).

Finally, for a tangent vector, we want to find a vector perpendicular to it. But since we are in 3 dimensions, there are infinitely many vectors perpendicular to this tangent vector, so there is no preferred perpendicular vector. However, there is a special choice of vector which is called the principal unit normal vector:

20 3 Definition 4.12 (Principal unit normal vector). Let v : I → R be a vector valued function from an interval of R. The principal unit normal vector N(t) is the vector defined by: T0(t) N(t) = , |T0(t)|

3 where T : I → R is the unit tangent vector.

This principal unit normal vector is perpendicular to the tangent vector at any point on the curve. Indeed, since the unit tangent vector satisfies |T(t)|2 = 1 for all t ∈ I. Hence, taking the derivative with respect to t on both sides, for all t ∈ I, we have:

2T(t) · T0(t) = 0 ⇒ T(t) · N(t) = 0.

v(t0)

γ = Image(v) 0

T(t0)

N(t0)

Figure 14: The curve γ of a vector-valued function v(t) and unit tangent vector T(t0) and principal unit normal vector N(t0) at the point v(t0).

The principal unit vector appears in the study of mechanics of particles. Recall that the 3 3 velocity of a particle whose position in R is given by the function v : I → R . Then the velocity of the particle lies tangential to the curve defined by v. The acceleration however, does not necessarily lie tangential to the curve. However, the acceleration can be decomposed into two components: the component tangential to the curve and the component in the direction of the principal unit normal. Indeed, the velocity vector can be written as v0(t) = |v0(t)|T0(t) and taking the derivative on both sides of the equation yields: d d v00(t) = |v0(t)|T(t) + |v0(t)|T0(t) = |v0(t)|T(t) + |v0(t)||T0(t)|N(t) = a T(t) + a N(t), dt dt T N Thus, we have proven that the acceleration vector can be decomposed into two perpendicular vectors in the tangential direction and the principal normal direction:

3 Proposition 4.13. Let v : I → R be the function of position of a particle at the time in an interval I ⊂ R. Then, the acceleration of the particle at time t ∈ I can be decomposed as:

00 v (t) = aTT(t) + aNN(t),

d 0 where T(t) and N(t) are the unit tangent and principal unit normal vectors, aT = dt |v (t)|, 0 0 and aN = |v (t)||T (t)|.

21 Recall that the force acting on a body is proportional to its acceleration. Therefore, an accelerating particle travelling in a curve would feel a force tangential to its direction of motion and in the principal normal direction. For objects moving in a circle, the force felt in the principal normal direction is called the centripetal force.

Exercise 4.14. Prove that the tangential component and the principal normal component of the acceleration aT and aN are given by the following formula: v0 · v00 |v0 × v00| a = and a = . T |v0| N |v0|

4.2 Length of Curves

Another geometric quantity that we can compute from the derivatives is the length of the arc 2 2 v(t) in R . Suppose that the particle moves in the space R and its position is given by the 2 0 function v :[a, b] → R . The velocity of the curve is then given by the vector v (t) and its speed is given by |v0(t)|. If we “sum” up its speed over the time interval [a, b], we would get the distance travelled by the particle. Another more mathematical interpretation is the following: suppose that we have a curve 2 γ ∈ R . In order to make this curve explicit, we let v(t) = (x(t), y(t)) be a parametrisation of the path γ for the variable t varying in an interval I = [a, b] ⊂ R. We wish to express the infinitesimal length ds of the curve in terms of t so we can compute the integral over the variable t as we have in the one-dimensional integral case.

y

v(t) v0(t) δy δs v(t + δt) δx v

γ = Image(v) x a t t + δt b

Figure 15: Limit of infinitesimal arclength component.

In the diagram above, between the points parametrised by t and t + δt for some small δt, the change in x is given by δx and the change in y is given is given by δy. Therefore the change in arclength can be approximated by δs = p(δx)2 + (δy)2, which then implies that: s δs δx2 δy 2 = + . δt δt δt

δx dx δy dy δs dσ As δt → 0, for smooth curves we have the limits δt → dt , δt → dt , and δt → dt , hence: p ds = (x0(t))2 + (y0(t))2 dt.

22 Pulling the integral (14) back to the domain I = [a, b], we get: ¢ ¢ ¢ b p b ds = (x0(t))2 + (y0(t))2 dt, = |v0(t)| dt γ a a which is an integral that we know how to evaluate. In general, for a curve in 3-dimensional space, we have:

3 Definition 4.15 (Arclength). Let γ ⊂ R be a regular curve represented by the function 3 v :[a, b] → R for t ∈ [a, b] and differentiable for t ∈ (a, b). By denoting the vector v(t) = (x(t), y(t), z(t)) with respect to the standard basis, the arclength L(γ) of the curve γ from t = a to t = b is given by: ¢ ¢ b b p L(γ) = |v0(t)| dt = x0(t)2 + y0(t)2 + z0(t)2 dt. a a Example 4.16. Recall the particle from Example 4.2 whose position is given by the vec- tor v(t) = (cos(t), sin(t), t) for t ≥ 0. We have computed its velocity vector to be v0(t) = (− sin(t), cos(t), 1). Hence the distance it travelled in 2 units of time, which is the length of a curve γ, can be computed as: ¢ ¢ ¢ 2 2 q 2 √ √ L(γ) = |v0(t)| dt = sin2(t) + cos2(t) + 1 dt = 2 dt = 2 2 units. 0 0 0 We can also calculate the distance travelled after an arbitrary time t via the arclength 3 function. Given a curve v : I → R in the 3-dimensional space, the arclength function s : I → R≥0 is the function that measures the length of the curve along the parameter t. Definition 4.17 (Arclength function). Let I = [a, b]. The arclength function of the vector 3 v : I → R is the map s : I → R≥0 defined as: ¢ ¢ t t p s(t) = |v0(u)| du = x0(u)2 + y0(u)2 + z0(u)2 du. a a If we view v(t) as the position vector of a particle and t as the time, then s(t) measures the distance the particle have travelled after time t.

Example 4.18. Recall the particle from Examples 4.2 and 4.16 whose position is given by the vector v(t) = (cos(t), sin(t), t) for t ≥ 0. We have computed its velocity vector to be v0(t) = (− sin(t), cos(t), 1). Hence, after time t, it would have travelled: ¢ ¢ ¢ t t q t √ √ s(t) = |v0(u)| du = sin2(u) + cos2(u) + 1 du = 2 du = 2t units. 0 0 0

4.3 Parametrisation of Curves

The length of a curve is invariant with respect to the parametrisation. What does this mean? 3 For demonstration, consider the following two curves in R :

v(t) = (cos(t), sin(t), t) for t ∈ [0, 2], u(s) = (cos(2s), sin(2s), 2s) for s ∈ [0, 1].

23 3 If we plot this curve in R , they trace out the exact same curve. However, they are parametrised differently. We have computed the length of the curve v(t) in Example 4.16. As an exercise, show that the curve u(s) for s ∈ [0, 1] also has the same length. This is because the curve is a geometric object: it is independent of parametrisations. Just like how a specific point in vector space can be described by various array of numbers for different choices of basis!

3 Definition 4.19 (Reparametrisation of curves). Let γ ⊂ R be a smooth regular curve rep- 3 resented by the function u :[a, b] → R . Let α :[c, d] → [a, b] be a smooth bijection between 3 the intervals [c, d] and [a, b]. Then the curve v :[c, d] → R defined by v(s) = u(α(s)) is a reparametrisation of the curve γ with respect to the parameter s ∈ [c, d].

α u 3 [c, d] [a, b] γ ⊂ R

v=u◦α

The resulting function v is called the pullback of the function u by the function α. We shall now show that the length of a curve γ is independent of parametrisation:

Proposition 4.20. The length of a smooth regular curve γ is independent of parametrisation.

3 3 Proof. Let u :[a, b] → R and v :[c, d] → R be two distinct parametrisations of the curve γ. By definition, there is a smooth bijection α between the two intervals [a, b] and [c, d] such that v(s) = u(α(s)). Without loss of generality, assume that α is an increasing function, so that we dα have ds ≥ 0. The arclength of γ computed with respect to the parametrisation u is given by: ¢ b 0 Lu(γ) = |u (t)| dt, a whereas the arclength computed with respect to the parametrisation v is: ¢ d 0 Lv(γ) = |v (s)| ds. c Our aim is to show that they are equal. By using chain rule on v(s) = u(α(s)), we compute: ¢ ¢ ¢ ¢ d d d d d d 0 dα 0 dα Lv(γ) = v(s) ds = u(α(s)) ds = u (α(s)) ds = |u (α(s))| ds, c ds c ds c ds c ds

dα However, note that t = α(s) and hence dt = ds ds. Since the function α is increasing, we have a = α(c) and b = α(d). Therefore, by change of variables, the above expression becomes: ¢ b 0 Lv(γ) = |u (t)| dt = Lu, a which proves our assertion.

Since there are so many different parametrisation of a given curve, which parametrisation is the best? For motion with respect to time, an obvious parameter is, of course, time since time

24 itself has a physical meaning for motions. However, for geometric objects, for example when we study curves, what is the preferred parameter? The most natural parametrisation for geometric objects is given by the arclength parametri- sation. The arclength parameter is a preferred parameter because the parameter denotes the distance one has travelled along the curve from the initial point.

3 Definition 4.21 (Arclength parametrisation). A regular curve γ ⊂ R described by the function 0 v : [0,L] → R is said to be parametrised by arclength if |v (s)| = 1 for all s ∈ [0,L]. Note that the domain of the curve is [0,L] where L is the total length of the curve. In terms of motion, the point particle is said to be moving in unit speed since the velocity vector has unit length for all time (which is not true for all motions, obviously). Then, one can easily compute the distance it has travelled after time t ∈ [0,L] since: ¢ ¢ t t |v0(s)| ds = ds = t, 0 0 which clearly shows why this is a preferred parametrisation for a curve γ the length of the curve from the initial point to the point parametrised by s is s itself! 3 3 Now comes a problem, given any parametrisation u : I → R of a curve γ ⊂ R such that 3 I = [a, b], how do we reparametrise it in terms of arclength parametrisation as v : [0,L] → R ? Recall the arclength function s(t) in Definition 4.17. This measures the distance of the curve s(t) at the original parameter t and s varies from 0 to L, therefore this is a candidate for the arclength parameter. Since the function s is an increasing function of t, we can invert the function to get t(s).

t(s) u 3 [0,L] [a, b] γ ⊂ R

v=u(t(s)) Hence:

d 0 dt u(t(s)) = u (t(s)) , (5) ds ds and by applying the fundamental theorem of calculus to the arclength function s(t), we have: ¢ t ds s(t) = |u0(x)| dx ⇒ = |u0(t)|. 0 dt Hence, substituting this in equation (5), we get:

d 0 1 u(t(s)) = u (t(s)) = 1. ds |u0(t)| So if we define v(s) = u(t(s)), we have |v0(s)| = 1 for all s ∈ [0,L]. So this s is the arclength parameter. The parameter s can be obtained explicitly as a function of t (or vice versa) by solving the differential equation: ds = |u0(t)| with s(a) = 0. dt From now on, we always denote the arclength parameter with the variable s.

25 Example 4.22. Recall from Example 4.2 that a helix is a curve γ that can be parametrised √ 0 p 2 2 by v(t) = (cos(t), sin(t), t) for t ∈ R≥0. Since |v (t)| = sin (t) + cos (t) + 1 = 2, this parametrisation is not an arclength parametrisation. To convert the parameter t to arclength parameter s, we solve: ds √ √ = |v0(t)| = 2 with s(0) = 0 ⇒ s = 2t. dt   Thus, if we define w(s) = v √s , we get a new parametrisation of the curve γ given by 2       w(s) = cos √s , sin √s , √s . We check: 2 2 2

 1  s  1  s  1  w0(s) = −√ sin √ , √ cos √ , √ ⇒ |w0(s)| = 1, 2 2 2 2 2

3 which implies that the curve γ is indeed parametrised by arclength via w : [0,L] → R .

We are now going to look at some applications in geometry. The important quantities of study in geometry are lengths, angles, and curvature. We have seen how lengths of arcs are calculated and how angles between two vectors are computed using dot products. Now we are going to define the quantity curvature. 3 Suppose that we have a curve γ in R . The curvature measures how much the unit tangent vector to the curve moves with respect to the parameter. If the unit tangent vector is given by v0(t) T(t) = |v0(t)| , the curvature κ(t) of the curve at the point v(t) is defined as the magnitude of the change of this vector, that is:

1 d κ(t) = T(t) |v0(t)| dt Exercise 4.23. Using Exercise 4.10 and vector triple product in Proposition 2.15, prove that:

|v0(t) × (v00(t) × v0(t))| κ(t) = . |v0(t)|4

Hence or otherwise, show that: |v00(t) × v0(t)| κ(t) = . |v0(t)|3 One thing to note here that if we have parametrised the curve γ by arclength by some 3 0 function w : [0,L] → R , then since we know that |w (s)| = 1, then the curvature formula simplifies significantly to κ(s) = |w00(s)|. This is another reason why when we are working with geometry, using the arclength parametrisation is the best way to go!

3 3 Example 4.24. Recall that for a particle moving in R given by v : I → R as a function of time, we have decomposed it into tangential and principal normal directions as:

v0 · v00 |v0 × v00| v00(t) = T(t) + N(t). |v0| |v0|

26 Then the curvature also comes into the equation since:

|v0 × v00| a = = κ(t)|v0|2. N |v0|

Thus, we see that the tighter the curve of motion of the particle, the bigger the acceleration (and hence, force) acted towards the particle in the principal normal direction.

2 Exercise 4.25. Consider a function f : R → R. This function gives a graph y = f(x) in R . 3 Since the cross product is only defined in R , we need to extend the space in which the curve 2 3 3 3 lies from R to R . By defining the curve (x, f(x), 0) in R using the standard basis on R , show that the curvature of the graph y = f(x) at the point (x, f(x)) is given by:

|f 00(x)| κ(x) = 3 . |1 + (y0)2| 2

2 In general, show that a parametric curve v(t) = (x(t), y(t)) in R has curvature κ(t) at point (x(t), y(t)) given by: |x0y00 − y0x00| κ(t) = 3 . |(x0)2 + (y0)2| 2

2 Exercise 4.26. Describe a parametrisation of a circle in R with radius r centered at (x0, y0). Find a parametrisation by arclength of this circle. Hence, show that the curvature at any point 1 on the circle is r .

Remark 4.27. The geometric interpretation of the curvature quantity for a curve γ at some 1 point v(t) ∈ γ in 2-dimensions is that it is equal to r where r is the radius of the best circle that approximates the curve at the point v. This circle is called the osculating circle at v. Thus, the more tightly wound the curve, the higher the curvature. As the curve straightens, the best approximating circle gets bigger, whilst the inverse of its radius gets smaller. Hence in the limiting case we can see why straight lines have zero curvature: the best approximating circle to the line has infinite radius!

γ v

u

2 Figure 16: The osculating circle of the curve γ ⊂ R at v has a bigger radius than the osculating circle at u. Thus the curvature of γ at u is greater than the curvature at v.

27 4.4 Integration of Vectors

We can also integrate vectors. As with differentiation, integration are done component-wise. If 3 we have a vector valued function v : I → R where I ⊂ R, then the indefinite integral of this vector is given by: ¢ ¢ ¢ ¢ ¢   v(t) dt = (v1(t), v2(t), v3(t)) dt = v1(t) dt + c1, v2(t) dt + c2, v3(t) dt + c3 , for some constants ci ∈ R and the definite integral is given by: ¢ ¢ ¢ ¢ ¢ b b  b b b  v(t) dt = (v1(t), v2(t), v3(t)) dt = v1(t) dt, v2(t) dt, v3(t) dt , a a a a a for a, b ∈ I. An application of this is in the study of forces acting on a particle.

Example 4.28. Suppose that an object of mass m > 0 is projected from a cliff of height h > 0 with initial speed given by u > 0 at an angle of θ degrees above the horizontal level of the cliff. The object is then acted upon by the gravitational force given by F(t) = −mg where g is the gravitational acceleration.

θ F = −mg h

Figure 17: Projectile motion of the object.

We wish to find how long the object will remain in the air, what is the maximum height it will attain, and where it will land. Let us put the information we have in a more meaningful way. We denote the horizontal direction from the cliff as e1 and the vertical direction as e2. Let the position vector of the object at time t ≥ 0 be x(t) = (x(t), y(t)). Then the initial position and velocity vector can be written as x(0) = (0, h) and x0(0) = (u cos θ, u sin θ) respectively. The only force acting on the object is the gravitational force, so we have:

d2x F(t) = (0, −mg) = m = m(x00(t), y00(t)). dt2 0 0 0 Integrating this vector, we get x (t) = (x (t), y (t)) = (c1, −gt + c2) for some constants 0 c1, c2 ∈ R. Comparing with the given velocity at initial time x (0) = (u cos θ, u sin θ) with this yields x0(t) = (u cos θ, −gt + u sin θ). Further integrating this vector, we get the position vector, and upon substituting the initial position of the object, we get the position of the object as a function of time: gt2 x(t) = (ut cos θ, − + ut sin θ + h). 2

28 0 u sin θ Therefore, the maximum height occurs when y (t) = 0, that is at the time t0 = g and gt2 hence the maximum height attained by the object is y(t ) = − 0 + ut sin θ + h. The object √ 0 2 0 2 2 will land when y(t) = 0, that is at time t = u sin θ± u sin θ+2gh and since the time is positive, √ 1 g u sin θ+ u2 sin2 θ+2gh we choose t1 = g . Finally, since the object lands at this time t1, it must land x(t1) = ut1 cos θ distance away from the cliff face.

Exercise 4.29. Show that the path traced by the object in Example 4.28 is a parabola.

5 Multivariable Functions

3 We have seen the vector space R and functions mapping into them. Let us now look at functions of several variables and their graphs. We define:

Definition 5.1 (Function of vector variables). A function of vector variables is a function that n maps a subset of a vector space U ⊂ R to the real numbers R, that is f : U → R.

Associated to a function of vector variables is its graph:

n Definition 5.2 (Graph of a function). Let f : U → R be a function on a subset U ⊂ R . Then the graph G(f) of the function f is the set of (n + 1) ordered tuples:

n+1 G(f) = {(x, f(x)) ∈ R : x ∈ U}, which is a subset of U × R.

n+1 If you plot these points in R , you get the “graph” that you have seen in high school. 2 3 From now on, we focus our attention to R and R .

Example 5.3. From Example 2.22, we have an example of a function of vector variables. We x 5 note that the equation of the plane x + 2y + 2z = 5 can be rearranged as z = − 2 − y + 2 . Therefore, the quantity z can be thought of a function f of the variables x and y over the whole xy-plane, defined as:

2 f : R → R x 5 (x, y) 7→ − − y + . 2 2 Therefore, the plane defined by the equation x + 2y + 2z = 5 can also be described as the 3 2 graph G(f) = {(x, y, f(x, y)) ∈ R :(x, y) ∈ R }.

3 Example 5.4. However, not all surfaces in R can be described as a graph of a function of 2 vector variables over R . For example, the unit radius ellipsoid centred at the origin 0 (or a sphere, denoted as S2) which is described by the equation x2 + y2 + z2 = 1 cannot be expressed 2 as a graph of a single function f : R → R. Indeed, by rearranging the equation, we get p 2 z = ± 1 − x2 − y2, which is not a function (since every point (x, y) ∈ R correspond to two

29 2 points in z) from R to R (since the square root does not make sense for large enough vectors (x, y) ∈ R). Instead, we can describe each hemisphere of the surface individually by restricting the func- 2 ¯ 2 p 2 2 tion from R to the closed ball B1(0) = {(x, y) ∈ R : |(x, y)| = x + y ≤ 1}. Therefore, 2 the unit sphere S is the union of two graphs of functions f1 and f2 defined on B¯1(0) as p 2 2 p 2 2 f1(x, y) = 1 − x − y and f2(x, y) = − 1 − x − y , that is:

p 2 2 3 ¯ G(f1) = {(x, y, 1 − x − y ) ∈ R :(x, y) ∈ B1(0)}, p 2 2 3 ¯ G(f2) = {(x, y, − 1 − x − y ) ∈ R :(x, y) ∈ B1(0)}, 2 S = G(f1) ∪ G(f2).

5.1 Level Sets

The next important object that we are going to look at at level sets. We have actually seen this object in Exercise 3.1. Here we define what it means:

2 Definition 5.5 (Level set). Let f : U → R be a real valued function from a subset U ⊂ R or 3 R . The c-level set Lc(f) of the function f for a constant c ∈ R is the set of points in U such that:

Lc(f) = {x ∈ U : f(x) = c}.

2 For functions from a subset U of the 2-dimensional plane R , we can view this as the contours of the graph on a map: imagine the graph of the function represents the topography of a landscape such that f(x, y) is the height of the point from above sea level at the point (x, y) on your map. The level sets denote the areas in your map that has the same height from the sea level.

x 5 2 2 (a) f(x, y) = − 2 − y + 2 . (b) g(x, y) = x − y .

Figure 18: Contour lines or c-level sets of some functions for integer values of c.

30 x 5 Example 5.6. Recall the function f(x, y) = − 2 − y + 2 from Example 5.3. The c-level set of 2 x 5 2 this function are the points {(x, y) ∈ R : − 2 − y + 2 = c}, which are lines in the plane R . For 1 different values of c, the level sets correspond to distinct parallel lines of gradient − 2 . Example 5.7. All the quadric surfaces that we have seen previously can all be described as level sets. For example, the quadric in Example 3.2 described by the equation x2 − 4x + 2y2 + 2 3 4y + z − 2z = −3 is the 0-level set of the function F : R → R defined as F (x, y, z) = x2 − 4x + 2y2 + 4y + z2 − 2z + 3. Indeed:

3 2 2 2 L0(F ) = {(x, y, z) ∈ R : x − 4x + 2y + 4y + z − 2z = −3}.

p 2 2 Exercise 5.8. For the function f1(x, y) = 1 − x − y where (x, y) ∈ B¯1(0), what are the c-level sets of this function when c ∈ [0, 1]? What are the c-level sets of f1 when c > 1?

5.2 Partial Derivatives

We have seen derivatives in one dimension: given a function f : R → R, the derivative of this map at x0 ∈ R is given by the limit:

df f(x0 + h) − f(x0) (x0) = lim , dx h→0 h whenever the limit is defined. This derivative gives us the slope of the function (and hence the tangent line to the curve) at the point (x0, f(x0)). n For functions with many variables, say f : R → R, the function might change if we move n in different directions. The obvious directions are given by the standard basis of R . Here, we define the partial derivatives of multivariable functions:

Definition 5.9 (Partial derivatives). Suppose that f : U → R be a map from an open set 3 2 U ⊂ R (or R ) to R. Let the coordinates variables of the points in U be denoted as (x1, x2, x3) = 3 x1e1 + x2e2 + x3e3 where ei are the standard basis of R . Then, the partial derivative in the ei-th direction at the point p ∈ U is given by the limit: ∂f f(p + he ) − f(p) (p) = lim i , ∂xi h→0 h whenever it is defined. Sometimes, the notation fxi (p) or ∂xi f(p) are also used for this partial derivative.

By definition, the partial derivative in the direction of ei at the point p is the slope of the function f at the point p in the direction ei whilst keeping the other coordinates constant.

As the operation suggests, in order to compute partial derivatives in the ei-th direction, we simply keep the other coordinates constant and differentiate the function with respect to the xi variable:

x 3 3 Example 5.10. Suppose that f(x, y, z) = ye + yz + x cos z is a function from R to R. Then, the partial derivatives are given by: ∂f ∂f ∂f = yex + cos z, = ex + z3, = 3yz2 − x sin z. ∂x ∂y ∂z

31 We can also define higher order partial derivatives in the same was as before: we fix the other variables and differentiate with respect to the variable interested. We use the following n notation if x = (x1, x2, . . . , xn) ∈ R is the coordinate variables in our domain, then: 2 ∂ ∂f ∂ f 2 (p) = (p) = fxixj (p) = ∂xixj f(p). ∂xj ∂xi ∂xj∂xi Note the order of the variables in the notation! However, if the domain is a subset of a vector space, a well-known result called the Clairaut’s Theorem states that the mixed partial derivatives commute:

2 Theorem 5.11 (Clairaut’s Theorem). If f : U → R is a smooth function on an open set U ⊂ R 3 or R , then the partial derivatives commute at every p ∈ U, that is: ∂2f ∂2f (p) = (p), ∂xi∂xj ∂xj∂xi for all i, j ∈ {1, 2, . . . , n}.

The proof of this theorem uses ideas and methods in real analysis and is beyond the scope of this course.

Exercise 5.12. Recall that we have defined f(x, y, z) = yex + yz3 + x cos z and computed its first partial derivatives in Example 5.10. Compute the second partial derivatives of this function and verify that the partial derivatives commute.

We are also interested to find partial derivatives of composed functions. For functions of one variable, we have chain rules. We have an analogous chain rule result for partial derivatives which takes into account all the variables of the function:

Proposition 5.13 (Chain rule). Let f : U → R be a function of n variables (x1, x2, . . . , xn) and v : I → U is a vector valued function with image in U given by v(t) = (v1(t), v2(t), . . . , vn(t)). Then, we have f ◦ v : I → R and:

d d ∂f dv1 ∂f dv2 ∂f dvn (f ◦ v)(t) = f(v1(t), v2(t)) = (v(t)) + (v(t)) + ··· + (v(t)) . dt dt ∂x1 dt ∂x2 dt ∂xn dt In general, we have:

Proposition 5.14 (Chain rule). Let f : U → R be a function of n variables (x1, x2, . . . , xn) m and v : W → U is a vector valued function from R with image in U. Let (t1, t2, . . . , tm) be the variables in W . Then, we have f ◦ v : W → R and for all i = 1, 2, . . . , m we have: ∂ ∂f ∂v ∂f ∂v ∂f ∂v (f ◦ v) = 1 + 2 + ... + n . ∂ti ∂x1 ∂ti ∂x2 ∂ti ∂xn ∂ti We can also differentiate vectors using this definition. This is done as before, that is the n m differentiation is done component wise. Let v : U → R where U ⊂ R be defined as v(x) = (v1(x), . . . , vn(x)) where x = (x1, x2, . . . , xm) is the coordinate in the domain. Then: ∂  ∂ ∂ ∂  v(x) = v1(x), v2(x),..., vn(x) , ∂xi ∂xi ∂xi ∂xi

32 for i = 1, 2, . . . , m. An application of this is to find tangent vectors to a graph. Suppose that f : U → R is 2 a sufficiently smooth function on a subset U ⊂ R . Then the graph of f which are points 3 (x, y, f(x, y)) for (x, y) ∈ U describes a surface Σ in R . We now wish to find vectors which are tangent to this surface at some point q = (x0, y0, f(x0, y0)) ∈ Σ. There are two obvious lines on the surface passing through the point q namely the line u1(x) = (x, y0, f(x, y0)) for x ∈ I such that x0 ∈ I and the line u2(y) = (x0, y, f(x0, y)) for y ∈ J such that y0 ∈ J. We can differentiate these lines with respect to x and y respectively to get the tangent vectors:

d  ∂  d  ∂  u (x ) = 1, 0, f(x , y ) and u (y ) = 0, 1, f(x , y ) , dx 1 0 ∂x 0 0 dy 2 0 ∂y 0 0

∂ to these lines. Since these lines are contained in the surface Σ, the vectors ∂x u1(x0) and ∂ ∂y u2(y0) are also tangential to the surface Σ.

f(x, y)

q

∂yu2 ∂xu1 y U

x

Figure 19: The graph G(f) over U with two tangent vectors ∂xu1 and ∂yu2 at the point q.

Since these two vectors are linearly independent, then we can define a plane through the point p that is spanned by these two vectors. This plane is called the tangent plane of the graph G(f) at the point q ∈ G(f). This tangent plane is the best linear approximation of the function f at the point q.

Definition 5.15 (Tangent plane to a graph). Let f : U → R be a real valued smooth function 2 from a subset U ⊂ R and let Σ be the surface defined by the graph G(f) of the function f over U. The tangent plane to the surface Σ at the point q ∈ Σ is defined as:

 3 Tq(Σ) = v ∈ R : v = q + λ∂xu1(q) + µ∂yu2(q) for λ, µ ∈ R .

3 Example 5.16. We have seen in Example 5.4 that the hemisphere Σ in R can be described as p 2 2 the graph G(f) of the function f(x, y) = 1 − x − y over the unit ball U = B¯1(0) We know   that the point q = 1 , 1 , √1 lies on H. We want to find the tangent plane to the hemisphere 2 2 2

33 at the point q. We compute the derivatives:

∂  1  (x, y, f(x, y))(q) = 1, 0, −√ , ∂x 2 ∂  1  (x, y, f(x, y))(q) = 0, 1, −√ . ∂y 2 Thus, the tangent plane to the hemisphere at the point q is given parametrically by:         3 1 1 1 1 1 Tq(Σ) = v ∈ R : v = , , √ + λ 1, 0, −√ + µ 0, 1, −√ for λ, µ ∈ R . 2 2 2 2 2

5.3 Gradient

Another concept used in the application of calculus is differentials. The partial derivatives measure the derivative of a multivariable function in one single coordinate direction whilst keeping the other variables constant. Putting the partial derivatives together gives us the differential or total derivative, which measures the change of the function with respect to all the coordinate directions.

2 Definition 5.17 (Differential). Suppose that f : U → R be a map from an open set U ⊂ R 3 (or R ) to R. Let the coordinates variables of the points in U be denoted as (x, y). Then, the differential or total derivative df of the function f at a point p ∈ U is given by: ∂f ∂f df = (p)dx + (p)dy. (6) ∂x ∂y In terms of applications, one can view the quantities df, dx, and dy as small changes in the quantities f, x, and y respectively. Therefore, this interpretation is used widely in approximation theory. The idea comes from approximating the function f at a fixed point p ∈ U as a linear function, which is easier to deal with. The differential in equation (6) at a fixed point p ∈ U is a linear equation in the (df, dx, dy) “variables”. In fact, the differential resembles the equation of a plane: ∂f ∂f 0 = (p)dx + (p)dy − df, ∂x ∂y that spanned by the vectors (1, 0, ∂xf(p)) and (0, 1, ∂yf(p)) and passes through the origin 0.

Therefore the differential describes the tangent plane T(p,f(p))(Σ) of the graph G(f) at (p, f(p)) but the point q = (p, f(p)) shifted to the origin.

Since the tangent plane Tq(Σ) to the point q = (p, f(p)) touches the said point, it remains close to the graph in a small neighbourhood of the point p. Therefore the tangent plane Tq(Σ) approximates the graph G(f) close to this point q. If the tangent plane is shifted to the origin, the new coordinates then describes the change in values of the various coordinates and their linear relationship. Thus, the differential describes the change df in terms of dx and dy. However, care must be taken as this only approximates the graph close to the contact point q, so the approximation is only acceptable for small values of dx and dy. If we choose them

34 too big, the tangent plane Tq(Σ) would probably deviate too much from the graph G(f) as the function f might not be a linear function.

df f(x, y)

−q q dy 0

∂yu2 dx ∂xu1 y p x

Figure 20: The graph G(f) over U with the shifted tangent plane Tq(Σ) − q described by the differential ∂xf(p)dx + ∂yf(p)dy − df = 0.

Remark 5.18. Strictly speaking, the “variables” df, dx, and dy are not variables. They are called differential forms, which are dual vectors to the tangent space. You will learn more about dual vectors spaces in a course in advanced linear algebra. Bur for our purposes here, it is enough to consider them as variables.

Example 5.19. Suppose that we have a cylinder of radius r and height h. Its volume is given by the quantity V (r, h) = πr2h. We wish to approximate the change in volume if we were to vary the radius and height of the cylinder from a fixed dimension (r0, h0). If we compute the total derivative, we get: ∂V ∂V dV (r , h ) = dr + dh = 2πr h dr + πr2dh. 0 0 ∂r ∂h 0 0 0 If we keep the radius constant, that is dr = 0, then we can see that the small change of volume is proportional to the small change in height. On the other hand, if we keep the height constant, that is dh = 0, the small change in volume is proportional to the small change in radius times the radius r0.

Example 5.20. Compute the volume of a cylinder with radius r = 4 units and height h = 3 units. By using differentials, approximate the change in volume if we increase the radius by dr = 0.125 units and height by dh = 0.125 units and approximate the volume of the resulting cylinder. Compare with the actual volume of the resulting cylinder. What happens if you approximate the change of volume with dr = 1 and dh = 1?

We can also put the components of the differential in terms of a vector. This vector is called the gradient of the function f, which is:

3 Definition 5.21 (Gradient). Suppose that f : U → R be a map from an open set U ⊂ R (or 2 R ) to R. Let the coordinates variables of the points in U be denoted as (x, y, z). Then, the

35 gradient vector ∇f of the function f at a point p ∈ U is given by:

∂f ∂f ∂f  ∇f(p) = (p), (p), (p) . ∂x ∂y ∂z

The symbol ∇ is called nabla or del. One can think of the operator ∇ as the formal vector

(∂x, ∂y, ∂z) so that ∇f = (∂x, ∂y, ∂z)f = (∂xf, ∂yf, ∂zf). Using product rule, one can prove the following properties of gradient:

Proposition 5.22 (Properties of gradient). Suppose that f, g : U → R are maps from an open 3 2 set U ⊂ R (or R ) to R. Then:

1. ∇(f + g) = ∇f + ∇g,

2. ∇(fg) = g∇f + f∇g,

 f  g∇f−f∇g 3. ∇ g = g2 .

The operator ∇ in the coordinates (x, y, z) is written as above. However, sometimes it is useful to write it in terms of other coordinates system, for example the spherical or cylindrical coordinates as we have seen before. In order to write the operator ∇ in terms of the cylindrical coordinates, we recall the change of variables (x, y, z) 7→ (r, θ, h) where x = r cos θ, y = r sin θ, p 2 2 y  and z = h or r = x + y , θ = arctan x , and h = z from Definition 3.6. The operator ∇ can be written as:

∇ = ∂xe1 + ∂ye2 + ∂ze3,

3 where ei is the standard basis vectors in R . We now wish to express the components in terms of the basis vectors er, eθ, and eh. The vector er is the unit vector pointing towards the point

(x, y, z), therefore we have er = cos θe1 +sin θe2. The vector eh is simply e3. Finally, the vector eθ = eh × er = (cos θe1 + sin θe2) × e3 = − sin θe1 + cos θe2. Thus, we have:

e1 = cos θer − sin θeθ, (7)

e2 = sin θer + cos θeθ, (8)

e3 = eh. (9)

Furthermore, by chain rule in Proposition 5.14, we have: ∂ ∂r ∂ ∂θ ∂ ∂z ∂ ∂ sin θ ∂ = + + = cos θ − , (10) ∂x ∂x ∂r ∂x ∂θ ∂x ∂z ∂r r ∂θ ∂ ∂ cos θ ∂ = sin θ + , (11) ∂y ∂r r ∂θ ∂ ∂ = . (12) ∂z ∂h

Thus, putting the expressions (7)-(12) in the expression ∇ = ∂xe1 + ∂ye2 + ∂ze3, we have: ∂ 1 ∂ ∂ ∇ = e + e + e . ∂r r r ∂θ θ ∂h h

36 Exercise 5.23. Prove that in the spherical coordinates (r, θ, φ) as defined in Definition 3.9, we have: ∂ 1 ∂ 1 ∂ ∇ = e + e + e . ∂r r r ∂θ θ r sin θ ∂φ φ

Recall that the partial derivative in the xi coordinate is the change in the function f when all other coordinates are kept constant. Therefore, our motion in the domain is fixed only in the direction ei from the point of interest. However, we can define a more general derivative in some other mixed direction, say v = v1e1 + v2e2 + v3e3. This derivative is called the directional derivative:

Definition 5.24 (Directional derivative). Suppose that f : U → R be a map from an open set 3 3 U ⊂ R to R. Let v be a vector in R . Then, the directional derivative in the direction v at the point p is defined as: f(p + hv) − f(p) Dvf(p) = lim , h→0 h whenever it is defined.

We note that if v = ei, then the directional derivative Dei f(p) is simply the partial derivative in the x variable ∂f . Since the direction we are moving is more general, this is why the i ∂xi derivative is called directional derivative. In fact, the directional derivative is related to the gradient in the following way:

3 Proposition 5.25. Suppose that f : U → R be a map from an open set U ⊂ R to R such 3 3 that p ∈ U. Let v ∈ R be a vector in R . Then:

Dvf(p) = ∇f(p) · v.

3 Proof. Define a function g : I → R by g(h) = p + hv = (p1 + hv1, p2 + hv2, p3 + hv3). Then, we can write the directional derivative Dvf(p) as: f(p + hv) − f(p) f(g(h)) − f(g(0)) d Dvf(p) = lim = lim = (f ◦ g)(0), h→0 h h→0 h dh and thus, by using chain rule in Proposition 5.13, we get: ∂f dg ∂f dg ∂f dg D f(p) = 1 + 2 + 3 v ∂x dh ∂y dh ∂z dh ∂f ∂f ∂f = v + v + v = ∇f(p) · v, ∂x 1 ∂y 2 ∂z 3 which is what we wanted to prove.

Geometrically, the gradient vector points in the direction of the greatest increase of the function f from the point p. The proof is the following:

3 Proposition 5.26. Suppose that f : U → R be a map from an open set U ⊂ R to R. Then the gradient vector ∇f(p) points in the direction of the greatest increase of the function f from the point p.

37 3 3 Proof. Let vˆ ∈ R be an arbitrary unit vector in R which we vary in order to find the greatest derivative. Suppose that ∇f(p) 6= 0. Then the directional derivative Dvˆf(p) is the derivative of f in the direction of vˆ at the point p. Thus from Proposition 5.25 and definition of dot product, we have:

Dvˆf(p) = ∇f(p) · vˆ = |∇f(p)||vˆ| cos θ = |∇f(p)| cos θ, where θ ∈ [0, π] is the angle between the vectors ∇f(p) and vˆ. Since cos θ ∈ [−1, 1], the derivative is the greatest when θ = 0, that is when vˆ and ∇f(p) points in the same direction. So the derivative is maximum in the direction of ∇f(p).

As a consequence, the direction of the greatest decrease of the function f at the point p is n given by the vector −∇f(p). The above definition and propositions also extend to U ⊂ R and n v ∈ R for any n ∈ N. Another result gives us an information about the level sets of a function. Since the c-level set of a function f are the points x satisfying f(x) = c and the gradient ∇f points in the direction for which the function f increases the most, the gradient vector must be pointing away from the level set. As a consequence, we have that the gradient vector points in a direction perpendicular to the level sets of f in the following sense:

3 Corollary 5.27. Let f : U → R be a real valued function from a subset U ⊂ R and let the non-empty c-level set of the function f be Lc(f). Then the gradient vector ∇f(p) of the function f at any point p ∈ Lc(f) is perpendicular to the tangent vectors of Lc(f) at p.

Proof. Let u : I → Lc(f) be a curve defined in the c-level set of f such that 0 ∈ I and u(0) = p.

Then since the curve lies in Lc(f), we must have that f(u(t)) = c is constant for all t ∈ I. Thus, taking the derivative with respect to the variable t at t = 0, by chain rule, we get: d 0 = (f ◦ u)(0) = ∇f(u(0)) · u0(0). dt Hence ∇f(p) is perpendicular to the vector u0(0). However, u0(0) is tangential to the curve u(t) at p and hence tangential to the level set Lc(f). Thus, since this tangent vector is arbitrary, we get that ∇f(p) is perpendicular to tangent vectors of Lc(f) at p.

∇f(p) u0(0)

p

f(v) = c

Figure 21: The c-level set of the function f, a tangent vector to Lc(f) and the gradient vector ∇f at the point p.

38 Figure 22: The level sets of the function f(x, y) = x2 − y2 and the gradient vectors ∇f(1, 0) =

(2, 0) and ∇f(−1, 0) = (−2, 0) on the level set L1(f).

This corollary implies that the vector ∇f(p) is normal to the tangent plane of the c-level set Lc(f) at p. From Proposition 2.21, since we know that a normal vector and a specific point is enough to fully define a plane, we can define the tangent plane to the level set Lc(f):

Definition 5.28 (Tangent plane of level set). Let f : U → R be a real valued function from a 3 subset U ⊂ R and let the non-empty c-level set of the function f be Lc(f). Let p ∈ Lc(f), be a point in the level set such that ∇f(p) 6= 0. Then:

1. the plane through p that is normal to the vector ∇f(p) is called the tangent plane of

Lc(f) at p. This plane is the set:

3 Tp(Lc(f)) = {v ∈ R : v · ∇f(p) = p · ∇f(p)}.

2. the line through p having the direction of ∇f(p) is called the normal line to Lc(f) at p. This line is the set:

3 Np(Lc(f)) = {v ∈ R : v = p + λ∇f(p) for λ ∈ R}.

Example 5.29. A hyperbolic paraboloid is a quadric that can described by the equation x2 − y2 − z = 0. Let us call this surface Σ. We can also describe Σ as the 0-level set of the function 3 2 2 f : R → R defined by f(x, y, z) = x − y − z. The point p = (2, 1, 3) lies on the surface Σ. We now wish to find the tangent plane and the normal to the point p. We compute: ∇f = (2x, −2y, −1). Therefore, the line normal to Σ at the point p is the set:

3 3 {v ∈ R : v = p + λ∇f(p) for λ ∈ R} = {v ∈ R : v = (2, 1, 3) + λ(4, −2, −1) for λ ∈ R},

39 and the tangent plane to Σ at p is:

3 3 {v ∈ R : v · ∇f(p) = p · ∇f(p)} = {v ∈ R : v · (4, −2, −1) = 3 for λ ∈ R}, hence the plane is described by the equation 4x − 2y − z = 3.

Recall that we have constructed the tangent plane of a graph G(f) in Definition 5.15 in the previous section as the plane spanned by ∂x(x, y, f(x, y)) and ∂y(x, y, f(x, y)) which passes through a point on the graph. The surface defined by a graph can also be described as a level 3 surface. The graph G(f) in R has the z coordinate defined by z = f(x, y). Therefore, the 3 graph G(f) is the 0-level set of the function F : R → R defined by F (x, y, z) = z − f(x, y), that is L0(F ) = G(f). We shall show that the two definitions of tangent planes coincide. Indeed, from Definition

5.15, the tangent plane of this surface at p ∈ G(f) is spanned by the vectors (1, 0, ∂xf(x, y)) and (0, 1, ∂yf(x, y)). In Definition 5.29, the tangent plane at p ∈ L0(F ) is normal to ∇F =

(∂xF, ∂yF, ∂zF ) = (−∂xf(x, y), −∂yf(x, y), 1). Thus, it is sufficient to check that ∇F is per- pendicular to the two vectors (1, 0, ∂xf(x, y)) and (0, 1, ∂yf(x, y)). But this is easy to check:

∇F · (1, 0, ∂xf(x, y)) = (−∂xf(x, y), −∂yf(x, y), 1) · (1, 0, ∂xf(x, y)) = 0,

∇F · (0, 1, ∂yf(x, y)) = (−∂xf(x, y), −∂yf(x, y), 1) · (0, 1, ∂yf(x, y)) = 0, and since the two planes pass through the same point p ∈ L0(F ) = G(f), the two planes coincide. Hence the two definitions are equivalent.

Example 5.30. Recall in Example 5.16 that the tangent plane of the hemisphere Σ defined p   as the graph of the function f(x, y) = 1 − x2 − y2 at the point p = 1 , 1 , √1 is described 2 2 2 parametrically as:         3 1 1 1 1 1 Tp(Σ) = v ∈ R : v = , , √ + λ 1, 0, −√ + µ 0, 1, −√ for λ, µ ∈ R . 2 2 2 2 2 The hemisphere Σ above can also be described as the 0-level set of the function F (x, y, z) = z − p1 − x2 − y2 for z ≥ 0. Therefore, we can also describe the tangent plane as the plane   perpendicular to the vector ∇F (p) = √1 , √1 , 1 that passes through the point p. Thus: 2 2 √ 3 3 Tp(Σ) = {v ∈ R : v · ∇F (p) = p · ∇F (p)} = {(x, y, z) ∈ R : x + y + 2z = 2}.

∇F (p)

p

Figure 23: The normal vector to a tangent plane of the hemisphere at p.

40 5.4 Critical and Extrema Points

Like normal derivatives in one dimensions, the partial derivatives are used to locate the critical points of a multivariable function. Since there are multiple directions that we can move in the domain of the function, at the critical points, the partial derivatives must simultaneously vanish. We first define what local and global extremas of a function.

2 Definition 5.31 (Local extrema). Let f : U → R be a smooth function from a domain U ⊂ R 3 (or R ). The function f has a local minimum at the point p if f(p) ≤ f(v) for any v in a small neighbourhood of p in U. The value f(p) is called a local minimum. Conversely, the function f has a local maximum at the point q if f(q) ≥ f(v) for any v in a small neighbourhood of q in U. The value f(q) is called a local maximum.

2 Definition 5.32 (Global extrema). Let f : U → R be a smooth function from a domain U ⊂ R 3 (or R ). The function f has a global minimum at the point p if f(p) ≤ f(v) for any v ∈ U. The value f(p) is called a global minimum. Conversely, the function f has a global maximum at the point q if f(q) ≥ f(v) for any v ∈ U. The value f(q) is called a global maximum.

The following theorem assures that the global extremas of a smooth function defined on a closed and bounded domain is attained somewhere in the domain.

Theorem 5.33 (Extreme value theorem). Let f : U → R be a smooth function defined on a 2 3 closed bounded region of R (or R ). Then the function attains its global extremas. Further- more, a global extremum must either be a local extremum inside the domain U or lies on the boundary of U.

The proof of this theorem is a beautiful result in real analysis and is outside the scope of this course. Next we define critical points of a multivariable smooth function.

2 Definition 5.34 (Critical points). Let f : U → R be a smooth function from a domain U ⊂ R 3 (or R ). The critical points of the function f are the points p ∈ U such that ∇f(p) = 0, that is all the partial derivatives of f vanish at p.

∂f ∂f In order to find a critical point, one must solve the simultaneous equation ∂x = 0 and ∂y = 0 ∂f 3 (and ∂z = 0 if U ⊂ R ). However, a critical point need not be an extremum point. dg Recall that for one dimensional function g : R → R, local extrema satisfy the equation dt = 0 dg but points where dt = 0 are not necessarily local extrema. An example of this phenomenon 3 dg 2 is the point (0, 0) for the function g(t) = t : at the origin dt = 3t so the derivative vanishes at the origin. However, if we plot the graph of g(t), we can clearly see that the point (0, 0) is neither a local minimum or a local maximum. Similar to the one-dimensional case, local extrema of the function f must satisfy ∇f = 0 but points where ∇f = 0 are not necessarily local extrema. For example, consider the function f(x, y) = x2 − y2. The graph of this function is a hyperbolic paraboloid. We compute that 2 ∇f = (2x, −2y) and thus at the origin (0, 0) ∈ R , the gradient ∇f vanishes. However, if we

41 plot this graph (or the level sets of this function), if we move slightly away from the point (0, 0) in the domain: if we move in the e1 direction, we can only increase the value of f(x, y) whilst if we move in the e2 direction, we can only decrease the value of f(x, y). So it is not an extremum point. This type of point is called a saddle point for obvious reasons.

Figure 24: Hyperbolic paraboloid f(x, y) = x2 − y2.

In order to single out some of the genuine extrema point from the critical points, we can appeal to the second derivative test. This is similar to the one-dimensional function that in which a critical point is a maximum if the second derivative is negative and a point is a minimum if the second derivative is positive.

Theorem 5.35 (Second partial derivative test). Let f : U → R be a smooth function from a 2 domain U ⊂ R and p ∈ U be a critical point of f, that is ∇f(p) = 0. We define the quantity: ∂2f ∂2f  ∂2f 2 D = (p) (p) − (p) . ∂x2 ∂y2 ∂x∂y

∂2f 1. If D > 0 and ∂x2 > 0, then f has a local minimum at p, ∂2f 2. If D > 0 and ∂x2 < 0, then f has a local maximum at p, 3. If D < 0, then p is a saddle point,

4. If D = 0, then the test is inconclusive.

Remark 5.36. The proof of this theorem is an interesting result in differential topology in which we study the Hessian matrix of the function f. The Hessian matrix of f is the matrix of second derivatives of f, which is given by: ! ∂2 f(p) ∂2 f(p) Hess(f)(p) = xx xy . 2 2 ∂yxf(p) ∂yyf(p) Note that the determinant of this matrix is the quantity D above. As you may have seen in linear algebra, the determinant is the product of the eigenvalues of this matrix. The eigenvalues of this matrix denote whether the function is concave or convex with the eigenvectors as the direction of motion in U from the point p.

42 Example 5.37. We wish to find all the critical points and classify them for the function 2 3 f :[−3, 3] × [−3, 3] → R defined by f(x, y) = 12x + y − 12xy. We compute all the partial derivatives as follows:

2 ∂xf = 24x − 12y, ∂yf = 3y − 12x, 2 2 2 ∂xxf = 24, ∂xyf = −12, ∂yyf = 6y.

Then the critical points (x, y) satisfy the equations 24x − 12y = 0 and 3y2 − 12x = 0. Hence the critical points are p = (0, 0) and q = (1, 2). Using the second partial derivatives test, we have D(p) = 24 ∗ 6(0) − (−12)2 = −144 < 0 and so it is a saddle point. On the other hand, 2 2 D(q) = 24 ∗ 6(2) − (−12) = 144 > 0 and since ∂xxf(q) > 0 it is a local minimum. Now let us find the global extremum of the function f in the domain [−3, 3] × [−3, 3]. From Extreme Value Theorem, we note that a global maxima must be an interior local maximum or lies on the boundary. Since there are no local maxima inside the boundary, the global maximum must lie on the boundary. We check the boundary segments one by one: on the top segment, the function satisfies f(x, 3) = 12x2 + 27 − 36x, so the maximum value in x ∈ [−3, 3] is 243. On the bottom boundary, we have f(x, −3) = 12x2 − 27 + 36x so its maximum value is 189. Similarly, the maximum values on the left and right boundaries which are the maximum of the functions f(−3, y) and f(3, y) are 243 and 189 respectively. Therefore, the global maxima in [−3, 3] × [−3, 3] is 243 which is attained at the point (−3, 3). Using similar analysis, the global minima is either an interior local minima f(1, 2) = −4 or the minimum on the boundary, which 3 is −54 at the point ( 2 , −3). Thus, the global minimum is the latter point as its value is smaller.

Figure 25: The level sets of the function f(x, y) = 12x2 + y3 − 12xy and its critical points.

2 Exercise 5.38. Find and classify all the critical points of the function f : R → R defined by f(x, y) = x(x + y)(y + y2).

43 5.5 Lagrange Multipliers

The Lagrange multiplier is a tool used in the study of optimisation with constraint. Recall in the previous section where we optimised (that is, finding the extrema points) of a function f(x, y). If we include a constraint g(x, y) = c, then the problem becomes significantly more complicated since the extrema points must satisfy another the other equation. Therefore the global extrema of this constrained problem might be very different to the global extrema of the unconstrained problem! As an example, suppose that we want to find the global extrema points of the function f(x, y) = x + y subject to the constraint g(x, y) = x2 + y2 − 1 = 0, that is we want to find the maximum of f(x, y) on the circle x2 + y2 = 1. We note that the function f(x, y) is unbounded 2 2 2 on R , so it does not attain its extrema. However, on the circle x + y = 1, the function f(x, y) attains its global maximum and global minimum somewhere, according to the Extreme Value Theorem. How do we actually find the solution to the constrained problem? There is a theorem by Lagrange that describes such extrema points:

3 2 Theorem 5.39 (Lagrange’s theorem). Let f and g be smooth functions from R (or R ) to R. 3 2 Suppose that the function f has an extremum point at p ∈ R (or R ) subject to the constraint g(p) = c. If ∇g(p) 6= 0, then we must have ∇f(p) = λ∇g(p) for some λ ∈ R.

Proof. Suppose that the extremum point of f(x, y) subject to the constraint g(x, y) = c is p.

Then at this point p we must have g(p) = c. Define a curve in the c-level set of g as u : I → Lc(g) with 0 ∈ I such that u(0) = p. Then we have the function F = f ◦u : I → R. This construction ensures that F has an extremum at t = 0. Thus by differentiating with respect to t at t = 0 and using chain rule, we have: dF d 0 = (0) = (f ◦ u)(0) = ∇f(p) · u0(0). dt dt This implies that ∇f(p) is perpendicular to the vector u0(0). Since we have constructed 0 an arbitrary path u(t), the vector u (0) is an arbitrary tangent vector to Lc(g) at p. Hence

∇f(p) is perpendicular to the tangent plane/line of Lc(g) at p. Since ∇g(p) is perpendicular to the tangent plane/line of Lc(g) at p, we must have that ∇g(p) and ∇f(p) are parallel to each other. This gives us the result.

With this extra knowledge, we want to find the extremum point p of the function f(x, y, z) subject to g(x, y, z) = c. Lagrange’s theorem gave us a necessary condition for such an extremum point, that is the point p satisfies ∇f(p) = λ∇g(p). We also require that g(p) = c. Thus to find the candidates of the extremum points, we solve the two equations simultaneously:

∇f(p) = λ∇g(p) and g(p) = c. (13)

44 However, these equations can be succinctly be written as an optimisation problem of one equation. We define an auxiliary equation:

4 L : R → R, (x, y, z, λ) 7→ f(x, y, z) − λ(g(x, y, z) − c), and thus the two equations in (13) can be expressed as the critical point of the function L with respect to all the four variables (x, y, z, λ). Namely, problem (13) is equivalent to the problem:

∇(x,y,z,λ)L = 0.

Indeed, we have ∇(x,y,z,λ)L = 0 if and only if these four equations ∂xf − λ∂xg = 0, ∂yf −

λ∂yg = 0, ∂zf −λ∂zg = 0, and g = c (which form the equations (13)) are satisfied. The variable λ here is called the Lagrange multiplier and the function L is called the Lagrangian.

Example 5.40. We go back to the problem posed at the beginning of this chapter where we want to find the global extrema of the function f(x, y) = x + y subject to the constraint x2 + y2 = 1. We define g(x, y) = x2 + y2 − 1 and thus the auxiliary equation is given by:

L(x, y; λ) = f(x, y) + λg(x, y) = x + y + λ(x2 + y2 − 1).

By solving ∇(x,y,λ)L = 0, we find the critical points of the auxiliary function L to be:     √1 , √1 , − √1 and − √1 , − √1 , √1 . The first two coordinates of these points give us the 2 2 2 2 2 2   √   √ values f √1 , √1 = 2 and f − √1 , − √1 = − 2 respectively. The first is a global maximum 2 2 2 2 whilst the second is the global minimum of the problem.

Note however that by finding the points such that ∇(x,y,λ)L = 0, we get all the critical points to the problem. Recall that not all critical points are extrema points. Therefore, using this method, we can only easily find the global extrema by picking the critical points that give largest and smallest value for the function f. The points which give the values of f in between the extremes can either be a local extrema or not. In order to determine whether they are local extremas, we need to look at the other values of f in a neighbourhood of these critical points. This of course requires more work!

Exercise 5.41. Find the critical points of the function f(x, y) = x2y on the circle of radius √ 3. Determine which of these critical points are the global extrema points.

Exercise 5.42. Find the dimensions of the box with largest volume if its surface area is fixed at 24 units2.

5.6 Integration over Multiple Variables

Another generalisation of differentiation to higher dimensions is of course integration. Similar to partial derivatives, when we integrate a multivariable function with respect to one of its variables, we do this by treating the other variables as constants. As a result, for indefinite integrals, the constant of integration depends on these fixed variables.

45 Example 5.43. As an example, recall the function f(x, y, z) = yex + yz3 + x cos z in Example 5.10. We have computed its first derivatives: ∂f ∂f ∂f = yex + cos z, = ex + z3, = 3yz2 − x sin z. ∂x ∂y ∂z In order to recover the function f from its derivatives, lets integrate the partial derivative x ∂xf = ye + cos z. By keeping the y and z variables constant, we get: ¢ x x ye + cos z dx = ye + x cos z + C1(y, z), which we can see is not exactly the function f above. Therefore, we integrate the other two derivatives and compare them to deduce the constant C1(y, z). We have: ¢ x 3 x 3 e + z dy = ye + yz + C2(x, z), ¢ 2 3 3yz − x sin z dz = yz + x cos z + C3(x, y).

3 Thus by comparing the three integrals, we deduce that C1(y, z) = yz +C for some constant C ∈ R. Thus, we have recovered the function f up to an addition of constant.

Since the resulting integral is a function of the remaining variables, we can continue the integration process with respect to any of the remaining variables. This will result in a double integral, generally used to find volume under the graph z = f(x, y). Compare this with the integral in one dimension where the 1 dimensional integral denotes the area under the graph y = f(x). Integrals over two variables and three variables are called double and triple integrals respectively.

2 Definition 5.44 (Double integrals). Let f : R → R be a function of two variables. Then the 2 integral of the function f over a region U ⊂ R is denoted as: ¤ f(x) dA, U 2 where dA is the infinitesimal area element of the domain R . In the usual Cartesian coordinates, if we denote these variables as x and y, the area element is dA = dx dy.

2 Remark 5.45. The area of the region U ⊂ R can be found by integrating the infinitesimal area element dA over U. In other words, the area of U is the same as the volume under the graph of the constant function f(x) = 1 on the domain U.

Here are some properties of the double integral:

2 Proposition 5.46 (Properties of double integrals). Let f : R → R be a function of two 2 variables. Denote these variables as x and y. Suppose that U1 and U2 are regions in R such that U = U1 ∪ U2 and U1 ∩ U2 = ∅. Then: £ £ 1. U λf(x) dA = λ f(x) dA

46 £ £ £ 2. U f(x) ± g(x) dA = U f(x) dA ± U g(x) dA, £ 3. U f(x) dA ≥ 0 if f(x) ≥ 0 for all x ∈ U, £ £ 4. U f(x) dA ≥ U g(x) dA if f(x) ≥ g(x) for all x ∈ U, £ £ £ 5. f(x) dA = f(x) dA + f(x) dA, U U1 U2

In Definition 5.44, the area element is written as dx dy which means that the integral is done in the x variable first, followed by the y variable. However, similar to partial derivatives, as long as the integrand is nice enough, the order of the integral does not matter, as long as the limits in the definite integral is properly set. This result is called Fubini-Tonelli’s Theorem. The proper conditions that is needed to ensure this is an advanced topic in measure and integration theory. But for now, since we are dealing with nice functions, we take this as a given.

Example 5.47. Let us evaluate the integral of the function f(x, y) = xy2 in the domain bounded by the curves y = x2, y = 0, and x = 2.

y y y = x2 y = x2

x x 2 2 (a) Integration with respect to y first. (b) Integration with respect to x first.

Figure 26: Direction of integration with respect to the variables x and y.

Then, we compute:

¢ ¢ 2 ¢ ¢ ¢ ¢ 2 x 2 x7 32 4 2 4 y3 32 xy2 dy dx = dx = and xy2 dx dy = 2y2 − dy = . √ 0 0 0 3 3 0 y 0 2 3

Exercise 5.48. Compute the volume of the region bounded by the xy-plane, yz-plane, xy-plane, and the plane described by the equation 2x + y + 4x = 4.

The trick of evaluating double integrals is to choose the order for which the integrals are easier to evaluate. Other times, it is more convenient to choose the right coordinate system, either to simplify the integrand or to simplify the region. Recall the polar coordinates as we have seen in Definition 3.4 that we can derive a polar coordinate system by the variables substitution x = r cos θ and y = r sin θ. When we are doing change of variables, the infinitesimal area element dA in different coordinate systems might look different. For example, in the Cartesian coordinate system, the area element dA is dx dy.

47 But in polar coordinates system, this is different. The way to compute this is by using exterior algebra. As we have mentioned before in Remark 5.18, the quantity dx and dy are called differential forms (or one-forms) and they are dual vectors to the tangent space. They form a vector space themselves and can be multiplied together by the wedge product ∧ resulting in a two-form. The wedge product is defined as:

Definition 5.49 (Wedge product). Let V be a vector space. Let u, v, w ∈ V and λ ∈ R. Then the wedge product on V is a map V × V → V ∧ V such that:

1.( λu) ∧ v = λ(u ∧ v) = u ∧ (λv),

2. u ∧ (v + w) = u ∧ v + u ∧ w,

3. u ∧ v = −v ∧ u. In particular, u ∧ u = 0.

The infinitesimal area element dA is actually the two-form defined by the wedge product of differential forms |dx ∧ dy| written succinctly as dx dy, and it is called the area form. If we were to change to polar coordinates, by using the relations x = r cos θ and y = r sin θ, we compute the differentials:

dx = cos θdr − r sin θdθ and dy = sin θdr + r cos θdθ, and therefore, substituting this in the area form and using the properties of wedge products we get:

dA = |dx ∧ dy| = |(cos θdr − r sin θdθ) ∧ (sin θdr + r cos θdθ)| ( (((( (((( 2 2 2 ((( = |(cos(θ(sin θdr ∧ dr + r cos θdr ∧ dθ − r sin θdθ ∧ dr − (r (sin(θ cos θdθ ∧ dθ| = r|(cos2 + sin2 θ)dr ∧ dθ| = r|dr ∧ dθ| = r dr dθ.

This gives us the following proposition:

2 Proposition 5.50 (Change of variables from Cartesian to polar coordinates). Let f : R → R 2 be a function of two variables. Then the integral of the function f over a region U ⊂ R can be written in coordinates as: ¤ ¤ ¤ f(x) dA = f(x, y) dx dy = f(r cos θ, r sin θ) r dr dθ. U U U Example 5.51. Suppose that we want to find the volume of the hemisphere of radius 2, that is the volume of the region bounded by the sphere x2 + y2 + z2 = 4 and the xy-plane. Therefore we have to evaluate the integral: ¤ p 4 − x2 − y2 dx dy, B2(0)

48 where B2(0) is the ball of radius 2 centred at the origin in the xy-plane. The computation using this formulation can be very messy. Therefore, since the region and function f(x, y) = p4 − x2 − y2 are rotationally symmetric, it would be easier if we can carry out a change of coordinates (x, y) → (r, θ) to polar coordinates since both the region and function will be independent of the variable θ. We then have: ¤ ¤ p p 4 − x2 − y2 dx dy = r 4 − r2 dr dθ B2(0) ¢ B2(¢0) 2π 2 p = r 4 − r2 dr dθ 0 ¢ 0 2 p 16 = 2π r 4 − r2 dr = π. 0 3 Exercise 5.52. Find the volume inside the paraboloid z = x2 + y2 for z ∈ [0, 1].

In fact, Proposition 5.50 can be generalised for any change of variables formula:

2 Proposition 5.53 (Change of variables formula). Let f : R → R be a function of two variables. 2 Suppose that we have a change of variables (x, y) → (u, v) in a region U ⊂ R given by x = x(u, v) and y = y(u, v). Define the Jacobian matrix as the 2 × 2 matrix given by: ! ∂(x, y) ∂x ∂x J = = ∂u ∂v . ∂(u, v) ∂y ∂y ∂u ∂v

2 Then the integral of the function f over a region U ⊂ R can be written in coordinates as: ¤ ¤ ¤ f(x) dA = f(x, y) dx dy = f(x(u, v), y(u, v))|det(J)| du dv. U U U Exercise 5.54. Prove Proposition 5.53 by generalising the change of variables procedure from Cartesian coordinates to polar coordinates above.

2 1 Exercise 5.55. Suppose that the region U ⊂ R is bounded by the curves y = x, y = x , and 3x y = 1+x2 . Sketch these curves in the xy-plane and denote the region U. We carry out the y change of variables (x, y) → (u, v) defined by u = x and v = xy. Sketch the region U is mapped to under this change of coordinates in the uv-plane. Hence or otherwise, evaluate the following integral: ¤ y2 2 dx dy. U x Exercise 5.56. Let A be the area of the region in the first quadrant bounded by the line 1 1 2 2 y = 2 x, the x-axis, and the ellipse γ described by the equation 9 x + y = 1. Find the positive number m such that A is equal to the area of the region in the first quadrant bounded by the line y = mx, the y-axis, and the ellipse γ. Hint: use a suitable change of coordinates to turn the ellipse into some other object that is easier to integrate over.

49 2 In Remark 5.45, we have seen how one can compute the area of a planar region U ⊂ R using double integrals. However, one might also be interested in computing the area of a curved surface, for example the surface area of a sphere or a cylinder. Recall that we have computed the length of a curve by adding up infinitesimal lengths of the curve, which were given by the length |v0(t)| of the parametrisation v(t) of the curve. The infinitesimal area element of a surface Σ, denoted as dS can also be described in a 2 similar manner. Recall the two tangent vectors to a graph G(f) of a function f : R → R at the point q = (x0, y0, f(x0, y0)) which were given as the tangent vectors to the curves u1(x, y0) and u2(x0, y).

f(x, y)

q

∂yu2 ∂xu1 y U

x

Figure 27: The graph G(f) over U with two tangent vectors ∂xu1 and ∂yu2 at the point q.

These two tangent vectors describes the infinitesimal lengths of a parallelogram on the graph at q = (p, f(p)) = (x0, y0, f(x0, y0)). Therefore, the infinitesimal area element dS at the point q is then given by the magnitude of their cross products:

dS(q) = |∂xu1 × ∂yu2|(p) dx dy = |(1, 0, ∂xf) × (0, 1, ∂yf)|(p) dx dy

= |(−∂xf, ∂yf, 1)|(p) dx dy q 2 2 = ∂xf(p) + ∂yf(p) + 1 dx dy.

Thus, the surface area of the graph G(f) is obtained by adding up all the infinitesimal area element over all the points x in the domain of the graph.

Definition 5.57 (Surface area of graph). Let f : U → R be a real valued smooth function from 2 a subset U ⊂ R and let Σ be the surface defined by the graph G(f) of the function f over U. The area of the surface Σ is given by the integral: ¤ ¤ q 2 2 dS = ∂xf(x) + ∂yf(x) + 1 dx dy. Σ U Exercise 5.58. Show that the surface are of a sphere of radius r is given by 4πr2.

By extrapolating from the double integral, we can also define the triple integrals in the same manner:

50 3 Definition 5.59 (Triple integrals). Let f : R → R be a function of three variables. Then the integral of the function f over a region U ⊂ 3 is denoted as: ¦ R f(x) dV, U 3 where dV is the infinitesimal volume element of the domain R . In the usual Cartesian coordi- nates, if we denote these variables as x, y, and z, the volume element is dV = dx dy dz. One can also compute the change of variables formula using the wedge product of the differential forms dV = |dx ∧ dy ∧ dz|. This is called a volume form. Thus we have a change of variables formula analogous to the 2 dimensional case in Proposition 5.53:

3 Proposition 5.60 (Change of variables formula). Let f : R → R be a function of three 3 variables. Suppose that we have a change of variables (x, y, z) → (u, v, w) in a region U ⊂ R given by x = x(u, v, w), y = y(u, v, w), and z = z(u, v, w). Define the Jacobian matrix as the 3 × 3 matrix given by:  ∂x ∂x ∂x  ∂u ∂v ∂w ∂(x, y, z)   J = =  ∂y ∂y ∂y  . ∂(u, v, w)  ∂u ∂v ∂w  ∂z ∂z ∂z ∂u ∂v ∂w Then the integral of the function f over a region U ⊂ 3 can be written in coordinates as: ¦ ¦ R f(x) dV = f(x, y, z) dx dy dz U ¦U = f(x(u, v, w), y(u, v, w), z(u, v, w))|det(J)| du dv dw. U Example 5.61. Finally let us look at an application of triple integral. We define the density of 3 3 a material in R to be a function ρ : R → R. Suppose we have a ball of unit radius B1(0) such that its density increases radially, given by ρ(x) = |x|. We can compute its mass by integrating the density over the ball, that is we integrate the weighted volume element ρ(x) dV . Therefore its mass is: ¦ ρ(x) dV. B1(0) Finding a closed form for the limits of this integral can be messy. If we integrate this using the Cartesian coordinates, we have to evaluate: ¢ ¢ √ ¢ √ 1 1−z2 1−y2−z2 p 2 2 2 √ √ x + y + z dx dy dz, 0 − 1−z2 − 1−y2−z2 which does not look very nice. However, our domain of integration is spherically symmetric. Therefore it is more convenient to use the spherical coordinates where x = r sin φ cos θ, y = r sin φ sin θ, and z = r cos φ. We compute the volume element dV in spherical coordinates. We can either use the wedge product derivation or the Jacobian. Here we compute the Jacobian matrix as:   sin φ cos θ −r sin φ sin θ r cos φ cos θ ∂(x, y, z)   J = = sin φ sin θ r sin φ cos θ r cos φ sin θ , ∂(r, θ, φ)   cos φ 0 −r sin φ

51 and thus |det(J)| = |r2 sin φ|. Therefore: ¦ ¢ ¢ ¢ ¢ ¢ π 2π 1 π 1 π ρ(x) dV = r3 sin φ dr dθ dφ = 2π sin φ dφ r3 dr = . B1(0) 0 0 0 0 0 2 Example 5.62. Show that the volume of a cone with base of radius r and height h is given by 2 h V = πr 3 .

6 Vector Fields

We have seen in the previous chapters vector-valued functions and multivariable functions. Now we are going to look at vector valued functions from a vector space.

3 Definition 6.1 (Vector field). A vector field over a region U ⊂ R is a map defined by F : U → 3 R . If we write the coordinates in U as x = (x1, x2, x3) and the coordinates in the codomain as v = (v1, v2, v3), the map is described explicitly as:

F ((x1, x2, x3)) = (v1(x1, x2), v2(x1, x2), v3(x1, x2, x3)).

3 Essentially a vector field is constructed by attaching a vector v(x) ∈ R to every point x ∈ U. One can think of this vector v(x) as the direction a particle would be moving when it 3 2 is at a point x ∈ U. This definition is also true if the vector spaces R are replaced with R everywhere above. Vector fields found many applications in science such as physics, biology, game theory, and meteorology. For example, the weathermen study vector fields over the surface of the earth in the form of wind velocities at points on the earth. The difference in the pressures of different points on the map give rise to these vector fields. Physicists study physical fields like gravitational, electric, and magnetic fields.

  (a) F(x, y) = (y, −x). (b) G(x, y) = −√ x , −√ y . x2+y2 x2+y2

Figure 28: Examples of vector fields.

52 3 Example 6.2. An example of a vector field over U ⊂ R is the gradient of a scalar function f : U → R. The vector field is obtained by assigning ∇f(x) to the point x ∈ U.

3 Conversely, a vector field F : U → R is called conservative if:

3 Definition 6.3 (Conservative vector field). A vector field F : U → R is called conservative if there exists a scalar function f : U → R such that ∇f = F. The function f is called the potential of the vector field F.

Example 6.4. A gravitational field is a force exerted on a particle of unit mass at the point x by a particle of mass M at the origin 0 in the 3-dimensional space is given by the Newton’s law of universal gravitation: x F(x) = −GM , |x|3 where G is a constant called the gravitational constant. This vector field is a conservative vector 3 3 field since there exists a scalar function f : R → R such that ∇f = F. To construct the scalar potential function, we first assume that there exists such a function f(x, y, z). Then we solve the three equations:

∂f x = −GM 3 , ∂x (x2 + y2 + z2) 2 ∂f y = −GM 3 , ∂y (x2 + y2 + z2) 2 ∂f z = −GM 3 . ∂z (x2 + y2 + z2) 2

1 Integrating the first one, we get f(x, y, z) = GM √ + f1(y, z). Similarly, the second x2+y2+z2 1 1 and the third give us f(x, y, z) = GM √ + f2(x, z) and f(x, y, z) = GM √ + x2+y2+z2 x2+y2+z2 f3(x, y). Comparing the three equations, we get f1 = f2 = f3 = C where C is some constant and thus the potential function of the vector field F is f(x, y, z) = GM √ 1 + C. x2+y2+z2 However, how do we know that a given vector field F is conservative? An easy way to check 3 this is via Clairaut’s theorem. Recall that the partial derivatives commute in R . Therefore, we have the following test:

3 Proposition 6.5 (Conservative vector field test). A vector field F : U → R is conservative if and only if: ∂F ∂F ∂F ∂F ∂F ∂F 1 = 2 , 1 = 3 , and 2 = 3 . ∂y ∂x ∂z ∂x ∂z ∂y

2 Remark 6.6. In two dimensions (that is if U ⊂ R ), the conservative vector field test is much 2 simpler: a vector field F : U → R is conservative if and only if: ∂F ∂F 1 = 2 . ∂y ∂x

53 6.1 Divergence and Curl

Recall that the gradient is a map from a scalar function to a vector field. We also have a map from a vector field to the scalars called the divergence:

3 3 Definition 6.7 (Divergence). Let F : U → R be a vector field in U ⊂ R . Then the divergence of F at a point p ∈ U is the function: ∂F ∂F ∂F ∇ · F(p) = 1 (p) + 2 (p) + 3 (p). ∂x1 ∂x2 ∂x3

Again, if we think of the operator ∇ = (∂x, ∂y, ∂z), the divergence can be thought of as the formal dot product of the vectors ∇ · F = (∂x, ∂y, ∂z) · (F1,F2,F3) = ∂xF1 + ∂yF2 + ∂zF3. Physically, the divergence of a vector field F at the point p ∈ U measures the volume density of the outward flux of the vector field in an infinitesimal region around the point p. If the divergence is positive, the point p is a source of the vector field F . Otherwise, the point p is called the sink of F. If ∇ · F ≡ 0, then the vector field F is called divergence-free. In fluid mechanics, the fluid is incompressible if the velocity field of the fluid is divergence-free. Likewise, in electromagnetism, an electric or magnetic field is called solenoidal if it is divergence-free. Another vector operation that is used frequently is called the curl of a vector field. It is defined as such:

2 3 Definition 6.8 (Curl). Let F : U → R be a vector field in U ⊂ R . Then the curl of F at a point p ∈ U is the vector:

∂F ∂F ∂F ∂F ∂F ∂F  ∇ × F(p) = 3 − 2 , − 3 + 1 , 2 − 1 (p). ∂x2 ∂x3 ∂x1 ∂x3 ∂x1 ∂x2

Once more, if we think of the operator ∇ = (∂x, ∂y, ∂z), the curl can be thought of as the formal cross product of the vectors ∇ × F. Thus the curl of a vector field F is the formal determinant of the following matrix:   e1 e2 e3   ∇ × F = det ∂x ∂y ∂z  = (∂yF3 − ∂zF2)e1 − (∂xF3 − ∂zF1)e2 + (∂xF2 − ∂yF1)e3.   F1 F2 F3

3 Like the cross product, the curl of a vector field is only defined for vector fields in R . The curl of a vector field F at the point p ∈ U measures the circulation density of the vector field along an infinitesimal curve enclosing p. If ∇ × F ≡ 0, then the vector field F is called irrotational.

Proposition 6.9 (Vanishing properties). Suppose that f : U → R is a map from an open set 3 3 U ⊂ R to R and F : U → R is a vector field on U. Then:

1. ∇ × (∇f) = 0,

54 2. ∇ · (∇ × F) = 0.

Proof. For the first assertion, let us compute the first component of this vector:

2 (∇ × (∇f))1 = ∂y(∇f)3 − ∂z(∇f)2 = ∂zyf − ∂yzf = 0, since partial derivatives commute. The other components are computed in a similar manner and hence the result. The second result is purely computational:

∇ · (∇ × F) = ∂x(∂yF3 − ∂zF2) − ∂y(∂xF3 − ∂zF1) + ∂z(∂xF2 − ∂yF1) = 0, since partial derivatives commute.

From the proposition above, we can modify Proposition 6.5 to simply: a vector field F is conservative if and only if ∇×F = 0. Hence irrotational vector fields are also called conservative. Here are some properties of divergence and curl:

Proposition 6.10 (Properties of divergence and curl). Suppose that f, g : U → R are maps 3 3 from an open set U ⊂ R to R and F, G : U → R are vector fields on U. Then:

1. ∇ · (F + G) = ∇ · F + ∇ · G,

2. ∇ · (fF) = (∇f) · F + f∇ · F,

3. ∇ · (F × G) = (∇ × F) · G − (∇ × G) · F,

4. ∇ × (F + G) = ∇ × F + ∇ × G,

5. ∇ × (fF) = (∇f) × F + f∇ × F,

6. ∇ × (f∇g) = ∇f × ∇g,

7. ∇ × (F × G) = F(∇ · G) − G(∇ · F) + (G · ∇)F − (F · ∇)G, where F · ∇ is the operator F1∂x + F2∂y + F3∂z.

3 Proposition 6.11 (Gradient of dot products). Suppose that F, G : U → R are vector fields 3 on U ⊂ R . Then:

∇(F · G) = (F · ∇)G + (G · ∇)F + F × (∇ × G) + G × (∇ × F).

The proofs of the two propositions above are simply computations.

Definition 6.12 (Laplacian of a function). Suppose that f : U → R is a map from an open set 3 U ⊂ R to R. Then we define the Laplacian ∆f of the function f as: ∂2f ∂2f ∂2f ∆f = ∇ · ∇f = + + . ∂x2 ∂y2 ∂z2

55 The Laplacian is a very important object in the study of applied mathematics, mathematical modelling, and partial differential equations. Aside from being the simplest second order partial differential equation, it appears in many physical processes like the heat equation, reaction- diffusion equation, and wave equation. Here are some of its identities:

Proposition 6.13 (Laplacian identities). Suppose that f, g : U → R are maps from an open 3 set U ⊂ R to R. Then:

1. ∇ · (f∇g) = f∆g + ∇f · ∇g,

2. f∆g − g∆f = ∇ · (f∇g − g∇f),

3. ∆(fg) = f∆g + 2∇f · ∇g + g∆f.

These identities are helpful when we are integrating later on.

Exercise 6.14. Prove all the assertions in Proposition 6.13.

7 Line and Surface Integrals

7.1 Line Integral

In one dimensions, we have seen the integral of functions f : R → R over some interval I ⊂ R. The domain I = [a, b] of the integral can also be thought of as a path in the domain paramaterised by the variable x ∈ [a, b] in a trivial way. The integral of the function f oer the interval I is the “sum” of the values of f(x) over this interval. This integral can also be thought of as the weighted length of the path I with weight f(x) at each x ∈ I. In this chapter we are going to define integration over a path in higher dimensions. This integral will be called the path or line integrals. 2 2 Suppose now we are given a regular curve γ ⊂ R and a function f : R → R. We wish to integrate the function f along this curve γ, that is we wish to find the weighted length of this path with the weight given by f(x). Therefore the line integral over the path γ is defined as: ¢ f(x) ds, (14) γ where ds is the infinitesimal arclength of the path γ. We now recall Definition 4.17. The 2 arclength function of a curve γ ⊂ R denoted s :[a, b] → [0,L] is defined as: ¢ t p s(t) = x0(u)2 + y0(u)2 du, a and therefore, by differentiating with respect to the variable t and using the Fundamental Theorem of Calculus, we get: ds p = x0(t)2 + y0(t)2, (15) dt

56 Substituting this in the integral above and pulling the integral (14) back to the domain I = [a, b], we get: ¢ ¢ b p f(x(t), y(t)) ds = f(x(t), y(t)) (x0(t))2 + (y0(t))2 dt, I a which is an integral that we know how to evaluate. We define this as the line (or path) integral:

2 Definition 7.1 (Line integral of scalar field). For a regular curve γ ⊂ U ⊂ R which is parametrised by v(t) = (x(t), y(t)) for t ∈ [a, b] and a function f : U → R, we define the line integral of the function f over γ as the integral: ¢ ¢ b f(x) ds = f(x(t), y(t))|v0(t)| dt. γ a This definition is well defined as there is only one arclength parametrisation of the curve. If we switch to a different parametrisation of the curve, the integral would also give the same value. Therefore the integral does not depend on the parametrisation of the curve, so long as they are defined properly. You can prove this in the exercise below:

Exercise 7.2. By adapting the proof of Proposition 4.20, prove that the line integral of a scalar field is independent of the arclength parametrisation, that is given any two different 2 2 parametrisation u :[a, b] → R and v :[c, d] → R of the curve γ, prove that: ¢ ¢ b d f(u(t))|u0(t)| dt = f(v(t))|v0(t)| dt. a c 3 3 Remark 7.3. The path integral above extends naturally to paths in R ad f : R → R. However, the arclength element is modified to ds = p(x0(t))2 + (y0(t))2 + (z0(t))2 dt to take into account the third dimension.

2 2 Now suppose that F : U → R is a vector field over U ⊂ R . We can also define the line 2 integral of his vector field along a path in the R space. This is defined as follows: 2 Definition 7.4 (Line integral of vector field). For a regular curve γ ⊂ U ⊂ R which is 2 parametrised by v(t) = (x(t), y(t)) for t ∈ [a, b] and a vector field F : U → R , we define the line integral of the vector field F over γ as the integral: ¢ ¢ b F(x) · dx = F(x(t), y(t)) · v0(t) dt. γ a Remark 7.5. Sometimes the line integral over vector fields can also be described as line integral of a set of scalar functions. Indeed suppose that the path γ is parametrised as v(t) = (x(t), y(t)). Then we have: ¢ ¢ 0 0 F(x) · dx = (F1(x),F2(x)) · (x (t), y (t)) dt γ ¢γ dx dy = F (x) dt + F (x) dt 1 dt 2 dt ¢γ

= F1(x) dx + F2(x) dy. γ

57 This form of integral is called line integral in differential forms because we are integrating the differential form F1(x) dx + F2(x) dy.

In the definition above, we note that the vector v0(t) is tangential to the curve at v(t) for all t ∈ [a, b]. Therefore, an interpretation of the line integral of vector field above is that we are summing up all the component of the vector field tangential to the curve. This can be seen in the following proposition:

2 Proposition 7.6. For a regular curve γ ⊂ U ⊂ R which is parametrised by v(t) = (x(t), y(t)) 2 for t ∈ [a, b] and a vector field F : U → R , we have: ¢ ¢ F(x) · dv = F(x) · T(x) ds, γ γ where T(x) is the unit tangent vector to the curve the the point x ∈ γ and ds is the infinitesimal arclength parameter.

Proof. We recall from Definition 4.8 that the unit tangent vector T(x) of a regular curve v0(t) parametrised by v(t) is given by T (x) = |v0(t)| . Furthermore, from equation (15) we know ds 0 that dt = |v (t)|. Thus: ¢ ¢ ¢ ¢ b b s(b) F(x) · dx = F(x) · v0(t) dt = F(v(t)) · T(v(t))|v0(t)| dt = F(v(s)) · T(v(s)) ds, γ a a s(a) which is what we claimed.

Care must be taken for vector line integrals: it depends on the direction of parametrisations. If we parametrise the same curve γ with parametrisations u(t) and v(t) with opposite directions, 2 say we have a parametrisation u :[a, b] → R and we define a reverse parametrisation v : 2 [a, b] → R such that v(t) = u(b + a − t) then the tangent vectors of each parametrisation would point in opposite directions and hence have opposite signs. Therefore, when we integrate F(x(t), y(t)) · v0(t), we would get a different sign to the integration F(x(t), y(t)) · u0(t). However, similar to the line integral for scalar fields, the line integral for vector fields is independent of parametrisation of the regular curve γ which have the same direction.

Exercise 7.7. By adapting the proof of Proposition 4.20, prove that the line integral of a vector field is independent of the arclength parametrisation with the same direction, that is given any 2 2 two different parametrisation u :[a, b] → R and v :[c, d] → R of the regular curve γ such that u(a) = v(c) and u(b) = v(d), prove that: ¢ ¢ b d F(u(t)) · u0(t) dt = F(v(t)) · v0(t) dt. a c Therefore, one must only pay attention to the orientation of the parametrisation of the integral. The usual convention for the choice of parametrisation is the parametrisation is done in anti-clockwise direction.

58 Vector line integrals have a physical application: the amount of work done to move an object done by a force F is equal to the force in the tangential component of the path taken. Recall that in one dimensions, is we are pulling an object to move horizontally using a string inclined at an angle θ, we are only doing work in the horizontal direction, thus the work done is W = F x cos θ, where x is the distance the object is moved.

F

θ x

Figure 29: The work done by the force when moving the object x units is W = F x cos θ.

This extends to higher dimension: if we are moving an object in a curved path γ parametrised by v(t) for t ∈ [a, b], the instantaneous work done is proportional to the component of the instantaneous force tangential to the curve, which gives us the formula dW = F · dv.

y γ F(v(t))

v(t) v0(t)

x

Figure 30: The force vector F at the point v(t) and the tangential vector v0(t). The infinitesimal work dW of this force is done in the direction tangential to the curve of motion.

Summing up over the path from a to b, we get the integral: ¢ ¢ b W = F(v(t)) · v0(t) dt = F(x) · dx, a γ which is the line integral of the vector field we have seen earlier.

Remark 7.8. The definition and results (scalar and vector line integrals) we have defined so 3 far all extends naturally to R . You might want to check this by looking at the proposition, proofs, and definitions and note that the 2-dimensional condition for the vector space is not used anywhere.

Let us now have a further look at an example we have seen earlier:

Example 7.9. Recall the gravitational field force in Example 6.4 exerted by an object of mass M at the origin 0 as: x F(x) = −GM , |x|3

59 where G is a constant called the gravitational constant. Suppose that we are moving an object in a straight line γ joining the points (1, 1) and (3, 3). We can parametrise this straight line as the path v(t) = (t + 1, t + 1) for t ∈ [0, 2]. Therefore, the work done by the object at 0 is: ¢ ¢ x W = F(x) · dx = −GM · dx |x|3 γ γ ¢ 2 (t + 1, t + 1) d = −GM 3 · (t + 1, t + 1) dt 0 (2(t + 1)2) 2 dt ¢ 2 (t + 1, t + 1) = −GM 3 · (1, 1) dt 0 (2(t + 1)2) 2 ¢ √ 2 2(t + 1) GM 2 = −GM 3 dt = − . 0 (2(t + 1)2) 2 3 For both of the scalar and vector line integrals, it is noted that the integral only makes sense for curves which are smooth enough, that is curves which are differentiable. For piecewise differentiable curves, we would run into the problem of finding a unique tangent point at the transition points of the curves and therefore the line integral is not well-defined. Therefore a way to remedy this is to define the integral piecewise. For example, for the line 3 integral of scalar field, suppose that we have a curve γ ∈ R which is not differentiable some

finite number of points p1, p2,..., pn ∈ γ. Then we can parametrise this curve via a piecewise 3 differentiable set of functions, which is defined by the map v :[a0, an] → R by:  v1(t) for t ∈ [a0, a1],  . v(t) = .   vn(t) for t ∈ [an−1, an], such that all the vi are differentiable and v(ai) = pi are the transition points. Then we define the line integral as: ¢ ¢ ¢ ¢ a1 a2 an 0 0 0 f(x) ds = f(v1(t))|v1(t)| dt + f(v2(t))|v2(t)| dt + ··· + f(vn(t))|vn(t)| dt, γ a0 a1 an−1 which is now well defined.

Example 7.10. Recall the gravitational field force in Example 7.9 exerted by an object of mass M at the origin 0 and the work done by it for moving a point mass object from (1, 1) to (3, 3) in √ GM 2 a straight line γ is − 3 . Suppose now the point mass object takes a different path ω to get from (1, 1) to (3, 3) namely it moves to the point (3, 1) in a straight line first and then moves to (3, 3) in another straight line. This path ω is not smooth as there is a sharp turn at (3, 1). However, it can be expressed 2 as a piecewise smooth path parametrised by the map v : [0, 4] → R via:  (t + 1, 1) for t ∈ [0, 2], v(t) = . (3, t − 1) for t ∈ [2, 4],

60 y (3, 3)

γ

(1, 1) ω x 0

Figure 31: The paths γ and ω taken by the point mass object from (1, 1) to (3, 3).

The work done by the object at 0 is then given by: ¢ ¢ ¢ 2 (t + 1, 1) 4 (3, t − 1) W = F(x) · dx = −GM 3 · (1, 0) dt + −GM 3 · (0, 1) dt ω 0 ((t + 1)2 + 1) 2 2 (32 + (t − 1)2) 2 ¢ ¢ 2 t + 1 4 t − 1 = −GM 3 dt − GM 3 dt 0 ((t + 1)2 + 1) 2 2 (32 + (t − 1)2) 2 √  1 1   1 1  GM 2 = −GM √ − √ − GM √ − √ = − , 2 10 10 3 2 3 which is the same as the work done if the point mass object is moved along γ. In fact, the work done by the object to move the point mass from (1, 1) to (3, 3) is the same, regardless of the path taken. Try this with any path you like joining these two points. In general, this is not true: a path integral is dependent on the path. However, in our case above, this phenomenon is true because the gravitational vector field F is conservative, that is ∇f = F for f(x, y) = GM √ 1 + C for some constant C as we have x2+y2 shown for the 3-dimensional case in Example 6.4. The path independence property is true for any conservative vector fields.

3 3 Proposition 7.11. Let F : U → R be a vector field over a region U ⊂ R and γ ⊂ U be a path with endpoints p and q. Suppose that F is conservative, that is there exists some function f : U → such that ∇f = F, then: R ¢ F(x) · dx = f(p) − f(q). γ Proof. We parametrise the regular curve γ via the map v :[a, b] → U such that v(a) = p and v(b) = q. Then: ¢ ¢ b F(x) · dx = F(v(t)) · v0(t) dt, γ a d However, since ∇f = F, by using chain rule in Proposition 5.13, we have dt f(v(t)) = ∇f · v0(t) = F(v(t)) · v0(t), and thus substituting this in the above and using Fundamental Theorem of Calculus, we have: ¢ ¢ b d F(x) · dx = f(v(t)) dt = f(v(a)) − f(v(b)) = f(p) − f(q), γ a dt which concludes the proof.

61 Finally, we define a loop and a corollary of the above.

3 Definition 7.12 (Simple loop). A simple loop or a simple closed curve C ∈ R is a curve that 3 is closed and does not intersect itself, that is it can be parametrised by a path v :[a, b] → R such that v(a) = v(b) and if v(t1) = v(t2) for any t1, t2 ∈ [a, b], then t1 = t2.

If we are integrating over a path that is a closed loop C, we denote the integral as:

f(x) ds and F(x) · dx. C C A direct corollary of Proposition 7.11 is:

3 3 Corollary 7.13. Let F : U → R be a vector field over a region U ⊂ R and C ⊂ U is a simple loop in U. Suppose that F is conservative, then:

F(x) · dx = 0. C

2 Exercise 7.14. Let ∆ be a triangle in R with vertices at (0, 0), (4, 1) , and (2, 3). Write down a parametrisation for the edges of this triangle. Hence or otherwise, evaluate the following line integrals along the edges ∂∆ (going anti-clockwise) of the triangle: ¢ ¤ (2x − 3y + 1) dx − (3x + y − 5) dy, and x2 dx + xy dy. ∂∆ ∂∆

3 Exercise 7.15. Let ∆ be a triangle in R defined by the intersection of the plane 2x+2y+z = 6, the xy-plane, the yz-plane, and the xz-plane. Sketch this triangle, noting where the vertices are. Write down a parametrisation for the edges of this triangle. Hence, evaluate the following vector line integral along the edges ∂∆ of the triangle: ¢ (−y2, z, x) · dx. ∂∆

7.2 Green’s Theorem

2 A remarkable theorem in R that relates line integral of a closed loop with the integral of the region enclosed by the loop is Green’s theorem. We first define simply connected regions:

2 Definition 7.16 (Simply connected region). Let U ⊂ R be a subset in the plane. The region U is called simply connected if any simple loop in U encloses only points in U.

2 Theorem 7.17 (Green’s Theorem). Let U ⊂ R be a simply connected region in the plane bounded by a piecewise smooth anti-clockwise oriented simple loop ∂U. Suppose further that 2 F,G : R → R are functions with continuous derivatives in U and on ∂U. Then: ¤ ∂G ∂F F (x) dx + G(x) dy = − dx dy. (16) ∂U U ∂x ∂y

62 Proof. Let us begin with a simple case in which any straight lines parallel to the x and y axes crosses the loop ∂U at most twice. Therefore, we can find exactly two points at which lines parallel to each of the axes. Without loss of generality, let us assume the lines parallel to the y-axis have x-coordinates a and b.

y

g(x)

U ∂U

f(x) x a b

Figure 32: Region of integration in Green’s Theorem.

Since every points in between these two points correspond to two points on the loop ∂U, we can denote the lower part of the loop as the graph of a function f(x) whereas the upper part of the loop is denoted as the graph of g(x). We compute: ¤ ¢ ¢ ¢ ∂F b g(x) ∂F b (x) dA = (x) dy dx = F (x, g(x)) − F (x, f(x)) dx ∂y ∂y U a¢ f(x) ¢ a a b = − F (x, g(x)) dx − F (x, f(x)) dx = − F (x) dx. b a ∂U Similarly, if we repeat the procedure with lines parallel to the x-axis, we would get: ¤ ∂G dx dy = G(x) dx, U ∂x ∂U When we add up the two integrals together, we get the equation (16) for these special cases. For the general case, we split the simply connected domain into smaller domains which satisfy the special case and carry out the procedure for as above.

y

U1

U3 U2

x

Figure 33: Division of the region U into regions satisfying the special cases.

63 When we add the resulting integrals together, we note that the line integrals on the intro- duced curves not originally in ∂U will cancel each other since the integrals would have opposite orientations.

y x If we set F (x) = − 2 and G(x) = 2 , then Green’s Theorem gives us a formula for computing the area of the region U since: ¤ ¤ 1 1 1 −y dx + x dy = + dx dy = dA. 2 ∂U U 2 2 U

2 Green’s Theorem also works for non-simply connected regions U ⊂ R . For example, if our domain is an annulus: y

U

x

Figure 34: Cutting the region U into a simply connected domain.

We can cut along the dotted line so that the domain is now simply connected. We can thus apply Green’s Theorem to the resulting region. However since the orientation of integration on the introduced cut are opposite to each other, integration along this cut cancels out.

Exercise 7.18. Using Green’s Theorem, verify the answers you get in Exercise 7.14.

Exercise 7.19. Evaluate the line integral:

ex sin y dx + (y3 + ex cos y) dy, γ

2 where γ is the edges of a square in R with vertices (0, 0), (1, 0), (1, 1), and (0, 1).

Exercise 7.20. Evaluate the line integral:

(arctan x + y2) dx + (ey − x2) dy, γ where γ is the path enclosing the half annulus {(x, y) : 1 ≤ x2 + y2 ≤ 3, y ≥ 0}.

Exercise 7.21. Show that the area of an ellipse with radii a and b bounded by the curve given x2 y2 by a2 + b2 = 1 is πab.

64 7.3 Surface Integral

In the previous section, we have seen the line integral on a regular curve. The way to do this is 2 to parametrise the curve with a parameter t which is run through an interval I, that is γ ⊂ R is described by the function v(t) = (x(t), y(t)). We have seen graphs of functions as well: let 2 f : U → R be a smooth function from a subset U ⊂ R , then the graph G(f) is the surface Σ = {(x, f(x)) : x ∈ U}. If we choose the usual x and y coordinates in U, the graph is the set of triples (x, y, f(x, y)). This means that the surface Σ depends only on two parameters, namely x and y. Extending from the one-dimensional case for curves, we define:

3 Definition 7.22 (Parametric surface). A parametric surface Σ ⊂ R is the set of points 3 3 2 in R described by some smooth function F : U → R with U ⊂ R such that S(s, t) = (x(s, t), y(s, t), z(s, t)).

This definition generalises the surfaces described by a graph . For a surface described by a graph of some function f : U → R, the parametrisation is given by S(s, t) = (s, t, f(s, t)) for (s, t) ∈ U.

Example 7.23. The sphere of unit radius described by x2 + y2 + z2 = 1 cannot be described as the graph of one function in terms of its variables x, y, and z, as we have seen in Example 5.4. However it can be described parametrically by:

3 S : [0, 2π) × [0, π] → R , (θ, φ) 7→ (sin φ cos θ, sin φ sin θ, cos φ),

Exercise 7.24. By comparing the radii of different horizontal slices of the elliptic hyperboloid x2 y2 z2 defined as the 0-level set of the function f(x, y, z) = a2 + a2 − b2 − 1, find a parametrisation for the surface from the domain [0, 2π) × R.

However, we are interested in a special parametrised surface, called the regular parametrised surface. This ensures that we are not dealing with pathological degenerate cases which would make our lives more difficult.

Definition 7.25 (Regular parametrised surface). A regular parametrised surface is a surface Σ such that it is a smooth function and the Jacobian of its parametrisation is of rank 2 for all p = (s, t) ∈ U.

The reason we require this definition is it allows us to describe the tangent plane of this ∂ ∂ surface more easily. Indeed, let us define two vectors ∂s S(p) = (∂sx, ∂sy, ∂sz)(p) and ∂t S(p) = (∂tx, ∂ty, ∂tz)(p). We claim that these two vectors are linearly independent for all p ∈ U. Indeed the Jacobian of the map S is given by:

 ∂x ∂x  ∂s ∂t   J(p) =  ∂y ∂y  (p).  ∂s ∂t  ∂z ∂z ∂s ∂t

65 For a regular parametrised surface, this matrix has rank 2 at every p ∈ U. Rank 2 implies that it has two linearly independent columns, so the vectors ∂sS(p) and ∂tS(p) are linearly independent for all p ∈ U, which allows us to define:

3 Definition 7.26 (Tangent plane of regular parametrised surface). Let Σ ⊂ R be a regular parametrised surface given by the map S(s, t) = (x(s, t), y(s, t), z(s, t)). The tangent plane to the surface Σ at the point q = S(p) ∈ Σ is defined as:

 3 Tq(Σ) = v ∈ R : v = q + λ∂sS(p) + µ∂tS(p) for λ, µ ∈ R .

From this, we can define the normal vector to a parametric surface at q ∈ Σ.

3 Definition 7.27 (Normal vector to a regular parametric surface). Let Σ ⊂ R be a regular parametrised surface given by the map S(s, t) = (x(s, t), y(s, t), z(s, t)). The normal vector to the surface Σ at the point q = S(p) ∈ Σ is defined as the cross product:

n = ∂sS(p) × ∂tS(p).

Of course, there are two possible choices for the normal vector, namely ∂sS(p) × ∂tS(p) or

∂tF(p) × ∂sS(p). They have the same magnitude but points in the opposite directions. We shall comment on this more later. Analogous to the finding the length of a regular curve, we can also define the area of a parametric surface in a similar manner. We note that we are integrating over the variables s and t. Therefore the infinitesimal area element is given by the area of the parallelogram spanned by the tangent vectors ∂sS(p) and ∂tS(p). Therefore, similar to what has been discussed earlier for surfaces defined by graphs of a function f : U → R, the infinitesimal area element of the surface Σ at a point q = S(p) ∈ S is:

dS(q) = |∂sS(p) × ∂tS(p)| dA = |∂sS(p) × ∂tS(p)| ds dt.

3 Proposition 7.28 (Area of regular parametrised surface). Let Σ ⊂ R be a regular parametric 3 2 surface described by some smooth function S : U → R with U ⊂ R given by the map S(s, t) = (x(s, t), y(s, t), z(s, t)). Then the area of the surface Σ is: ¢ ¤

dS = |∂sS × ∂tS| ds dt. Σ U Similar to curves, surfaces are geometric objects and therefore their area is independent of parametrisations. Indeed, suppose that we have two different parametrisations for the regular 3 3 3 surface Σ ⊂ R , given by the maps S : U → R and R : V → R so that Σ = S(U) = R(V ). Let us denote the coordinates on U be s, t and the coordinates on V be u, v. Then there is a bijection between the sets U and V , given by a map F : U → V defined as (s, t) 7→ (u(s, t), v(s, t)) such that S = R ◦ F . The areas of the surface computed with respect to the parametrisation S and R are given by: ¤ ¤

|∂sS × ∂tS| ds dt, and |∂uR × ∂vR| du dv, U V

66 We aim to show that these two quantities are equal. Note that S = R ◦ F . We know by chain rule that: ∂S ∂R ∂u ∂R ∂v = + , ∂s ∂u ∂s ∂v ∂s ∂S ∂R ∂u ∂R ∂v = + . ∂t ∂u ∂t ∂v ∂t Therefore, we have: ¤ ¤     ∂R ∂u ∂R ∂v ∂R ∂u ∂R ∂v |∂sS × ∂tS| ds dt = + × + ds dt ∂u ∂s ∂v ∂s ∂u ∂t ∂v ∂t U ¤U

∂u ∂v ∂u ∂v ∂R ∂R = − × ds dt. (17) U ∂s ∂t ∂t ∂s ∂u ∂v However, note that the first term in the integrand is just the absolute value of the determi- nant for the Jacobian matrix: ! ∂(u, v) ∂u ∂u J = = ∂s ∂t . ∂(s, t) ∂v ∂v ∂s ∂t

Hence, by change of variables, from (s, t) to (u, v), integral (17) becomes: ¤ ¤ ¤

∂u ∂v ∂u ∂v ∂R ∂R |∂sS × ∂tS| ds dt = − × ds dt = |∂uR × ∂vR| du dv, U U ∂s ∂t ∂t ∂s ∂u ∂v V which is what we claimed. Hence we have proven:

Proposition 7.29. The area of a smooth regular parametrised surface Σ is independent of parametrisation.

Exercise 7.30. A torus or a doughnut is a smooth regular parametrised surface that can be parametrised as:

3 S : [0, 2π] × [0, 2π] → R , (s, t) 7→ ((b + a cos s) cos t, (b + a cos s) sin t, a sin s).

Find the surface area of the torus.

3 We can also define the surface integral of a function f : R → R as the sum of the values of f over the surface Σ.

3 Definition 7.31 (Surface integral of scalar field). For a regular parametrised surface Σ ⊂ R 3 2 described by some smooth function S : U → R with U ⊂ R given by the map S(s, t) = 3 (x(s, t), y(s, t), z(s, t)) and a smooth function f : R → R, we define the surface integral of the function f over Σ as the integral: ¤ ¤

f(x) dS = f(x(s, t), y(s, t), z(s, t))|∂sS × ∂tS| ds dt. Σ U

67 Exercise 7.32. By adapting the proof of Proposition 7.29 above, prove that the surface integral of a scalar field is independent of the surface parametrisation, that is given any two different 3 3 parametrisation S : U → R and R : V → R of the smooth regular parametrised surface Σ 2 3 where U, V ⊂ R , and a function f : R → R, prove that: ¤ ¤

f(x(s, t))|∂sS × ∂tS| ds dt = f(x(u, v))|∂uR × ∂vR| du dv, U V

We have noted that the cross product ∂sS(p)×∂tS(p) is perpendicular to the tangent plane of the parametrised surface at S(p). As mentioned in Definition 7.27, this vector is called the normal vector to the surface at S(p). There are two choices for this normal vector, depending on the order of the cross product. One of the vector is called the outward-pointing normal and the other is called inward-pointing normal. The outward pointing normal is determined by looking at the induced (s, t) coordinate grid on the surface: at the point S(p) = S(s0, t0), we can fix t0 and vary s to give a curve S(s, t0) on Σ. Likewise, we can also define the curve S(s0, t) on Σ. These two curves intersect at the point S(p). By using the right-hand rule with the directions determined by these two curves at the point S(p), we can determine the direction of the outward-pointing normal. Roughly speaking, an orientation on a surface is a continuous choice of a unit normal vector to the surface. On orientable surfaces, once we have chosen a unit normal vector at a point, by continuity, this determines the choice of unit normal vector to all of the points on the surface. Since there is a unique choice of outward-pointing normal to the surface, we can define a surface integral of a vector field.

3 Definition 7.33 (Surface integral of vector field). For a regular parametrised surface Σ ⊂ R 3 2 described by some smooth function S : U → R with U ⊂ R given by the map S(s, t) = 3 3 (x(s, t), y(s, t), z(s, t)) and a smooth vector field F : R → R , we define the surface integral of the vector field F over Σ as the integral: ¤ ¤ F · nˆ dS = F · n ds dt, Σ U where n = ∂sS×∂tS is the outward pointing normal to the surface Σ and nˆ is the corresponding unit normal vector.

Remark 7.34. The definition above makes sense because if we have a surface parametrisation 3 S : U → R the unit normal vector nˆ would be: ∂ S × ∂ S nˆ = s t . |∂sS × ∂tS| Thus the integral on the left-hand side is: ¤ ¤ ∂ S × ∂ S F · nˆ dS = F · s t dS. Σ Σ |∂sS × ∂tS|

68 Furthermore, we have seen before that the surface area element dS is given by dS = |∂sS ×

∂tS| ds dt in Proposition 7.28 . So we have: ¤ ¤ ∂ S × ∂ S F · nˆ dS = F · s t dS Σ ¤Σ |∂sS × ∂tS| ¤ ∂ S × ∂ S  s t  = F · |∂sS × ∂tS| ds dt = F · n ds dt, Σ |∂sS× ∂tS| U which gives us the definition above.

Remark 7.35. Similar to the results before, this integral is also independent of the parametri- sation of the surface Σ.

We note that this definition is different to the line integral of vector fields. Here, we take the dot product of the vector field with the direction pointing away from the surface in contrast to tangential to the line. This quantity also has a physical motivation: the integrand F · n is called the flux of the vector field F to the surface Σ. If the vector field F represents the velocity 3 of fluid in R , the integral represents the rate of fluid flow through Σ. Computing this integral over any surface Σ can be quite tedious as one needs to find the normal vector first. However, for closed surfaces, one can compute this quite easily. We first define:

3 3 Definition 7.36 (Closed surfaces). Let Σ ⊂ R be a regular parametrised surface in R . The surface Σ is called closed if it does not have a boundary, that is ∂Σ = ∅.

Now, we state the Divergence Theorem, which is an extension of the Green’s Theorem to three dimensions. The proof of this theorem is very complicated, but essentially it follow from the line of reasoning for the proof of Green’s Theorem: first prove the identity for simple regions which can be described as graphs. Then we proceed to the general case by cutting up the general region to the simple regions, noting that the normal integral to the new added surface areas cancel up.

3 Theorem 7.37 (Divergence Theorem). Let Σ ⊂ R be a closed regular parametrised surface 3 2 described by some smooth function S : U → R with U ⊂ R given by the map S(s, t) = 3 3 (x(s, t), y(s, t), z(s, t)) and a smooth vector field F : R → R . Suppose that the closed surface 3 Σ bounds the solid V ⊂ R . Then we have: ¤ ¦ F · nˆ dS = ∇ · F dV, Σ V where nˆ is the outward pointing unit normal to the surface Σ.

In fact, this theorem also holds if the solid V is bounded by piecewise smooth parametric surfaces. For example, the square or a closed semicircle. The following is an example of application of divergence theorem:

69 3 2 3 2 2 2 3 Example 7.38. Let F = (x + 3y + z , y , x + y + 3z ) be a vector field in R . Suppose that V is a solid bounded by the paraboloid 1 − z = x2 + y2 and the plane z = 0. We want to compute the total flux of the vector field over the paraboloid surface of V .

z

Σ1

V

y

x Σ2

Figure 35: The solid V and the surface components Σ1 and Σ2.

Suppose that the surface bounding V is Σ, then we can decompose the surface Σ = Σ1 + Σ2 where Σ1 is the paraboloid surface and Σ2 is the bottom of the solid V , that is Σ2 is the disc {(x, y, z): x2 + y2 ≤ 1, z = 0}. Mathematically, we want to compute: ¤ F · nˆ dS. Σ1

You can compute this directly by finding the outward pointing normal to the surface Σ1 first. Then use the surface integral to express the infinitesimal surface area dS in terms of dx dy. Another approach is to use Divergence Theorem. We have: ¤ ¤ ¤ ¦ F · nˆ dS = F · nˆ dS + F · nˆ dS = ∇ · F dV. Σ Σ1 Σ2 V So, we have: ¤ ¦ ¤ F · nˆ dS = ∇ · F dV − F · nˆ dS. Σ1 V Σ2 Let us now compute the integrals on the right-hand side. Since the solid is rotationally sym- metric, it is useful to convert the integral from Cartesian coordinates to cylindrical coordinates x = r cos θ, y = r cos θ, and z = h. ¤ ¦ ¦ ∇ · F dV = 3(x2 + y2) + 6z dx dy dz = (3r2 + 6h)r dr dθ dh V ¢ V ¢ ¢ V 2π 1 1−r2 = 3r3 + 6hr dh dr dθ 0 ¢ 0 0 1 3π = 2π 3r − 3r3 dr = . 0 2 On the other hand, the outward pointing unit normal vector to bottom of the solid V is (0, 0, −1). Therefore, the integral over the base of the solid is: ¤ ¤ F · nˆ dS = −(x2 + y2 + 3z2) dA. Σ2 Σ2

70 On Σ2, we have z = 0. Therefore, using polar coordinates: ¤ ¤ ¢ ¢ 2π 1 π F · nˆ dS = −(x2 + y2) dA = − r2r dr dθ = − . Σ2 Σ2 0 0 2 3π π  Thus, we conclude that the total flux of F over the paraboloid surface of V is 2 − − 2 = 2π. Exercise 7.39. Verify this answer by computing the flux using the direct method.

By using vector identities, we can deduce the following results:

3 Corollary 7.40. Let Σ ⊂ R be a closed regular parametrised surface described by some 3 2 smooth function S : U → R with U ⊂ R given by the map S(s, t) = (x(s, t), y(s, t), z(s, t)), 3 3 3 smooth vector fields F, G : R → R , and scalar functions f, g : R → R. Suppose that the closed surface Σ bounds the solid V ⊂ 3. Then we have: ¤ ¦ R 1. fG · nˆ dS = f∇ · G + G · ∇f dV, Σ V ¤ ¦ 2. Green’s first identity: f(∇g) · nˆ dS = f∆g + ∇f · ∇g dV, Σ V ¤ ¦ 3. (F × G) · nˆ dS = G · (∇ × F) − F · (∇ × G) dV. Σ V Remark 7.41. The first and second assertions are analogues of the integration by parts we have seen in one variables in which we are moving the derivative ∇ from one function/vector to another. Recall that the integration by parts formula is given by: ¢ ¢ b b 0 0 b f(x)g (x) dx + f (x)g(x) dx = [f(x)g(x)]a = f(b)g(b) − f(a)g(a). a a The term on the right-hand side can be thought of the integral (which is essentially a sum) of the function f(x)g(x) over the boundary of the domain, which are the points a and b, with outward pointing normal 1 at b and −1 at a.

Exercise 7.42. Show that Corollary 7.40 implies that: ¤ ¦ 1. Green’s second identity: f(∇g) · nˆ − g(∇f) · nˆ dS = f∆g − g∆f dV, Σ V ¤ ¦ 2. Gradient Theorem: f dS = ∇f dV, Σ V ¤ ¦ 3. Curl Theorem: nˆ × F dS = ∇ × F dV. Σ V Hint: For assertions 2 and 3, choose a suitable vector field G to use in Corollary 7.40.

Exercise 7.43. Let V be a solid bounded by the cylinder defined by x2 + y2 = 4, the plane 3 x + y = 6, and the plane z = 0. Let F be a vector field in R defined by: F(x, y, z) = (x2 + sin z, xy + cos z, ey).

Find the flux of the vector field F over the surface of the solid V .

Exercise 7.44. Repeat Exercise 7.15 by using Divergence Theorem.

71 7.4 Stokes’ Theorem

The final important theorem in the study of surface integrals is Stokes’ Theorem. This is a 3 generalisation of Green’s Theorem to curved surfaces embedded in R . Recall that Green’s Theorem relates the area integral with a boundary integral. However, Green’s Theorem only 2 holds for planar regions U ⊂ R . Stokes’ Theorem extends this relation between area and 3 boundary integrals to any parametrised regular surface Σ ⊂ R .

3 Theorem 7.45 (Stokes’ Theorem). Suppose that Σ ⊂ R be an orientable regular parametrised 3 2 surface described by some smooth function S : U → R with U ⊂ R given by the map 3 3 S(s, t) = (x(s, t), y(s, t), z(s, t)). Let F be a smooth vector field F : R → R . Then: ¤ F(x) · dx = (∇ × F(x)) · nˆ dS, (18) ∂Σ Σ where nˆ is the unit normal vector to the surface induced by the orientation on the boundary ∂Σ.

Σ

∂Σ

Figure 36: The orientation on the boundary ∂Σ determines the orientation on the surface Σ. The choice of the unit normal to the surface is given by the right-hand rule.

2 Remark 7.46. For planar regions U ⊂ R , the orientation of the integral over the boundary ∂U is anti-clockwise. The anti-clockwise orientation of the curve induces a unit normal vector 2 3 to the plane in the e3 direction if we were to embed the R plane the region U lies in into R . Thus Green’s Theorem is consistent with Stokes’ Theorem above.

There are various proofs for Stoke’s Theorem, the most sophisticated uses manifold theory and differential forms. However, we are going to present the most elementary proof here.

2 Proof. The proof of this theorem is by pulling back the integrals to the planar domain U ⊂ R and compare the two sides using Green’s Theorem. We begin with the left-hand side. By

72 expressing the differential form dx in terms of ds and dt, we get:

∂S ∂S  F(x) · dx = F(x(s, t)) · ds + dt = F · ∂sS ds + F · ∂tS dt ∂Σ ¤∂U ∂s ∂t ∂U ∂ ∂ = (F · ∂tS) − (F · ∂sS) ds dt ¤U ∂s ∂t 2 2 = ∂sF · ∂tS + F· ∂tsS − ∂tF · ∂sS − F· ∂stS ds dt ¤U

= ∂sF · ∂tS − ∂tF · ∂sS ds dt, U

2 by using Green’s Theorem in the third equality and the fact that partial derivatives on R commute. We can compute the integrand to be:

∂F ∂x ∂x¨¨ ∂F ∂y ∂x ∂F ∂z ∂x ∂F ∂x ∂y ∂F ∂y ∂y¨¨ ∂F ∂z ∂y 1¨¨ 1 1 2 2 ¨ 2 ¨ + + + + ¨ + ¨∂x ∂s ∂t ∂y ∂s ∂t ∂z ∂s ∂t ∂x ∂s ∂t ¨∂y¨ ∂s ∂t ∂z ∂s ∂t ¨¨ ¨¨ ∂F3 ∂x ∂z ∂F3 ∂y ∂z ∂F3 ∂z¨∂z ∂F1 ∂x¨∂x ∂F1 ∂y ∂x ∂F1 ∂z ∂x + + + ¨¨ − ¨ − − ∂x ∂s ∂t ∂y ∂s ∂t ¨∂z ∂s ∂t ¨∂x¨ ∂t ∂s ∂y ∂t ∂s ∂z ∂t ∂s ¨¨ ¨¨ ∂F2 ∂x ∂y ∂F2 ∂y¨∂y ∂F2 ∂z ∂y ∂F3 ∂x ∂z ∂F3 ∂y ∂z ∂F3 ∂z¨∂z − − ¨ − − − − ¨¨ (19) ∂x ∂t ∂s ¨∂y¨ ∂t ∂s ∂z ∂t ∂s ∂x ∂t ∂s ∂y ∂t ∂s ¨∂z ∂t ∂s The right-hand side, by definition, is: ¤ ¤   ∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1 (∇ × F(x)) · nˆ dS = − , − , − · (∂sS × ∂tS) ds dt Σ U ∂y ∂z ∂z ∂x ∂x ∂y We can compute the integrand to be:

∂F ∂F ∂F ∂F ∂F ∂F  ∂y ∂z ∂z ∂y ∂z ∂x ∂x ∂z ∂x ∂y ∂y ∂x 3 − 2 , 1 − 3 , 2 − 1 · − , − , − ∂y ∂z ∂z ∂x ∂x ∂y ∂s ∂t ∂s ∂t ∂s ∂t ∂s ∂t ∂s ∂t ∂s ∂t which, upon expansion, is equal to (19). This concludes the proof.

A consequence of this theorem is that the integral: ¤ (∇ × F(x)) · nˆ dS Σ on a surface Σ only depends on the vector line integral of F along boundary of the surface.

Therefore, as long as two surfaces Σ1 and Σ2 share the same boundary, that is ∂Σ1 = ∂Σ2, then the above integral are the same, regardless of how different the surfaces Σ1 and Σ2 are.

Remark 7.47. Green’s Theorem is a special case of Stokes’ Theorem. If the surface Σ is flat (that is, it is contained in the plane z = 0), then the normal to this surface is simply n = (0, 0, 1). Hence, if we choose the vector field F = (F, G, 0) and apply Stokes’ Theorem to the surface Σ, we get: ¤ F(x) · dx = (∇ × F(x)) · nˆ dS ∂Σ ¤Σ ¤

⇒ F dx + G dy = (∂zG, ∂zF, ∂xG − ∂yF ) dS = ∂xG − ∂yF dA. ∂Σ Σ Σ

73 3 Example 7.48. Let F = (y − z, z − x, x − y) be a vector field in R and Σ is the upper half of x2 y2 z2 the ellipsoid given by a2 + b2 + c2 = 1. We want to compute the flux of ∇ × F over the surface Σ. z

Σ

y

x ∂Σ

Figure 37: The top half of the ellipse Σ.

By Stokes’ Theorem, the inteegral of the flux is given by: ¤ (∇ × F(x)) · nˆ dS = F(x) · dx, Σ ∂Σ so it is sufficient to just integrate the vector field F over the boundary of the surface. The boundary of the surface can be parametrised as the curve v(t) = (a cos t, b sin t, 0) for t ∈ [0, 2π]. Therefore: ¤ (∇ × F(x)) · nˆ dS = F(x) · dx Σ ¢∂Σ 2π dv = F(v(t)) · dt dt ¢0 2π = (b sin t, a cos t, a cos t − b sin t) · (−a sin t, b cos t, 0) dt ¢0 2π = −ab sin2 t − ab cos2 t dt = −2πab. 0

74