Elements of Multivariable Calculus

52 CHAPTER 5 Elements of Multivariable Calculus 1. Norms and Continuity n As we have seen the 2-norm gives us a measure of the magnitude of a vector v in R , kvk2. As such it also gives n us a measure of the distance between to vectors u; v 2 R ; ku − vk2. Such measures of magnitude and distance are very useful tools for measuring model misfit as is the case in linear least squares problem. They are also essential for analyzing the behavior of sequences and functions on Rn as well as on the space of matrices Rm×n. For this reason, we formalize the notion of a norm to incorporate other measures of magnitude and distance. Definition 1.1. [Vector Norm] A function k·k : Rn ! R is a vector norm on Rn if (1) kxk ≥ 0 for all x 2 Rn with equality if and only if x = 0, (2) kαxk = jαj kxk for all x 2 Rn and α 2 R, and (3) kx + yk ≤ kxk + kyk for all x; y 2 Rn. Example 1.1. Perhaps the most common examples of norms are the p-norms for 1 ≤ p ≤ 1. Given 1 ≤ p < 1, n the `p-norm on R is defined as 1=p 2 n 3 X p kxkp := 4 jxjj 5 : j=1 For p = 1, we define kxk1 := max fjxij j i = 1; 2; : : : ; ng : This choice of notation for the 1-norm comes from the relation n lim kxkp = kxk1 8 x 2 R : p"1 In applications, the most important of these norms are the p = 1; 2; 1 norms as well as variations on these norms. In finite dimensions all norms are said the equivalent in the sense that one can show that for any two norms n k·k(a) and k·k(b) on R there exist positive constants α and β such that n α kxka ≤ kxkb ≤ β kxka 8 x 2 R : But we caution that in practice the numerical behavior of these norms differ greatly when the dimension is large. Since norms can be used to measure the distance between vectors, they can be used to form a notions of continuity for functions mapping Rn to Rm that parallel those established for mappings from R to R. Definition 1.2. [Continuous Functions] Let F : Rn ! Rn. (1) F is said to be continuous at a point x 2 Rn if for all > 0 there is a δ > 0 such that kF (x) − F (x)k ≤ whenever kx − xk ≤ δ : (2) F is said to be continuous on a set S ⊂ Rn if it is continuous at every point of S. (3) The function F is said to be continuous relative to a set S ⊂ Rn if kF (x) − F (x)k ≤ whenever kx − xk ≤ δ and x 2 S: (4) The function F is said to be uniformly continuous on a set S ⊂ Rn if if for all > 0 there is a δ > 0 such that kF (x) − F (y)k ≤ whenever kx − yk ≤ δ and x; y 2 S: 53 54 5. ELEMENTS OF MULTIVARIABLE CALCULUS Norms allow us to define certain topological notions that are very helpful in analizing the behavior of sequences and functions. Since we will make frequent use of these concepts, it is helpful to have certain notational conventions associated with norms. We list a few of these below: the closed unit ball B := fx j kxk ≤ 1g the unit vectors S := fx j kxk = 1g -ball about x x + B := fx + u j u 2 Bg = fx j kx − xk ≤ g The unit ball associated with the 1, 2, and 1 norms will be denoted by B1, B2, and B1, respectively. A few basic topological notions are listed in the following definition. The most important of these for our purposes is compactness. Definition 1.3. Let S be a subset of Rn, and let k·k be a norm on Rn. (1) The set S is said to be an open set if for every x 2 S there is an > 0 such that x + B ⊂ S: (2) The set S is said to be a closed set if S contains every point x 2 Rn for which there is a sequence fxkg ⊂ S k with limk!1 x − x = 0. (3) The set S is said to be a bounded set set if there is a β > 0 such that S ⊂ βB. (4) The set S is said to be a compact set if it is both closed and bounded. n k k (5) A point x 2 R is a cluster point of the set S if there is a sequence fx g ⊂ S with limk!1 x − x = 0. (6) A point x 2 Rn is said to be a boundary point of the set S if for all > 0, (x + B) \ S 6= ; while (x + B) 6⊂ S, i.e., every ball about x contains points that are in S and points that are not in S. The importance of the notion of compactness in optimization is illustrated in following basic theorems from analysis that we make extensive use of, but do not prove. Theorem 1.1. [Compactness implies Uniform Continuity] Let F : Rn ! Rn be a continuous function on an open set S ⊂ Rn. Then F is uniformly continuous on every compact subset of S. Theorem 1.2. [Weierstrass Compactness Theorem] A set D ⊂ Rn is compact if and only if every infinite sequence in D has a cluster point in D. Theorem 1.3. [Weierstrass Extreme Value Theorem] Every continuous function on a compact set attains its extreme values on that set. That is, there are points in the set at which both the infimum and the supremum of the function relative to the set are attained. We will also have need of a norm on the space of matrices. First note that the space of matrices Rm×n is itself a vector space since it is closed with respect to addition and real scalar multiplication with both operations being distributive and commutative and Rm×n contains the zero matrix. In addition, we can embed Rm×n in Rmn by stacking one column on top of another to get a long vector of length mn. This process of stacking the columns is denoted by the vec operator (column vec): given A 2 Rm×n, 0 1 A·1 BA·2 C vec(A) = B C 2 mn : B . C R @ . A A·n Example 1.2. 2 1 3 6 0 7 6 7 1 2 −3 6 2 7 vec = 6 7 0 −1 4 6 −1 7 6 7 4 −3 5 4 Using the vec operation, we define an inner product on Rm×n by taking the inner product of these vectors of length mn. Given A; B 2 Rm×n we write this inner product as hA; Bi. It is easy to show that this inner product obeys the formula hA; Bi = vec(A)T vec(B) = tr AT B : 2. DIFFERENTIATION 55 This is known as the Frobenius inner product. It generates a corresponding norm, called the Frobenius norm, by setting p kAkF := kvec(A)k2 = hA; Ai: Note that for a given x 2 Rn and A 2 Rm×n we have m m m 2 X 2 X 2 2 X 2 2 2 kAxk2 = (Ai· • x) ≤ (kAi·k2 kxk2) = kxk2 kAi·k2 = kAkF kxk2 ; i=1 i=1 i=1 and so (61) kAxk2 ≤ kAkF kxk2 : This relationship between the Frobenius norm and the 2-norm is very important and is used extensively in our development. In particular, this implies that for any two matrices A 2 Rm×n and B 2 Rn×k we have kABkF ≤ kAkF kBkF : 2. Differentiation In this section we use our understanding of differentiability for mappings from R to R to build a theory of differentiation for mappings from Rn to Rm. Let F be a mapping from Rn to Rm which we denote by F : Rn ! Rm. n Let the component functions of F be denoted by Fi : R ! R: 0 1 F1(x) B F2(x) C F (x) = B C : B . C @ . A Fm(x) Example 2.1. 0 2 1 0 1 3x1 + x1x2x3 x1 B 2 cos(x1) sin(x2x3) C F (x) = F @x2A = B 2 2 2 C : @ln[exp(x1 + x2 + x3)]A x3 p 2 1= 1 + (x2x3) In this case, n = 3, m = 4, and 2 2 2 2 p 2 F1(x) = 3x1 + x1x2x3;F2(x) = 2 cos(x1) sin(x2x3);F3(x) = ln[exp(x1 + x2 + x3)];F4(x) = 1= 1 + (x2x3) : The first step in understanding the differentiability of mappings on Rn is to study their one dimensional properties. For this, consider a function f : Rn ! R and let x and d be elements of Rn. We define the directional derivative of f in the direction d, when it exits, to be the one sided limit f(x + td) − f(x) f 0(x; d) := lim : t#0 t 2 T Example 2.2. Let f : R ! R be given by f(x1; x2) := x1 jx2j, and let x = (1; 0) and d = (2; 2). Then, f(x + td) − f(x) (1 + 2t) j0 + 2tj − 1 j0j 2(1 + 2t)t f 0(x; d) = lim = lim = lim = 2 ; t#0 t t#0 t t#0 t while, for d = −(2; 2)T , f(x + td) − f(x) (1 − 2t) j0 − 2tj − 1 j0j 2(1 − 2t)t f 0(x; d) = lim = lim = lim = 2 : t#0 t t#0 t t#0 t In general, we have 0 (1 + d1t) jd2tj f ((1; 0); (d1; d2)) = lim = jd2j: t#0 t For technical reasons, we allow this limit to take the values ±∞.

Load more