<<

John Nachbar Washington University March 12, 2016 Separation in General Normed Vector Spaces1

1 Introduction

N Recall the Basic Separation Theorem for convex sets in R . N Theorem 1. Let A ⊆ R be non-empty, closed, and convex. If 0 ∈/ A then there is N a v ∈ R , v 6= 0, such that v · a ≥ v · v for all a ∈ A. The goal here is to extend this result to general normed vector spaces over the reals. (See the notes on Vector Spaces and Norms.) N I have stated the Basic Separation Theorem for R using inner products. The next section discusses the fact that this approach does not generalize. In general vector spaces, it is necessary instead to work with real-valued linear functions. In N R , such functions can always be represented as inner products. In general vector spaces, this is no longer true.

2 Linear Functions

Given a X, a F : X → R is linear iff for any x, xˆ ∈ X and any θ ∈ R, 1. F (θx) = θF (x),

2. F (x +x ˆ) = F (x) + F (ˆx).

Taking θ = 0, the first condition implies that the graph of a linear function always goes through the origin: F (0) = 0. It is common to use the word “” to describe a real-valued function defined on a general vector space. To avoid excess jargon, I call a functional simply a “real-valued function.” N In R , but not more generally, a real-valued function is linear iff it can be written N as an inner product. Thus, for R , writing the Basic Separation Theorem in terms of the inner product v · x is equivalent to writing the theorem in terms of the linear function F (x) = v · x.

N N N Theorem 2. F : R → R is linear iff there is a v ∈ R such that for all x ∈ R , F (x) = x · v.

1cbna. This work is licensed under the Creative Commons Attribution-NonCommercial- ShareAlike 4.0 License.

1 Proof

1. ⇒. Let en be the n coordinate vector: en = (0,..., 0, 1, 0,..., 0), with a 1 in n P n the n place. Define vn = F (e ). Then, for any x = (x1, . . . , xn) = n xne , P n P n P the linearity of F impies that F (x) = F ( xne ) = n xnF (e ) = n xnvn = x · v.

2. ⇐. Almost immediate.



For a general vector space X, let X∗ denote the set of real-valued linear functions ∗ N defined on X. X is called the of X. The above theorem says that R is N “self-dual:” every real-valued linear function F on R is identified with an element N N ∗ N v ∈ R , and conversely. Abusing notation, (R ) = R . One can also show that, ∗ similarly, (L2) = L2. But not all vector spaces are self-dual. Consider `∞. For many v, x ∈ `∞, v · x is P 2 not even well defined. For example, if v = (1, 1,... ) then v · v should be n(vn) , but the latter is infinite. One can partially fix this problem by restricting v to `1, the ∞ consisting of the points in R that are absolutely summable, X kxk1 = |xn| < ∞. n

1 ∞ P For any v ∈ ` , the function F : ` → R given by F (x) = v · x = n vnxn is well defined. Therefore `1 ⊆ (`∞)∗. Unfortunately, `1 is a proper subset of (`∞)∗: there exist elements of (`∞)∗ that cannot be represented as inner products with elements of `1, and in fact cannot be represented as inner products in any standard sense. The canonical examples are the Banach Limits, real-valued linear functions on `∞ whose existence is implied by the Hahn-Banach Extension Theorem, given below. Because they have no inner product representation, Banach Limits are hard to visualize, but they operate somewhat like limits of averages, N 1 X lim xn. n→∞ N n=1 Unlike limits of averages, which are not defined if the limit does not exist, Banach Limits are defined for all x ∈ `∞. Banach Limits have occasionally been used in economic theory to formulate the utility functions of decision makers who are, loosely speaking, infinitely patient. In any vector space, a plane is, by definition, the level set of a real-valued linear function. The fact that there are elements of (`∞)∗ that have no inner-product interpretation means that there are planes in `∞ that cannot be represented via

2 inner products. Since such planes may be needed for separation, the implication is that the Basic Separation Theorem for general normed vector spaces cannot rely on inner products.

3 The Basic Separation Theorem.

Theorem 3. Let X be a normed vector space and let A be a non-empty, closed, convex subset of X. If 0 ∈/ A then there is a linear function F : X → R and an r > 0 such that for all a ∈ A, F (a) > r. The Basic Separation Theorem is a consequence of the Hahn-Banach Extension Theorem, which I state and prove in the next section. The Basic Separation Theorem can be used in turn to establish the Separating Hyperplane Theorem for closed convex sets, at least one of which is compact, exactly N as in R . Generalizing the Supporting Hyperplane Theorem is less straightforward. N In R , the Supporting Hyperplane Theorem says that if A is non-empty, closed, and ∗ convex and x is not an interior point of A then there is a linear function F : X → R such that for all a ∈ A, F (a) ≥ F (x∗). For general normed vector spaces, a condition stronger than “x∗ is not be an interior point of A” is needed.

4 The Hahn-Banach Extension Theorem

The Hahn-Banach Theorem is a relative of the Supporting Hyperplane Theorem. It is stated for special types of convex sets, namely convex sets generated by sublinear functions. A function p : X → R is sublinear iff for any x, xˆ ∈ X and any θ ∈ R, θ ≥ 0, 1. p(θx) = θp(x), 2. p(x +x ˆ) ≤ p(x) + p(ˆx). By the first property, if p is sublinear then p(0) = 0. Any linear function is sublinear. An example of a that is not linear is p : R → R, p(x) = |x|. More generally, in any normed vector space, the is sublinear. Any sublinear function is convex: p(θ(x) + (1 − θ)ˆx) ≤ p(θx) + p((1 − θ)ˆx) = θp(x) + (1 − θ)p(ˆx). Recall that a function is convex iff its epigraph, which is the set of points lying on or above the graph, is convex. Thus, the epigraph of a sublinear function is convex. In fact, one can readily verify that the epigraph of a sublinear function is a closed, convex, cone. The epigraph of p : R → R, p(x) = |x| provides illustration. The Hahn-Banach theorem is a statement about supporting this cone at the origin, which is the cone’s vertex. Theorem 4 (Hahn-Banach Extension Theorem). Let W be a vector subspace of a vector space X. Let f : W → R be linear and p : X → R be sublinear. If for all x ∈ W , f(x) ≤ p(x), then there is a linear function F : X → R such that,

3 1. For all x ∈ W , f(x) = F (x),

2. For all x ∈ X, F (x) ≤ p(x).

To interpret the statement of the Hahn-Banach Theorem, focus on the finite N dimensional case: X = R . The epigraph of p is then a closed, in N+1 N N+1 R . If F (x) ≤ p(x) for all x ∈ R then the graph of F , which is a plane in R , lies on or below the cone determined by the epigraph of p: the graph of F supports this cone at the origin. Thus if W = {0}, in which case f is trivially defined by f(0) = 0, then Hahn-Banach says that the closed convex cone generated by the epigraph of p can be supported at the origin, something we already knew from the Supporting Hyperplane Theorem. Hahn-Banach generalizes this in two ways. First, and most importantly, it allows N X to be a general vector space rather than just X = R . Second, it accommodates some restrictions on the supporting plane. If one already has a real-valued linear function f that supports the epigraph of p on a subspace W , then the Hahn-Banach Theorem says that one can find a real-valued linear function F , agreeing with f on W , that supports the epigraph of p on all of X. That is, the supporting plane through 0 defined by F must contain the lower dimensional plane through 0 defined by f.

5 Proof of the Hahn Banach Extension Theorem

Proof of Theorem4 . The proof below is complete with one exception, which I flag. Let Wg,Wh be vector subspaces of X and let g : Wg → R, h : Wh → R be real-valued continuous functions such that for all x ∈ Wg, g(x) ≤ p(x) and for all x ∈ Wh, h(x) ≤ p(x). Say that h extends g iff

1. Wg ⊆ Wh,

2. For all x ∈ Wg, h(x) = g(x).

Say that h is a proper extension of g if h is an extension of g and Wg 6= Wh. Consider the set of all extensions of f. This set is not empty (it contains f). If g and h are both extensions of f, write g < h iff h is a proper extension of g. This partially orders the set of extensions of f. By the Hausdorff Maximality Principle, which is equivalent to the Axiom of Choice, there exists a maximal linearly (i.e., totally or completely) ordered set of extensions of f. Take one such set and call it E. Maximality means that E cannot be enlarged by adding as an element another extension of f: any extension not already in E is non-comparable to at least some elements in E.2

2 2 By way of analogy, consider R with the standard order: (x1, x2) ≤ (ˆx1, xˆ2) iff x1 ≤ xˆ1 and

4 S Define WF = g∈E Wg. If x ∈ WF , x ∈ Wg for some g ∈ E. For any g for which x ∈ Wg, define F (x) = g(x). This definition is unambiguous because E is a linearly ordered set of extensions: any g for which x ∈ Wg gives the same value. Since g(x) ≤ p(x) for every g ∈ E and every x ∈ Wg, F (x) ≤ p(x). F is thus an extension of every g ∈ E. Since E is maximal, this implies that F ∈ E (since, otherwise, E ∪ {F } would be linearly ordered and strictly contain E). Since F is an extension of every element of E, F is the maximal element in E. It remains to show that, in fact, WF = X. This part of the proof looks like an induction argument. Take any g ∈ E. If Wg = X then g = F and I am done. Suppose then that Wg 6= X. Then there is a y ∈ X, y∈ / Wg. The proof proceeds by defining an extension h of g on the vector subspace Wh spanned by Wg and y. That is, I claim that if I have an extension g defined on a subspace Wg then I can extend g to an h, not necessarily in E, that is defined on a subspace Wh that contains Wg and is one dimension larger. Because of linearity, and the fact that h must agree with g on Wg, h is entirely pinned down by choice of h(y). I need to show that I can choose h(y) in such a way that h(x) ≤ p(x) for all x ∈ Wh. Consider, by way of example, the case X = R, p(x) = |x|, and Wg = {0}, hence g is defined trivially by g(0) = 0. Take y = 1. I need to specify h(1) = b, in which case h(x) = bx. If x > 0 then bx ≤ p(x) = x implies b ≤ 1. If x < 0 then bx ≤ p(x) = −x implies, since x < 0, b ≥ −1. Therefore, take b ∈ [−1, 1]. Somewhat more generally, continue to take X = R, W = {0}, and y = 1, but now consider an arbitrary sublinear p : R → R. For x > 0, bx ≤ p(x) implies, since p is sublinear, b ≤ (1/x)p(x) = p(x/x) = p(1). Similarly, for x < 0, bx ≤ p(x) implies, since x < 0, b ≥ (1/x)p(x). Since x < 0, −1/x > 0, and hence, since p is sublinear, (−1/x)p(x) = p(−1). Therefore, I need b ≥ (1/x)p(x) = −(−1/x)p(x) = −p(−1). Since p is sublinear, 0 = p(0) = p(1 − 1) ≤ p(1) + p(−1), hence −p(−1) ≤ p(1). Therefore, take b ∈ [−p(−1), p(1)]. The general case is similar but the interval for b depends on g (which was trivial in the examples above) as well as p. The argument is somewhat tedious, so I omit it. Take it as established, therefore, that if Wg 6= X then there is an h that is a proper extension of g. This implies that if g were the maximal element of E then E is not maximal (since E ∪ {h} would be linearly ordered). By contraposition, since E is maximal, g is not maximal. In summary, if Wg 6= X then g is not maximal. Since F is maximal, it follows by contraposition that WF = X. 

N If X = R then the second half of the proof, the induction-like argument, is

2 x2 ≤ xˆ2. Then the set, call it S, corresponding to the x1 axis, S = {x ∈ R : x2 = 0}, is a maximal linearly ordered set. To see that it is maximal, consider any x = (x1, x2) with x2 6= 0. Then S ∪ {x} is not linearly ordered. For example, if x = (0, 1) then x is not ordered with respect 2 to (1, 0) ∈ S. There are many, many other maximal linearly ordered subsets of R . For example, 2 2 both {x ∈ R : x1 = 0} and {x ∈ R : x1 = x2} are maximal linearly ordered sets.

5 sufficient. One starts with an f defined on a subspace, and then extends f one N dimension at a time until one has enlarged the domain to be all of R . But if X is infinite dimensional, the induction-like argument does not suffice. For example, ∞ if X = R then the induction-like argument shows that it is possible to extend a function defined on a finite dimensional subspace to functions with domains that, while strictly larger, are still finite-dimensional. The induction-like argument does ∞ not, by itself, show that there is any extension to all of R , an infinite dimensional space. To establish the existence of such an extension requires extra machinery. In the proof, this gap in the induction-like argument is filled by the Axiom of Choice, in the form of the Hausdorff Maximality Principle. Hahn-Banach does not, strictly speaking, require the Axiom of Choice. But it does require something stronger than the combined remaining axioms of standard set theory. And the Hahn-Banach Theorem in and of itself implies the existence of non-measurable sets, which is one way of expressing the fact that the Hahn-Banach Theorem is a highly non-trivial result.

6 Proof of Theorem3 (the Basic Separation Theorem)

Proof of Theorem3 . Since 0 ∈/ A and A is closed, there is a δ > 0 such that c ∗ Nδ(0) = {x ∈ X : kxk < δ} ⊆ A . Pick any element a ∈ A and pickr ˆ > 0 such that δ > kraˆ ∗k =r ˆka∗k. Finally, pick any r ∈ (0, rˆ). ∗ ∗ Define C = a − A + Nδ(0). Since A is convex, C is convex. Since a ∈ A and ∗ ∗ 0 ∈ Nδ(0), 0 ∈ C. On the other hand, a ∈/ C: if a ∈ C then there is an a ∈ A and a ∗ ∗ b ∈ Nδ(0) such that a = a − a + b, which implies a = b ∈ Nδ(0); by contraposition, ∗ since no element of A is in Nδ(0), a ∈/ C. Define p : X → R by p(x) = inf{γ ∈ R+ : (1/γ)x ∈ C}. Informally, p(x) measures the fractional distance of x to the boundary of C. If x ∈ C then p(x) ≤ 1. If x 6∈ C, then p(x) ≥ 1. If x is on the boundary of C, then p(x) = 1. If C is unbounded along the ray from the origin through x, then p(x) = 0. I prove below that p is sublinear. If C = N1(0) then p is the standard norm on X. More generally, if C is bounded then p is the norm on X for which C is the unit ball at the origin. If C is not bounded then p does not satisfy property 1 in the definition of a norm; in this case, p is a pseudo norm. Let W be the one-dimensional subspace spanned by x∗ and define the linear ∗ ∗ function f : W → R by f(x) = θ iff x = θa . Thus f(a ) = 1. I claim that ∗ f(x) ≤ p(x) for any x ∈ W . Take θ ∈ R such that x = θa . Hence f(x) = θ. Since a∗ ∈/ C, p(a∗) ≥ 1. If θ > 0 then, since p is sublinear, p(θa∗) = θp(a∗) ≥ θ. Hence if θ > 0 then p(x) ≥ f(x). If θ ≤ 0 then f(x) ≤ 0. Hence, since p(x) ≥ 0 for any x, if θ ≤ 0 then f(x) ≤ p(x). By Hahn-Banach, there is an extension of f to a real-valued linear function F defined on all of X such that for all x ∈ X, F (x) ≤ p(x). By linearity, for any

6 x ∈ X,

F (a∗) − F (x) + F (ˆra∗) = F (a∗ − x + ra∗) ≤ p(a∗ − x + ra∗).

If x = a ∈ A, then a∗ − a + ra∗ ∈ C and hence p(a∗ − a + ra∗) ≤ 1. Moreover, F (a∗) = f(a∗) = 1 and F (ˆra∗) = f(ˆra∗) =r ˆ. Combining all this implies

1 − F (a) +r ˆ ≤ 1

or F (a) ≥ rˆ > r. as was to be shown. Finally, it remains to show that p is sublinear.

1. For property 1 of sublinear functions, take any x ∈ X and any θ ∈ R+. For any γ ∈ R+, (1/γ)x ∈ C iff (1/(θγ))θx ∈ C. The claim then follows from the definition of p.

2. For property 2 of sublinear functions, take any x, xˆ ∈ X. Take any γ, γˆ ∈ R such that (1/γ)x, (1/γˆ)ˆx ∈ C. Since C is convex, and since

1 γ γˆ (x +x ˆ) = (1/γ)x + (1/γˆ)ˆx γ +γ ˆ γ +γ ˆ γ +γ ˆ

is a of two elements of C, (1/(γ +γ ˆ))(x +x ˆ) ∈ C. There- fore, by definition of p, for any such γ, γˆ,

p(x +x ˆ) ≤ γ +γ. ˆ

The claim follows by taking the infimum of the right-hand side over all γ, γˆ ∈ R+ such that (1/γ)x ∈ C and (1/γˆ)ˆx ∈ C.



7