Statistics 210B, Spring 1998 Class Notes

Statistics 210B, Spring 1998 Class Notes P.B. Stark [email protected] www.stat.berkeley.edu/ stark/index.html ∼ May 8, 1998 Seventh Set of Notes 1 Optimization For references, see D.G. Luenberger (1969) Optimization by Vector Space Methods,John Wiley and Sons, Inc., NY.; E.J. Anderson and P. Nash, 1987, Linear Programming in Infinite-Dimensional Spaces, Wiley, NY; M.S. Bazaraa and C.M. Shetty, 1979, Nonlinear Programming: Theory and Algorithms, Wiley, NY; Shor, 1985, Minimization Methods for Non-Differentiable Functions, Springer-Verlag, NY. Many questions in statistical theory can be reduced to optimization problems, sometimes in infinite-dimensional spaces (spaces of functions or measures, for example). For example, we have just seen in Donoho’s work how the difficulty of certain minimax estimation problems can be related to the modulus of continuity, whose computation is an optimization problem over a convex subset of `2. Some unconstrained optimization problems with differentiable, convex objective functions, are fairly straightforward to solve, e.g., using the calculus of variations. Solving differentiable, unconstrained, convex problems numerically is not typically difficult (descent 1 algorithms can be used), unless evaluating the objective functional or its derivative is ex- tremely computationally intensive. Constrained problems, nondifferentiable problems, and nonconvex problems are typically much harder. Even for convex functions whose derivative exists almost everywhere, the steepest descent algorithm can converge to a nonstationary point (see Shor, Ch. 2, 2.1 § for an example). Nondifferentiable objective functionals arise fairly frequently in statistics. For example, the absolute value function is not differentiable at zero, so the problem of finding the median as the solution of the optimization problem of minimizing the sum of the absolute deviations (or of finding a multivariate generalization of the median) is a convex, nondifferentiable optimization problem. Similarly, the objective functionals for minimum `1 and minimum ` regression are nondifferentiable. ∞ Some quite interesting statistical problems have convex objective functionals, but nonconvex constraints, such as signal recovery problems subject to constraints on the measure of the support of the signal (sparsity constraints). Linear equality constraints (such as x, g = 0) are fairly straightforward to deal with; h i one can project the problem onto the subspace where the constraint is satisfied. Linear inequalities are somewhat harder. Two of the most useful tools for solving constrained infinite-dimensional optimization problems are Fenchel and Lagrange duality. A cone in a real linear vector space is a set P such that if x P ,thenαx P X ⊂X ∈ ∈ for all α>0. One can establish a partial order on a vector space with a convex cone P X (then called the positive cone) by defining x y if x y P .If is a topological space ≥ − ∈ X (such as a normed space with topology inherited from the norm), and if the interior of the positive cone P is nonempty in the topology of , we write x>yif x y P ◦, the interior X − ∈ of P . (Note that this differs from the definition of < we used for a totally ordered set, where < meant but not =; here, < derives from topological properties of the positive cone that ≤ defines the order.) If is a linear vector space, ∗ denotes the linear space of all linear functionals defined X X on , and is called the algebraic dual space of .If is a normed space, by default ∗ X X X X is the space of bounded linear functionals on , (called the normed dual space of ) unless X X otherwise specified. Denote by x∗,x the action of the linear functional x∗ ∗ on the h i ∈X 2 element x . The natural mapping from a space to its second dual ∗∗ (the dual of its ∈X X X dual) is x∗∗,x∗ = x∗,x . Clearly each x gives a linear functional x∗∗ on ∗ this way. h i h i ∈X X If every (bounded) linear functional on ∗ can be obtained this way (i.e.,if ∗∗ = ), is X X X X said to be reflexive. The epigraph of a functional f on a set C is ⊂X [f,C] (t, x) R : x C, f(x) t . (1) + ≡{ ∈ ×X ∈ ≤ } If C is convex, [f,C] is a convex set in R iff f is a convex functional. Similarly, let + ×X [g, D] (t, x) R : x D, f(x) r . (2) − ≡{ ∈ ×X ∈ ≥ } Definition. A linear variety in a vector space is the translation of a subspace S in X .Thatis,ifS is a subspace, then for every x , S + x is a linear variety. A hyperplane X ∈X H in a vector space is a maximal proper linear variety; that is, a linear variety such X that if Y is another linear variety in and H Y , then either Y = H or Y = X.A X ⊂ hyperplane can be characterized by a linear functional: every hyperplane can be written as x : x∗,x + b =0 for some x∗ ∗ and some b R. Every set of the form { ∈X h i } ∈X ∈ x : x∗,x + b =0 is a hyperplane. If x∗ is a nonzero linear functional on a normed { ∈X h i } vector space , the hyperplanes x : x∗,x + b =0 are closed for every b R iff x∗ is X { h i } ∈ continuous. Theorem 1 Separating hyperplane theorem. Suppose that C, D are convex subsets of a normed vector space , that C contains interior points, and that D contains no interior X point of C. Then there is a a hyperplane separating the sets: there is an element x∗ ∗ ∈X s.t. sup x∗,x inf x∗,x . (3) x C h i≤x D h i ∈ ∈ Thus there is a number b R s.t. ∈ x∗,x + b 0 x C h i ≤ ∀ ∈ x∗,x + b 0 x D h i ≥ ∀ ∈ (4) 3 1.1 Algebraic Duality We always take inf x f(x)= and supx f(x)= . ∈∅ ∞ ∈∅ −∞ For any functional f on a real linear vector space with dual ∗ and any sets C, D X X ⊂X consider the value of the primal problem v( ) inf f(x). (5) x C D P ≡ ∈ ∩ Clearly, for any x∗ ∗, ∈X v( )= inf f(x) x∗,x + x∗,x x C D P ∈ ∩ { −h i h i} inf x∗,x +inf f(x) x∗,x x C D x C D ≥ ∈ ∩ h i ∈ ∩ { −h i} inf x∗,x +inf f(x) x∗,x (6) x D x C ≥ ∈ h i ∈ { −h i} This is true for all x∗,so inf f(x) sup inf x∗,x +inf f(x) x∗,x x C D ≥ x x D h i x C{ −h i} ∈ ∩ ∗∈X ∗ ∈ ∈ =supD∗[x∗]+C∗[x∗] , (7) x { } ∗∈X ∗ where C∗[x∗] inf f(x) x∗,x (8) x C ≡ ∈ { −h i} and D∗[x∗] inf x∗,x . (9) x D ≡ ∈ {h i} The only functionals x∗ it is worth considering are those for which C∗ and D∗ are greater than .Let −∞ ∗ x∗ ∗ :inf f(x) x∗,x > , (10) x C C ≡ ∈X ∈ { −h i} −∞ and ∗ x∗ ∗ :inf x∗,x > . (11) x D D ≡ ∈X ∈ h i −∞ Then wlog we can restrict attention to sup D∗[x∗]+C∗[x∗] , (12) x { } ∗∈C∗∩D∗ with the supremum defined to be if ∗ ∗ = . −∞ C ∩D ∅ 4 In general, the sup over ∗ need not equal the inf over . When it does not, there is X X said to be a “duality gap.” Example. Suppose is a linear vector space of functions x = x(t) on some fixed domain; X n n n n x∗ ∗, x∗ linearly independent; d : R ,x ( x∗,x ) ;Ξ R is a { j }j=1 ⊂X { j } X→ → h j i j=1 ⊂ n n bounded subset of R , D = x : d(x) Ξ . Then one can show that ∗ =span x∗ { ∈X ∈ } D { j }j=1 (D is a hypercylinder constrained only in the directions “aligned” with an xj∗;inother directions D is unconstrained, so a linear functional with a component in any direction not n in span x∗ is unbounded below on D). Thus for any f and C, the infinite-dimensional { j }j=1 problem inf f(x) (13) x C D ∈ ∩ n is bounded from below by a finite-dimensional problem on span x∗ . For particular sets { j }j=1 ΞandC, and particular functionals f, this can lead to an easy solution for v( ). Continuing P the example, suppose that C = ,thatf(x)= x∗,x ,andthat X h 0 i Ξ γ Rn : γ δ , (14) ≡{ ∈ k − k≤ } n n for some fixed δ R (an -ball in R centered at δ). Then ∗ = x∗, and we already saw ∈ C 0 n n that ∗ =span x∗ ,so ∗ ∗ = unless x∗ = α x∗ for some sequence of constants D { j }j=1 C ∩D ∅ 0 j=1 j j n Pn α =(α ) . Given a linearly independent set x∗ ∗, one can construct a linearly j j=1 { j }j=1 ⊂X independent set x n such that { j}j=1 ⊂X x∗,x =1 . (15) h j ki j=k n If indeed x0∗ = j=1 αjxj∗,thenforx∗ ∗ ∗ P ∈C ∩D inf x0∗,x x∗,x =0. (16) x C ∈ h i−h i For any x D, d(x)=δ + ν with ν ,soforx D, ∈ k k≤ ∈ α d(x)=α δ + α ν · · · α δ α ν ≥ · −| · | α δ α ν ≥ · −k kk k α δ α . (17) ≥ · − k k 5 This bound is in fact attained by setting α β = δ (18) − α k k and taking x = j βjxj.Thus P inf x∗,x = α δ α . (19) x D ∈ h i · − k k This gives us α δ α ,x= n α x 0∗ j=1 j j∗ inf x0∗,x = · − k k (20) x C D P ∈ ∩ h i , otherwise −∞ There is no duality gap in this problem.

Statistics 210B, Spring 1998 Class Notes

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support