Lecture Notes for IAP 2005 Course Introduction to Bundle Methods

Lecture Notes for IAP 2005 Course Introduction to Bundle Methods Alexandre Belloni¤ Version of February 11, 2005 1 Introduction Minimizing a convex function over a convex region is probably the core problem in the Nonlinear Programming literature. Under the assumption of the function of a di®erentiable function of interest, several methods have been proposed and successively studied. For example, the Steepest Descent Method consists of performing a line search along a descent direction given by minus the gradient of the function at the current iterate. One of the most remarkable applications of minimizing convex functions reveals itself within duality theory. Suppose we are interested in solving the following problem 8 > max g(y) < y (P ) ; > hi(y) = 0 i = 1; : : : ; m : y 2 D which is known to be hard (for example, a large instance of the TSP or a large- scale linear programming). One can de¯ne the following auxiliary function f(x) = max g(y) + hx; h(y)i y ; y 2 D which is convex without any assumption on D, g, or h. Due to duality theory, the function f will always give an upper bound for the original maximization problem. Also, the dual information within this scheme is frequently very useful to build primal heuristics for nonconvex problems. In practice1, it is easy to obtain solutions at most 5% worse than the unknown optimum value2. There is an intrinsic cost to be paid in the previous construction. The function f is de¯ned implicitly, and evaluating it may be a costly operation. Moreover, the di®erentiability is lost in general3. One important remark is that ¤Operation Research Center, M.I.T. ([email protected]) 1Of course, we cannot prove that in general, but it holds for a large variety of problems in real-world applications. 2In fact, the quality of the solution tends to be much better, i.e., smaller than 1% in the majority of the problems. 3This tends to be the rule rather than the exception in practice. 1 even though di®erentiability is lost, one can easily compute a substitute for the gradient called subgradient with no additional cost4 beyond the evaluation of f. This motivates an important abstract framework to work with. The framework consists of minimizing a convex function (possible nondi®erentiable) given by an oracle. That is, given a point in the ¸, the oracle returns the value f(x) and a subgradient s. Within this framework, several methods have also been proposed and many of them consists of adapting methods for di®erentiable functions by replacing gradients with subgradients. Unfortunately, there are several drawbacks with such procedure. Unlike in the di®erentiable case, minus the subgradient may not be a descent direction. In order to guarantee that a direction is indeed a descent direction, one needs to know the complete set of subgradients5, which is a restrictive assumption in our framework6. Another drawback is the stopping criterium. In the di®erentiable case, one can check if rf(x) = 0 or krf(x)k < ". However, the nondi®erentiablility of f imposes new challenges as shown in the following 1-dimensional example. Example 1.1 Let f(x) = jxj where ¸ 2 IR. One possible oracle could have the following rule for computing subgradients ½ 1; x ¸ 0 @f(x) 3 s = : ¡1; x < 0 Note that the oracle returns a subgradient such that ksk ¸ 1 for all x. In this case, there is no hope of a simple stopping criteria like s = 0 would work. The goal of these notes is to give a brief but formal introduction to a family of methods for this framework called Bundle Methods. 2 Notation We denote the inner product by h¢; ¢i and k ¢ k the norm induced by it. The ball centered at x with radius r is denoted by B(x; r) = fy 2 IRn : ky ¡ xk · rg. i k Pk i i The vector of all ones is denoted by e. Also, span(fd gi=1) = f i=1 ®id : ® 2 i k IR; i = 1; : : : ; kg denotes the set of linear combinations of fd gi=1. If S is a set, int S = fy 2 IRn : 9r > 0;B(y; r) ½ Sg is the interior of S, cl S = fy 2 n k k IR : 9fy gk¸1 ½ S; y ! yg is the closure of S, and @S = cl S n int S is the boundary of S. 3 Some Convex Analysis De¯nition 3.1 A set C is said to be convex if for all x; y 2 C, and all ® 2 [0; 1], we have that ®x + (1 ¡ ®)y 2 C: This de¯nition can be extended for funtions on IRn as follows. 4See Appendix for this derivation and the proof that f is convex. 5This set is called subdi®erential of f at x, @f(x). 6Not only in theory but also in practice the complete knowledge of the subdi®erential is much harder to obtain. 2 De¯nition 3.2 A function f : IRn ! IR = IR [ f+1g is a convex function if for all x; y 2 IRn, ® 2 [0; 1], f(®x + (1 ¡ ®)y) · ®f(x) + (1 ¡ ®)f(y): We also de¯ne the domain of f as the points where f is ¯nite-valued, i.e., dom(f) = fx 2 IRn : f(x) < 1g. f is proper if dom(f) 6= ;. De¯nition 3.3 The indicator function of a set C is de¯ned as ½ 0; x 2 C I (x) = : C +1; x2 = C n Note that C is convex if and only if IC : IR ! IR is a convex function. In some proofs, it is sometimes convenient to work with convex sets instead of functions. So, we de¯ne the following auxiliary object. De¯nition 3.4 De¯ne the epigraph of a function f as epi(f) = f(x; t) 2 IRn £ IR : x 2 dom(f); t ¸ f(x)g: Lemma 3.1 Let f : IRn ! IR be a function. Then, f is convex if and only if epi(f) is a convex set. Proof. ()) Let (x; t); (y; s) 2 epi(f). Thus, t ¸ f(x) and s ¸ f(y) by de¯nition. Take any ® 2 [0; 1], we have that f(®x + (1 ¡ ®)y) · ®f(x) + (1 ¡ ®)f(y) · ®t + (1 ¡ ®)s which proves that (®x + (1 ¡ ®)y; ®t + (1 ¡ ®)s) 2 epi(f). (() Consider x; y 2 dom(f) and ® 2 [0; 1]. Note that (x; f(x)) and (y; f(y)) 2 epi(f). Since epi(f) is convex, (®x+(1¡®)y; ®f(x)+(1¡®)f(y)) 2 epi(f). By de¯nition of epi(f), f(®x + (1 ¡ ®)y) · ®f(x) + (1 ¡ ®)f(y): Throughout these notes, we will assume that our convex functions have a closed epigraph. In fact, we have the following equivalence: Theorem 3.1 Let f : IRn ! [¡1; +1] be an arbitrary function. Then, the following conditions are equivalent: 3 (i) f is lower semi-continuous throughout IRn; (ii) fx 2 IRnjf(x) · ®g is closed for every ® 2 IR; (iii) The epigraph of f is a closed set in IRn+1. We also state some basic facts from Convex Analysis Theory that can be found in [11]. Theorem 3.2 Let f : IRn ! IR be a convex function. Then, f is continuous in int dom(f). 4 The Subdi®erential of a Convex Function Now, we de¯ne one of the main object of our theory, the subdi®erential of a convex function. Later, we will also prove some additional results about the subdi®erential that we do not use but are very insightful. De¯nition 4.1 The subdi®erential of a convex function f : IRn ! IR at a point x is de¯ned as @f(x) = fs 2 IRn : f(y) ¸ f(x) + hs; y ¡ xi for all y 2 IRng: Remark 4.1 We point out that the subdi®erential is a set of linear functionals, so it lives on the dual space of the space that contains dom(f). In the case of IRn we do have an one-to-one correspondence between the dual space and the original space. Lemma 4.1 Let f : IRn ! IR be a convex function, and x 2 int dom(f). Then, @f(x) 6= ;. Proof. Since f is convex, epi(f) is also convex. Also, x 2 dom(f) implies that f(x) < 1. Using the Hahn-Banach Theorem7 there exists a linear functional ins ~ = n (s; sn+1) 2 IR £ IR such that hs;~ (x; f(x))i · hs;~ (y; t)i for all (y; t) 2 epi(f); where the \extended" scalar product is de¯ned as h(s; sn+1); (x; t)i = hs; xi + sn+1t. We need to consider three cases for sn+1. If sn+1 < 0, we have that hs; xi ¡ jsn+1jf(x) · hs; yi ¡ jsn+1jt for all (y; t) 2 epi(f): Letting t % +1 we obtain a contradiction since the left hand side is a constant. If sn+1 = 0, we have that hs; xi · hs; yi for all y 2 dom(f): 7The version of the Separating Hyperplane Theorem for convex sets. 4 Noting that x 2 int dom(f), this is also a contradiction and we can also reject this case. So, sn+1 > 0, and due to homogeneity, we can assume sn+1 = 1. By de¯nition of the epigraph, (y; f(y)) 2 epi(f) for all y 2 dom(f), which implies that hs; xi + f(x) · hs; yi + f(y) ) f(y) ¸ f(x) + h¡s; y ¡ xi : Thus, (¡s) 2 @f(x).

Load more