Interior-Point Theory for Convex Optimization

Interior-Point Theory for Convex Optimization Robert M. Freund May, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Background The material presented herein is based on the following two research texts: Interior-Point Polynomial Algorithms in Convex Programming by Yurii Nesterov and Arkadii Nemirovskii, SIAM 1994, and A Mathematical View of Interior-Point Methods in Convex Optimiza- tion by James Renegar, SIAM 2001. 2 Barrier Scheme for Solving Convex Optimiza- tion Our problem of interest is T P : minimizex c x s.t. x 2 S; where S is some closed convex set, and denote the optimal objective value by V ∗. Let f(·) be a barrier function for S, namely f(·) satisfies: (a) f(·) is strictly convex on its domain Df := intS , and (b) f(x) ! 1 as x ! @S . The idea of the barrier method is to dissuade the algorithm from computing points too close to @S, effectively eliminating the complicating factors of dealing with @S. For every value of µ > 0 we create the barrier problem: T Pµ : minimizex µc x + f(x) s.t. x 2 Df : Note that Pµ is effectively unconstrained, since the boundary of the feasible region will never be encountered. The solution of Pµ is denoted z(µ): 2 T z(µ) := arg minfµc x + f(x): x 2 Df g : x Intuitively, as µ ! 1, the impact of the barrier function on the solution of T ∗ Pµ should become less and less, so we should have c z(µ) ! V as µ ! 1. Presuming this is the case, the barrier scheme tries to use Newton's method i to solve for approximate solutions x of Pµi for an increasing sequence of values of µi ! 1. In order to be more specific about how the barrier scheme might work, let us assume that at each iteration we have some value x 2 Df that is an approximate solution of Pµ for a given value µ > 0. We will, of course, need a way to define \is an approximate solution of Pµ" that will be developed later. We then will increase the barrier parameter µ by a multiplicative factor α > 1: µ^ αµ : Then we will take a Newton step at x for the problem Pµ^ to obtain a new pointx ^ that we would like to then be an approximate solution of Pµ^. If so, we can continue the scheme inductively. We typically use g(·) and H(·) to denote the gradient and Hessian of f(·). Note that the Newton iterate for Pµ^ has the formula: x^ x − H(x)−1(^µc + g(x)) : The general algorithmic scheme is presented in Algorithm 1. 3 Some Plain Facts n Let f(·): R ! R be a twice-differentiable function. We typically use g(·) and H(·) denote the gradient and Hessian of f(·). Here are four facts about integrals and derivatives: Z 1 Fact 3.1 g(y) = g(x) + H(x + t(y − x))(y − x)dt 0 3 Algorithm 1 General Barrier Scheme Initialize. 0 Initialize with µ0 > 0; x 2 Df that is \an approximate solution of Pµ0 ." i 0. Define α > 1. At iteration i : 1. Current values. µ µi x xi 2. Increase µ and take Newton step. µ^ αµ x^ x − H(x)−1(^µc + g(x)) 3. Update values. µi+1 µ^ xi+1 x^ 4 Fact 3.2 Let h(t) := f(x + tv). Then (i) h0 (t) = g(x + tv)T v , and (ii) h00 (t) = vT H(x + tv)v . Fact 3.3 T 1 T f(y) = f(x) + g(x) (y − x) + 2 (y − x) H(x)(y − x) Z 1 Z t + (y − x)T [H(x + s(y − x)) − H(x)](y − x) ds dt 0 0 Z r 1 ar2 Fact 3.4 2 − 1 dt = 0 (1 − at) 1 − ar R 1 1 This follows by observing that (1−at)2 dt = a(1−at) . We also present five additional facts that we will need in our analyses. n n Fact 3.5 Suppose f(·) is a convex function on R , and S ⊂ R is a compact convex set, and suppose x 2 intS satisfies f(x) ≤ f(y) for all y 2 @S. Then f(·) attains its global minimizer on S. p T Fact 3.6 Let kvk := v v be the Euclidean norm. Let λ1 ≤ ::: ≤ λn be the ordered eigenvalues of the symmetric matrix M, and define kMk := maxfkMxk : kxk ≤ 1g. Then kMk = maxifjλijg = maxfjλnj; jλ1jg. Fact 3.7 Suppose A; B are symmetric and A + B = θI for some θ 2 R. Then AB = BA. Furthermore, if A 0;B 0, then AαBβ = BβAα for all α; β ≥ 0. To see why this is true, decompose A = P DP T where P is orthonormal (P T = P −1) and D is diagonal. Then B = P (θI − D)P T , whereby AαBβ = PDαP T P (θI − D)βP T = PDα(θI − D)βP T = P (θI − D)βDαP T = P (θI − D)βP T PDαP T = BβAα. 5 Fact 3.8 Suppose λn ≥ ::: ≥ λ1 > 0. Then maxfjλi − 1jg ≤ maxfλn − 1; 1/λ1 − 1g : i Fact 3.9 Suppose a; b; c; d > 0. Then na c o a + c na c o min ; ≤ ≤ max ; : b d b + d b d 4 Self-Concordant Functions and Properties Let f(·) be a strictly convex twice-differentiable function defined on the open set Df := domainf(·) and let D¯f := cl Df . Consider x 2 Df . We will often abbreviate Hx := H(x) for the Hessian at x. Here we assume that Hx 0, whereby Hx can be used to define the norm p T kvkx := v Hxv which is the \local norm" at x. Notice that 1 p T 2 kvkx = v Hxv = kHx vk ; p T where kwk = w w is the standard Euclidean (L2) norm. Let Bx(x; 1) := fy : ky − xkx < 1g : This is called the open Dikin ball at x after the Russian mathematician I.I.Dikin. Definition 4.1 f(·) is said to be (strongly nondegenerate) self-concordant if for all x 2 Df we have Bx(x; 1) ⊂ Df , and for all y 2 Bx(x; 1) we have: kvky 1 1 − ky − xkx ≤ ≤ kvkx 1 − ky − xkx for all v 6= 0. Let SC denote the class of all such functions. 6 Remark 1 The following are the most-used self-concordant functions: (i) f(x) = − ln(x) for x 2 Df = fx 2 R : x > 0g , k×k (ii) f(X) = − ln det(X) for X 2 Df = fX 2 S : X 0g , and 2 Pn 2 (iii) f(x) = − ln(x1 − j=2 xj ) for x 2 Df := fx : k(x2; : : : ; xn)k < x1g . Before showing that these functions are self-concordant, let us see how we can combine self-concordant functions to obtain other self-concordant functions. Proposition 4.1 (self-concordance under addition/intersection) Sup- pose that fi(·) 2 SC with domain Di := Dfi for i = 1; 2, and suppose that D := D1 \D2 6= ;. Define f(·) = f1(·)+f2(·). Then Df = D and f(·) 2 SC. i Proof: Consider x 2 D = D1 \ D2. Let Bx(c; r) denote the Dikin ball centered at c with radius r defined by fi(·) and let k · kx;i denote the norm induced at x using the Hessian Hi(x) of fi(x) for i = 1; 2. Then since i 2 2 2 x 2 Di we have Bx(x; 1) ⊂ Di and kvkx = kvkx;1 + kvkx;2 because H(x) = H1(x) + H2(x). Therefore if ky − xkx < 1 it follows that ky − xkx;1 < 1 i and ky − xkx;2 < 1, whereby y 2 Bx(x; 1) ⊂ Di for i = 1; 2, and hence y 2 D1 \ D2 = D. Also, for any v 6= 0, using Fact 3.9 we have 2 2 2 2 2 kvky kvky;1+kvky;2 kvky;1 kvky;2 2 = 2 2 ≤ max 2 ; 2 kvkx kvkx;1+kvkx;2 kvkx;1 kvkx;2 2 2 ≤ max 1 ; 1 1−ky−xkx;1 1−ky−xkx;2 2 ≤ 1 : 1−ky−xkx The virtually identical argument can also be applied to prove the \≥" inequality of the definition of self-concordance by replacing \max" by \min" above and applying the other inequality of Fact 3.9. 7 Proposition 4.2 (self-concordance under affine transformation) Let m×n A 2 R satisfy rankA = n ≤ m. Suppose that f(·) 2 SC with domain m ^ ^ ^ Df ⊂ R and define f(·) by f(x) = f(Ax−b). Then f(·) 2 SC with domain D^ := fx : Ax − b 2 Df g. Proof: Consider x 2 D^ and s = Ax − b. Letting g(s) and H(s) denote the gradient and Hessian of f(s) andg ^(x) and H^ (x) the gradient and Hes- sian of f^(x), we haveg ^(x) = AT g(s) and H^ (x) = AT H(s)A. Suppose that ky − xkx < 1. Then defining t := Ay − b we have 1 > ky − xkx = p T T T T (y A − x A )H(s)(Ay − Ax) = kt−sks, whereby t 2 Df and so y 2 D^. Therefore Bx(x; 1) ⊂ D^. Also, for any v 6= 0, we have p T T kvky v A H(t)Av kAvkt 1 1 = p = ≤ = : kvkx vT AT H(s)Av kAvks 1 − ks − tks 1 − ky − xkx The exact same argument can also be applied to prove the \≥" inequality of the definition of self-concordance. Proposition 4.3 The three functions defined in Remark 1 are self-concordant.

Load more