Interior-Point Theory for Convex Optimization

Interior-Point Theory for Convex Optimization Robert M. Freund May, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Background The material presented herein is based on the following two research texts: Interior-Point Polynomial Algorithms in Convex Programming by Yurii Nesterov and Arkadii Nemirovskii, SIAM 1994, and A Mathematical View of Interior-Point Methods in Convex Optimiza- tion by James Renegar, SIAM 2001. 2 Barrier Scheme for Solving Convex Optimiza- tion Our problem of interest is T P : minimizex c x s.t. x 2 S; where S is some closed convex set, and denote the optimal objective value by V ∗. Let f(·) be a barrier function for S, namely f(·) satisfies: (a) f(·) is strictly convex on its domain Df := intS , and (b) f(x) ! 1 as x ! @S . The idea of the barrier method is to dissuade the algorithm from computing points too close to @S, effectively eliminating the complicating factors of dealing with @S. For every value of µ > 0 we create the barrier problem: T Pµ : minimizex µc x + f(x) s.t. x 2 Df : Note that Pµ is effectively unconstrained, since the boundary of the feasible region will never be encountered. The solution of Pµ is denoted z(µ): 2 T z(µ) := arg minfµc x + f(x): x 2 Df g : x Intuitively, as µ ! 1, the impact of the barrier function on the solution of T ∗ Pµ should become less and less, so we should have c z(µ) ! V as µ ! 1. Presuming this is the case, the barrier scheme tries to use Newton's method i to solve for approximate solutions x of Pµi for an increasing sequence of values of µi ! 1. In order to be more specific about how the barrier scheme might work, let us assume that at each iteration we have some value x 2 Df that is an approximate solution of Pµ for a given value µ > 0. We will, of course, need a way to define \is an approximate solution of Pµ" that will be developed later. We then will increase the barrier parameter µ by a multiplicative factor α > 1: µ^ αµ : Then we will take a Newton step at x for the problem Pµ^ to obtain a new pointx ^ that we would like to then be an approximate solution of Pµ^. If so, we can continue the scheme inductively. We typically use g(·) and H(·) to denote the gradient and Hessian of f(·). Note that the Newton iterate for Pµ^ has the formula: x^ x − H(x)−1(^µc + g(x)) : The general algorithmic scheme is presented in Algorithm 1. 3 Some Plain Facts n Let f(·): R ! R be a twice-differentiable function. We typically use g(·) and H(·) denote the gradient and Hessian of f(·). Here are four facts about integrals and derivatives: Z 1 Fact 3.1 g(y) = g(x) + H(x + t(y − x))(y − x)dt 0 3 Algorithm 1 General Barrier Scheme Initialize. 0 Initialize with µ0 > 0; x 2 Df that is \an approximate solution of Pµ0 ." i 0. Define α > 1. At iteration i : 1. Current values. µ µi x xi 2. Increase µ and take Newton step. µ^ αµ x^ x − H(x)−1(^µc + g(x)) 3. Update values. µi+1 µ^ xi+1 x^ 4 Fact 3.2 Let h(t) := f(x + tv). Then (i) h0 (t) = g(x + tv)T v , and (ii) h00 (t) = vT H(x + tv)v . Fact 3.3 T 1 T f(y) = f(x) + g(x) (y − x) + 2 (y − x) H(x)(y − x) Z 1 Z t + (y − x)T [H(x + s(y − x)) − H(x)](y − x) ds dt 0 0 Z r 1 ar2 Fact 3.4 2 − 1 dt = 0 (1 − at) 1 − ar R 1 1 This follows by observing that (1−at)2 dt = a(1−at) . We also present five additional facts that we will need in our analyses. n n Fact 3.5 Suppose f(·) is a convex function on R , and S ⊂ R is a compact convex set, and suppose x 2 intS satisfies f(x) ≤ f(y) for all y 2 @S. Then f(·) attains its global minimizer on S. p T Fact 3.6 Let kvk := v v be the Euclidean norm. Let λ1 ≤ ::: ≤ λn be the ordered eigenvalues of the symmetric matrix M, and define kMk := maxfkMxk : kxk ≤ 1g. Then kMk = maxifjλijg = maxfjλnj; jλ1jg. Fact 3.7 Suppose A; B are symmetric and A + B = θI for some θ 2 R. Then AB = BA. Furthermore, if A 0;B 0, then AαBβ = BβAα for all α; β ≥ 0. To see why this is true, decompose A = P DP T where P is orthonormal (P T = P −1) and D is diagonal. Then B = P (θI − D)P T , whereby AαBβ = PDαP T P (θI − D)βP T = PDα(θI − D)βP T = P (θI − D)βDαP T = P (θI − D)βP T PDαP T = BβAα. 5 Fact 3.8 Suppose λn ≥ ::: ≥ λ1 > 0. Then maxfjλi − 1jg ≤ maxfλn − 1; 1/λ1 − 1g : i Fact 3.9 Suppose a; b; c; d > 0. Then na c o a + c na c o min ; ≤ ≤ max ; : b d b + d b d 4 Self-Concordant Functions and Properties Let f(·) be a strictly convex twice-differentiable function defined on the open set Df := domainf(·) and let D¯f := cl Df . Consider x 2 Df . We will often abbreviate Hx := H(x) for the Hessian at x. Here we assume that Hx 0, whereby Hx can be used to define the norm p T kvkx := v Hxv which is the \local norm" at x. Notice that 1 p T 2 kvkx = v Hxv = kHx vk ; p T where kwk = w w is the standard Euclidean (L2) norm. Let Bx(x; 1) := fy : ky − xkx < 1g : This is called the open Dikin ball at x after the Russian mathematician I.I.Dikin. Definition 4.1 f(·) is said to be (strongly nondegenerate) self-concordant if for all x 2 Df we have Bx(x; 1) ⊂ Df , and for all y 2 Bx(x; 1) we have: kvky 1 1 − ky − xkx ≤ ≤ kvkx 1 − ky − xkx for all v 6= 0. Let SC denote the class of all such functions. 6 Remark 1 The following are the most-used self-concordant functions: (i) f(x) = − ln(x) for x 2 Df = fx 2 R : x > 0g , k×k (ii) f(X) = − ln det(X) for X 2 Df = fX 2 S : X 0g , and 2 Pn 2 (iii) f(x) = − ln(x1 − j=2 xj ) for x 2 Df := fx : k(x2; : : : ; xn)k < x1g . Before showing that these functions are self-concordant, let us see how we can combine self-concordant functions to obtain other self-concordant functions. Proposition 4.1 (self-concordance under addition/intersection) Sup- pose that fi(·) 2 SC with domain Di := Dfi for i = 1; 2, and suppose that D := D1 \D2 6= ;. Define f(·) = f1(·)+f2(·). Then Df = D and f(·) 2 SC. i Proof: Consider x 2 D = D1 \ D2. Let Bx(c; r) denote the Dikin ball centered at c with radius r defined by fi(·) and let k · kx;i denote the norm induced at x using the Hessian Hi(x) of fi(x) for i = 1; 2. Then since i 2 2 2 x 2 Di we have Bx(x; 1) ⊂ Di and kvkx = kvkx;1 + kvkx;2 because H(x) = H1(x) + H2(x). Therefore if ky − xkx < 1 it follows that ky − xkx;1 < 1 i and ky − xkx;2 < 1, whereby y 2 Bx(x; 1) ⊂ Di for i = 1; 2, and hence y 2 D1 \ D2 = D. Also, for any v 6= 0, using Fact 3.9 we have 2 2 2 2 2 kvky kvky;1+kvky;2 kvky;1 kvky;2 2 = 2 2 ≤ max 2 ; 2 kvkx kvkx;1+kvkx;2 kvkx;1 kvkx;2 2 2 ≤ max 1 ; 1 1−ky−xkx;1 1−ky−xkx;2 2 ≤ 1 : 1−ky−xkx The virtually identical argument can also be applied to prove the \≥" inequality of the definition of self-concordance by replacing \max" by \min" above and applying the other inequality of Fact 3.9. 7 Proposition 4.2 (self-concordance under affine transformation) Let m×n A 2 R satisfy rankA = n ≤ m. Suppose that f(·) 2 SC with domain m ^ ^ ^ Df ⊂ R and define f(·) by f(x) = f(Ax−b). Then f(·) 2 SC with domain D^ := fx : Ax − b 2 Df g. Proof: Consider x 2 D^ and s = Ax − b. Letting g(s) and H(s) denote the gradient and Hessian of f(s) andg ^(x) and H^ (x) the gradient and Hes- sian of f^(x), we haveg ^(x) = AT g(s) and H^ (x) = AT H(s)A. Suppose that ky − xkx < 1. Then defining t := Ay − b we have 1 > ky − xkx = p T T T T (y A − x A )H(s)(Ay − Ax) = kt−sks, whereby t 2 Df and so y 2 D^. Therefore Bx(x; 1) ⊂ D^. Also, for any v 6= 0, we have p T T kvky v A H(t)Av kAvkt 1 1 = p = ≤ = : kvkx vT AT H(s)Av kAvks 1 − ks − tks 1 − ky − xkx The exact same argument can also be applied to prove the \≥" inequality of the definition of self-concordance. Proposition 4.3 The three functions defined in Remark 1 are self-concordant.

Interior-Point Theory for Convex Optimization

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support