<<

A Mathematical View

of

InteriorPoint Metho ds

for

Convex Optimization

James Renegar

Scho ol of Op erations Research

and Industrial Engineering

Cornell University

renegaroriecornel ledu

July

Supp orted by NSF Grant CCR

These lecture notes aim at developing a thorough understanding of the

core theory for interiorp oint metho ds The overall theory continues to

growat a rapid rate but the core ideas have remained largely unchanged

for several years since Nesterovand Nemirovskii published their path

breaking broadlyencompassing and enormously inuential work Since

then what has changed ab out the core ideas is our conception of them

Whereas is notoriously dicult reading even for sp ecialists we now

knowhow to motivate and present the general theory in suchawayasto

make it accessible for nonsp ecialists and PhD students Therein lies the

justication for these lecture notes

n

Wedevelop the theory in although most of the theory can b e devel

op ed in arbitrary real Hilb ert spaces The restriction to nite dimensions

is primarily for accessibility

The notes were develop ed largely in conjunction with a PhDlevel course

on interiorp oint metho ds at Cornell University

Presently the notes contain only two chapters but those are sucient

to provide the reader with a solid introduction to the contemporary view of

interiorpoint methods A chapter on Theory is nearing comple

tion A chapter on Complexity Theory is planned If you are interestedin

receiving the chapters as they arecompleted please send a brief message to

renegaroriecornel ledu

Chapter

Preliminaries

This chapter provides a review of material p ertinenttocontinuous optimiza

tion theory quite generally alb eit phrased so as to b e readily applicable in

developing interiorp oint metho d ipm theory The primary dierence b e

tween our exp osition and more customary approaches is that we do not rely

on co ordinate systems For example it is customary to dene the gradient

n n n

of a functional f as the vectorvalued function g

th

whose j co ordinate is f x Instead we consider the gradient as de

j

termined by an underlying inner pro duct h i For us the gradientisthe

function g satisfying

f x x f x hg x xi

lim

kxk kxk

th

where kxk hx xi In general the function whose j co ordinate

is f x is the gradient only if h i is the Euclidean inner pro duct

j

The natural geometry varies from point to point in the domains of

optimization problems that can be solved by ipms As the algorithms

progress from one p oint to the next one changes the inner pro duct and

hence the geometry to visualize the headwayachieved by the algorithms

The relevant inner pro ducts may b ear no relation to an initiallyimp osed

co ordinatesystem Consequently in aiming for the most transparent and

least cumb ersome pro ofs one should disp ense with co ordinate systems

We b egin with a review of linear algebra by recalling for example

the notion of a selfadjoint linear op erator We then dene gradients and

Hessians emphasizing howtheychange when the underlying inner pro duct

is changed Next is a brief review of basic results for convex functionals

followed by results akin to the Fundamental Theorem of Calculus Although

CHAPTER PRELIMINARIES

these Calculus results are elementary and rather dry they are essential

in achieving lifto for the ipm theory Finallywe recall Newtons metho d

for continuous optimization proving a standard theorem which later plays

acentral motivational role

Linear Algebra

n

We let h i denote an arbitrary inner pro duct on In later sections h i

will act as a reference inner pro duct an inner pro duct from which other

inner pro ducts are constructed For the ipm theory it happ ens that the

reference inner pro duct h i is irrelevant the inner pro ducts essential to

the theory are indep endent of the reference inner pro duct To large extent

the reference inner pro duct will serve only to x notation

Although the particular reference inner pro duct will prove to b e irrele

vant for ipm theory for optimization problems to b e solved by ipms there

typically are asso ciated natural reference inner pro ducts For example in

linear programming LP where vectors x are expressed co ordinatewise and

x means each co ordinate is nonnegative the natural inner pro duct

is the Euclidean inner pro duct which we refer to as the dot pro duct

writing x x Similarly in semidenite programming SDP where the

nn

relevantvector space is S the space of symmetric n n real matrices

X and X means X is p ositive semidenite ie has no negative

eigenvalues the natural inner pro duct is the trace pro duct

X S traceXS

Thus X S equals the sum of the eigenvalues of the matrix XS

n

Throughout the general development weuse to denote an arbitrary

nn

nitedimensional real vector space b e it S or whatever

n

The inner pro duct h i induces a norm on

kxk hx xi

Perhaps the most useful relation b etween the inner pro duct and the norm

is the CauchySchwarz inequality

jhx x ij kx kkx k

with equality i x and x are colinear If neither x nor x is the zero

vector CauchySchwarz implies the existence of satisfying

cos hx x ikx kkx k

LINEAR ALGEBRA

The value is referred to as the angle b etween x and x

Whereas the dot pro duct gives rise to the Euclidean norm the norm

arising from the trace pro duct is known as the Frob enius norm The

Frob enius norm can b e extended to the vector space of all real n n matrices

P

by dening kM k m where m are the co ecients of M

ij

ij

We remark that the Frob enius norm is submultiplicative meaning

kXSk kX kkS k

n

Recall that vectors x x are said to b e orthogonal if hx x i

n

Recall that a basis v v for is said to b e an orthonormal if

n

hv v i for all i j

i j ij

where is the Kronecker delta A linear op erator ie a linear transfor

ij

n n

mation Q is said to b e orthogonal if

n

hQx Qx i hx x i for all x x

n

If given an inner pro duct h i on one adopts a co ordinate system

obtained by expressing vectors as linear combinations of an orthonormal

basis the inner pro duct h i is the dot pro duct for that co ordinate system

Consequently one can consider the results we review below as following

from the sp ecial case of the dot pro duct but one should keep in mind that

thinking in terms of co ordinates is b est avoided for understanding the ipm

theory

n m n m

If b oth and are endowed with inner pro ducts and A

m n

is a linear op erator there exists a unique linear op erator A

satisfying

n m

hAx y i hx A y i for all x y

The op erator A is the adjoint of A The range of A is orthogonal to the

nullspace of A

n

Assuming A is surjective the linear op erator A AA A pro jects

orthogonally onto the range space of A that is the image of x is the p oint

n

in the range space closest to x Likewise I A AA A pro jects

orthogonally onto the nullspace of A

n m

If b oth and are endowed with the dot pro duct and if A and A

are written as matrices then A is the transp ose of A Thus it is natural in

T

this setting to write A rather than A

It is a simple but imp ortant exercise for SDP to showthatif S S

m

nn nn m

S and A S is the linear op erator dened by

X X S X S

m

CHAPTER PRELIMINARIES

then

X

A y y S

i i

i

nn m

assuming S is endowed with the trace pro duct and is endowed with

the dot pro duct

n m

Continuing to assume and are endowed with inner pro ducts

and hence norms one obtains an induced operator norm on the vector

n m

space consisting of linear op erators A

kAk maxfkAxk kxk g

n m

Each linear op erator A has a singularvalue decomposition

Precisely there exist orthomormal bases u u and w w aswell

n m

as real numbers where r is the rank of Asuch that for

r

all x

r

X

Ax hu xiw

i i i

i

The numbers are the singular values of Aifrnthen the numberis

i

also considered to b e a singular value of A It is easily seen that kAk

r

Moreover

r

X

A y hw yiu

i i i

i

so that the values and p ossibly are also the singular values of A It

i

immediately follows that kA k kAk

n m

If and are endowed with the dot pro duct the singularvalue

decomp osition corresp onds to the fact that if A is an m n matrix there

exist orthogonal matrices Q and Q such that Q AQ where is an

m n m n

m n matrix with zeros everywhere except p ossibly for p ositivenumb ers

on its main diagonal

n n

It is not dicult to prove that a linear op erator Q is orthog

onal i Q Q For orthogonal op erators kQk

n n

is said to b e selfadjoint if S S A linear op erator S

If h i is the dot pro duct and S is written as a matrix then S b eing

selfadjointisequivalenttoS b eing symmetric

nn

It is instructive to show that for S S the linear op erator A

nn nn

S S dened by

X SX S

LINEAR ALGEBRA

is selfadjoint Such op erators are imp ortant in the ipm theory for SDP

n n

A linear op erator S is said to b e positive semidenite psd

if S is selfadjointand

n

hx S xi for all x

If further S satises

hx S xi for all x

then S is said to b e positive denite p d

Each selfadjoint linear op erator S has a spectral decomposition Pre

cisely for each selfadjoint linear op erator S there exists an orthonormal

basis v v and real numbers such that for all x

n n

X

Sx hv xiv

i i i

It is easily seen that v is an eigenvector for S with eigenvalue

i i

If h i is the dot pro duct and S is a symmetric matrix then the sp ectral

decomp osition corresp onds to the fact that S can b e diagonalized using an

T

orthogonal matrix ie Q SQ

The following relations are easily established

kS k max j j maxfjhx S xij kxk g

i i

S is psd i for all i

i

S is p d i for all i

i

If S exists then it to o is selfadjoint and has eigenvalues

i

In particular kS k min j j

i i

The sp ectral decomp osition for a psd op erator S allows one to easily

prove the existence of a psd op erator S satisfying S S simply

p

replace by in the decomp osition In turn the uniqueness of S

i i

can readily be proven by relying on the fact that if T is a psd op erator

satisfying T S then the eigenvectors for T are eigenvectors for S The

op erator S is the square ro ot of S

Here is a crucial observation If S is pd then S denes a new inner

pro duct namely

hx x i hx Sx i

S

n

Every inner pro duct on arises in this way that is regardless of the

initial inner pro duct h i for every other inner pro duct there exists S

CHAPTER PRELIMINARIES

which is pd wrt h i and for which h i is precisely the other inner

S

pro duct

Let k k denote the norm induced by h i

S S

n m

Assume A is the adjointof A Assuming S and T are p d

n

wrt the resp ective inner pro ducts if the inner pro duct on is replaced

m

by h i and that on is replaced by h i then the adjoint of A

S T

b ecomes S A T as is easily shown In particular if m n and S T

then the adjointofA b ecomes S A S Moreover letting kAk denote

ST

the resulting op erator norm it is easily proven that

kAk kT AS k and kAk kT AS k

ST ST

The notation used in stating these facts illustrates our earlier assertion that

the reference inner pro duct will b e useful in xing notation as the inner

pro ducts change

It is instructive to consider the shap e of the unit wrt k k viewed

S

in terms of the geometry of the reference inner pro duct The sp ectral

decomp osition of S easily implies the unit ball to b e an ellipsoid with axes

in the directions of the orthonormal basis vectors v v the length of

n

p

the axis in the direction of v b eing

i i

Gradients

Recall that a functional is a function whose range lies in WeuseD to

f

denote the domain of a functional f It will always b e assumed that D is

f

n n

an op en of in the norm top ology recalling that all norms on

induce the same top ology Let h i denote an arbitrary inner pro duct on

n

and let k k denote the norm induced by h i

The functional f is said to b e Frechet dierentiable at x D if there

f

exists a vector g x satisfying

f x x f x hg x xi

lim

kxk

kxk

The vector g xisthegradient of f at x wrt h i

n

Of course if one cho oses the inner pro duct h i on to be the dot

th

pro duct and expresses g x co ordinatewise the j co ordinate is f x

j

For an arbitrary inner pro duct h i the gradient has the same geomet

rical interpretation that is taught in Calculus for the dot pro duct gradient

Roughly sp eaking the gradient g x p oints in the direction for which the

functional output increases the fastest p er unit distance travelled and the

GRADIENTS

magnitude kg xk equals the amount the functional will change per unit

distance travelled in that direction We give rigour to this geometrical

interpretation in x

The rstorder approximation of f at x is the linear functional

y f xhg xy xi

If f is dierentiable at each x D then f is said to b e dierentiable

f

Henceforth assume f is dierentiable

If the function x g xiscontinuous at each x D then f is said to

f

be continuously dierentiable One then writes f C

To illustrate the denition of gradient consider the functional

f X ln detX

nn

nn

with domain S the set of all pd matrices in S This functional

plays an esp ecially imp ortantroleinSDP We claim that wrt the trace

pro duct

g X X

nn

For let X S and denote the eigenvalues of X X X by

Since the trace of a matrix dep ends only on the eigenvalues

n

hence X X traceX X X wehave

f X X f X hX X i

kX k

jln detX X ln detX X X j

X X

jln detI X X X traceX X X j

traceX

P

j ln j

i i

i

traceX

Letting X and X denote the eigenvalues of X and X it is easily

i i

proven that

traceX max j X jmin X max j j

i i i

i i i

and hence

f X X f X hX X i

lim sup

kX k

kX k

P

j ln j

i i

i

lim sup

min X max j j

i i i i

kX k

CHAPTER PRELIMINARIES

Since implies when kX k it is now straightforward to

i

conclude that the value of the limit supremum is Thus g X X

Our denition of what it means for a functional f to b e dierentiable

dep ends on the inner pro duct h i However relying on the equivalence of

n

all norm top ologies on it is readily proven that the prop erty of b eing

dierentiable and being continuously dierentiable is indep endent of

the inner pro duct The gradient dep ends on the inner pro duct but dier

entiabilitydoesnot

The following prop osition shows how the gradientchanges as the inner

pro duct changes

Prop osition If S is pdandf is dierentiable at x then the gradient

of f at x wrt h i is S g x

S

Pro of Letting denote the least eigenvalue of S the pro of relies on the

fact that

p

kxkkxk for all x

S

as follows easily from the sp ectral decomp osition of S

Toprove that S g x is the gradientof f at x wrt h i we wish

S

to show

jf x x f x hS g x xi j

S

lim sup

kxk

S

kxk

S

However noting that for all v

hS g xvi hS g xSvi hSS g xvi hg xvi

S

wehave

jf x x f x hS g x xi j

S

lim sup

kxk

S

kxk

S

jf x x f x hg x xij

lim sup

kxk

S

kxk

S

jf x x f x hg x xij

p

lim sup

kxk

kxk

S

f x x f x hg x xi

p

lim

kxk

kxk

the nexttolast equality due to kxkif kxk b ecause kxk

S

p

kxk

S

HESSIANS

Theorem has the unsurprising consequence that the rstorder

approximation of f at x is indep endent of the inner pro duct

f xhg xy xi f xhS g xy xi

S

Finally we make an observation that will be imp ortant for applying

ipm theory to optimization problems having linear equations among the

n

constraints Assume L is a subspace of Restricting h i to L makes L

into an inner pro duct space Thus if f is a functional for which D L

f

one can sp eak of the gradientof f j the functional obtained by restricting f

L

to L Let g j denote the gradient a function from L to L It is not dicult

L

n

to prove g j P g where P is the op erator pro jecting orthogonally

L L L

onto L Summarizing if x L then the gradient g j xoff j at x is the

L L

vector P g x

L

Hessians

The functional f is said to b e twice dierentiable at x D if f C and

f

n n

there exists a linear op erator H x satisfying

kg x x g x H xxk

lim

kxk

kxk

If it exists H x is said to b e the Hessian of f at x wrt h i

If h i is the dot pro duct and H x is written as a matrix the i j

entry of H xis

f

x x

i j

The secondorder approximation of f at x is the linear functional

hy xHxy xi y f xhg xy xi

If f is twice dierentiable at each x D then f is said to be twice

f

dierentiable Henceforth assume f is twice dierentiable

If the function x H xiscontinuous at x wrt the op eratornorm

top ology or equivalentlyany norm top ology on the vector space of linear

n n

op erators from to then H x is selfadjoint If the function x

H x is continuous at each x D then f is said to b e twicecontinuously

f

dierentiable One then writes f C

CHAPTER PRELIMINARIES

The assumption of twice continuous dierentiability as opp osed to mere

twice dierentiability is often made in optimization primarily to ensure self

adjointness of the Hessian

If the inner pro duct is the dot pro duct and Hessian is expressed as a

matrix selfadjointness is equivalent to the matrix b eing symmetric that

is

f f

x x x x

i j j i

If the Hessian matrix do es not vary continuously in x the order in which

the partials are taken can matter resulting in a nonsymmetric matrix

To illustrate the denition of the Hessian we again consider the func

tional

f X ln detX

nn

nn

with domain S the set of all p d matrices in S Wesaw that g X

X We claim that H X is the linear op erator given by

X X X X

This can b e proven by relying on the fact that if kX k is suciently small

then

X

k

X X X X X

k

and hence

X

k

g X X g X H X X X X X

k

For then from the submultiplicativityoftheFrob enius norm

jg X X g X H X X j

lim sup

kX k

kX k

P

k

kX kkX k kX k kX k

k

lim sup

kX k

kX k

The prop erty of b eing twice continuously dierentiable does not depend

on the inner pro duct whereas the Hessian most certainly do es dep end on the

inner pro duct as is made explicit in the following prop osition The pro of

of the following prop osition is similar to the pro of of Prop osition and hence is left to the reader

CONVEXITY

Prop osition If S is pd and f is twice dierentiable at x then the

Hessian of f at x wrt h i is S H x

S

Prop ositions and have the unsurprising consequence that the

secondorder approximation of f at x is indep endent of the inner pro duct

hy xHxy xi f xhg xy xi

f xhS g xy xi hy xS H xy xi

S S

Finally we make an observation regarding Hessians and subspaces It

n

is straightforward to prove that if L is a subspace of and f satises

D L then the Hessian of f j at x L an op erator from L to L is

f L

given by H j xP H x That is when one applies H j xtoavector

L L L

v L one obtains the vector P H xv

L

Convexity

n

Recall that a set S is said to be convex if whenever x y S and

t wehave x ty x S

Recall that a functional f is said to be convex if D is convex and if

f

whenever x y D and t wehave

f

f x ty x f xtf y f x

If the inequality is strict whenever t andx y then f is said to

be strictly convex

The minimizers of a convex functional form a A strictly

convex functional has at most one minimizer

Henceforth we assume f C and we assume D is an open convex

f

set

If f is a univariate functional weknow from Calculus that f is convex

i f x for all x D Similarlyif f x for all x D then f is

f f

strictly convex The following standard theorem generalizes these facts

Prop osition The functional f is convex i H x is psd for al l x

D If H x is pd for al l x D then f is strictly convex

f f

The following elementary prop osition which is relied on in the pro of of

Prop osition is fundamental throughout these lecture notes It do es

not assume convexityof f

CHAPTER PRELIMINARIES

Prop osition Assume x y D and dene a univariate functional

f

by

tf x ty x

Then

thg x ty xy xi

and

thy x H x ty xy xi

Pro of Fix t and let u x ty x Wewishtoprove

thg uy xi and thy x H uy xi

To prove thg uy xi it suces to show

t s t shg uy xi

lim sup

s

s

However noting tf uandt sf u sy x wehave

t s t shg uy xi

lim sup

s

s

f u sy x f u hg usy xi

lim sup

s

s

jf u sy x f u hg usy xij

ky xk lim sup

ksy xk

ksy xk

the nal equalityby denition of g u

Similarlytoprove thy x H uy xi it suces to show

t s t shy x H uy xi

lim sup

s

s

However since wenowknow

thg uy xi and t shg u sy xy xi

wehave

t s t shy x H uy xi

lim sup

s

s

CONVEXITY

hg u sy x g u H usy xy xi

lim sup

s

s

kg u sy x g u H usy xkky xk

lim sup

jsj

s

kg u sy x g u H usy xk

ky xk lim sup

ksy xk

ksy xk

where the inequality is by CauchySchwarz and the nal equality is by

denition of H u

Pro of of Prop osition Werstshow that if the Hessian is psd ev

erywhere on D then f is convex So assume the Hessian is psd everywhere

f

on D

f

Assume x and y are arbitrary p oints in D We wish to show that if t

f

satises t then

f x ty x f xtf y f x

Consider the univariate functional dened by

tf x ty x

Observe is equivalentto

t t

an inquality that is certainly valid if is convex on the interval Hence

to prove it suces to prove t for all t However

Prop osition implies

thy x H x ty xy xi

the inequality b ecause H x ty x is psd

The pro of that f is strictly convex if the Hessian is everywhere p d on

D is similar and hence is left to the reader

f

To conclude the pro of it suces to showthatif H x is not psd for some

x then f is not convex If H x is not psd then H x has an eigenvector

v with negative eigenvalue Toshow f is not convex it suces to show

the functional t f x tv is not convex Toshow is not convex

it suces to show This is straightforward again relying on

Prop osition

CHAPTER PRELIMINARIES

Earlier in the notes it was asserted that roughly sp eaking the gradient

g x p oints in the direction for whichthe functional output increases the

fastest per unit distance travelled and the magnitude kg xk equals the

amount the functional will change p er unit distance travelled in that direc

tion Prop osition provides the means to make this rigorous Cho ose

an arbitrary direction v of unit length The initial rate of change in the

output of f as one moves from x to x v in unit time is given by

where

tf x tv

Note Prop osition implies

hg xvi

and hence by CauchySchwarz and kv k wehave

kg xk kg xk

So the initial rate of change cannot exceed kg xk in magnitude regardless

of which direction v of unit length is chosen However assuming g x

if one cho oses the direction

v g x

kg xk

implies kg xk

We mention that a p oint z minimizes a convex functional f i g z

For the if assume g z For y D consider the univariate

f

functional tf z ty z By Prop osition Since

is convex we know from univariate Calculus that minimizes In

particular that is f z f y For the only if assume

g z and consider the discussion of the preceding paragraph

As with the two preceding sections we close this one with a discussion

of subspaces

n

If L is a subspace of and D L weknow the gradientof f j

f L

to b e P g Thus z L solves the constrained optimization problem

L

min f x

st x L

i P g z that is i g z is orthogonal to L In particular if L is the

L

nullspace of a linear op erator A then z solves the optimization problem i

g z A y for some y Likewise when L is replaced by a translate of L

that is when L is replaced by an ane space v L for some vector v We

record this in the following prop osition

FUNDAMENTAL THEOREMS OF CALCULUS

Prop osition If f is convex and A is a linear operator then z D

f

solves the linearlyconstrained optimization problem

min f x

st Ax b

i Az b and g z A y for some y

Fundamental Theorems of Calculus

Wecontinue to assume f C and D is an op en convex set

f

The following theorem generalizes the Fundamental Theorem of Calcu

lus

Theorem If x y D then

f

Z

hg x ty xy xi dt f y f x

Pro of Consider the univariate functional t f x ty x The

fundamental theorem of Calculus asserts

Z

t dt

Since f y f xandby Prop osition

thg x ty xy xi

the pro of is complete

In a similar vein wehave the following prop osition

Prop osition If x y D then

f

f y f xhg xy xi

Z

hg x ty x g xy xi dt

and

hy x H xy xi f y f xhg xy xi

Z Z

t

hy x H x sy x H xy xi ds dt

CHAPTER PRELIMINARIES

Pro of Again considering the univariate functional tf x ty x

the Fundamental Theorem of Calculus implies

Z

t dt

and

Z Z

t

s ds dt

Using Prop osition to make the obvious substitutions yields

whereas yields

Prop osition provides the means to b ound the error in the rst and

second order approximations of f

Corollary If x y D then

f

jf y f x hg xy xij

Z

ky xk kg x ty x g xk dt

and

jf y f x hg xy xi hy x H xy xij

Z Z

t

ky xk kH x sy x H xk ds dt

Relying on continuity of g and H observethat the error in the rst

order approximation is oky xk ie tends to zero faster than ky xk

whereas the error in the secondorder approximation is oky xk

Theorem gives a fundamental theorem of Calculus for a functional

f It will b e necessary to have an analogous theorem for g a theorem which

expresses the dierence g y g xasanintegral involving the Hessian To

keep our development co ordinatefree weintro duce the following denition

n

The univariate function t v t with domain a b is

said to b e integrable if there exists a vector u such that

Z

b

n

hv twi dt for all w hu w i

a

If it exists the vector u is uniquely determined as is not dicult

to prove and is called the integral of the function v t One uses

R

b

the notation v t dt to representthisvector a

FUNDAMENTAL THEOREMS OF CALCULUS

Although the denition of the integral is phrased in terms of the inner

pro duct h i it is indep endent of the inner pro duct For if u is the integral

as dened by h i and if S is p d then for all vectors w

hu w i hu S w i

S

Z

b

hv tSwi dt

a

Z

b

hv twi dt

S

a

Following are two useful elementary prop ositions

n

Prop osition If the univariate function t v t with domain

a b is integrable then

Z Z

b b

k v t dtk kv tk dt

a a

R

b

Pro of Let u v t dt By denition of the integral for all w wehave

a

Z

b

hu w i hv twi dt

a

In particular cho osing w u gives

Z

b

kuk hv tui dt

a

However

Z Z

b b

hv tui dt j hv tui dtj

a a

Z

b

jhv tuij dt

a

Z

b

kv tkkuk dt

a

Z

b

kv tk dt kuk

a

Combining and gives

Z

b

kuk kuk kv tk dt a

CHAPTER PRELIMINARIES

R

b

v t dtk the pro of is complete Since kuk k

a

n

Prop osition If the univariate function t v t with do

n m

main a bisintegrable and if A is a linear operator then the

function t Av t is integrable and

Z Z

b b

Av t dt A v t dt

a a

m

Pro of Observe that for all w wehave

Z Z

b b

v t dt A w i v t dt w i h hA

a a

Z

b

hv tA w i dt

a

Z

b

hAv twi dt

a

R

b

where the second equalityisby denition of v t dt

a

Next is the fundamental theorem of Calculus for the gradient

Theorem If x y D then

f

Z

g y g x H x ty xy x dt

Pro of By denition of the integral wewishtoprove that for all w

Z

hg y g xwi hH x ty xy xwi dt

Fix arbitrary w and consider the functional

thg x ty xwi

The Fundamental Theorem of Calculus asserts

Z

t dt

which by denition of is equivalentto

Z

hg y g xwi t dt

FUNDAMENTAL THEOREMS OF CALCULUS

Comparing with we see that to prove it suces

to show for arbitrary t that

thH uy xwi

where

u x ty x

Towards proving recall that H u is the unique op erator satis

fying

kg u u g u H uuk

lim

kuk kuk

Thinking of u as b eing sy x where s it follows from that

kg u sy x g u sH uy xk

lim

s

s

Since by CauchySchwarz

kg u sy x g u sH uy xkkw k

jhg u sy x g u sH uy xwij

implies

hg u sy x g u sH uy xwi

lim

s

s

Since

t shg u sy xwi and thg uwi

wethus have

t s t shH uy xwi

lim

s

s

from which it is immediate that hH uy xwi t Thus is

established and the pro of is complete

Prop osition If x y D then

f

Z

H x ty x H xy x dt g y g xH xy x

CHAPTER PRELIMINARIES

Pro of A simple consequence of Theorem and

Z

H xy x dt H xy x

an identity which is trivially veried

Corollary If x y D then

f

Z

kg y g x H xy xkky xk kH x ty x H xk dt

NEWTONS METHOD

Newtons Metho d

In optimization Newtons metho d is an algorithm for minimizing func

tionals The idea behind the algorithm is simple Given a p oint x in the

domain of a functional f where f is to be minimized one replaces f by

the secondorder approximation at x and minimizes the approximation to

obtain a new p oint x One iterates this pro cedure with x in place of x

and so on generating a sequence of p oints which under certain conditions

converges rapidly to a minimizer of f

For x D we denote the secondorder or quadratic approxima

f

tion at x by

q y f xhg xy xi hy x H xy xi

x

n

The domain of q is all of

x

Prop osition The gradient at y of q is g xH xy x and the

x

Hessian is H x regard less of y

Pro of Using the selfadjointness of H x is easily established that

hy H xy i q y y q y hg x H xy x y i

x x

Proving the gradient is as asserted is thus equivalenttoproving

hy H xy i

lim

ky k

ky k

an easily established identity

Having proven the gradient is as asserted it is a trivial to prove the

Hessian is as asserted

Henceforth assume H xispd Then q is strictly convex and hence is

x

minimized by the p oint x satisfying g xH xx x that is q

x

is minimized bythepoint

x x H x g x

The Newtonstep at x is dened to b e the dierence

nxx x H x g x

Newtons metho d steps from x to x nx

We know the secondorder approximation is indep endent of the inner

pro duct Consequently so is Newtons metho d More explicitly in the

CHAPTER PRELIMINARIES

inner pro duct h i the gradient of f at x is S g x the Hessian is

S

S H x and so the Newton step is

S H x S g xH x g x

The Newton step is unchanged

The following theorem is the main to ol for analyzing the progress of

Newtons metho d

Theorem If z minimizes f and H x is invertible then

Z

kH x tz x H xk dt kx z kkx z kkH x k

Pro of Noting g z wehave

g xk kx z k kx z H x

kx z H x g z g xk

Z

H x tz xz xdtk kx z H x

Z

kH x H x tz x H xz x dtk

Z

kx z kkH x k kH x tz x H xk dt

Invoking the assumed continuity of the Hessian the theorem is seen to

imply that if H z isinvertible and x is suciently close to z then x will

b e closer to z than is x

Nowwe present a brief discussion of Newtons metho d and subspaces

as will b e imp ortantwhenwe consider applications of ipm theory to opti

mization problems having linear equations among the constraints Assume

n

L is a subspace of and x L D Let nj x denote the Newton step

f L

for f j at x Since the Hessian of f j at x is P H x and the gradientis

L L L

P g x the Newton step nj xisthevector in L solving

L L

P H xnj xP g x

L L L

that is nj x is the vector in L for which H xnj xg x is orthogonal

L L

n m

to L In particular if L is the nullspace of a linear op erator A

n m

then nj x is the vector in for which there exists y satisfying

L

H xnj xg xA y

L

Anj x L

NEWTONS METHOD

Computing nj x and y canthus b e accomplished by solving a system

L

of m n equations in m n variables

If H x is readily computed as it is for functionals f used in ipms

the size of the system of linear equations to b e solved can easily b e reduced

to m variables One solves the linear system

AH x A y AH x g x

and then computes

nj xH x A y g x

L

In closing this section we remark that of course the error b ound given by

Theorem applies to f j if z minimizes f j x x nj x and the

L L L

Hessians for f are replaced by Hessians for f j In fact the Hessians need

L

not be replaced by the Hessians for f j To verify the replacement need

L

not be done one notes for example that b ecause H j x P H xP

L L L

wehave

kH x k kH j x k

L

where is the smallest eigenvalue of the p d op erator H j x H x

L

resp ectively

CHAPTER PRELIMINARIES

Chapter

Basic InteriorPoint

Metho d Theory

Throughout this chapter unless otherwise stated f refers to a functional

having at least the following prop erties D is op en and convex f C

f

H x is p d for all x D In particular f is strictly convex

f

Intrinsic Inner Pro ducts

The functional f gives rise to a family of inner pro ducts h i an

H x

inner pro duct for each p oint x D These inner pro ducts vary continu

f

ously with x In particular given there exists a neighborhood of x

consisting of p oints y with the prop erty that for all vectors v

kv k

H y

kv k

H x

We often refer to the inner pro duct h i as the lo cal inner pro duct

H x

at x

In the inner pro duct h i the gradient at y is H x g y and

H x

the Hessian is H x H y In particular the gradientat x is nx the

negativeof the Newtonstep and the Hessian is I the identity Thus in

the lo cal inner pro duct Newtons metho d coincides with the metho d of

steep est descent ie Newtons metho d coincides with the algorithm which

attempts to minimize f bymoving in the direction given by the negativeof

the gradient Whereas Newtons metho d is indep endent of inner pro ducts

the metho d of steep est descent is not indep endent b ecause gradients are

not indep endent

CHAPTER BASIC INTERIORPOINT METHOD THEORY

It app ears from our denition that the lo cal inner pro duct p otentially

dep ends on the reference inner pro duct h i b ecause the Hessian H xis

wrt that inner pro duct In fact the lo cal inner pro duct is indep endentof

the reference inner pro duct For if the reference inner pro duct is changed

to h i and hence the Hessian is changed to S H x the resulting lo cal

S

inner pro duct is

hu S H xv i hu S S H xv i hu H xv i

S

that is the lo cal inner pro duct is unchanged

The indep endence of the lo cal inner pro ducts from the reference inner

pro duct shows the lo cal inner pro ducts to be intrinsic to the functional

f To highlight the indep endence of the lo cal pro ducts from any reference

inner pro duct we adopt notation whichavoids a reference We denote the

lo cal inner pro duct at x by h i Let k k denote the induced norm

x x

For y D let g y denote the gradientat y and let H y denote the

f x x

n m

Hessian Thus g xnxandH xI If A is a linear

x x

op erator let A denotes its adjoint wrt h i Of course the adjoint

x

x

m

also dep ends on the inner pro duct on That inner pro duct will always

b e xed but arbitraryunliketheintrinsic inner pro ducts whichvary with

x and are not arbitrary dep ending on f

The reader should be esp ecially aware that we use g x and nx

x

interchangably dep ending on context

A miniscule amount of the ipm literature is written in terms of the

lo cal inner pro ducts Rather in much of the literature only a reference

inner pro duct is explicit say the dotpro duct There the pro ofs are done

by manipulating op erators built from Hessians op erators like H x H y

T

and AH x A op erators we recognize as b eing H y and AA An

x

x

advantage to working in the lo cal inner pro ducts is that the underlying

geometry b ecomes evident and consequently the op erator manipulations

in the pro ofs b ecome less mysterious

Observethat in the lo cal inner pro duct the quadratic approximation

of f at x is

ky xk q y f x hnxy xi

x x

x

and its error in approximating f y Corollary is no worse than

Z Z

t

kI H x sy xk ds dt ky xk

x x

x

where the latter norm is the op erator norm induced by the lo cal norm

Similarly the progress made by Newtons metho d towards approximating

SELFCONCORDANT FUNCTIONALS

a minimizer z Theorem is captured by the inequality

Z

kx z k kx z k kI H x tz xk dt

x x x x

n

Assume L is a subspace of and x L D Let P denote the

f Lx

n

op erator pro jecting orthogonally onto L orthogonal wrt h i In

x

the inner pro duct obtained by restricting h i to L the Hessian of f j at

x L

x is

n

P H xP I the last equalityisvalid on L not

Lx x Lx

Consequently the lo cal inner pro duct on L induced by f j is precisely the

L

restriction of h i to L Letting g j denote the gradientof f j wrt the

x Lx L

lo cal inner pro duct on Lwethus have

nj xg j xP g xP nx

L Lx Lx x Lx

That is in the lo cal inner pro duct the Newton step for f j is the orthogonal

L

pro jection of the Newton step for f

If L is the nullspace of a surjective linear op erator A the relation

nj xP nxI A AA Anx

L Lx

x x

provides the means to compute nj x from nx One solves the linear

L

system

AA y Anx

x

and then computes

nj xA y nx

L

x

Expressed in terms of an arbitrary inner pro duct h i the equations b ecome

AH x A y AH x g x and nj xH x A y g x

L

precisely the equations we arrived at in by dierent reasoning

SelfConcordant Functionals

Let B y r denote the op en ball of radius r centered at y where radius is

x

measured wrt k k Let B y r denote the closed ball

x x

CHAPTER BASIC INTERIORPOINT METHOD THEORY

A functional f is said to b e strongly nondegenerate selfconcordant

if for all x D we have B x D and if whenever

f x f

y B x wehave

x

kv k

y

for all v ky xk

x

kv k ky xk

x x

Let SC denote the family of functionals thus dened

Selfconcordant functionals play a central role in the general theory of

ipms as was rst made evident in the pioneering work of Nesterov and

Nemirovskii Although our denition of strongly nondegenerate self

concordant functionals is on the surface quite dierent from the original

denition given in it is in fact equivalent except in assuming f C as

opp osed to the eversoslightly stronger assumption in that f is thrice

dierentiable The equivalence is shown in x where it is also shown that

our denition can b e relaxed in a few ways without altering the family

of functionals sodened for example the leftmost inequality involving

kv k kv k is redundant

y x

The term strongly refers to the requirement B x D The

x f

term nondegenerate refers to the Hessians b eing p d thereby giving the

lo cal inner pro ducts The denition of selfconcordant functionals not

necessarily strongly nondegenerate is a natural relaxation of the ab ove

denition only requiring the Hessians to b e psd However it is the strongly

nondegenerate selfconcordant functionals that play the central role in ipm

theory and so the relaxation of the denition is best p ostp oned until the

reader has in mind a general outline of the theory

As the parentheses in our denition indicate for brevity we typically

refer to strongly nondegenerate selfconcordant functionals simply as self

concordant functionals

If a linear functional is added to a selfconcordant functional x

hc xi f x the resulting functional is selfconcordant b ecause the Hes

sians are unaected Similarly if one restricts a selfconcordant functional

f to a subspace L or to a translation of the subspace one obtains a

selfconcordant functional a simple consequence of the lo cal norms for f j

L

b eing the restrictions of the lo cal norms for f

The primordial selfconcordant barrier functional is the logarithmic

n

barrier function for the nonnegativeorthant having domain D

f

P

ie the strictly p ositive orthant It is dened by f x ln x

j

j

Since the co ordinates of vectors playsuch a prominent role in the denition

of this functional to prove selfconcordancy it is natural to use the dot

pro duct as a reference inner pro duct Expressing the Hessian H x as a

SELFCONCORDANT FUNCTIONALS

th

matrix one sees it is diagonal with j diagonal entry x Consequently

j

y B x is equivalentto

x

X

y x

j j

x

j

j

an inequality which is easily seen to imply y D as required by the

f

denition of selfconcordancy Moreover assuming y B x and v is an

x

arbitrary vector wehave

X

v

j

kv k

y

y

j

j

X

v x

j j

x y

j j

j

x

j

kv k max

x

j

y

j

Since

x y x

j j j

y x

j j

k

X

x y

j j

x

j

k

X

k

ky xk

x

k

ky xk

x

the rightmost inequalityon kv k kv k in the denition of selfconcordancy

y x

is proven The leftmost inequalityisproven similarly

For an LP

min c x

st Ax b

x

the most imp ortant selfconcordant functionals are those of the form

c x f j x

L

where is a xed constant f is the logarithmic barrier function for

the nonnegative orthant and L fx Ax bg

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Another imp ortant selfconcordant functional is the logarithmic barrier

nn

function for the cone of p d matrices in S This is the functional dened

nn

by f X ln det X having domain S Toprove selfconcordancy

it is natural to rely on the trace pro duct for whichweknow H X X

nn

X X X For arbitrary Y S keeping in mind that the trace of

a matrix dep ends only on the eigenvalues wehave

kY X k traceY X X Y X X

X

traceX Y X X Y X X

X

j

j

where are the eigenvalues of X YX Assuming kY

n

X k all of the values are thus p ositive and hence X YX

X j

is p d which is easily seen to b e equivalenttoY being pd Consequently

if kY X k then Y D as required by the denition of self

X f

concordancy

Assuming Y B X and letting Q be an orthogonal matrix for

X

which

T

Q X YX Q

nn

is diagonal for arbitrary V S wehave

kV k traceVY VY

Y

traceX VY VY X

traceX VX X Y X

T

traceX VX Q Q

T

traceQ X VX Q

T

traceQ X VX Q



traceX VX



traceVX VX



kV k

X



Since



X

k

k

SELFCONCORDANT FUNCTIONALS

n

X X

k

i

i

k

X

k

traceX X Y X

k

X

k

traceX Y X X Y X

k

X

k

kY X k

X

k

kY X k

X

the rightmost inequalityonkV k kV k in the denition of selfconcordancy

Y X

is proven The leftmost inequalityisproven similarly

For an SDP

min C X

st AX b

X

nn m

where A S is a linear op erator the most imp ortant self

concordant functionals are those of the form

C X f j X

L

where is a xed constant f is the logarithmic barrier function for

the cone of p d matrices and L fX AX bg

LP can be viewed as a sp ecial case of SDP by identifying in the ob

n nn

vious manner with the subspace in S consisting of diagonal ma

trices Then the logarithmic barrier function for the pd cone restricts to

the logarithmic barrier function for the nonnegative orthant Thus we

were redundant in giving a pro of that the logarithmic barrier function for

the nonnegative orthant is indeed a selfconcordant functional The in

sight gained from the simplicity of the nonnegative orthant justies the

redundancy In x weshow that the selfconjugacy of each of these two

logarithmic barrier functions is a simple consequence of the original deni

tion of selfconcordancy due to Nesterov and Nemirovskii The original

denition is not particularly wellsuited for a transparent development of

the theory but it is wellsuited for establishing selfconjugacy of logarithmic barrier functions

CHAPTER BASIC INTERIORPOINT METHOD THEORY

To apply our denition of selfconcordancy in developing the theoryit

is useful to rephrase it in terms of Hessians Sp ecically since

kv k

hv H y v i

x x

y

sup sup kH y k

x x

kv k kv k

v v

x x

and similarly

kv k

y

kH y k inf

x x

v

kv k

x

the pair of inequalities in the denition is equivalent to the pair

kH y k kH y k

x x x x

ky xk

x

In turn since

kI H y k maxfkH y k kH y k g

x x x x x x

and similarly

kI H y k maxfkH y k kH y k g

x x x x x x

the inequalities imply

kI H y k kI H y k

x x x x

ky xk

x

Recalling that H x I it is evident from that to assume self

x

concordancy is essentially to assume Lipschitz continuity of the Hessians

wrt the op erator norms induced by the lo cal norms

An aside for those familar with third dierentials Dividing the quan

tities on the left and right of by ky xk and taking the limits

x

supremum as y tends to x suggests when f is thrice dierentiable that self

concordancy implies the lo cal norm of the third dierential to b e b ounded

by In fact the converse is also true that is a bound of on the

lo cal norm of the third dierential for all x D together with the require

f

ment that the lo cal unit balls b e contained in the functional domain imply

selfconcordancyaswe shall see in x Indeed the original denition of

selfconcordancy in is phrased as a b ound on the third dierential

The following prop osition and theorem display the simplifying role the

conditions of selfconcordancy play in analysis The prop osition b ounds the

error of the quadratic approximation and the theorem guarantees progress

made by Newtons metho d Note the elegance of the b ounds when com pared to the more general Corollary and Theorem

SELFCONCORDANT FUNCTIONALS

Prop osition If f SC x D and y B x then

f x

ky xk

x

jf y q y j

x

ky xk

x

Pro of Using Corollary wehave

Z Z

t

kI H x sy xk ds dt jf y q y j ky xk

x x x

x

Z Z

t

ky xk ds dt

x

sky xk

x

Z

t

dt ky xk

x

tky xk

x

Z

ky xk

x

t dt

ky xk

x

ky xk

x

ky xk

x

Theorem Assume f SC and x D If z minimizes f and

f

z B x then

x

kx z k

x

kx z k

x

kx z k

x

Pro of Using Theorem simply observe

Z

kx z k kx z k kI H x tz xk dt

x x x x

Z

dt kx z k

x

tky xk

x

kx z k

x

kx z k

x

The use of the lo cal norm k k in Theorem to measure the dif

x

ference x z makes for a particularly simple pro of but do es not result

in a theorem immediately ready for induction At x the containment

x is needed to apply the theorem ie a b ound on kx z k z B

x x

 

rather than a bound on kx z k Given that the denition of self

x

concordancy restricts the norms to vary nicely it is no surprise that the

CHAPTER BASIC INTERIORPOINT METHOD THEORY

theorem can easily be transformed into a statement ready for induction

For example substituting into the theorem the inequalities

kx z k kx z k kx z k

z z x

and

kx z k

z

kx z k kx z k kx z k

z z x

kx z k

z

as are immediate from the denition of selfconcordancy when x B z

z

then we nd as a corollary to the theorem that if kx z k

z

kx z k

z

kx z k

z

kx z k kx z k

z z

then Consequently if one assumes kx z k

z

sox D kx z k kx z k

f z z

and inductively

i

kx z k kx z k

i z z

where x x x is the sequence generated by Newtons metho d The

b ound makes apparent the rapid convergence of Newtons metho d

The most elegant pro ofs of key results in the ipm theory are obtained by

phrasing the analysis in terms of knxk rather than in terms of kz xk

x x

as was done in Theorem In this regard the following theorem is

esp ecially useful

Theorem Assume f SC If knxk then

x

knxk

x

knx k

x



knxk

x

Pro of Assuming knxk wehave

x

kH x g x k knx k

x x

x x

 

hg x H x g x i

x x x x

kH x k kg x k

x x x

x

Since by wehave

kH x k

x x

knxk x

SELFCONCORDANT FUNCTIONALS

wethus have

kg x k

x x

knx k

x



knxk

x

The pro of is completed by observing

Z

kg x k k H x tnx I nx dtk

x x x x

Z

knxk kI H x tnxk dt

x x x

Z

dt knxk

x

tknxk

x

knxk

x

knxk

x

A rather unsatisfying but unavoidable asp ect of general convergence

results for Newtons metho d is an assumption of x b eing suciently close

to a minimizer z where suciently close dep ends explicitly on z For

general functionals it is imp ossible to verify that x is indeed suciently

close to z without knowing z For selfconcordant functionals we know

that the explicit dep endence on z of what constitutes suciently close

can take a particularly simple form eg weknow x is suciently close to

alb eit a form which app ears still to require knowing z if kx z k

z

z The next prop osition provides means to verify proximity to a minimizer

without knowing the minimizer

Prop osition Assume f SC If knxk then f has a mini

x

mizer z and

knxk

x

kz x k

x

knxk

x

So kz xk knxk knxk knxk

x x x

x

Pro of We rst provea weaker result namelyif knxk then f has

x

a minimizer z and kx z k knxk

x x

Prop osition implies that for all y B x

x

jf y q y j ky xk

x

x

and hence

ky xk f y f x knxk ky xk

x x

x

CHAPTER BASIC INTERIORPOINT METHOD THEORY

It follows that if knxk and ky xk knxk then f y f x

x x x

However it is easily proven that whenever a continuous convex functional

f satises f y f x for all y on the b oundary of a compact convex set

S and some x in the interior of S thenf has a minimizer in S Thus if

knxk f has a minimizer z and kx z k knxk

x x x

Now assume knxk Theorem implies

x

knxk

x

knx k

x



knxk

x

Applying the conclusion of the preceding paragraph to x rather than to

xwe nd that f has a minimizer z and kz x k knx k Thus

x x

 

kz x k

x



kz x k

x

knxk

x

knx k

x



knxk

x

knxk

x

knxk

x

We noted that adding a linear functional to a selfconcordant functional

yields a selfconcordant functional The next two prop ositions demonstrate

other relevantways for constructing selfconcordant functionals

Prop osition The set SC is closed under addition that is if f and

f are selfconcordant functionals satisfying D D then f f

f f



D D is a selfconcordant functional

f f



Pro of Let f f f Assume x D For all v

f

hv H xv i hv H xv i hv H xv i

that is

kv k kv k kv k

x x x

Hence

B x B x B x D D D

x x x f f f



as required by the denition of selfconcordancy

Note that whenever a b c d are p ositivenumbers

a b ab a b

minf g maxf g

c d cd c d

SELFCONCORDANT FUNCTIONALS

Consequentlyif y D

f

kv k kv k

kv k

yi y yi

min max

i i

kv k kv k kv k

x

xi xi

Thus if ky xk andhenceky xk

x xi

kv k

y

max

i

kv k ky xk ky xk

x xi x

establishing the upp er b ound on kv k kv k in the dention of selfconcordancy

y x

One establishes the lower b ound similarly

m n m

Prop osition If f SC D and A is an in

f

jective linear operator then x f Ax b is a selfconcordant functional

assuming the domain fx Ax b D g is nonempty

f

Pro of Denote the functional x f Ax bby f Assuming x D

f

one easily veries from the identity H xA H Ax bA that H xis

p d and that kv k kAv k for all v In particular

Axb

x

AB x b B Ax b D

Axb f

x

and thus

x fy Ay b D g D B

f f

x

as required by the denition of selfconcordancy

If ky xk hence kAy xk and v then

x

x

kv k

kAv k

y Ay b

kv k kAv k

Axb

x

kAy xk

Axb

ky xk

x

establishing the upp er b ound on kv k kv k in the dention of selfconcordancy

y x

One establishes the lower b ound similarly

Applying Prop osition with the logarithmic barrier function for

m

the nonnegative orthantin we obtain the selfconcordancy of the func

tional

X

x lna x b

i i i

CHAPTER BASIC INTERIORPOINT METHOD THEORY

whose domain consists of the p oints x satisfying the strict linear inequality

constraints a xb This selfconcordant functional is imp ortant for LPs

i i

with constraints written in the form Ax b It to o is referred to as a

logarithmic barrier function

Toprovide the reader with another logarithmic barrier functional with

which to apply the ab ove prop ositions wemention that x ln kxk

is a selfconcordant functional with domain the op en unit ball Verication

of selfconcordancy is made in x Given an ellipsoid fx kAxk rgit

then follows from Prop osition that

x lnr kAxk

is a selfconcordant functional whose domain is the ellipsoid yet another

logarithmic barrier function For an intersection of ellipsoids one simply

adds the functionals for the individual ellipsoids as justied by Prop osi

tion

Nesterov and Nemirovskii showed that each op en convex set contain

ing no lines is the domain of a strongly nondegenerate selfconcordant

functional Wegive a somewhat dierent pro of of this in x Unfortu

nately the result is only of theoretical interest To rely on selfconcordant

functionals in devising ipms one must be able to readily compute their

gradients and Hessians For the selfconcordant functionals proven to ex

ist one cannot saymuch more than that the gradients and Hessians exist

By contrast the imp ortance of the various logarithmic barrier functions

we have describ ed lies largely in the ease with which their gradients and

Hessians can b e computed

Although the values of continuous convex functionals with b ounded do

mains ie b ounded wrt any reference norm are always b ounded from

b elow they need not have minimizers when the domain is op en Such is

not the case for selfconcordant functionals

Prop osition If f SC and the values of f areboundedfrom below

then f has a minimizer In particular if D is bounded then f has a

f

minimizer

Pro of Assume x satises

inf f y f x

y

Letting y x nx Prop osition implies

knxk

x

knxk f y f x

x

SELFCONCORDANT FUNCTIONALS

and thus bychoice of x

knxk

x

that is

knxk

x

Prop osition then implies f to have a minimizer

The conclusion of the next prop osition is trivially veried for imp ortant

selfconcordant functionals like those obtained by adding linear functionals

to logarithmic barrier functions Whereas the denition of selfconcordancy

plays a useful role in b oth simplifying and unifying the analysis of New

tons metho d for many functionals imp ortant to ipms it certainly do es not

simplify the pro of of the prop erty established in the next prop osition for

those same functionals Nonetheless for the theory it is imp ortant that the

prop erty is p ossessed by all selfconcordant functionals

Prop osition Assume f SC and x D the of D

f f

If the sequence fx gD converges to x then lim inf f x

i f i i

Pro of Adding f to a functional x lnR kxk where R kxk

one obtains a selfconcordant functional f for which D is b ounded and

f

for which lim inf f x i lim inf f x Consequentlywemay

i i i i

assume D is b ounded

f

Assuming D is b ounded we construct from fx g a sequence fy gD

f i i f

whose limit p oints lie in D and for which

f

f y f x

i i

Applying the same construction to the sequence fy g and so on we will

i

thus conclude that if lim inf f x then f assumes arbitrarily small

i i

values contradicting the lower b oundedness of continuous convex function

als having b ounded domains

In particular for suciently Shortly we proveliminf knx k

i i x

i

large i y x nx iswelldened and from Prop osition

i i i

knx k

i x

i

f x f y f x

i i i

Moreover all limit p oints of fy g lie in D For otherwise passing to a

i f

subsequence of fy g if necessary there exists suchthatB y D

i i f

for all i where the ball is with resp ect to the reference norm However since

CHAPTER BASIC INTERIORPOINT METHOD THEORY

y and x y x lieinD b ecause ky x k it then follows

i i i i f i i x

i

from convexityof D that B x D contradicting x x D

f i f i f

Since D is b ounded Prop o Finally we show lim inf knx k

f i i x

i

sition shows f has a minimizer z Since B z D and x

z f i

x D wehave lim inf kx z k Hence from the denition of self

f i i z

concordancy lim inf kx z k Because f has a most one minimizer

i i x

i

Prop osition then implies lim inf knx k concluding the pro of

i i x

i

We close this section with a technical prop osition to be called up on

later

If g xv is a vector suciently near g x there exists x u close

to x such that g x ug xv a consequence of H x b eing p d and

hence invertible It is useful to quantify near and close when the inner

pro duct is the lo cal inner pro duct that is when H x I and hence

x

u v

Prop osition Assume f SC and x D If kv k r where

f x

r

there exists u B v such that g x ug xv r

x  x x

r

Pro of Consider the selfconcordant functional

y hg xv yi f y

x x

a functional whose lo cal inner pro ducts agree with those of f Note that

a point z minimizes the functional i g z g xv Under the

x x

assumption of the prop osition wethus wish to show z exists and u z x

r

satises u B v

x 

r

Since at x the Newton step for the functional is v the as

sumption kv k r allows us to apply Prop osition concluding that a

x

r

minimizer z do es indeed exist and kx x v k

 x

r

Barrier Functionals

A functional f is said to be a strongly nondegenerate self

concordant barrier functional if f SC and

sup kg xk

f x

x

xD f

BARRIER FUNCTIONALS

Let SCB denote the family of functionals thus dened Wetypically refer

to elements of SCB as barrier functionals

The denition of barrier functionals is phrased in terms of kg xk

x x

rather than in terms of the identical quantity knxk b ecause the im

x

p ortance of barrier functionals for ipms lies not in applying Newtons

metho d to them directly but rather in applying Newtons metho d to self

concordant functionals built from them As mentioned b efore for an LP

min c x

st Ax b

x

the most imp ortant selfconcordant functionals are those of the form

c x f j x

L

where is a xed constant f is the logarithmic barrier function for

the nonnegative orthant and L fx Ax bg

When they dened barrier functionals Nesterov and Nemirovskii ref

ered to as the parameter of the barrier f Unfortunately this can b e

f

confused with the phrase barrier parameter which predates and refers

to the constant in Consequentlywe prefer to call the com

f

plexity value of f esp ecially b ecause it is the quantity that most often

represents f in the complexity analysis of ipms relying on f

If one restricts a barrier functional f to a subspace L or a translation

of a subspace one obtains a barrier functional simply b ecause the lo cal

norms for f j are the restrictions of the lo cal norms for f and

L

p

kg j xk kP g xk kg xk

f Lx x Lx x x x x

Clearly

f

f j

L

The primordial barrier functional is the primordial selfconcordantfunc

tional ie the logarithmic barrier function for the nonnegativeorthant

P

f x ln x Relying on the dot pro duct so that g x is the vector

j

j

th th

with j entry x and H x is the diagonal matrix with j diagonal entry

j

wehave x

j

kg xk hg xHx g xi n

x

x

Thus n

f

Now let f denote the logarithmic barrier function for the cone of pd

nn

matrices in S f X ln detX Relying on the trace pro duct we

CHAPTER BASIC INTERIORPOINT METHOD THEORY

nn

have for all X S

kg X k hg X HX g X i

X

X

traceX XX X

traceI

n

Thus n

f

Finally let f denote the logarithmic barrier function for the unit ball in

n

f x ln kxk where k k h i for some inner pro duct

It is not dicult to verify that for x intheunitball

hx xi

x H xx x x g x

kxk kxk kxk

and hence

kxk

x H x g x

kxk

Consequently

kxk

kg xk hg xHx g xi

x

x

kxk

It readily follows that showing the complexity value need not

f

dep end on the dimension n

In x we noted that if a linear functional is added to a selfconcordant

functional the resulting functional is selfconcordant b ecause the Hessians

are unchanged the denition of selfconcordancy dep ends on the Hessians

alone By contrast adding a linear functional to a barrier functional need

not result in a barrier functional For example consider the the univariate

barrier functional x ln x and the functional x x ln x

The set SCBlike SC is closed under addition

Prop osition If f f SCB and D D then f f f

f f



SCB where D D D and

f f f f f f

 

Pro of Assume x D Let the reference inner pro duct h i b e the lo cal

f

inner pro duct at x dened by f Thus I H x H xH x In

particular H x and H x commute ie H xH x H xH x

ConsequentlysodoH x and H x

For brevity let H H x and g g x for i

i i i i

BARRIER FUNCTIONALS

Toprove the inequalityinthe statement of the prop osition it suces

to show

kg g k hg H g i hg H g i

For by denition the quantityontheright is b ounded by

f f



Dening v H g for i wehave

i i

i

kg g k kg k hg g i kg k

hv H v i hH v H v i hv H v i

hv I H v i hH v H v i hv I H v i

v k v H hv v i hv v ikH

hv v i hv v i

g i g i hg H hg H

The set SCBlike SC is closed under comp osition with injective linear

maps

m n m

Prop osition If f SCB D and A is an injec

f

tive linear operator then x f Ax b is a barrier functional assuming

the domain fx Ax b D g is nonempty and its complexity value does

f

not exceed

f

n

Pro of Assume Ax b D Endow with an arbitrary reference

f

m

inner pro duct and let the reference inner pro duct on b e the lo cal inner

pro duct for f at Ax b Denoting the functional x f Ax b by f

we then have g x A g Ax b H x A H Ax bA A A and

kg Ax bk Thus

f

hg xH x g xi hg Ax bAA A A g Ax bi

kAA A A kkg Ax bk

f

The last inequality is due to the op erator being a pro jection op erator

hence the op erator has norm equal to one The prop osition follows

With regards to theory the following prop osition is p erhaps the most

useful to ol in establishing prop erties p ossessed by all barrier functionals

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Prop osition Assume f SCB If x y D then

f

hg xy xi

f

Pro of We wish to prove where is the univariate functional

f

dened by tf x ty x In doing so wemayassume

and hence byconvexityof t for all t in the domain of

Let v tx ty x Assuming t is in the domain of

hy x y xi

t

v t

t hg v ty xi

v t

v t

kg v tk

v t

v t

f

and hence for all s in the domain of

Z

s

t s

dt

t

f

Thus

s

s

t

f

that is

f

s

s

f

Consequently s is not in the domain of Since s is in the

f

domain wehave

f

The next prop osition implies that for each x in the domain of a barrier

functional the ball B x is to within a factor of the largest

x f

among all ellipsoids centered at x which are contained in the domain

Prop osition Assume f SCB If x y D satisfy hg xy xi

f

then y B x

x f

Pro of Restricting f to the line through x and y we may assume f is

univariate Viewing the line as with values increasing as one travels from

x to y the assumption hg xy xi is then equivalentto g x ie

g x is a nonnegativenumb er

BARRIER FUNCTIONALS

Let v denote the smallest nonnegativenumb er for which kg xv k

x x

Since g x wehave kv k Applying Prop osition we

x x

nd there exists u satisfying

u B v and kg x uk kg xv k

x x x x x

Note kuk

x

Prop osition implies

hg x uy x ui

f x x

hg xv y xi hg xv ui

x x x x

ky xk hg xv ui

x x x

where the last inequalitymakes use of g xv and y x b oth b eing non

x

only if v and hence only if negative However since kg xv k

x x

u wehave

hg xv ui kuk

x x x

Thus

ky xk

x f

from which the prop osition is immediate

Minimizers of barrier functionals are called analytic centers The fol

lowing Corollary gives meaning to the term center

Corollary Assume f SCB If z is the analytic center for f then

B z D B z

z f z f

Pro of Since f SC the leftmost containment is by assumption The

rightmost constainment is immediate from Prop osition since g z

Corollary suggests that if one was to cho ose a single inner pro duct

as b eing esp ecially natural for a barrier functional with b ounded domain

the lo cal inner pro duct at the analytic center would be an appropriate

choice

Corollary If f SCB then f has an analytic center i D is

f

bounded

Pro of Immediate from Prop osition and Corollary

CHAPTER BASIC INTERIORPOINT METHOD THEORY

In the complexity analysis of ipms it is desirable to have barrier func

tionals with small complexityvalues However there is a p ositive threshold

b elowwhich the complexityvalues of no barrier functionals fall Nesterov

and Nemirovskii prove that for all f SCB To understand

f

why there is indeed a lower bound assume for some f SCB

f

Prop osition then implies f has a unique minimizer z and all x D

f

satisfy kx z k However bycho osing x so that in the line L through

x

x and z the distance from x to the b oundary of D L is smaller than the

f

distance from x to z the containment B x D implies kx z k

x f z

acontradiction Hence for all f SCB

f

Likewise by Prop osition if f SCB and D is unb ounded

f

hence f has no minimizer then kg xk for al l x D

x x f

In the unb ounded case NesterovandNemirovskii provethe lower b ound

kg xk for all x D

x x f

It is worth noting that a universal lower b ound as in the preceeding

paragraph implies a lower b ound n for each barrier functional f

f

n

whose domain is the nonnegativeorthant For let e denote the vector

th

of all ones and let e denote the j unit vector Consider the univariate

j

barrier functional f obtained by restricting f to the line through e in

j

the direction e Let g denote the gradient of f wrt h i and let

j e e

g denote the gradient of f wrt the restricted inner pro duct Since

je j

D is unb ounded and hence f do es not have an analytic center it is

f j

j

readily proven without making use of the particular inner pro duct that

hg ee i Since g and e are colinear b ecause D is one

je j e je j f

j

dimensional it follows that

hg ee i kg ek ke k

je j e je e j e

Noting ke k becausee e D wethus have

j e j f

hg ee i kg ek

je j e je e

Hence

X

n hg e e i

je j e

j

X

hg e e i

e j e

j

hg e ei

e e

f

the last inequalityby Prop osition

BARRIER FUNCTIONALS

In light of the two preceding paragraphs we see that with regards to

the complexityvalue the logarithmic barrier function for the nonnegative

n n

orthant is the optimal barrier functional having domain Like

n nn

wise viewing as a subspace of S the logarithmic barrier function

for the cone of pd matrices is the optimal barrier functional having that

cone as its domain

p

For arbitrary inner pro ducts the b ounds kg xk imply noth

x x f

ing ab out the quantities kg xk However the bounds do imply bounds

on the quantities kg xk for all y D This is the sub ject of the next

y y f

prop osition First a denition

For x in an arbitrary b ounded convex set D a natural way of measur

ing the relative nearness of x to the b oundary of D in a manner that is

indep endent of a particular norm is the quantity known as the symmetry

of Dabout x denoted symx D This quantity is dened in terms of the

set Lx D consisting of all lines through x whichintersect D in an interval

of p ositive length If D is lower dimensional most lines through x will not

be in Lx D If x is an endp oint of L D for some L Lx D de

ne symx D Otherwise for each L Lx D letting r L denote

the ratio of the length of the smaller to the larger of the twointervals in

L D nfxg dene

symx D inf r L

LLxD

Clearly if D is an ellipsoid centered at x then symx D p erfect

symmetry Corollary implies that if z is the analytic center for a

barrier functional f then symz D

f f

Prop osition Assume f SCB If x D then for al l y D

f f

kg xk

f y y

symx D

f

Pro of For brevity let s symx D Assuming x y D note that

f f

w y sx y D the of D Since B y D and

f f y f

D is convex wethus have

f

s

w B y D

y f

s s

s

that is B x D Consequently

y f

s

kg xk max hg xv xi

y y y y

v B x y

CHAPTER BASIC INTERIORPOINT METHOD THEORY

s

max hg xv xi

y y

s

s

v B x

y

s

s

hg xv xi max

y y

s

v D

f

s

f

s

the last inequalityby Prop osition

At the end of x we saw that f x if f is a selfconcordant

i

functional and fx g converges to a p oint in the b oundary of D To close

i f

this section wepresent a prop osition that indicates the rate at which f x

i

go es to is slow if f is a barrier functional

Prop osition Assume f SCB and x D If y D then for al l

f f

t

f y tx y f x ln t

f

s

Pro of For s let xs y e x y and consider the univariate

functional sf xs Relying on the chain rule observe that

Z

s

s t dt

Z

s

s

f x hg xt e x y i dt

Z

s

hg xty xti dt f x

Z

s

f x dt

f

f xs

f

the inequality due to Prop osition Hence for t

f y tx y ln t f x lnt

f

Primal Algorithms

The imp ortance of a barrier functional f lies not in itself but in that it can

b e used to eciently solve optimization problems of the form

min hc xi

st x D f

PRIMAL ALGORITHMS

where D denotes the closure of D Among many other problems linear

f f

programs are of this form Sp ecically restricting the logarithmic barrier

function for the nonnegative orthant to the space fx Ax bgwe obtain

a barrier functional f for which

D fx Ax b x g

f

Similarly for SDP

Let val denote the optimal value of the optimization problem

Pathfollowing ipms solve byfollowing the central path the path

consisting of the minimizers z of the selfconcordant functionals

f x hc xi f x

for It is readily proven when D is b ounded that the central path

f

b egins at the analytic center z of f and consists of the minimizers of the

barrier functionals f j obtained by restricting f to the spaces

Lv

Lv fx hc xi v g

for val v hc z i Similarly when D is unb ounded the central path

f

consists of the minimizers of the barrier functionals f j for val v

Lv

We note the lo cal inner pro ducts for the selfconcordant functionals f

are identical with those for f We observe for each y D the optimization

f

problem is equivalentto

min hc xi

y y

st x D

f

where c H y c In other words wrt h i the ob jectivevector is

y y

c

y

The desirabilityoffollowing the central path is made evidentbyconsid

ering the ob jectivevalues hc z i Since g z c Prop osition

implies for all y D

f

hg z y z i hc z ihc y i

f

and hence

hc z ival

f

Moreover the p oint z iswellcentered in the sense that all feasible p oints

y with ob jectivevalue at least hc z i satisfy y B z a

f

z consequence of Prop osition

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Pathfollowing ipms follow the central path approximately generating

p oints near the central path where near is measured by lo cal norms If a

p oint y is computed for which ky z k is small then relativelythe

z

ob jectivevalue at y will not b e muchworse than at z and hence

implies a b ound on hc y i In fact if x is an arbitrary point in D and y

f

is a p oint for which ky xk is small then relatively the ob jectivevalue

x

at y will not b e muchworse than at x To make this precise rst observe

B x D implies x tc D if tkc k Since the ob jective

x f x f x x

value at x tc is hc xitkc k wethus have

x x

x

kc k hc xival

x x

n

Hence for all y

hc y xi hc y ival

x x

hc xival hc xival

kc k ky xk

x x x

hc xival

ky xk

x

In particular using

hc y i ky x k val

f

x

Before discussing algorithms we record a piece of notation Let n x

denote the Newton step for f at x that is

n x H x c g x

c g x

x x

The Barrier Metho d

Shortstep ipms follow the central path most closely generating se

quences of points al l of which are near the path We now present and

analyze an elementary shortstep ipm the barrier metho d

Assume initiallywe know andx such that x is near z

that is x is near the minimizer for the functional f In the barrier



metho d one increases by a slight amounttoavalue then applies

Newtons metho d to approximate z thus obtaining a p oint x Assum

ing only one iteration of Newtons metho d is applied

x x n x

PRIMAL ALGORITHMS

Continuing this pro cedure indenitely ie increasing applying Newtons

metho d increasing we have the algorithm known as the barrier

metho d

One would like to b e much larger than However if is to o

large relativeto Newtons metho d will fail to approximate z in fact

it can happ en that x D bringing the algorithm to a halt The key

f

ingredient in a complexity analysis of the barrier metho d is proving that

can b e larger than by a reasonable amount without the algorithm losing

sight of the central path

In analyzing the barrier metho d it is most natural to rely on the length

of Newton steps to measure proximity to the central path We will assume

x is near z in the sense that kn x k is small The Newton step

x

 

x x The relevance of n x not n taken by the algorithm is n

 

x is due to the following easily proven relation for n

n x g x n x

x



 

In particular

p

kn xk kn j xk j

f x x



 

Besides the b ound the other crucial ingredient in the analysis is

a b ound on kn x k in terms of kn x k Theorem provides

x x



an appropriate b ound If kn x k then

x



kn x k

x



kn x k

x

kn x k

x



Supp ose we determine values andsuchthatifwe dene

p

f

then and

By requiring kn x k and we then nd from

x

 



that

x k kn

x



and thus from

kn x k

x

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Consequently x will b e close to the central path like x Continuing by



x will b e close to the central path to o And so on requiring

Hence we will have determined a value such that if one has an initial

p oint appropriately close to the central path and if one never increases the

barrier parameter from to more than the barrier metho d will follow

the central path always generating p oints close to it

The reader can verify for example that

p p

and

maxf g

f f

satisfy the relations Hence we havea safe value for Relying on it

the algorithm is guaranteed to stayontrack It is a remarkable asp ect of

ipms that safe values for quantities like dep end only on the complexity

value of the underlying barrier functional f Concerning LPs

f

min hc xi

st Ax b

x

if one relies on the logarithmic barrier function for the strictly nonnegative

n

p

is safe regardless of A b and c orthant then

n

Assuming at each iteration of the barrier metho d the parameter is

p

never increased by more than wenowknow that for each x

f

generated by the algorithm there corresp onds z which x approximates

in that kn xk hence by Prop ositon kx z k thus by

x x

the dention of selfconcordancy kx z k All p oints generated

z

of the central path by the algorithm lie within distance

Assuming that at each iteration of the barrier metho d the parameter

p

is increased by exactly the factor thenumb er of iterations

f

required to increase the parameter from an initial value to some value

is

ln

p

i

ln

f

p

ln

f

p

O log

f

where in the inequality we rely on Hence from given

f

p

f

O log

f

PRIMAL ALGORITHMS

iterations suce to pro duce x satisfying hc xival

Wehave b een assuming that an initial p oint x near the central path is

available What if instead we only know some arbitrary p oint x D

f

How might we use the barrier metho d to solve the optimization problem

eciently Wenow describ e a simple approach assuming D is b ounded

f

and hence f has an analytic center

Consider the optimization problem obtained by replacing the ob jective

vector c with g x The central path then consists of the minimizers z

of the selfconcordant functionals

f x hg x xi f x

The p oint x is on the central path for this optimization problem In fact

x z for

Let n x denote the Newton step for f at x

Rather than increasing the parameter we decrease it towards zero

following the central path to the analytic center z of f From there we

switchtofollowing the central path fz g as b efore

p

Brief Weshowed can safely b e increased byafactorof

f

consideration of the analysis shows it is also safe to decrease by a factor

p

and hence safe to decrease by that factor Thus to complete

f

our understanding of the dicultyoffollowing the path fz g and then

the path fz g it only remains to understand the pro cess of switching

paths

One way to know when it is safe to switch paths is to compute the

length of the gradients for f at the p oints x generated in following the path

one fz g Once one encounters a p oint x for which say kg xk

x x

can safely switch paths For then bycho osing kc k we nd the

x x

Newton step for f at x satises



kn xk k c g xk

x x x x



and hence by Prop osition the Newton step takes us from x to a

putting us precisely in the setting of point x for which kn x k

x

 

the earlier analysis where was determined safe

How much will have to be decreased from the initial value

b efore we compute a p oint x for which kg xk so that paths can b e

x x

switched An answer is found from the relations

kg xk kg x n xk

x x x x

xk kg x k kn

x x x

CHAPTER BASIC INTERIORPOINT METHOD THEORY

kn xk

x f

symx D

f

the last inequalityby Prop osition In particular with kn xk

x

need only satisfy

symx D

f f

in order for kg xk

x x

The requirementon sp ecied by gives geometric interpretation

to the eciency of the algorithm in following the path fz g b eginning

with the initial value If the domain D is nearly symmetric ab out

f

the initial p oint x not much time will b e required to follow the path to a

p oint where we can switchtofollowing the path fz g

We stipulated that the algorithm switch paths when it encounters x

and we stipulated that one cho ose the initial value satisfying kg xk

x x

kc k Letting

x x

V supfhc xi x D g

f

note implies

kc k V val

x x

and hence

V val

Wehavenow essentially proven the following theorem

Theorem Assume f SCB and D is bounded Assume x D

f f

apoint at which to initiate the barrier method If then within

p

f

O log

f

symx D

f

iterations of the method al l points x computedthereafter satisfy

hc xival

V val

Consider the following mo dication to the algorithm Cho ose V

hc x i Rather than relying on f rely on the barrier functional

x f x lnV hc xi

PRIMAL ALGORITHMS

a functional whose domain is

D fx hc xi V g

f

and whose complexityvalue do es not exceed In the theorem the

f

quantity V val is then replaced by the p otentially much smaller quantity

V val Of course the quantity symx D must then b e replaced by the

f

symmetry of the set ab out x

Finally we highlight an implicit assumption underlying our analysis

namely the complexity value is known The value is used to safely

f

increase the parameter What is actually required is an upp er b ound

If one relies on an upp er b ound rather than the precise complexity

f

value then in the theorem must b e replaced by

f f

Except for none of the quantities app earing in the theorem are

f

assumed to b e known or approximated The quantities app ear naturally in

the analysis of the algorithm but the algorithm itself do es not rely on the

quantities

No ipms haveproven complexity b ounds which are b etter than

even in the restricted setting of linear programming Nonetheless the bar

rier metho d is not considered to be practically ecient relative to some

other ipms esp ecially relative to primaldual metho ds discussed in Chap

ter The barrier metho d is an excellent algorithm with whichtobegin

ones understanding of ipms and it is often the p erfect choice for concise

complexity theory pro ofs but it is not one of the ipms that app ear in

widely used software

The LongStep Barrier Metho d

One of the barrier metho ds shortcomings is obvious being implicit

in the terminology shortstep algorithm Although it is always safe to

p

with each iteration that increase is increase by a factor

f

small if is large No doubt for many instances a much larger increase

f

is safe

There is a trivial manner in which to mo dify the barrier metho d in hop es

of having a more practical algorithm Rather than increase bythesafe

amount increase it by much more apply p erhaps several iterations of

Newtons metho d and check say using Prop osition if the computed

point is near the desired minimizer If not increase by a smaller amount

and try again

A more interesting and more practical mo dication of the barrier metho d

is known as the longstep barrier metho d In this version one increases

by an arbitrarily large amount but do es not take Newton steps Instead

CHAPTER BASIC INTERIORPOINT METHOD THEORY

the Newton steps are used as directions for exact line searches as wenow

describ e

Assume as b efore that we have an initial value and a point

x approximating z Cho ose larger than p erhaps signicantly

larger In search of a point x which approximates z the algorithm

will generate a nite sequence of p oints

y x y y y

K K

then let x y At each p oint y the algorithm will determine if the

K k

We p oint is close to z by say checking whether kn y k

k y

k

cho ose the sp ecic value b ecause it is the largest value for whichProposi

tion applies The p oint y will b e the rst p oint that is determined

K

to satisfy this inequality

To compute y from y the algorithm minimizes the univariate func

k k

tional

t f y tn y

k k

This is the step in the algorithm to which the phrase exact line search

alludes Line refers to the functional b eing univariate Exact refers to

an assumption that the exact minimizer is computed certainly an exager

ation but an assumption needed to keep the complexity analysis succinct

Letting t denote the exact minimizer dene

k

y y t n y

k k k k

thus ending our description of the longstep barrier metho d

The shortstep barrier metho d is conned to making slow but sure

progress The longstep metho d is more adventurous having the p oten

tial for much quicker progress

Clearly the complexity analysis of the longstep barrier metho d revolves

around determining an upp er b ound on K in terms of the ratio We

now undertake the task of determining such a b ound

We b egin by determining an upp er b ound on the dierence

f x f z

Then we show that f y f y is bounded below by a positive

k k

amount indep endent of k that is each exact line search decreases the

by at least a certain amount Consequently K Pro ofs value of f

like this showing a certain functional decreases by at least a certain

amountwitheach iteration are common in the ipm literature

PRIMAL ALGORITHMS

In proving an upp er b ound on the dierence we make use of the fact

that for any convex functional f and x y D one has

f

f x f y hg xy xi

The upp er b ound on is obtained by adding upp er b ounds for

f x f z and f z f z

Propo Assuming x is close to z in the sense that kn x k

x

 

sition implies kx z k Thus applying to the

x



functional f

hn x z x i

x



hn x z x i

x

 



hg x z x i

x x

 



p

f

 

p

f



Similarly for all y D

f

f z f y hn z y z i

z



hn z y z i

z







hg z y z i

z z

 



hg z y z i

z z

 



the nal equality b ecause z minimizes f and hence n z

 

Thus

hg z z z i

z





f



the last inequalityby Prop osition

Nowweshow f y f y is b ounded b elowby a p ositiveamount

k k

indep endentofk

If the algorithm pro ceeds from y to y it is b ecause y happ ens not

k k k

to be appropriately close to z ie it happ ens that kn y k

k y

k

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Letting t kn y k andy y tn y Prop osition then

k y k k

k

implies



y y f f

k

f y

k

Since t minimizes the functional wethus have

k

f y f y

k k

Finally

p

K

f f



It follows that if one xes a p ositive constant and always cho oses

successivevalues to satisfy thenumber of points gener

i i i i

ated by the longstep barrier metho d ie the numb er of exact line searches

in increasing the parameter from an initial value to some value is

O log

f

No b etter b ound is known for the longstep metho d Fixing say

we obtain the b ound

O log

f

This bound is worse than the analogous b ound for the shortstep

p

metho d by a factor It is one of the ironies of t he ipm literature that

f

algorithms which are more ecient in practice often have slightly worse

complexitybounds

A PredictorCorrector Metho d

The Newton step n xc g x for the barrier metho d can b e

x x

viewed as the sum of two steps one of which predicts the tangential direc

tion of the central path and the other of which corrects for the discrepancy

between the tangential direction and the actual p osition of the curving

path

The corrector step is the Newton step at x for the barrier functional

f j where v hc xi and

Lv

Lv fy hc y i v g

PRIMAL ALGORITHMS

Thus the correcting step is nj x this b eing the orthogonal pro jection

Lv

of the Newton step nx for f onto the subspace L where orthogonal

is wrt h i In the literature the corrector step is often referred to as

x

the centering direction It aims to move from x towards the p ointonthe

central path having the same ob jectivevalue as x

Since the multiples of c H x c form the orthogonal

x

of L the dierence nx nj xisamultiple of c and hence so is

x

Lv

the predictor step

n x nj xc nx nj x

x

Lv Lv

The vector c predicts the tangential direction of the central path near x

x

If x is on the central path the vector c is exactly tangential to the path

x

pointing in the direction of decreasing ob jectivevalues In the literature

c is often referred to as the anescaling direction With regards to

x

h i it is the direction in whichonewould move to decrease the ob jective

x

value most quickly

Whereas the barrier metho d combines a predictor step and a corrector

step in one step predictorcorrector metho ds separate the two typ es of

steps After a predictor step several corrector steps might b e applied In

practice predictorcorrector metho ds tend to b e substantially more ecient

than the barrier metho d but the worstcase complexity b ounds that have

b een proven for them are worse

Perhaps the most natural predictorcorrector metho d is based on mov

ing in the predictor direction a xed fraction of the distance towards the

b oundary and then recentering via exact line searches Wenow formalize

and analyze such an algorithm

Fix satisfying Assume x is near the central path Let

v hc x i The algorithm is assumed to rst compute

s supfs x sc D g

x f



and then let y x s c Thus s c is the predictor step Let

x x

 

v hc y i v s kc k

x

 x



Beginning with y the algorithm takes corrector steps moving towards

the p oint z on the central path with ob jectivevalue v by using the New

ton steps for the functional f j as directions in p erforming exact line

Lv

searches Preciselygiven y the algorithm is assumed to compute exactly

k

the minimizer t for the univariate functional

k

t f y tnj y

k k

Lv

CHAPTER BASIC INTERIORPOINT METHOD THEORY

and then let

y y t nj y

k k k k

Lv

When the rst point y is encountered for which knj y k is ap

K K y

Lv

K

propriately small the algorithm lets x y and takes a predictor step

K

from x relying on the same value as in the predictor step from x The

predictor step is followed by corrector steps and so on

Typically is chosen very nearly equal to say In the

following analysis we assume

In analyzing the predictorcorrector metho d we determine an upp er

b ound on the number K of exact line searches made in moving from x to

x and we determine a lower b ound on progress made in decreasing the

ob jectivevalue bymoving from x to x

to be the criterion For the analysis we consider knj xk

x

Lv

for claiming x to be close to the point z on the central path satisfying

is chosen so we will b e in p osition to hc z i hc xi The sp ecic value

rely on the barrier metho d analysis Toseehowit puts us in p osition to

rely on that analysis let z denote the minimizer of f j and let

Lv



denote the value for which z z We claim that knj x k

x

Lv





implies kn x k precisely the criteria we assumed x to satisfy

x

 

in the barrier metho d For if knj x k then Prop osition

x

Lv





applied to f j implies

Lv



kz x k

x



Since z z applying Theorem to f then yields kn x k

x

  

The barrier metho d moves from x to x n x where

p

It is not necessarily the case that z z wherez mini

f

mizes f j The length of the barrier metho d step is thus

Lv

kn x k k n x g x k

x x x

   

 

Consequently in one step of the barrier metho d the ob jective value de

creases byatmost kc k

x x

in the predictorcorrector metho d the predictor step is Assuming

and has length at least x in the direction c a consequence of B

x x

 

Hence in moving from x to x the progress k kc D Thus v v

x x f

 

made by the predictorcorrector metho d in decreasing the ob jectivevalue is

PRIMAL ALGORITHMS

at least as great as the progress made by the barrier metho d in taking one

step from x Of course in moving from x to x the predictorcorrector

metho d might require several exact line searches Wenow b ound the num

ber K of exact line searches

Analogous to our analysis for the longstep barrier metho d we obtain

an upp er b ound on K by dividing an upp er b ound on f y f z bya

lower b ound on the dierences f y f y

k k

The lower b ound on the dierences f y f y isproven exactly as

k k

was the lower b ound for the dierences f y f y in our analysis of

k k

as is the case the longstep barrier metho d Assuming knj y k

k y

Lv

k

if the algorithm pro ceeds to compute y one relies on Prop ositon

k

now applied to the functional f j toshow f y f y is b ounded

k k

Lv

below indep endently of k

To obtain an upp er b ound on f y f z one can use the relation

f y f z f y f x f x f z f z f z

Prop osition and the denition of y imply

f y f x ln

f

Relation applied to f j together with give

Lv



f x f z knj x k kz x k

x x

Lv

 



Finally applied to f gives

f z f z h c z z i

In all

f z f z ln

f

Combined with the constantlower b ound on the dierences f y f y

k k

wethus nd the number K of exact line searches p erformed in moving from

x to x satises

K O log

f

Having shown the progress made by the predictorcorrector metho d in

decreasing the ob jectivevalue is at least as great as the progress made by

the barrier metho d in taking one step from x we obtain complexity b ounds

for the predictorcorrector metho d which are greater than the b ounds for

the barrier metho d by a factor K that is by a factor assuming xed

f

say The b ounds are greater than the b ounds for the longstep

p

No b etter b ounds are known for the barrier metho d by a factor

f predictorcorrector metho d

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Matters of Denition

There are various equivalent ways to dene selfconcordant functionals

Our denition is geometric and simple to employ in theory but it is not

the original denition due to Nesterov and Nemirovskii In this section

we consider various equivalent denitions of selfconcordancy including the

original denition We close the section with a brief discussion of the term

strongly nondegenerate a term wehave supressed thus far

Unless otherwise stated we assume that f C D is open and convex

f

and H x is pd for al l x D

f

For ease of reference we recall our denition of selfconcordancy

A functional f is said to b e strongly nondegenerate selfconcordant

if for all x D we have B x D and if whenever

f x f

y B x wehave

x

kv k

y

for all v ky xk

x

kv k ky xk

x x

Recall SC denotes the family of functionals thus dened

An imp ortant prop erty is establishing the equivalence of various deni

tions of selfconcordancy is the transitivity of the condition

kv k

y

v

kv k ky xk

x x

Sp ecicallyifx y and z are colinear with y between x and z ifx and y

satisfy if y and z satisfy the analogous inqualities

kv k

z

v

kv k kz y k

y y

and if kz xk then

x

kv k

z

v

kv k kz xk

x x

To establish this transitivity note implies

kz y k

x

kz y k

y

ky xk

x

Substituting into gives for all v

ky xk kv k

x z

kv k ky xk kz y k

y x x

ky xk

x

kz xk x

MATTERS OF DEFINITION

the equality relying on the colinearity of x y and z Note is

immediate from and Hence the transitivity

Our rst mo dication to the denition of selfconcordancy is to show

the redundancy of the leftmost inequality in the denition

Prop osition Assume f is such that for al l x D we have B x

f x

D and is such that whenever y B x we have

f x

kv k

y

for al l v

kv k ky xk

x x

Then

kv k

y

ky xk for al l v

x

kv k

x

Pro of Since

kv k

x

kH y k sup

x x

kv k

v

y

it suces to show for all x y and

ky xk kH y k

x x x

ky xk

x

Towards establishing the implication note that from

wehave whenever p oints y and z satisfy kz y k

y

kz y k

y

kz y k kz y k

z y

kz y k

y

Letting denote the minimum eigenvalue of H z also note that since

min y

H z is selfadjoint wrt b oth h i and h i wehave

y y z

kH z k kH z k kH y k

y y min y z z z

Recalling

kv k

y

kH y k sup

z z

kv k

v

z

wethus haveby and

kH z k

y y

ky z k ky z k

z z

CHAPTER BASIC INTERIORPOINT METHOD THEORY

In other words

kz y k kH z k

y y y

kz y k

y

replaces whichwould givethe desired implication except that

In light of to prove it suces to establish that for all y

x satisfying

ky xk and kH y k

x x x

ky xk

x

if wedene z y ty x where t is chosen so that

kz y k ky xk

x x

then

kH z k

x x

kz xk

z

Assuming x y and z satisfy the assumptions just stated using

wehave

kz y k

x

kz y k

y

ky xk

x

ky xk

x

ky xk

x

Hence by

kH z k

y y

kz y k

y

Since by

kz y k kz y k

x x

kz y k

y

ky xk ky xk

x x

wethus have

ky xk

x

kH z k

y y

kz y k ky xk

x x

ky xk

x

kz xk x

MATTERS OF DEFINITION

the equality relying on the colinearityof x y and z That is for all v

kv k ky xk

y x

kv k kz xk

z x

Since

kv k

x

kv k ky xk

y x

it follows that

kv k

x

kv k kz xk

z x

In other words

kH z k

x x

kz xk

z

completing the pro of of the prop osition

The following theorem provides various equivalent denitions of self

concordant functionals

Theorem The fol lowing conditions on a functional f are equiva

lent

a For al l x y D if ky xk then

f x

kv k

y

for al l v

kv k ky xk

x x

b For al l x D and for al l y in some open neighborhood of x

f

kv k

y

for al l v

kv k ky xk

x x

c For al l x D

f

kI H y k

x x

lim sup

ky xk

y x

x

Moreover if f satises any one and hence al l of the above conditions

as wel l as any one condition from the fol lowing list then f satises al l

conditions from the fol lowing list

a For al l x D we have B x D

f x f

CHAPTER BASIC INTERIORPOINT METHOD THEORY

b There exists r such that for al l x D we have B x r

f x

D

f

c If a sequence fx g converges to a point in the boundary D then

k f

f x

k

Hence since SC consists precisely of those functionals satisfying conditions

a and a by choosing one condition from the rst set and one from the

second the set SC can be dened as the set of functionals satisfying the

two chosen conditions

Pro of Toprove the theorem we rst establish the equivalence of condi

tions a b and c We then prove conditions a and b together imply a

as do conditions a and c Trivially a implies b To conclude the pro of

it then suces to recall that by Prop osition conditions a and a

together imply c

Now to establish the equivalence of a b and c Trivially a implies

b Next note

hv I H y v i

x x

kI H y k lim sup

x x

kv k

v

x

kv k

y

lim sup

kv k

v

x

Consequently condition b implies for y near x

kI H y k

x x

ky xk

x

ky xk ky xk

x

x

ky xk

x

Thus b implies c

To conclude the pro of of the equivalence of a b and c we assume

c holds but a do es not then obtain a contradiction

Since

kv k

y

kH y k sup

x x

kv k

v

x

condition a not holding implies there exist x y and such that

and ky xk

x

kH y k

x x

ky xk x

MATTERS OF DEFINITION

Considering points on the line segment between x and y and relying on

continuity of the Hessian it then readily follows that there exists y p ossibly

y x satisfying ky xk

x

kH y k

x x

ky xk

x

and

kH z k

x x

kz xk

x

for all z y ty xwheret is suciently small

Condition c implies for z near y

kH z k

y y

kz y k

y

That is for all v

kv k

z

kv k kz y k

y y

Likewise from

kv k

y

kv k ky xk

x x

In particular

kz y k

x

kz y k

y

ky xk

x

Substituting into and relying on the colinearityof x y and z we

have

kv k ky xk

z x

kv k kz xk

y x

From and for all v

kv k

z

kv k kz xk

x x

that is

kH z k

x x

kz xk

x

contradicting Wehavethus proven the equivalence of conditions abandc

CHAPTER BASIC INTERIORPOINT METHOD THEORY

Nowweprove conditions a and b together imply a Let r

b e as in b ie B x r D for all x D Assuming y D satises

x f f f

ky xk and letting

x

t r ky xk z y ty x

x

weshow a and b together imply z D Condition a readily follows

f

By condition a

ky xk

x

ky xk

y

ky xk

x

from whichitfollows that z B y r Hence bybwehave z D as

y f

desired

To conclude the pro of of the theorem it only remains to prove that

conditions a and c together imply a

Assuming y D satises ky xk condition a implies

f x

Z Z

t

f y f xhg xy xi hy x H x sy xy xi ds dt

x x x x

Z Z

t

f xkg xk kH x sy xk ds dt

x x x x

f xkg xk

x x

ky xk

x

In particular f y is b ounded awayfrom By condition c we conclude

B x D

x f

We turn to the original denition of selfconcordancy due to Nesterov

and Nemirovskii First some motivation

We know that if one restricts a selfconcordant functional f to sub

spaces or translates thereof one obtains selfconcordant functionals In

n

particular if f is restricted to a line t x td where x d then

tf x td

is a univariate selfconcordant functional Since for wehave

p

kv k t jv j

t

the prop erty

kv k

s

kv k ks tk

t t

MATTERS OF DEFINITION

is identical to

p

s

p p

t t js tj

Squaring b oth sides then subtracting from b oth sides and nally multi

plying b oth sides by tjs tjwend

t t js tj s t

p

js tj

t js tj

If f and hence is thricedierentiable this implies

t t

This result has a converse The converse given by the following theorem

coincides with the original denition of selfconcordancy due to Nesterov

and Nemirovskii

Theorem Assume f C and assume each of the univariate func

tionals obtainedbyrestricting f to lines intersecting D satisfy

f

for al l t in their domains Furthermore assume that if a sequence fx g

k

convergestoapoint in the boundary D then f x Then f SC

f k

Pro of The pro of assumes the reader to b e familar with certain prop erties

of dierentials

Toprove the theorem it suces to prove f satises condition c and c

of Theorem Of course c is satised by assumption

Assuming the third dierential D x of f at x is written in terms

of a basis which is orthonormal wrt h i The inequality is

x

equivalent to requiring

jD xu u uj whenever kuk

x

On the other hand condition c is equivalent to requiring

jD xu v v j whenever kuk kv k

x x

However for any C functional f and for any inner pro duct norm k k

h i

maxfjD xu v w j kuk kv k kw k g maxfD xu u uj kuk g

Hence follows from

CHAPTER BASIC INTERIORPOINT METHOD THEORY

The original denition of strongly nondegenerate selfconcordancy

was that f satisfy the assumptions of Theorem The theorem shows

such f to b e selfconcordant according to the denition relied on in these

lecture notes The denition in these notes is eversoslightly lessrestrictive

by requiring only f C not f C For example letting k k denote the

Euclidean norm the functionals

kxk kxk f x

and

kxk

f xe kxk

are selfconcordant according to the denition in these notes but not ac

cording to the original denition Neither functional is thricedierentiable

at the origin

The denition in these notes was not chosen for the slightly broader set

of functionals it denes It was chosen b ecause it provides the reader upfront

with some sense of the geometry underlying selfconcordancy and b ecause

it is handy in developing the theory Nonetheless the original denition has

distinct advantages esp ecially in proving a functional to b e selfconcordant

n

For example assume D is op en and convex and assume F C is

a functional whichtakes on only p ositivevalues in D and only the value

on the b oundary D Furthermore assume that for eachlineintersecting

D theunivariate functional tf x td obtained by restricting F to

the line happ ens to be a p olynomial moreover a p olynomial with only

real ro ots Then relying on the original denition of selfconcordancyitis

not dicult to prove the functional

f x lnF x

to be selfconcordant For letting r r denote the ro ots of and

d

assuming wlog that is monic wehave

Y



d

t ln t r

 i

dt

i

X



d

lnt r

 i

dt

i

X

t r

i

i

t r i

MATTERS OF DEFINITION

the inequality due to the relation k k k k between the norm and the

d

norm on

It is an insightful exercise to show that the selfconcordancy of the

various logarithmic barrier functions are sp ecial cases of the result describ ed

in the preceding paragraph Incidentally functionals F as ab ove are known

as hyp erb olic p olynomials

We close this section with a discussion of the qualifying phrase strongly

nondegenerate whichwehave suppressed throughout

Nesterov and Nemirovskii dene selfconcordant functionals with no

qualiers as functionals f C with op en and convex domains satisfying

for the univariate functionals obtained by restricting f to lines

They dene strongly selfconcordant functionals as having the additional

prop ertythat f x if the sequence fx g converges to a p oint in the

k k

b oundary D Finally strongly nondegenerate selfconcordant function

f

als are those which satisfy the yet further prop ertythat H x is p d for all

x D

f

It is readily proven that selfconcordant functionals thus dened have

psd Hessians

One might ask if the NesterovNemirovskii denition of say strong self

concordancy has a geometric analogue similar to the denition of strongly

nondegenerate selfconcordancy used in these lecture notes One would

not exp ect the analogue to be as simple as the denition in these notes

b ecause the bilinear forms

hu v i hu H xv i

x

need not be inner pro ducts if f is not nondegenerate However there

is indeed an analogue obtained as a direct extension of the denition for

strongly nondegenerate selfconcordancy Roughly strongly selfconcordant

functionals are those obtained by extending strongly nondegenerate self

concordant functionals to larger vector spaces byhaving the functional b e

constant on parallel slices Sp ecically one can prove as is done in

n

that f is strongly selfconcordant i can be written as a direct sum

L L for which there exists a strongly nondegenerate selfconcordant

functional h with D L satisfying f x x hx For example

h

f x lnx is a strongly selfconcordant functional with domain the

n n

halfspace in but it is not nondegenerate

If selfconcordant resp strongly selfconcordant functionals are added

the resulting functional is selfconcordant resp strongly selfconcordant

If one of the summands is strongly nondegenerate so is the sum This is

an indication of how the theory of selfconcordant functionals and strongly

CHAPTER BASIC INTERIORPOINT METHOD THEORY

selfconcordant functionals parallels the theory develop ed in these notes

To get to the heart of the theory exp editously these notes fo cus on strongly

nondegenerate selfconcordant functionals Those are byfarthemostim

p ortant functionals

Henceforth we return to our practice of refering to functionals as self

concordant when strictly sp eaking we mean strongly nondegenerate self

concordant

Bibliography

YuE Nesterov and AS Nemirovskii Interior Point Methods in Con

vex Optimization Theory and Applications SIAM Philadelphia