DETERMINANT MAXIMIZATION WITH LINEAR

y

INEQUALITY CONSTRAINTS

z x x

LIEVEN VANDENBERGHE , STEPHEN BOYD , AND SHAO-PO WU

Abstract. The problem of maximizing the of a matrix sub ject to linear matrix

inequalities arises in many elds, including computational geometry, statistics, system identi cation,

exp eriment design, and information and communication theory. It can also be considered as a

generalization of the semide nite programming problem.

We giveanoverview of the applications of the determinant maximization problem, p ointing out

simple cases where sp ecialized algorithms or analytical solutions are known. We then describ e an

interior-p oint metho d, with a simpli ed analysis of the worst-case complexity and numerical results

that indicate that the metho d is very ecient, b oth in theory and in practice. Compared to existing

sp ecialized algorithms where they are available, the interior-p oint metho d will generally b e slower;

the advantage is that it handles a much wider variety of problems.

1. Intro duction. We consider the optimization problem

T 1

minimize c x + log det Gx

1.1 sub ject to Gx 0

F x  0;

m m ll

where the optimization is the vector x 2 R . The functions G : R ! R

m nn

and F : R ! R are ane:

Gx=G +x G + + x G ;

0 1 1 m m

Fx=F +x F + + x F ;

0 1 1 m m

T T

where G = G and F = F . The inequality signs in 1.1 denote matrix inequalities,

i i

i i

T T

i.e., Gx 0 means z Gxz>0 for all nonzero z and F x  0 means z F xz  0

for all z . We call Gx 0 and F x  0 strict and nonstrict, resp ectively linear

matrix inequalities LMIs in the variable x. We will refer to problem 1.1 as a max-

T

det problem, since in many cases the term c x is absent, so the problem reduces to

maximizing the determinantof Gx sub ject to LMI constraints.

The max-det problem is a convex optimization problem, i.e., the ob jective func-

T 1

tion c x + log det Gx is convex on fx j Gx 0g, and the constraint set is con-

vex. Indeed, LMI constraints can represent many common convex constraints, includ-

ing linear inequalities, convex quadratic inequalities, and and eigenvalue

constraints see Alizadeh[1], Boyd, El Ghaoui, Feron and Balakrishnan[13], Lewis and

Overton[47], Nesterov and Nemirovsky[51, x6.4], and Vandenb erghe and Boyd[69].

In this pap er we describ e an interior-p oint metho d that solves the max-det prob-

lem very eciently, b oth in worst-case complexity theory and in practice. The metho d

we describ e shares many features of interior-p oint metho ds for linear and semide nite

programming. In particular, our computational exp erience which is limited to prob-

lems of mo derate size | several hundred variables, with matrices up to 100  100

y

Research supp orted in part byAFOSR under F49620-95-1-0318, NSF under ECS-9222391 and

EEC-9420565, MURI under F49620-95-1-0525, and a NATO Collab orative Research Grant CRG-

941269. Asso ciated software is available at URL http://www-isl.stanfor d.e du/p eop le/b oyd and

from anonymous FTP to isl.stanford.edu in pub/boyd/maxdet.

z

Electrical Engineering Department, University of California, Los Angeles CA 90095-1594. E-

mail: [email protected].

x

Information Systems Lab oratory, Electrical Engineering Department, Stanford University, Stan-

ford CA 94305. E-mail: [email protected], [email protected]. 1

2 L. VANDENBERGHE, S. BOYD AND S.P.WU

indicates that the metho d we describ e solves the max-det problem 1.1 in a number

of iterations that hardly varies with problem size, and typically ranges b etween 5 and

50; each iteration involves solving a system of linear equations.

Max-det problems arise in many elds, including computational geometry, statis-

tics, and information and communication theory, so the duality theory and algorithms

we develop have wide application. In some of these applications, and for very simple

forms of the problem, the max-det problems can b e solved by sp ecialized algorithms

or, in some cases, analytically. Our interior-p oint algorithm will generally b e slower

than the sp ecialized algorithms when the sp ecialized algorithms can b e used. The

advantage of our approach is that it is much more general; it handles a much wider

variety of problems. The analytical solutions or sp ecialized algorithms, for example,

cannot handle the addition of convex constraints; our algorithm for general max-det

problems do es.

In the remainder of x1, we describ e some interesting sp ecial cases of the max-det

problem, such as semide nite programming and analytic centering. In x2we describ e

examples and applications of max-det problems, p ointing out analytical solutions

where they are known, and interesting extensions that can be handled as general

max-det problems. In x3we describ e a duality theory for max-det problems, p ointing

out connections to semide nite programming duality. Our interior-p oint metho d for

solving the max-det problem 1.1 is develop ed in x4{x9. We describ e twovariations:

a simple `short-step' metho d, for whichwe can prove p olynomial worst-case complex-

ity, and a `long-step' or adaptive step predictor-corrector metho d which has the same

worst-case complexity, but is much more ecient in practice. We nish with some

numerical exp eriments. For the sake of brevity, we omit most pro ofs and some im-

p ortantnumerical details, and refer the interested reader to the technical rep ort [70].

A C implementation of the metho d describ ed in this pap er is also available [76].

Let us now describ e some sp ecial cases of the max-det problem.

Semide nite programming. When Gx = 1, the max-det problem reduces to

T

minimize c x

1.2

sub ject to F x  0;

which is known as a semide nite program SDP. Semide nite programming uni es

a wide varietyofconvex optimization problems, e.g., linear programming,

T

minimize c x

sub ject to Ax  b

which can be expressed as an SDP with F x = diag b Ax. For surveys of the

theory and applications of semide nite programming, see [1], [13], [46], [47], [51, x6.4],

and [69].

Analytic centering. When c = 0 and F x = 1, the max-det problem 1.1

reduces to

1

minimize log det Gx

1.3

sub ject to Gx 0;

which we call the analytic centering problem. We will assume that the feasible set

fx j Gx 0g is nonempty and b ounded, which implies that the matrices G ,

i

1

i =1;:::;m, are linearly indep endent, and that the ob jective x = log det Gx

DETERMINANT MAXIMIZATION 3

is strictly convex see, e.g., [69] or [12]. Since the ob jective grows without

?

b ound as x approaches the b oundary of the feasible set, there is a unique solution x

?

of 1.3. We call x the analytic center of the LMI Gx 0. The analytic center

of an LMI generalizes the analytic center of a set of linear inequalities, intro duced by

Sonnevend[64,65].

?

Since the constraint cannot b e active at the analytic center, x is characterized

?

by the optimality condition rx =0:

? ? 1

1.4 rx  = TrG Gx  =0; i=1;:::;m

i

i

see for example Boyd and El Ghaoui[12].

The analytic center of an LMI is imp ortant for several reasons. We will see in x5

that the analytic center can b e computed very eciently, so it can b e used as an easily

computed robust solution of the LMI. Analytic centering also plays an imp ortant role

in interior-p oint metho ds for solving the more general max-det problem 1.1. Roughly

sp eaking, the interior-p oint metho ds solve the general problem by solving a sequence

of analytic centering problems.

T

Parametrization of LMI feasible set. Let us restore the term c x:

T 1

minimize c x + log det Gx

1.5

sub ject to Gx 0;

retaining our assumption that the feasible set X = fx j Gx 0g is nonempty and

b ounded, so the matrices G are linearly indep endent and the ob jective function is

i

?

strictly convex. Thus, problem 1.5 has a unique solution x c, which satis es the

?

optimality conditions c + rx c = 0, i.e.,

? 1

Tr G Gx c = c ; i =1;:::;m:

i i

m

?

Thus for each c 2 R ,wehave a readily computed p oint x c in the set X.

m

1

Conversely, given a p oint x 2 X, de ne c 2 R by c = Tr Gx G , i =

i i

?

1;:::;m. Evidently wehavex=x c. In other words, there is a one-to-one corre-

m

?

sp ondence b etween vectors c 2 R and feasible vectors x 2 X: the mapping c 7! x c

is a parametrization of the feasible set X of the strict LMI Gx 0, with parameter

m

c 2 R . This parametrization of the set X is related to the Legendre transform of

1

the convex function log det Gx , de ned by

T 1

Ly =inf fy x + log det Gx j Gx 0g:

Maximal lower b ounds in the p ositive de nite cone. Here we consider

T

a simple example of the max-det problem. Let A = A , i = 1;:::;L, be p ositive

i

i

pp

de nite matrices in R . A matrix X is a lower b ound of the matrices A if X  A ,

i i

i =1;:::;L; it is a maximal lower b ound if there is no lower b ound Y with Y 6= X ,

Y  X .

1

Since the function log det X is monotone decreasing with resp ect to the p ositive

semide nite cone, i.e.,

1 1

0  X  Y = log det Y  log det X ;

we can compute a maximal lower b ound A by solving

mlb

1

minimize log det X

sub ject to X 0

1.6

X  A ; i =1;:::;L: i

4 L. VANDENBERGHE, S. BOYD AND S.P.WU

This is a max-det problem with pp +1=2variables the elements of the matrix X ,

and L LMI constraints A X  0, whichwe can also consider as diagonal blo cks of

i

one single blo ck diagonal LMI

diag A X; A X;:::;A X  0:

1 2 L

1

Of course there are other maximal lower b ounds; replacing log det X by any

1

other monotone decreasing matrix function, e.g., TrX or Tr X , will also yield

other maximal lower b ounds. The maximal lower b ound A obtained by solv-

mlb

ing 1.6, however, has the prop erty that it is invariant under congruence transfor-

pp

T

mations, i.e., if the matrices A are transformed to TA T , where T 2 R is

i i

T

nonsingular, then the maximal lower b ound obtained from 1.6 is TA T .

mlb

2. Examples and applications. In this section we catalog examples and ap-

plications. The reader interested only in duality theory and solution metho ds for the

max-det problem can skip directly to x3.

2.1. Minimum volume ellipsoid containing given p oints. Perhaps the ear-

liest and b est known application of the max-det problem arises in the problem of de-

n

1 K

termining the minimum volume ellipsoid that contains given p oints x , ..., x in R

1 K

or, equivalently, their convex hull Cofx ;:::;x g. This problem has applications

in cluster analysis Rosen[58], Barnes[9], and robust statistics in ellipsoidal p eeling

metho ds for outlier detection; see Rousseeuw and Leroy[59, x7].

T

We describ e the ellipsoid as E = fx jkAx + bk 1g, where A = A 0, so the

1

volume of E is prop ortional to det A . Hence the minimum volume ellipsoid that

i

contains the p oints x can b e computed by solving the convex problem

1

minimize log det A

i

sub ject to kAx + bk 1; i =1;:::;K

2.1

T

A = A 0;

nn n

T i

where the variables are A = A 2 R and b 2 R . The norm constraints kAx +

bk1, which are just convex quadratic inequalities in the variables A and b, can b e

expressed as LMIs

 

i

I Ax + b

 0:

i T

Ax + b 1

These LMIs can in turn b e expressed as one large blo ck diagonal LMI, so 2.1 is a

max-det problem in the variables A and b.

Nesterov and Nemirovsky[51, x6.5], and Khachiyan and To dd[43] describ e interior-

p oint algorithms for computing the maximum-volume ellipsoid in a p olyhedron de-

scrib ed by linear inequalities as well as the minimum-volume ellipsoid covering a

p olytop e describ ed by its vertices.

Many other geometrical problems involving ellipsoidal approximations can be

formulated as max-det problems. References [13, x3.7], [16] and [68] give several

examples, including the maximum volume ellipsoid contained in the intersection or in

the sum of given ellipsoids, and the minimum volume ellipsoid containing the sum of

given ellipsoids. For other ellipsoidal approximation problems, sub optimal solutions

can b e computed via max-det problems.

Ellipsoidal approximations of convex sets are used in control theory and signal

pro cessing in bounded-noise or set-membership techniques. These techniques were

DETERMINANT MAXIMIZATION 5

rst intro duced for state estimation see, e.g., Schwepp e[62, 63], Witsenhausen[75],

Bertsekas and Rho des[11], Chernousko[16,17], and later applied to system identi -

cation Fogel and Huang[35,36], Norton[52,53, x8.6], Walter and Piet-Lahanier[71],

Cheung, Yurkovich and Passino[18], and signal pro cessing Deller[23]. For a survey

emphasizing signal pro cessing applications, see Deller et al. [24].

Other applications include the method of inscribed el lipsoids develop ed byTarasov,

Khachiyan, and Erlikh[66], and design centering Sapatnekar[60].

2.2. Matrix completion problems.

Positive de nite matrix completion. In a p ositive de nite matrix completion

nn

problem we are given a A 2 R , some entries of which are xed;

f

the remaining entries are to b e chosen so that the resulting matrix is p ositive de nite.

Let the p ositions of the free unsp eci ed entries be given by the index pairs

i ;j , j ;i , k =1;:::;m. We can assume that the diagonal elements are xed,

k k k k

i.e., i 6= j for all k . If a diagonal element, say the l; lth, is free, we take it to

k k

be very large, which makes the l th row and column of A irrelevant. The p ositive

f

de nite completion problem can b e cast as an SDP feasibility problem:

m

nd x 2 R

m

X



such that Ax = A + x  E + E  0;

f k i j j i

k k k k

k =1

where E denotes the matrix with all elements zero except the i; j  element, whichis

ij

equal to one. Note that the set fx j Ax 0g is b ounded since the diagonal elements

of Ax are xed.

Maximum entropy completion. The analytic center of the LMI Ax 0

is sometimes called the maximum entropy completion of A . From the optimality

f

?

conditions 1.4, we see that the maximum entropy completion x satis es



? 1 ? 1

2Tr E Ax  =2 Ax  =0; k =1;:::;m;

i j

k k

i j

k k

 1

i.e., the matrix Ax  has a zero entry in every lo cation corresp onding to an unsp ec-

i ed entry in the original matrix. This is a very useful prop erty in many applications;

see, for example, Dempster[27], or Dewilde and Ning[30].

Parametrization of all p ositive de nite completions. As an extension of

the maximum entropy completion problem, consider

1

minimize Tr CAx + log det Ax

2.2

sub ject to Ax 0;

T

where C = C is given. This problem is of the form 1.5; the optimality conditions

are



  1

2.3 Ax  0; Ax  = C ; k =1;:::;m;

i j

k k

i j

k k

i.e., the inverse of the optimal completion matches the given matrix C in every free

entry. Indeed, this gives a parametrization of all p ositive de nite completions: a

p ositive de nite completion Ax is uniquely characterized by sp ecifying the elements



1

of its inverse in the free lo cations, i.e., Ax . Problem 2.2 has b een studied

i j

k k

by Bakonyi and Wo erdeman[8].

6 L. VANDENBERGHE, S. BOYD AND S.P.WU

Contractive completion. A related problem is the contractive completion prob-

lem: given a p ossibly nonsymmetric matrix A and m index pairs i ;j , k =

f k k

1;:::;m, nd a matrix

m

X

Ax=A + x E :

f k i ;j

k k

k=1

with sp ectral norm maximum singular value less than one.

This can b e cast as a semide nite programming feasibility problem [69]: nd x

such that

 

I Ax

2.4 0:

T

Ax I

One can de ne a maximum entropy solution as the solution that maximizes the de-

terminant of 2.4, i.e., solves the max-det problem

T

maximize log det I Ax Ax

 

I Ax

2.5

sub ject to 0:

T

Ax I

See Nvdal and Wo erdeman[50], Helton and Wo erdeman[38]. For a statistical inter-

pretation of 2.5, see x2.3.

Sp ecialized algorithms and references. Very ecient algorithms have b een

develop ed for certain sp ecialized typ es of completion problems. A well known ex-

ample is the maximum entropy completion of a p ositive de nite banded To eplitz

matrix Dym and Gohb erg[31], Dewilde and Deprettere[29]. Davis, Kahan, and

Weinb erger[22] discuss an analytic solution for a contractive completion problem with

a sp ecial blo ck matrix form. The metho ds discussed in this pap er solve the general

problem eciently, although they are slower than the sp ecialized algorithms where

they are applicable. Moreover they have the advantage that other convex constraints,

e.g., upp er and lower b ounds on certain entries, are readily incorp orated.

Completion problems, and sp ecialized algorithms for computing completions, have

b een discussed by many authors, see, e.g., Dym and Gohb erg[31], Grone, Johnson, S a

and Wolkowicz[37], Barrett, Johnson and Lundquist[10], Lundquist and Johnson[49],

Dewilde and Deprettere[29], Demb o, Mallows, and Shepp[26]. Johnson gives a survey

in [41]. An interior-p oint metho d for an approximate completion problem is discussed

in Johnson, Kroschel, and Wolkowicz[42].

We refer to Boyd et al. [13, x3.5], and El Ghaoui[32], for further discussion and

additional references.

2.3. Risk-averse linear estimation. Let y = Ax + w with w N0;I and

q p

A 2 R . Here x is an unknown quantity that we wish to estimate, y is the mea-

surement, and w is the measurement noise. We assume that p  q and that A has

full column rank.

pq

A linear estimator xb = My, with M 2 R ,isunbiased if Exb = x where E means

exp ected value, i.e., the estimator is unbiased if MA = I. The minimum-variance

unbiased estimator is the unbiased estimator that minimizes the error variance

p

X

2 2 T

 M ; EkMy xk = TrMM =

i

i=1

DETERMINANT MAXIMIZATION 7

+

where  M  is the ith largest singular value of M . It is given by M = A , where

i

+ T 1 T

A =A A A is the pseudo-inverse of A. In fact the minimum-variance estimator

P

2

is optimal in a stronger sense: it not only minimizes  M , but each singular value

i

i

 M  separately:

i

+

2.6 MA = I =  A    M; i =1;:::;p:

i i

In some applications estimation errors larger than the mean value are more costly,

or less desirable, than errors less than the mean value. To capture this idea of risk

aversion we can consider the ob jective or cost function

1

2 2

2.7 kMy xk 2 log E exp

2

2

where the parameter is called the risk-sensitivity parameter. This cost function was

intro duced by Whittle in the more sophisticated setting of sto chastic optimal control;

see [72, x19]. Note that as !1, the risk-sensitive cost 2.7 converges to the cost

2

EkMy xk , and is always larger by convexity of exp. We can gain further insight

2

from the rst terms of the series expansion in 1= :



1 1

2

2 2 4 2 2

2 log E exp kxb xk Ekxb xk Ekxb xk ' Ekxb xk +

2 2

2 4

1

var z; = Ez +

2

4

2

where z = kxb xk is the squared error. Thus for large , the risk-averse cost 2.7

augments the mean-square error with a term prop ortional to the variance of the

squared error.

The unbiased, risk-averse optimal estimator can b e found by solving

1

2 2

minimize 2 log E exp kMy xk

2

2

sub ject to MA = I;

which can b e expressed as a max-det problem. The ob jective function can b e written

as

1

2 2

2 log E exp kMy xk

2

2

1

2 T T

=2 log E exp w M Mw

2

2



2 2 T 1=2 T 2

2 log detI 1= M M  if M M  I

=

1 otherwise

8

   

1

1 T 1 T

<

I M I M

2

log det if 0

1 1

=

M I M I

:

1 otherwise

8 L. VANDENBERGHE, S. BOYD AND S.P.WU

so the unbiased risk-averse optimal estimator solves the max-det problem

 

1

1 T

I M

2

minimize log det

1

M I

 

1 T

2.8

I M

sub ject to 0

1

M I

MA = I:

This is in fact an analytic centering problem, and has a simple analytic solution: the

+

least squares estimator M = A . To see this we express the ob jective in terms of the

singular values of M :

8

p

X

>

 

1

<

2 2 2 1

1 T

log1  M =  if  M  <

I M

1

2

i

= log det

1

i=1

M I

>

:

1 otherwise.

+ +

It follows from prop erty 2.6 that the solution is M = A if kA k < , and that the

problem is infeasible otherwise. Whittle refers to the infeasible case, in which the

risk-averse cost is always in nite, as `neurotic breakdown'.

In the simple case discussed ab ove, the optimal risk-averse and the minimum-

variance estimators coincide so there is certainly no advantage in a max-det problem

formulation. When additional convex constraints on the matrix M are added, e.g.,

a given sparsity pattern, or triangular or To eplitz structure, the optimal risk-averse

estimator can b e found by including these constraints in the max-det problem 2.8

and will not, in general, coincide with the constrained minimum-variance estimator.

2.4. Exp eriment design.

Optimal exp eriment design. As in the previous section, we consider the prob-

lem of estimating a vector x from a measurement y = Ax + w , where w N0;Iis

measurement noise. The error covariance of the minimum-variance estimator is equal

T

+ + T T 1

to A A  =A A . We supp ose that the rows of the matrix A =[a ::: a ]

1 q

p

i

can b e chosen among M p ossible test vectors v 2 R , i =1;:::;M:

1 M

a 2fv ;:::;v g; i =1;:::;q:

i

The goal of exp eriment design is to cho ose the vectors a so that the error covariance

i

T 1

A A is `small'. We can interpret each comp onentofy as the result of an exp eri-

ment or measurement that can b e chosen from a xed menu of p ossible exp eriments;

our job is to nd a set of measurements that together are maximally informative.

P

T

M

T i i

We can write A A = q  v v , where  is the fraction of rows a equal

i i k

i=1

i

to the vector v . We ignore the fact that the numb ers  are integer multiples of 1=q ,

i

and instead treat them as continuous variables, which is justi ed in practice when q

is large. Alternatively,we can imagine that we are designing a random exp eriment:

k 

each exp eriment a has the form v with probability  .

i k

T 1

Many di erent criteria for measuring the size of the matrix A A have b een

prop osed. For example, in E -optimal design, we minimize the norm of the error

T 1

covariance,  A A , which is equivalent to maximizing the smallest eigenvalue max

DETERMINANT MAXIMIZATION 9

T

of A A. This is readily cast as the SDP

maximize t

M

X

T

i i

sub ject to  v v  tI

i

i=1

M

X

 =1

i

i=1

 0; i=1;:::;M

i

in the variables  ;:::; , and t. Another criterion is A-optimality, in which we

1 M

T 1

minimize Tr A A . This can b e cast as an SDP:

p

X

minimize t

i

i=1

" 

P

T

M

i i

 v v e

i i

i=1

sub ject to  0; i =1;:::;p;

T

e t

i

i

  0; i =1;:::;M;

i

M

X

 =1;

i

i=1

p

where e is the ith unit vector in R , and the variables are  , i =1;:::;M, and t ,

i i i

i =1;:::;p.

T 1

In D -optimal design, we minimize the determinant of the error covariance A A ,

which leads to the max-det problem

 !

1

M

X

T

i i

minimize log det  v v

i

i=1

2.9

sub ject to   0; i =1;:::;M

i

M

X

 =1:

i

i=1

In x3we will deriveaninteresting geometrical interpretation of the D -optimal matrix

T

A, and show that A A determines the minimum volume ellipsoid, centered at the

1 M

origin, that contains v , ..., v .

Fedorov[33], Atkinson and Donev[7], Pukelsheim[55], and Co ok and Fedorov[19]

give surveys and additional references on optimal exp eriment design. Wilhelm[74,73]

discusses nondi erentiable optimization metho ds for exp eriment design. J avorzky et

al. [40] describ e an application in frequency domain system identi cation, and compare

the interior-p oint metho d discussed later in this pap er with conventional algorithms.

Lee et al. [45, 5] discuss a non-convex exp eriment design problem and a relaxation

solved byaninterior-p oint metho d.

Extensions of D -optimal exp eriment design. The formulation of D -optimal

design as an max-det problem has the advantage that one can easily incorp orate

additional useful convex constraints. For example, one can add linear inequalities

T

c   , which can re ect b ounds on the total cost of, or time required to carry out,

i

i

the exp eriments.

10 L. VANDENBERGHE, S. BOYD AND S.P.WU

Without 90-10 constraint With 90-10 constraint

2

Fig. 2.1. A D -optimal experiment design involving 50 test vectors in R , with and without

the 90-10 constraint. The circle is the origin; the dots are the test vectors that are not used in the

experiment i.e., have a weight  =0; the crosses are the test vectors that are usedi.e., have a

i

weight  > 0. Without the 90-10 constraint, the optimal design al locates al l measurements to only

i

two test vectors. With the constraint, the measurements are spread over ten vectors, with no more

than 90 of the measurements al located to any group of ve vectors. See also Figure 2.2.

We can also consider the case where each exp eriment yields several measurements,

k 

i.e., the vectors a and v b ecome matrices. The max-det problem formulation 2.9

i

k  k T

remains the same, except that the terms v v can nowhave rank larger than one.

This extension is useful in conjunction with additional linear inequalities represent-

ing limits on cost or time: we can mo del discounts or time savings asso ciated with

p erforming groups of measurements simultaneously. Supp ose, for example, that the

1 2

cost of simultaneously making measurements v and v is less than the sum of the

3

costs of making them separately. We can take v to b e the matrix

 

3

1 2

v =

v v

and assign costs c , c , and c asso ciated with making the rst measurement alone,

1 2 3

the second measurement alone, and the two simultaneously, resp ectively.

Let us describ e in more detail another useful additional constraint that can be

imp osed: that no more than a certain fraction of the total number of exp eriments,

say 90, is concentrated in less than a given fraction, say 10, of the p ossible mea-

surements. Thus we require

bM=10c

X

2.10   0:9;

[i]

i=1

where  denotes the ith largest comp onentof. The e ect on the exp eriment design

[i]

will be to spread out the measurements over more p oints at the cost of increasing

the determinant of the error covariance. See Figures 2.1 and 2.2.

M

The constraint 2.10 is convex; it is satis ed if and only if there exists x 2 R

and t such that

M

X

bM=10c t + x  0:9

i

2.11 i=1

t + x   ; i =1;:::;M

i i

x  0

DETERMINANT MAXIMIZATION 11

without 90-10 constraint

1:0

0:9

@I

@

@

with 90-10 constraint

k

X



[i]

i=1

0:0

15

5 10 20 25 30 35 40 45 50

k

Fig. 2.2. Experiment design of Figure 2.1. The curves show the sum of the largest k components

of  as a function of k , without the 90-10 constraint `', and with the constraint `'. The

constraint speci es that the sum of the largest ve components should be less than 0.9, i.e., the curve

should avoid the area inside the dashedrectangle.

see [14, p.318]. One can therefore compute the D -optimal design sub ject to the 90-

10 constraint 2.10 by adding the linear inequalities 2.11 to the constraints in 2.9

and solving the resulting max-det problem in the variables , x, t.

2.5. Maximum likeliho o d estimation of structured covariance matrices.

The next example is the maximum likeliho o d ML estimation of structured covari-

ance matrices of a normal distribution. This problem has a long history; see e.g.,

Anderson[2, 3].

1 M

Let y , ..., y be M samples from a normal distribution N 0; . The ML

estimate for  is the p ositive de nite matrix that maximizes the log-likeliho o d function

Q

M

i

log py , where

i=1

1

1=2

p T 1

px=2 det  exp x  x :

2

In other words,  can b e found by solving

N

X

T

i 1 i

1

1

y  y maximize log det 

M

2.12

i=1

sub ject to  0:

1

This can b e expressed as a max-det problem in the inverse R = :

1

minimize Tr SR + log det R

2.13

sub ject to R 0;

P

T

n

1

i i

where S = y y . Problem 2.13 has the straightforward analytical solu-

i=1

M

1

tion R = S provided S is nonsingular.

It is often useful to imp ose additional structure on the covariance matrix 

or its inverse R Anderson[2, 3], Burg, Luenb erger, Wenger[15], Scharf[61, x6.13],

12 L. VANDENBERGHE, S. BOYD AND S.P.WU

Demb o[25]. In some sp ecial cases e.g.,  is circulant analytical solutions are known;

in other cases where the constraints can b e expressed as LMIs in R , the ML estimate

can b e obtained from a max-det problem. To give a simple illustration, b ounds on

the variances  can b e expressed as LMIs in R :

ii

 

R e

i

T 1

 = e R e    0:

ii i

T

i

e

i

The formulation as a max-det problem is also useful when the matrix S is singular

for example, b ecause the numb er of samples is to o small and, as a consequence, the

max-det problem 2.13 is unb ounded b elow. In this case we can imp ose constraints

i.e., prior information on , for example lower and upp er b ounds on the diagonal

elements of R .

2.6. Gaussian channel capacity.

The Gaussian channel and the water- lling algorithm. The entropyofa

1

normal distribution N ;  is, up to a constant, equal to log det  see Cover and

2

Thomas[21, Chapter 9]. It is therefore not surprising that max-det problems arise

naturally in information theory and communications. One example is the computation

of channel capacity.

Consider a simple Gaussian communication channel: y = x + v , where y , x, and v

n

are random vectors in R ; x N0;X is the input; y is the output, and v N0;R

is additive noise, indep endentofx. This mo del can represent n parallel channels, or

one single channel at n di erent time instants or n di erent frequencies.

We assume the noise covariance R is known and given; the input covariance X is

the variable to b e determined, sub ject to constraints suchaspower limits that we

will describ e b elow. Our goal is to maximize the mutual information between input

and output, given by

1 1

1=2 1=2

log detX + R  log det R = log detI + R XR 

2 2

see [21]. The channel capacity is de ned as the maximum mutual information over

all input covariances X that satisfy the constraints. Thus, the channel capacity

dep ends on R and the constraints.

The simplest and most common constraint is a limit on the average total p ower

in the input, i.e.,

T

2.14 Ex x=n = Tr X=n  P:

The information capacity sub ject to this average p ower constraint is the optimal value

of

1

1=2 1=2

log det I + R XR  maximize

2

2.15

sub ject to TrX  nP

X  0

T

see [21, x10]. This is a max-det problem in the variable X = X .

There is a straightforward solution to 2.15, known in information theory as

T

the water- l ling algorithm see [21, x10], [20]. Let R = V V be the eigenvalue

DETERMINANT MAXIMIZATION 13

T

e

decomp osition of R . By intro ducing a new variable X = V XV ,we can rewrite the

problem as

1

1=2 1=2

e

maximize log det I + X 

2

e

sub ject to Tr X  nP

e

X  0:

e

Since the o -diagonal elements of X do not app ear in the constraints, but decrease

e

the ob jective, the optimal X is diagonal. Using Lagrange multipliers one can show

e

that the solution is X = max  ; 0, i =1;:::;n, where the Lagrange multiplier

ii i

P

e

 is to b e determined from X = nP . The term `water- lling' refers to a visual

ii

description of this pro cedure see [21, x10], [20].

Average power constraints on each channel. Problem 2.15 can be ex-

tended and mo di ed in many ways. For example, we can replace the average total

power constraintbyanaverage p ower constraint on the individual channels, i.e.,we

2

can replace 2.14 by Ex = X  P , k =1;:::;n. The capacity sub ject to this

kk

k

constraint can b e determined by solving the max-det problem



1

1=2 1=2

log det I + R XR maximize

2

sub ject to X  0

X  P; k =1;:::;n:

kk

The water- lling algorithm do es not apply here, but the capacity is readily computed

by solving this max-det problem in X . Moreover, we can easily add other constraints,

such as power limits on subsets of individual channels, or an upp er b ound on the

correlation co ecientbetween two comp onents of x:

 

p

jX j

 X X

ij

max ii ij

p p

    0:

max

X  X

ij max jj

X X

ii jj

Gaussian channel capacity with feedback. Supp ose that the n comp onents

of x, y , and v are consecutivevalues in a time series. The question whether knowledge

of the past values v helps in increasing the capacity of the channel is of great interest

k

in information theory [21, x10.6]. In the Gaussian channel with feedback one uses,

instead of x, the vector x~ = Bv + x as input to the channel, where B is a strictly

lower . The output of the channel is y =~x+v=x+B +Iv. We

T

assume there is an average total p ower constraint: Ex~ x=n~  P .

The mutual information b etween x~ and y is



1

T

log detB + I R B + I  + X  log det R ;

2

so we maximize the mutual information by solving



1

T

log detB + I R B + I  + X  log det R maximize

2

T

sub ject to Tr BRB + X   nP

X  0

B strictly lower triangular

over the matrix variables B and X . To cast this problem as a max-det problem, we

T

intro duce a new variable Y =B+IRB+I +X i.e., the covariance of y , and

14 L. VANDENBERGHE, S. BOYD AND S.P.WU

obtain

maximize log det Y

T

sub ject to Tr Y RB BR R  nP

2.16

T

Y B + I R B + I   0

B strictly lower triangular.

The second constraint can b e expressed as an LMI in B and Y ,

 

Y B + I

 0;

T 1

B + I  R

so 2.16 is a max-det problem in B and Y .

Capacity of channel with cross-talk. Supp ose the n channels are indep en-

dent, i.e., all covariances are diagonal, and that the noise covariance dep ends on X :

R = r + a X , with a > 0. This has b een used as a mo del of near-end cross-talk

ii i i ii i

see [6]. The capacity with the total average p ower constraint is the optimal value

of

n

X

1 X

ii

maximize log 1+

2 r + a X

i i ii

i=1

sub ject to X  0; i =1;:::;n

ii

n

X

X  nP ;

ii

i=1

which can b e cast as a max-det problem

n

X

1

maximize log 1 + t 

i

2

i=1

sub ject to X  0; t  0; i =1;:::;n;

ii i

 

p

r 1 a t

i i

i

p

 0; i =1;:::;n;

r a X + r

i ii i

i

n

X

X  nP :

ii

i=1

The LMI is equivalent to t  X =r + a X . This problem can be solved using

i ii i i ii

standard metho ds; the advantage of a max-det problem formulation is that we can add

other LMI constraints on X , e.g., individual power limits. As another interesting

p ossibility,we could imp ose constraints that distribute the p ower across the channels

more uniformly, e.g., a 90-10 typ e constraint see x2.4.

3. The dual problem. We asso ciate with 1.1 the dual problem

maximize log det W TrG W Tr F Z + l

0 0

sub ject to Tr G W + Tr F Z = c ; i =1; :::; m;

3.1

i i i

T T

W = W 0; Z = Z  0:

ll nn

The variables are W 2 R and Z 2 R . Problem 3.1 is also a max-det problem,

and can b e converted into a problem of the form 1.1 by elimination of the equality

constraints.

DETERMINANT MAXIMIZATION 15

We say W and Z are dual feasible if they satisfy the constraints in 3.1, and

strictly dual feasible if in addition Z 0. We also refer to the max-det problem 1.1

as the primal problem and say x is primal feasible if F x  0 and Gx 0, and

strictly primal feasible if F x 0 and Gx 0.

 

Let p and d b e the optimal values of problem 1.1 and 3.1, resp ectively with

 

the convention that p =+1 if the primal problem is infeasible, and d = 1 if the

dual problem is infeasible.

The optimization problem 3.1 is the Lagrange dual of problem 1.1, rewritten

as

T 1

minimize c x + log det X

sub ject to X = Gx

F x  0; X 0:

ll

T

Weintro duce a new variable X = X 2 R and add an equality constraint. To

T

derive the dual problem, we asso ciate a Lagrange mulitiplier Z = Z  0 with the

T

LMI F x  0, andamultiplier W = W with the equality constraint X = Gx.

The optimal value can then b e expressed as



 T 1

p = inf sup c x + log det X Tr ZF x+TrWX Gx :

x;X

Z 0;W

Changing the order of the supremum and the in mum, and solving the inner uncon-



strained minimization over x and X analytically, yields a lower b ound on p :



 T 1

p  sup inf c x + log det X Tr ZF x+TrWX Gx

x;X

Z 0;W

log det W Tr ZF TrWG + l = sup

0 0

Z 0;W 0;c =TrZF +TrWG

i i i



= d :

 

The inequality p  d holds with equality if a constraint quali cation holds, as stated

in the following theorem.

 

Theorem 3.1. p  d . If 1.1 is strictly feasible, the dual optimum is achieved;

 

if 3.1 is strictly feasible, the primal optimum is achieved. In both cases, p = d .

The theorem follows from standard results in convex optimization Luenb erger[48,

Chapter 8], Ro ckafellar[57, x29,30], Hiriart-Urruty and Lemar echal[39, Chapter XI I],

so we will not prove it here. See also Lewis[46] for a more general discussion of convex

analysis of functions of symmetric matrices.

The di erence b etween the primal and dual ob jective, i.e., the expression

T 1 1

c x + log det Gx + log det W + Tr G W + TrF Z l

0 0

m m

X X

= x Tr G W + TrG W + x Tr F Z + Tr F Z log det GxW l

i i 0 i i 0

i=1 i=1

3.2 = TrGxW log det GxW l + Tr F xZ;

is called the duality gap asso ciated with x, W and Z . Theorem 3.1 states that the

duality gap is always nonnegative, and zero only if x, W and Z are optimal.

Note that zero duality gap 3.2 implies GxW = I and F xZ =0. This gives

the optimality condition for the max-det problem 1.1: a primal feasible x is optimal

if there exists a Z  0, such that F xZ = 0 and

1

Tr G Gx + TrF Z = c ; i =1;:::;m:

i i i

16 L. VANDENBERGHE, S. BOYD AND S.P.WU

This optimality condition is always sucient; it is also necessary if the primal problem

is strictly feasible.

In the remainder of the pap er we will assume that the max-det problem is strictly

primal and dual feasible. By Theorem 3.1, this assumption implies that the primal

problem is b ounded b elow and the dual problem is b ounded ab ove, with equalityat

the optimum, and that the primal and dual optimal sets are nonempty.

Example: semide nite programming dual. As an illustration, we derive

from 3.1 the dual problem for the SDP 1.2. Substituting G =1, G =0, l =1,

0 i

in 3.1 yields

maximize log W W Tr F Z +1

0

sub ject to Tr F Z = c ; i =1;:::;m;

i i

W 0; Z  0:

The optimal value of W is one, so the dual problem reduces to

maximize Tr F Z

0

sub ject to Tr F Z = c ; i =1;:::;m;

i i

Z  0;

which is the dual SDP in the notation used in [69].

Example: D -optimal exp eriment design. As a second example we derive the

dual of the exp eriment design problem 2.9. After a few simpli cations we obtain

maximize log det W + p z

T

sub ject to W = W 0

3.3

T

i i

v Wv  z; i =1;:::;M;

where the variables are the matrix W and the scalar variable z . Problem 3.3 can

b e further simpli ed. The constraints are homogeneous in W and z , so for each dual

feasible W , z we have a ray of dual feasible solutions tW , tz , t > 0. It turns out

that we can analytically optimize over t: replacing W by tW and z by tz changes the

ob jective to log det W + p log t + p tz , which is maximized for t = p=z . After this

f

simpli cation, and with a new variable W =p=z W , problem 3.3 b ecomes

f

maximize log det W

f

3.4

sub ject to W 0

T

i i

f

v Wv  p; i =1;:::;M:

f

Problem 3.4 has an interesting geometrical meaning: the constraints state that W

T

f

determines an ellipsoid fx j x Wx  pg, centered at the origin, that contains the

i

f

p oints v , i =1;:::;M; the ob jectiveis to maximize det W , i.e., to minimize the

volume of the ellipsoid.

There is an interesting connection b etween the optimal primal variables  and

i

i

the p oints v that lie on the b oundary of the optimal ellipsoid E . First note that the

f

duality gap asso ciated with a primal feasible  and a dual feasible W is equal to

! 

1

M

X

T

i i

f

log det W;  v v log det

i

i=1

DETERMINANT MAXIMIZATION 17

Fig. 3.1. In the dual of the D -optimal experiment design problem we compute the minimum-

volume el lipsoid, centered at the origin, that contains the test vectors. The test vectors with a nonzero

weight lie on the boundary of the optimal el lipsoid. Same data and notation as in Figure 2.1.

1

P

T

M

i i

f

and is zero hence,  is optimal if and only if W =  v v . Hence, 

i

i=1

is optimal if

8 9

 !

1

M

< =

X

T

p

T i i

E = x 2 R x  v v x  p

i

: ;

i=1

j 

is the minimum-volume ellipsoid, centered at the origin, that contains the p oints v ,

j =1;:::;M. We also have in fact, for any feasible :

0 1

 !

1

M M

X X

T T

j  i i j 

@ A

 p v  v v v

j i

j =1 i=1

0 1

 !

1

M M

X X

T T

j  j  i i

@ A

= p Tr  v v  v v

j i

j =1 i=1

=0:

If  is optimal, then each term in the sum on the left hand side is p ositive since E

j 

, and therefore the sum can only b e zero if each term is zero: contains all vectors v

 !

1

M

X

T T

j i i j 

 > 0= v  v v v = p;

j i

i=1

j 

Geometrically,  is nonzero only if v lies on the b oundary of the minimum volume

j

ellipsoid. This makes more precise the intuitive idea that an optimal exp eriment only

uses `extreme' test vectors. Figure 3.1 shows the optimal ellipsoid for the exp eriment

design example of Figure 2.1.

The duality between D -optimal exp eriment designs and minimum-volume el-

lipsoids also extends to non- nite compacts sets Titterington[67], Pronzato and

p

Walter[54]. The D -optimal exp eriment design problem on a compact set C  R is

T

3.5 maximize log det Evv

18 L. VANDENBERGHE, S. BOYD AND S.P.WU

over all probability measures on C . This is a convex but semi-in nite optimization

problem, with dual [67]

f

maximize log det W

f

3.6

sub ject to W 0

T

f

v Wv  p; v 2 C:

Again, we see that the dual is the problem of computing the minimum volume ellip-

soid, centered at the origin, and covering the set C .

General metho ds for solving the semi-in nite optimization problems 3.5 and 3.6

fall outside the scop e of this pap er. In particular cases, however, these problems can

b e solved as max-det problems. One interesting example arises when C is the union

of a nite numb er of ellipsoids. In this case, the dual 3.6 can b e cast as a max-det

problem see [70] and hence eciently solved; by duality, we can recover from the

dual solution the probability distribution that solves 3.5.

4. The central path. In this section we describ e the central path of the max-

det problem 1.1, and give some of its prop erties. The central path plays a key role

in interior p oint metho ds for the max-det problem.

The primal central path. For strictly feasible x and t  1, we de ne





T 1 1

' t; x = t c x + log det Gx 4.1 + log det F x :

p

This function is the sum of two convex functions: the rst term is a p ositivemultiple

1

of the ob jective function in 1.1; the second term, log det F x , is a barrier function

for the set fx j F x 0g. For future use, we note that the gradient and Hessian of

' x; t are given by the expressions

p



1 1

4.2 r' t; x = t c Tr Gx G Tr F x F ;

p i i i

i



2 1 1 1 1

4.3 r ' t; x = tTr Gx G Gx G + TrF x F F x F ;

p i j i j

ij

for i; j =1;:::;m.

It can b e shown that ' t; xisastrictly convex function of x if the m matrices

p

diag G ;F , i = 1;:::;m, are linearly indep endent, and that it is b ounded b elow

i i

?

since we assume the problem is strictly dual feasible. We de ne x t as the unique

minimizer of ' t; x:

p

?

x t = argmin f' t; x j Gx 0; F x 0 g :

p

?

The curve x t, parametrized by t  1, is called the central path.

?

The dual central path. Points x t on the central path are characterized by

?

the optimality conditions r' t; x t = 0, i.e., using the expression 4.2,

p

1

? 1 ? 1

Tr F x t F = c ; i =1;:::;m: Tr Gx t G +

i i i

t

From this we see that the matrices

1

? ? 1 ? ? 1

4.4 W t=Gx t ; Z t= Fx t t

DETERMINANT MAXIMIZATION 19

? ? ?

are strictly dual feasible. The duality gap asso ciated with x t, W t and Z t is,

from expression 3.2,

n

? ? ? ? ? ?

Tr F x tZ t+TrGx tW t log det Gx tW t l = ;

t

?

which shows that x t converges to the solution of the max-det problem as t !1.

? ?

It can be shown that the pair W t;Z t actually lies on the dual central

path, de ned as

 

T T

W = W  0; Z = Z 0;

? ?

 W t;Z t = argmin ' t; W;Z

d

TrG W + TrF Z = c ; i =1;:::;m

i i i

where





1 1

' t; W;Z = t log det W + TrG W + Tr F Z l + log det Z :

d 0 0

The close connections b etween primal and dual central path are summarized in the

following theorem.

Theorem 4.1. If x is strictly primal feasible, and W , Z are strictly dual feasible,

then

4.5 ' t; x+' t; W;Z  n1 + log t

p d

? ? ?

with equality if and only if x = x t, W = W t, Z = Z t.

pp

T

Proof. If A = A 2 R and A 0, then log det A TrA+p by convexityof

log det A on the cone of p ositive semide nite matrices. Applying this inequality,

we nd

' t; x+' t; W;Z=tTrGxW +TrFxZ log det GxW l  log det F xZ

p d

1=2 1=2 1=2 1=2

= t log det W GxW + Tr W GxW

1=2 1=2 1=2 1=2

log det tZ F xZ + Tr tZ F xZ + n log t tl

 tl + n + n log t tl = n1 + log t:

? ? ?

The equality for x = x t, W = W t, Z = Z t can b e veri ed by substitution.

Tangent to the central path. We conclude this section by describing how the

tangent direction to the central path can b e computed. Let  x=log det Gx

1

?

and  x=log det F x. A p oint x t on the central path is characterized by

2

? ?

t  c + r x t + r x t = 0:

1 2

?

@x t

The tangent direction can b e found by di erentiating with resp ect to t:

@t

?



@x t

? 2 ? 2 ?

c + r x t + tr  x t + r  x t =0;

1 1 2

@t

so that

?



@x t

1

2 ? 2 ? ?

4.6 = tr  x t + r  x t c + r x t:

1 2 1

@t

By di erentiating 4.4, we obtain the tangent to the dual central path,

 !

m

? ?

X

@W t @x t

i

? 1 ? 1

4.7 = Gx t G Gx t ;

i

@t @t

i=1

 !

m

? ?

X

@x t @Z t 1 1

i

? 1 ? 1 ? 1

4.8 F x t : = F x t F F x t

i

2

@t t t @t

i=1

20 L. VANDENBERGHE, S. BOYD AND S.P.WU

5. Newton's metho d. In this section we consider the problem of minimizing

?

' t; x for xed t, i.e., computing x t, given a strictly feasible initial p oint:

p

minimize ' t; x

p

5.1 sub ject to Gx 0

F x 0:

This includes, as a sp ecial case, the analytic centering problem t = 1 and F x =

1. Our main motivation for studying 5.1 will b ecome clear in next section, when

we discuss an interior-p oint metho d based on minimizing ' t; x for a sequence of

p

values t.

Newton's metho d with line search can b e used to solve problem 5.1 eciently.

Newton metho d for minimizing ' t; x

p

given strictly feasible x, tolerance  0 < 0:5

rep eat



1

N 2

1. Compute the Newton direction x = r ' t; x r' t; x

p p

N 2 N 1=2

2. Compute  =x r ' t; xx 

p

N

b

3. if >0:5, compute h = argmin ' t; x + h x 

p

b

else h =1

N

b

4. Up date: x := x + h x

until   

The quantity

N 2 N 1=2

5.2  =x r ' t; xx 

p

is called the Newton decrement at x. The cost of Step 3 the line search  is very

small, usually negligible compared with the cost of computing the Newton direction;

see x8 for details.

It is well known that the asymptotic convergence of Newton's metho d is quadratic.

Nesterov and Nemirovsky in [51, x2.2] give a complete analysis of the global sp eed of

convergence. The main result of their convergence analysis applied to problem 5.1

is the following theorem.

Theorem 5.1. The algorithm terminates in fewer than

0 ?

5.3 11' t; x  ' t; x t + log log 1= 

p p

2 2

?

iterations, and when it terminates, ' t; x ' t; x t   .

p p

A self-contained pro of is given in Ref.[70].

Note that the right-hand side of 5.3 do es not dep end on the problem size i.e.,

m, n, or l  at all, and only dep ends on the problem data through the di erence

0

between the value of the function ' t;  at the initial p oint x and at the central

p

?

p oint x t.

The term log log 1= , whichischaracteristic of quadratic convergence, grows

2 2

extremely slow ly with required accuracy  . For all practical purp oses it can b e consid-

10

ered a constant, say, ve which guarantees an accuracy of  =2:33  10 . Not quite

? 0

precisely, then, the theorem says we can compute x t in at most 11' t; x 

p

?

' t; x t + 5 Newton steps. The precise statement is that within this number of p

DETERMINANT MAXIMIZATION 21

30

upp er b ound from

25

Theorem 5.1







20

15 10

 Newton steps

5

0

0

5 10 15 20 25 30

0 1 ? 1

log det Ax  log det Ax 

1 0 1

Fig. 5.1. Number of Newton iterations to minimize log det Ax versus log det Ax 

? 1 10

log det Ax  with  =2:33  10 , i.e., log log 1= =5. Random matrix completion problems

2 2

of three sizes `+': m =20; l =20,`': m = 100, l =20,`': m =20, l = 100. The dotted line

0 1 ? 1

is a least-squares t of the data and is given by 5+0:59log det Ax  log det Ax  . The

0 1 ? 1

dashed line is the upper bound of Theorem 5.1 5 + 11log det Ax  log det Ax  .

?

iterations we can compute an extremely go o d approximation of x t. In the sequel,

?

we will sp eak of `computing the central p oint x t' when we really mean computing

an extremely go o d approximation. We can justify this on several grounds. It is p os-

sible to adapt our exp osition to account for the extremely small approximation error

0 ?

incurred by terminating the Newton pro cess after 11' t; x  ' t; x t + 5

p p

steps. Indeed, the errors involved are certainly on the same scale as computer arith-

metic roundo  errors, so if a complexity analysis is to be carried out with such

precision, it should also account for roundo error.

Theorem 5.1 holds for an `implementable' version of the algorithm as well, in

which an appropriate approximate line search is used instead of the exact line search.

Numerical exp eriment. The b ound provided by Theorem 5.1 on the number

? 0

of Newton steps required to compute x t, starting from x , will play an imp ortant

role in our path-following metho d. It is therefore useful to examine how the b ound

?

compares to the actual numb er of Newton steps required in practice to compute x t.

Figure 5.1 shows the results of a numerical exp eriment that compares the actual

convergence of Newton's metho d with the b ound 5.3. The test problem is a matrix

completion problem

1

minimize log det Ax

m

X

sub ject to Ax=A + x E + E  0;

f k i j j i

k k k k

k=1

which is a particular case of 5.1 with c =0, Gx=Ax, F x = 1, and ' t; x=

p

1

log det Ax . We considered problems of three di erent sizes: m = 20, l = 20

indicated by `+'; m = 100, l = 20 indicated by`'; m = 20, l = 100 indicated by

`'. Each p oint on the gure corresp onds to a di erent problem instance, generated

as follows.

22 L. VANDENBERGHE, S. BOYD AND S.P.WU

T

 The matrices A were constructed as A = UU with the elements of U

f f

drawn from a normal distribution N 0; 1. This guarantees that x = 0 is

strictly feasible. The m index pairs i ;j , i 6= j , were chosen randomly

k k k k

with a uniform distribution over the o -diagonal index pairs. For each of the

three problem sizes, 50 instances were generated.

?

 For each problem instance, we rst computed x using x = 0 as starting p oint.

We then selected a value uniformly in the interval 0; 30, generated a

m

0

random xb 2 R with distribution N 0;I, and then computed x =

? ?

x + txb x  such that

0 1  1

log detAx  log detAx  = :

0

This p oint x was used as starting p oint for the Newton algorithm.

Our exp erience with other problems shows that the results for this family of random

problems are quite typical.

From the results we can drawtwo imp ortant conclusions.

0 1 ? 1

 The quantity log det Ax  log det Ax  not only provides an upp er

b ound on the numb er of Newton iterations via Theorem 5.1; it is also a very

good predictor of the number of iterations in practice. The dimensions m

and l on the other hand havemuch less in uence except of course, through

0 1 ? 1

log detAx  log detAx  .

 The average numb er of Newton iterations seems to growas

0 1 ? 1

+ log det Ax  log det Ax  ;

with ' 5, ' 0:6. This is signi cantly smaller than the upp er b ound of

Theorem 5.1  =5, = 11.

0 ?

In summary,we conclude that the di erence ' t; x ' t; x t is a go o d measure,

p p

?

in theory and in practice, of the e ort required to compute x t using Newton's

0

metho d, starting at x .

A computable upp er b ound on the numb er of Newton steps. Note that

?

' t; x t is not known explicitly as a function of t. To evaluate the b ound 5.3

p

?

one has to compute x t, i.e., carry out the Newton algorithm. Which, at the very

least, would seem to defeat the purp ose of trying to estimate or b ound the number of

?

Newton steps required to compute x t. Therefore the b ound of Theorem 5.1 is not

directly useful in practice. From Theorem 4.1, however, it follows that every dual

?

feasible p oint W ,Z provides a lower b ound for ' t; x t:

p

?

' t; x t ' t; W;Z+n1 + log t:

p d

? ?

and that the b ound is exact if W = W t and Z = Z t.

We can therefore replace the b ound 5.3 byaweaker, but more easily computed

b ound, provided wehave a dual feasible pair W , Z :

0 ?

11' t; x  ' t; x t + log log 1= 

p p

2 2

0

5.4  11 t; x ;W;Z + log log 1= ;

ub

2 2

where

5.5  t; x; W;Z=' t; x+' t; W;Z n1 + log t:

ub p d

This is the b ound we will use in practice and in our complexity analysis: it gives a

?

readily computed b ound on the numb er of Newton steps required to compute x t,

0

starting from x , given any dual feasible W , Z .

DETERMINANT MAXIMIZATION 23

6. Path-following algorithms. Path-following metho ds for convex optimiza-

tion have a long history. In their 1968 b o ok [34], Fiacco and McCormickwork out

many general prop erties, e.g., convergence to an optimal p oint, connections with du-

ality, etc. No attempt was made to give a worst-case convergence analysis, until

Renegar[56] proved p olynomial convergence of a path-following algorithm for linear

programming. Nesterov and Nemirovsky[51, x3] studied the convergence for nonlinear

convex problems and provided pro ofs of p olynomial worst-case complexity. See [51,

pp.379{386] and Den Hertog[28] for a historical overview.

We will presenttwovariants of a path-following metho d for the max-det problem.

The short-step version of x6.2 is basically the path-following metho d of [34,51], with

a simpli ed, self-contained complexity analysis see also Anstreicher and Fampa[4] for

avery related analysis of an interior-p oint metho d for semide nite programming. In

the long-step verion of x6.3 we combine the metho d with predictor steps to accelerate

convergence. This, to o, is a well known technique, originally prop osed by Fiacco and

McCormick; our addition is a new step selection rule.

6.1. General idea. One iteration pro ceeds as follows. The algorithm starts at

?

a p oint x t on the central path. As wehave seen ab ove, the duality gap asso ciated

? +

with x t is n=t. We then select a new value t >t, and cho ose a strictly feasible

?

starting p oint xb which may or may not be equal to x t. The p oint xb serves as

? + ? +

an approximation of x t  and is called the predictor of x t . Starting at the

? +

predictor xb, the algorithm computes x t  using Newton's metho d. This reduces

+ ? ? +

the duality gap by a factor t =t. The step from x t to x t  is called an outer

iteration.

+ +

The choice of t and xb involves a tradeo . A large value of t =t means fast

duality gap reduction, and hence fewer outer iterations. On the other hand it makes

it more dicult to nd a go o d predictor xb, and hence more Newton iterations may

? +

b e needed to compute x t .

In the metho d discussed b elow, we imp ose a b ound on the maximum number of

Newton iterations p er outer iteration, by requiring that the predictor xb and the new

+

value of t satisfy

+ + ? +

6.1 ' t ; xb ' t ;x t   :

p p

This implies that no more than 5 + 11 Newton iterations are required to compute

? +

x t  starting at xb. Of course, the exact value of the left hand side is not known,

unless we carry out the Newton minimization, but as we have seen ab ove, we can

replace the condition by

+

c b

6.2  t ; x;b W; Z= ;

ub

c b

where W and Z are conveniently chosen dual feasible p oints.

The parameters in the algorithm are >0 and the desired accuracy .

Path-following algorithm

?

given >0, t  1, x := x t

rep eat

+ + +

c b c b

1. Select t , xb, W , Z such that t >t and  t ; x;b W; Z=

ub

? +

2. Compute x t  starting at xb, using the Newton algorithm of x5

+ ? +

3. t := t , x := x t 

until n=t  

24 L. VANDENBERGHE, S. BOYD AND S.P.WU

Step 1 in this outline is not completely sp eci ed. In the next sections we will

c b

discuss in detail di erentchoices. We will show that one can always nd xb, W , Z and

+

t that satisfy

r

+

t 2

6.3  1+ :

t n

This fact allows us to estimate the total complexity of the metho d, i.e., to derivea

b ound on the total numb er of Newton iterations required to reduce the duality gap

? 0

to . The algorithm starts on the central path, at x t , with initial duality gap

0 0 +

 = n=t . Each iteration reduces the duality gap by t =t. Therefore the total

0

number of outer iterations required to reduce the initial gap of  to a nal value

b elow  is at most

& '

 

0 0

p

log  = log =

p p

 n :

log1 + 2 

log1 + 2 =n

The inequality follows from the concavity of log 1 + x. The total numb er of Newton

steps can therefore b e b ounded as

 

0

p

log =

p

Total Newton iterations d5+11 e n

log1 + 2 

p

0

6.4 = O n log  = :

p

This upp er b ound increases slowly with the problem dimensions: it grows as n, and

is indep endentofl and m. We will see later that the p erformance in practice is even

b etter.

Note that we assume that the minimization in Step 2 of the algorithm is exact.

The justi cation of this assumption lies in the very fast lo cal convergence of Newton's

metho d: wehave seen in x5 that it takes only a few iterations to improve a solution

with Newton decrement   0:5 to one with a very high accuracy.

Nevertheless, in a practical implementation as well as in a rigorous theoretical

?

analysis, one has to takeinto account the fact that x t can only b e computed ap-

proximately. For example, the stopping criterion n=t   is based on the duality gap

? ? ?

asso ciated with exactly central p oints x t, W t, and Z t, and is therefore not

?

quite accurate if x t is only known approximately. We give a suitably mo di ed cri-

terion in Ref.[70], where we show that dual feasible p oints are easily computed during

the centering step Step 2 once the Newton decrement is less than one. Using the

asso ciated duality gap yields a completely rigorous stopping criterion. We will brie y

p oint out some other mo di cations, as we develop di erentvariants of the algorithm

in the next sections; full details are describ ed in [70]. With these mo di cations, the

?

algorithm works well even when x t is computed approximately. We often use a

3

value  =10 in the Newton algorithm.

It is also p ossible to extend the simple worst-case complexity analysis to takeinto

account incomplete centering, but we will not attempt such an analysis here. For the

xed reduction algorithm describ ed immediately b elow, such a complete analysis

can b e found in Nesterov and Nemirovsky [51, x3.2].

?

c

6.2. Fixed-reduction algorithm. The simplest variant uses xb = x t, W =

? ?

b

W t, and Z = Z t in Step 1 of the algorithm. Substitution in condition 6.2

DETERMINANT MAXIMIZATION 25

gives

+

c b

 t ; x;b W; Z

ub

+ ? ? ? ? ? ?

= t TrGx tW t+TrFx tZ t log det Gx tW t l 

? ? +

log det F x tZ t n1 + log t 

+ +

6.5 = nt =t 1 logt =t = ;

+

which is a simple nonlinear equation in one variable, with a unique solution t >t.

We call this variant of the algorithm the xed-reduction algorithm b ecause it uses the

+

same value of t =t | and hence achieves a xed duality gap reduction factor | in

each outer iteration. The outline of the xed-reduction algorithm is as follows.

Fixed-reduction algorithm

?

given >0, t  1, x := x t

Find such that n 1 log =

rep eat

+

1. t := t

? +

2. Compute x t  starting at x, using the Newton algorithm of x5

+ ? +

3. t := t , x := x t 

until n=t  

We can b e brief in the convergence analysis of the metho d. Each outer iteration

reduces the duality gap by a factor , so the numb er of outer iterations is exactly

 

0

log =

:

log

The inequality 6.3, whichwas used in the complexity analysis of the previous section,

follows from the fact that for y  1

n

2

y 1 ; ny 1 log y  

2

p

and hence  1+ 2 =n.

This convergence analysis also reveals the limitation of the xed reduction metho d:

the numb er of outer iterations is never b etter than the numb er predicted by the the-

oretical analysis. The upp er b ound on the total numb er of Newton iterations 6.4 is

also a go o d estimate in practice, provided we replace the constant 5+11 with an

empirically determined estimate such as 3+0:7 see Figure 5.1. The purp ose of

the next section is to develop a metho d with the same worst-case complexity as the

xed-reduction algorithm, but a much b etter p erformance in practice.

6.3. Primal-dual long-step algorithm. It is p ossible to use much larger val-

+

ues of t =t, and hence achieve larger gap reduction per outer iteration, by using a

c b

b etter choice for xb, W , and Z in Step 1 of the path-following algorithm.

A natural choice for xb is to take a p oint along the tangent to the central path,

i.e.,

?

@x t

?

xb = x t+p ; @t

26 L. VANDENBERGHE, S. BOYD AND S.P.WU

for some p>0, where the tangent direction is given by 4.6. Substitution in 6.2

+

gives a nonlinear equation from which t and p can b e determined. Taking the idea

c b

one step further, one can allow W and Z to vary along the tangent to the dual central

path, i.e., take

? ?

@W t @Z t

? ?

c b

W = W t+q ; Z = Z t+q

@t @t

for some q>0, with the tangent directions given by 4.7 and 4.8. Equation 6.2

+

then has three unknowns: t , the primal step length p, and the dual step length q .

+

The xed-reduction up date of previous section uses the solution t = t, p = q =0;

+

an ecient metho d for nding a solution with larger t is describ ed b elow.

The outline of the long-step algorithm is as follows.

Primal-dual long-step algorithm

? ? ?

given >0, t  1, x := x t, W := W t, Z := Z t

Find such that n 1 log =

rep eat

? ? ?

@W t @Z t @x t

, W := , Z := 1. Compute tangent to central path. x :=

@t @t @t

2. Parameter selection and predictor step.

+

2a. t := t

rep eat f

+

2b. pb; qb = argmin  t ;x + p x; W + qW;Z + qZ 

ub

p;q

+ +

2c. Compute t from  t ;x + pb x; W + qWb ;Z + qZb =

ub

g

2d. xb = x + pbx

? +

3. Centering step. Compute x t  starting at xb, using the Newton algorithm of x5

+ ? + ? + ? +

4. Update. t := t , x := x t , W := W t , Z := Z t 

until n=t  

Again we assume exact centering in Step 3. In practice, approximate minimiza-

tion works, provided one includes a small correction to the formulas of the tangent

directions; see Ref.[70].

Step 2 computes a solution to 6.2, using a technique illustrated in Figure 6.1.

The gure shows four iterations of the inner lo op of Step 2 for an instance of the

+

problem family describ ed in x9. With a slight abuse of notation, we write  t ;p;q

ub

instead of

+ ? ? ?

6.6  t ;x t+p x; W t+qW;Z t+qZ :

ub

0

We start at the value t = t, at the left end of the horizontal axis. The rst

+ +

curve marked  t ; 0; 0 shows 6.6 as a function of t , with p = q = 0, which

ub

simpli es to

+ ? ? ? + +

 t ;x t;W t;Z t = nt =t 1 logt =t

ub

+

see x6.2. This function is equal to zero for t = t, and equal to for the short-step

+

up date t = t. We then do the rst iteration of the inner lo op of Step 2. Keeping

+ 1

t xed at its value t ,we minimize the function 6.6 over p and q Step 2b. This

1 1

pro duces new values pb = p and qb = q with a value of  < . This allows us to ub

DETERMINANT MAXIMIZATION 27

+ 2 2

 t ;p ;q 

ub



+

+ 1 1

 t ; 0; 0

 t ;p ;q 

ub

ub



+ 3 3

C

C

 t ;p ;q 

ub



CW CW 



ub



0

4

0 1 2 3

t = t t

t = t t t

+

t

Fig. 6.1. Parameter selection and predictor step in long-step algorithm alternates between

+ +

minimizing  t ;p;q over primal step length p and dual step length q , and then increasing t

ub

+

until  t ;p;q= .

ub

+ + 1 1

increase t again Step 2c. The second curve in the gure lab eled  t ;p ;q 

ub

+ 1 1

shows the function 6.6 as a function of t with xed values p = p , q = q . The

+ 2

intersection with  = gives the next value t = t .

ub

These two steps 2b, 2c are rep eated either for a xed numb er of iterations or

+

until t converges which in the example of Figure 6.1 happ ens after four or ve

+

iterations. Note that in each step 2c, we increase t , so that in particular, the nal

+ +

value of t will b e at least as large as its initial short-step value, t = t. Thus,

the complexity analysis for the short-step metho d still applies.

+

In practice, the inner lo op 2b, 2c often yields a value of t considerably larger

than the short-step value t, while maintaining the same upp er b ound on the number

? +

of Newton step required to compute the next iterate x t . In the example shown

+

in the gure, the nal value of t is ab out a factor of 2:5 larger than the short-step

value; in general, a factor of 10 is not uncommon.

Using some prepro cessing we will describ e in x8, the cost of the inner lo op 2b,

2c is very small, in most cases negligible compared with the cost of computing the

tangentvectors.

Finally, note that the dual variables Z and W are not used in the xed-reduction

algorithm. In the primal-dual long-step algorithm they are used only in the predictor

step to allow a larger step size p.

?

7. Preliminary phases. The algorithm starts at a central p oint x t, for some

t  1. In this section we discuss how to select the initial t, and how to compute such

a p oint.

Feasibility. If no strictly primal feasible p oint is known, one has to precede

the algorithm with a rst phase to solve the SDP feasibility problem: Find x that

satis es Gx > 0, F x > 0. More details can b e found in [69].

Choice of initial t. Wenow consider the situation where a strictly primal fea-

0 0

sible p oint x is known, but x is not on the central path. In that case one has

to select an appropriate initial value of t and compute a central p oint by Newton's

28 L. VANDENBERGHE, S. BOYD AND S.P.WU

0

metho d starting at x . In theory and often in practice the simple choice t = 1

works.

It is not hard, however, to imagine cases where the choice t =1would b e inecient

0 ?

in practice. Supp ose, for example, that the initial x is very near x 100, so a

reasonable initial value of t is 100 but we don't know this. If we set t = 1, we

exp end many Newton iterations `going backwards' up the central path towards the

?

p oint x 1. Several outer iterations, and many Newton steps later, we nd ourselves

?

back near where we started, around x 100.

0 0

If strictly dual feasible p oints W , Z are known, then we start with a known

0 0 0

duality gap asso ciated with x , W and Z . Avery reasonable initial choice

for t is then t = maxf1; n= g, since when t = n= , the centering stage computes

central p oints with the same duality gap as the initial primal and dual solutions. In

particular, the preliminary centering stage do es not increase the duality gap as it

would in the scenario sketched ab ove.

We can also interpret and motivate the initial value t = n= in terms of the

0 0 0

function  t; x ;W ;Z , which provides an upp er b ound on the number of

ub

? 0

Newton steps required to compute x t starting at x . From the de nition 5.5 we

have

1

0 0 0 0 1 0

 t; x ;W ;Z =t + log det F x  + log det Z n1 + log t;

ub

0 0 0

which shows that the value t = n= minimizes  t; x ;W ;Z . Thus, the value

ub

t = n= is the value which minimizes the upp er b ound on the numb er of Newton steps

required in the preliminary centering stage.

A heuristic preliminary stage. When no initial dual feasible Z , W and hence

duality gap are known, cho osing an appropriate initial value of t can b e dicult. We

have had practical success with a variation on Newton's metho d that adapts the value

of t at each step based on the square of  the Newton decrement x; t,



1

2 T 2

x; t = r' t; x r ' t; x r' t; x;

p p p

which serves as a measure of proximity to the central path. It is a convex function of

t, and is readily minimized in t for xed x. Our heuristic preliminary phase is:

Preliminary centering phase

given strictly feasible x

t := 1

rep eat f

1. t := maxf1; argmin x; tg



1

2

2. x = r ' t; x r' t; x

p p

N

b

3. h = argmin ' t; x + h x 

p

g until   

Thus, we adjust t each iteration to make the Newton decrement for the current

x as small as p ossible sub ject to the condition that t remains greater than 1.

8. Ecient line and plane searches. In this section we describ e some simple

prepro cessing that allows us to implement the line search in the Newton metho d of

x5 and the plane searchof x6.3 very eciently.

DETERMINANT MAXIMIZATION 29

Line search in Newton's metho d. We rst consider the line search in New-

ton's metho d of x5. Let  , k =1;:::;l, be the generalized eigenvalues of the pair

k

P

m

N

x G , Gx, and  , k = l +1;:::;l + n, be the generalized eigenvalues of

i k

i

i=1

P

m

N N

x F , F x, where x is the Newton direction at x. We can write the pair

i

i

i=1

N

' t; x + h x  in terms of these eigenvalues as

p

l l+n

X X

1 1

N T N

f h=' t; x + h x =' t; x+hc x + t log + log :

p p

1+h 1+h

k k

k=1 k =l+1

0 00

Evaluating the rst and second derivatives f h, f h of this convex function

of h 2 R requires only O n + l  op erations once the generalized eigenvalues 

i

have b een computed. In most cases, the cost of the prepro cessing, i.e., computing

the generalized eigenvalues  , exceeds the cost of minimizing over h, but is small

i

compared with the cost of computing the Newton direction. The function ' t; x +

p

N

h x  can therefore b e eciently minimized using standard line search techniques.

Plane search in long-step path-following metho d. A similar idea applies

to the plane search of x6.3. In Step 2c of the primal-dual long-step algorithm we

minimize the function  t; x + p x; W + qW;Z +qZ over p and q , where x, W ,

ub

Z are tangent directions to the central path. We can again reduce the function to a

convenient form

 t; x + p x; W + qW;Z + qZ 

ub

l+n l

X X

1 1

+ log =  t; x; W;Z+p + q + t log

ub 1 2

1+p 1+p

k k

k =l+1 k=1

l l+n

X X

1 1

+ t log 8.1 + log ;

1+q 1+q

k k

k =1 k=l+1

P

m

where  , k =1;:::;l, are the generalized eigenvalues of the pair x G , Gx

k i i

i=1

P

m

and  , k = l +1;:::;l + n, are the generalized eigenvalues of the pair x F ,

k i i

i=1

F x;  , k =1;:::;l, are the generalized eigenvalues of the pair W , W , and  ,

k k

k = l +1;:::;l+n, are the generalized eigenvalues of the pair Z , Z . The co ecients

and are

1 2

T

= c x; = TrG W + TrF Z:

1 2 0 0

The rst and second derivatives of the function 8.1 with resp ect to p and q can

again b e computed at a low cost of O l + n, and therefore the minimum of  over

ub

the plane can b e determined very cheaply, once the generalized eigenvalues have b een

computed.

In summary, the cost of line or plane search is basically the cost of prepro cessing

computing certain generalized eigenvalues, which is usually negligible compared to

the rest of algorithm e.g., determining a Newton or tangent direction.

One implication of ecient line and plane searches is that the total number of

Newton steps serves as a go o d measure of the overall computing e ort.

9. Numerical examples.

30 L. VANDENBERGHE, S. BOYD AND S.P.WU

0 0

10 10

1 1

10 10

2 2

10 10

3 3

10 10

4 4

10 10

y gap y gap

5 5

10 10

6 6

10 10

dualit dualit

7 7

10 10

8 8

10 10

9 9

10 10

15 30 15 30

0 5 10 20 25 35 40 0 5 10 20 25 35 40

Newton iterations Newton iterations

Fig. 9.1. Duality gap versus number of Newton steps for randomly generated max-det problems

of dimension l =10, n=10, m =10. Left: =10. Right: =50. The crosses are the results for

the xed-reduction method; the circles are the results for the long-step method. Every cross/circle

represents the gap at the end of an outer iteration.

Typical convergence. The rst exp eriment Figure 9.1 compares the conver-

gence of the xed-reduction metho d and the long-step metho d. The lefthand plot

shows the convergence of b oth metho ds for = 10; the righthand plot shows the con-

vergence for = 50. Duality gap is shown vertically on a logarithmic scale ranging

0 9

from 10 at the top to 10 at the b ottom; the horizontal axis is the total number of

Newton steps. Each outer iteration is shown as a symb ol on the plot `' for the long-

step and `' for the short-step metho d. Thus, the horizontal distance b etween two

consecutive symb ols shows directly the numb er of Newton steps required for that par-

ticular outer iteration; the vertical distance shows directly the duality gap reduction

+

factor t =t.

ll nn

Problem instances were generated as follows: G 2 R , F 2 R were chosen

0 0

T

random p ositive de nite constructed as U U with the elements of U drawn from a

normal distribution N 0; 1; the matrices G , F , i =1;:::;m, were random sym-

i i

metric matrices, with elements drawn from N 0; 1; c = Tr G + Tr F , i =1;:::;m.

i i i

This pro cedure ensures that the problem is primal and dual feasible x = 0 is primal

feasible; Z = I , W = I is dual feasible, and hence b ounded b elow. We start on the

central path, with initial duality gap one.

We can make the following observations.

 The convergence is very similar over all problem instances. The number of

iterations required to reduce the duality gap by a factor 1000 ranges b etween

5 and 50. As exp ected, the long-step metho d p erforms much b etter than the

xed-reduction metho d, and typically converges in less than 15 iterations.

 The xed-reduction metho d converges almost linearly. The duality gap re-

+ +

duction t =t p er outer iteration can b e computed from equation 6.5: t =t =

+

3:14 for = 10, and t =t =8:0 for = 50. The numb er of Newton iterations

p er outer iteration is less than ve in almost all cases, whichismuch less than

the upp er b ound 5+11 . Rememb er that this b ound is a combination of

two conservative estimates: Theorem 5.1 is conservative; see Figure 5.1. In

addition wehave replaced 6.1 with the weaker condition 6.2.

 The long-step metho d takes a few more Newton iterations p er centering step,

but achieves a much larger duality gap reduction. Moreover the convergence

accelerates near the optimum.

 Increasing has a large e ect on the xed-reduction metho d, but only little

DETERMINANT MAXIMIZATION 31

50 50 50

40 40 40

30 30 30

20 20 20

10 10 10

Newton iterations

0 0 0

20 100 20 100 20 100

0 40 60 80 0 40 60 80 0 40 60 80

n l m

Fig. 9.2. Newton iterations versus problem size for family of random problems. Fixedreduction

method top curve and long-step method lower curve. =10. Left. l =10, n=10{100, m =10.

Middle. l =10{100, n =10, m =10. Right. l =10, n=10, m=10{100. The curves give the

average over 10 problem instances. The error bars indicate the standard deviation.

80

60

40 20

Newton iterations

0

15 55

25 35 45

M

Fig. 9.3. Newton iterations versus problem size for family of experiment design problems of x2.4

including 90-10 rule. Fixed reduction method top curve and long-step method lower curve.

=10, p=10, M =15{50. The curves show the average over 10 problem instances. The error

bars indicate the standard deviation.

e ect on the long-step metho d.

Complexityversus problem size. Figure 9.2 shows the in uence of the prob-

lem dimension on the convergence. For each triplet m, n, l we generated 10 problem

instances as ab ove. We plot the numb er of Newton iterations to reduce the duality

gap by a factor 1000, starting with duality gap 1. The plot shows the average number

of Newton steps and the standard deviation. The top curve shows the results for the

xed-reduction metho d, the lower curve is for the long-step metho d.

 The numb er of Newton iterations in the short-step metho d dep ends on n as

p

O  n. This is easily explained from the convergence analysis of x6.2: We

p

have seen that the numb er of outer iterations grows as n, in theory and in

practice, and hence the practical b ehavior of the xed-reduction metho d is

very close to the worst case b ehavior.

 We see that the numb er of iterations for the long-step metho d lies b etween 5

and 20, and is very weakly dep endent on problem size.

Figure 9.3 shows similar results for a family of exp eriment design problems 2.9

10

i

in R , including a 90-10 constraint 2.10. The p oints v , i = 1;:::;M, were

p

generated from a normal distribution N 0;I on R . Note that the dimensions of

the corresp onding max-det problem are m = 2M , n = 3M +1, l = p. Figure 9.3

con rms the conclusions of the previous exp eriment: It shows that the complexity

p

of the xed-reduction metho d grows as n, while the complexity of the long-step

metho d is almost indep endent of problem size.

10. Conclusion. The max-det problem 1.1 is a quite sp eci c convex exten-

sion of the semide nite programming problem, and hence includes a wide varietyof

32 L. VANDENBERGHE, S. BOYD AND S.P.WU

convex optimization problems as sp ecial cases. Perhaps more imp ortantly, max-det

problems arise naturally in many areas, including computational geometry, linear

algebra, exp eriment design, linear estimation, and information and communication

theory. Wehave describ ed several of these applications.

Some of the applications have b een studied extensively in the literature, and in

some cases analytic solutions or ecient sp ecialized algorithms have b een develop ed.

Wehave presented an interior-p oint algorithm that solves general max-det problems

eciently. The metho d can b e applied to solve max-det problems for which no sp ecial-

ized algorithm is known; in cases where such a metho d exists, it op ens the p ossibility

of adding useful LMI constraints, which is an imp ortant advantage in practice.

p

Wehave proved a worst-case complexityof O n Newton iterations. Numerical

exp eriments indicate that the b ehavior is much b etter in practice: the metho d typi-

cally requires a numb er of iterations that lies b etween 5 and 50, almost indep endently

of problem dimension. The total computational e ort is therefore determined by the

amount of work per iteration, i.e., the computation of the Newton directions, and

therefore dep ends heavily on the problem structure. When no structure is exploited,

the Newton directions can b e computed from the least-squares formulas in [70], which

2 2 2

requires O n + l m  op erations, but imp ortantsavings are p ossible whenever we

sp ecialize the general metho d of this pap er to a sp eci c problem class.

Acknowledgments. Wehave many p eople to thank for useful comments, sug-

gestions, and p ointers to references and applications including some applications that

did not make it into the nal version. In particular, we thank Paul Algo et, Kurt

Anstreicher, Eric Beran, Patrick Dewilde, Valerii Fedorov, Gene Golub, Bijit Halder,

Bill Helton, Istvan Kollar, Lennart Ljung, Tomas MacKelvey, Erik Ordentlich, Art

Owen, Randy Tobias, Rick Wesel, and Henry Wolkowicz. We also thank the two

reviewers and Michael Overton for very useful comments.

REFERENCES

[1] F. Alizadeh, Interior point methods in semide nite programming with applications to combi-

natorial optimization, SIAM Journal on Optimization, 5 1995, pp. 13{51.

[2] T. W. Anderson, Statistical inference for covariance matrices with linear structure, in Mul-

tivariate Analysis I I. Pro ceedings of the Second International Symp osium on Multivariate

Analysis held at Wright State University,Dayton, Ohio, June 17-21, 1968, P. R. Krishnaiah,

ed., Academic Press, New York, 1969, pp. 55{66.

, Estimation of covariance matrices which are linear combinations or whose inverses are [3]

linear combinations of given matrices, in Essays in Probability and Statistics, R. C. Bose

et al., eds., University of North Carolina Press, 1970, pp. 1{24.

[4] K. M. Anstreicher and M. Fampa, A long-step path fol lowing algorithm for semide nite

programming problems, tech. rep ort, Department of Management Sciences, Universityof

Iowa, Jan. 1996.

[5] K. M. Anstreicher, M. Fampa, J. Lee, and J. Williams, Continuous relaxations for con-

strained maximum-entropy sampling, in Integer Programming and Combinatorial Opti-

mization, W. H. Cunningham, S. T. McCormick, and M. Queyranne, eds., no. 1084 in

Lecture Notes in Computer Science, Springer Verlag, 1996, pp. 234{248.

[6] J. T. Aslanis and J. M. Cioffi, Achievable information rates on digital subscriber loops:

limiting information rates with crosstalk noise, IEEE Transactions on Communications, 40

1992, pp. 361{372.

[7] A. C. Atkinson and A. N. Donev, Optimum experiment designs, Oxford Statistical Science

Series, Oxford University Press, 1992.

[8] M. Bakonyi and H. J. Woerdeman, Maximum entropy elements in the intersection of an

ane space and the cone of positive de nite matrices, SIAM J. on Matrix Analysis and Applications, 16 1995, pp. 360{376.

DETERMINANT MAXIMIZATION 33

[9] E. R. Barnes, An algorithm for separating patterns by el lipsoids, IBM Journal of Research

and Development, 26 1982, pp. 759{764.

[10] W. W. Barrett, C. R. Johnson, and M. Lundquist, Determinantal formulation for matrix

completions associated with chordal graphs, and Appl., 121 1989, pp. 265{

289.

[11] D. Bertsekas and I. Rhodes, Recursive state estimation for a set-membership description of

uncertainty, IEEE Trans. Aut. Control, AC-16 1971, pp. 117{128.

[12] S. Boyd and L. El Ghaoui, Methodofcenters for minimizing generalized eigenvalues, Linear

Algebra and Applications, sp ecial issue on Numerical Linear Algebra Metho ds in Control,

Signals and Systems, 188 1993, pp. 63{111.

[13] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Inequalities in Sys-

tem and Control Theory,vol. 15 of Studies in Applied Mathematics, SIAM, Philadelphia,

PA, June 1994.

[14] S. Boyd and L. Vandenberghe, Introduction to convex optimization with engineering ap-

plications. Lecture Notes, Information Systems Lab oratory. Stanford University, 1995.

http://www-isl.stanford. edu/ peo ple/ boy d/39 2/e e392 x.ht ml.

[15] J. P. Burg, D. G. Luenberger, and D. L. Wenger, Estimation of structured covariance

matrices, Pro ceedings of the IEEE, 30 1982, pp. 963{974.

[16] F. L. Chernousko, Guaranteed estimates of undetermined quantities by means of el lipsoids,

Soviet Math. Doklady, 21 1980, pp. 396{399.

, State Estimation for Dynamic Systems,CRC Press, Bo ca Raton, Florida, 1994. [17]

[18] M. Cheung, S. Yurkovich, and K. M. Passino, An optimal volume el lipsoid algorithm for

parameter estimation, IEEE Trans. Aut. Control, AC-38 1993, pp. 1292{1296.

[19] D. Cook and V. Fedorov, Constrained optimization of experimental design, Statistics, 26

1995, pp. 129{178.

[20] T. M. Cover and S. Pombra, Gaussian feedback capacity, IEEE Transactions on Information

Theory, 35 1989, pp. 37{43.

[21] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 1991.

[22] C. Davis, W. M. Kahan, and H. F. Weinberger, Norm-preserving dilations and their appli-

cations to optimal error bounds, SIAM J. Numerical Anal., 19 1982, pp. 445{469.

[23] J. Deller, Set membership identi cation in digital signal processing, IEEE Trans. Acoust.,

Sp eech, Signal Pro cessing, 6 1989, pp. 4{20.

[24] J. R. Deller, M. Nayeri, and S. F. Odeh, Least-square identi cation with error bounds for

real-time signal processing and control, Pro ceedings of the IEEE, 81 1993, pp. 815{849.

[25] A. Dembo, The relation between maximum likelihood estimation of structuredcovariance ma-

trices and periodograms, IEEE Transactions on Acoustics, Sp eech and Signal Pro cessing,

34 1986, pp. 1661{1662.

[26] A. Dembo, C. L. Mallows, and L. A. Shepp, Embedding nonnegative de nite Toeplitz matri-

ces in nonnegative de nite circulant matrices, with application to covariance estimation,

IEEE Transactions on Information Theory, 35 1989, pp. 1206{1212.

[27] A. P. Dempster, Covariance selection, Biometrics, 28 1972, pp. 157{175.

[28] D. den Hertog, Interior Point Approach to Linear, Quadratic and Convex Programming,

Kluwer, 1993.

[29] P. Dewilde and E. F. A. Deprettere, The generalized Schur algorithm: approximations and

hierarchy, in Topics in Op erator Theory and Interp olation, I. Gohb erg, ed., Birkhauser 

Verlag, 1988, pp. 97{116.

[30] P. Dewilde and Z. Q. Ning, Models for Large Integrated Circuits, Kluwer, 1990.

[31] H. Dym and I. Gohberg, Extensions of band matrices with band inverses, Linear Algebra and

Appl., 36 1981, pp. 1{24.

[32] L. El Ghaoui, Robustness analysis of semide nite programs and applications to matrix com-

pletion problems, in Pro ceedings of MTNS-96, 1996.

[33] V. V. Fedorov, Theory of Optimal Experiments, Academic Press, 1971.

[34] A. Fiacco and G. McCormick, Nonlinear programming: sequential unconstrained minimiza-

tion techniques, Wiley, 1968. Reprinted 1990 in the SIAM Classics in Applied Mathematics

series.

[35] E. Fogel, System identi cation via membership set constraints with energy constrained noise,

IEEE Trans. Aut. Control, AC-24 1979, pp. 752{757.

34 L. VANDENBERGHE, S. BOYD AND S.P.WU

[36] E. Fogel and Y. Huang, On the value of information in system identi cation- bounded noise

case, Automatica, 18 1982, pp. 229{238.

[37] R. Grone, C. R. Johnson, E. M. Sa, and H. Wolkowicz, Positive de nite completions of

partial Hermitian matrices, Linear Algebra and Appl., 58 1984, pp. 109{124.

[38] J. W. Helton and H. J. Woerdeman, Symmetric Hankel operators: Minimal norm extensions

and eigenstructures, Linear Algebra and Appl., 185 1993, pp. 1{19.

[39] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I,

vol. 305 of Grundlehren der mathematischen Wissenschaften, Springer-Verlag, New York,

1993.

[40] G. B. Javorzky, I. Kollar, L. Vandenberghe, S. Boyd, and S.-P. Wu, Optimal excitation

signal design for frequency domain system identi cation using semide nite programming,

in Pro ceedings of the 8th IMEKO TC4 Symp osium on Recent Advances in Electrical Mea-

surements, Budap est, Hungary, 1996.

[41] C. R. Johnson, Positive de nite completions: a guide to selected literature, in Signal Pro cess-

ing. Part I: Signal Pro cessing Theory, L. Auslander, T. Kailath, and S. Mitter, eds., The

IMA volumes in mathematics and its applications, Springer, 1990, pp. 169{188.

[42] C. R. Johnson, B. Kroschel, and H. Wolkowicz, An interior-point method for approxi-

mate positive semide nite completions,Tech. Rep ort CORR Rep ort 95-11, Universityof

Waterlo o, 1995. to app ear in Computational Optimization and Applications.

[43] L. G. Khachiyan and M. J. Todd, On the complexity of approximating the maximal inscribed

el lipsoid for a polytope, Mathematical Programming, 61 1993, pp. 137{159.

[44] C. W. Ko, J. Lee, and K. Wayne, Aspectral bound for D -optimality, tech. rep ort, Department

of Mathematics. UniversityofKentucky, 1996.

[45] J. Lee, Constrained maximum entropy sampling, tech. rep ort, Department of Mathematics,

UniversityofKentucky, Lexington, Kentucky 40506-0027, 1995. to app ear in Operations

Research.

[46] A. S. Lewis, Convex analysis on the Hermitian matrices, SIAM J. on Optimization, 6 1996,

pp. 164{177.

[47] A. S. Lewis and M. L. Overton, Eigenvalue optimization, Acta Numerica, 1996, pp. 149{

190.

[48] D. G. Luenberger, Optimization By Methods, John Wiley & Sons, New York,

N. Y., 1969.

[49] M. E. Lundquist and C. R. Johnson, Linearly constrainedpositive de nite completions, Lin-

ear Algebra and Appl., 150 1991, pp. 195{207.

[50] G. Naevdal and H. J. Woerdeman, Partial matrix contractions and intersections of matrix

bal ls, Linear Algebra and Appl., 175 1992, pp. 225{238.

[51] Y. Nesterov and A. Nemirovsky, Interior-point polynomial methods in convex programming,

vol. 13 of Studies in Applied Mathematics, SIAM, Philadelphia, PA, 1994.

[52] J. P. Norton, An Introduction to Identi cation, Academic Press, 1986.

, Identi cation and application of bounded-parameter models, Automatica, 23 1987, [53]

pp. 497{507.

[54] L. Pronzato and E. Walter, Minimum-volume el lipsoids containing compact sets: Applica-

tion to parameter bounding, Automatica, 30 1994, pp. 1731{1739.

[55] F. Pukelsheim, Optimal Design of Experiments, Wiley, 1993.

[56] J. Renegar, Apolynomial-time algorithm, based on Newton's method, for linear programming,

Mathematical Programming, 40 1988, pp. 59{93. North-Holland Press.

[57] R. T. Rockafellar, Convex Analysis, Princeton Univ. Press, Princeton, second ed., 1970.

[58] J. B. Rosen, Pattern separation by convex programming, Journal of

and Applications, 10 1965, pp. 123{134.

[59] P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, Wiley, 1987.

[60] S. S. Sapatnekar, Aconvex programming approach to problems in VLSI design, PhD thesis,

Co ordinated Science Lab oratory, University of Illinois at Urbana-Campaign, Aug. 1992.

[61] L. L. Scharf, Statistical Signal Processing, Addison-Wesley, 1991.

[62] F. C. Schweppe, Recursive state estimation: Unknown but bounded errors and system inputs,

IEEE Trans. Aut. Control, AC-13 1968, pp. 22{28.

[63] , Uncertain dynamic systems, Prentice-Hall, Englewo o d Cli s, N.J., 1973.

DETERMINANT MAXIMIZATION 35

[64] G. Sonnevend, An `analytical centre' for polyhedrons and new classes of global algorithms

for linear smooth, convex programming, in Lecture Notes in Control and Information

Sciences, vol. 84, Springer-Verlag, 1986, pp. 866{878.

[65] , Applications of analytic centers, in Numerical Linear Algebra, Digital Signal Pro cessing,

and Parallel Algorithms, G. Golub and P. V. Do oren, eds., vol. F70 of NATO ASI Series,

1991, pp. 617{632.

[66] S. Tarasov, L. Khachiyan, and I. Erlikh, The method of inscribed el lipsoids,Soviet Math.

Dokl., 37 1988, pp. 226{230. 1988 American Mathematical So ciety.

[67] D. M. Titterington, Optimal design: some geometric aspects of D -optimality, Biometrika,

62 1975, pp. 313{320.

[68] L. Vandenberghe and S. Boyd, Connections between semi-in nite and semide nite pro-

gramming, in Pro ceedings of the International Workshop on Semi-In nite Programming,

R. Reemtsen and J.-J. Rueckmann, eds., 1996. To app ear.

[69] , Semide nite programming, SIAM Review, 38 1996, pp. 49{95.

[70] L. Vandenberghe, S. Boyd, and S.-P. Wu, Determinant maximization with linear matrix

inequality constraints, tech. rep ort, Information Systems Lab oratory. Stanford University,

Feb. 1996.

[71] E. Walter and H. Piet-Lahanier, Estimation of parameter bounds from bounded-error data:

a survey, Mathematics and Computers in Simulation, 32 1990, pp. 449{468.

[72] P. Whittle, Optimization over Time. Dynamic Programming and Stochastic Control, John

Wiley & Sons, 1982.

[73] A. Wilhelm, Computing optimal designs by bund le trust methods,Tech. Rep ort 314, Institut 

fur Mathematik. Universitat Augsburg, 1994.

[74] , Algorithms for di erentiable and nondi erentiable experimental design problems, Pro-

ceedings SoftStat, 1995.

[75] H. S. Witsenhausen, Sets of possible states of linear systems given perturbed observations,

IEEE Trans. Aut. Control, 13 1968, pp. 556{558.

[76] S.-P. Wu, L. Vandenberghe, and S. Boyd, maxdet: Software for Determinant Maximization

Problems. User's Guide, Alpha Version., Stanford University, Apr. 1996.