Moment Problems and Semidenite Optimization

y z

Dimitris Bertsimas Ioana Pop escu Jay Sethuraman

April

Abstract

Problems involving moments of random variables arise naturally in many areas of

mathematics economics and op erations research Howdowe obtain optimal b ounds

on the probability that a b elongs in a set given some of its moments

Howdowe price nancial derivatives without assuming any mo del for the underlying

price dynamics given only moments of the price of the underlying asset Howdoweob

tain stronger relaxations for sto chastic optimization problems exploiting the knowledge

that the decision variables are moments of random variables Can we generate near

optimal solutions for a discrete optimization problem from a semidenite relaxation by

interpreting an optimal solution of the relaxation as a covariance matrix In this pap er

we demonstrate that convex and in particular semidenite optimization metho ds lead

to interesting and often unexp ected answers to these questions

Dimitris Bertsimas Sloan Scho ol of Management and Op erations ResearchCenter Rm E MIT

Cambridge MA db ertsimmitedu Research supp orted by NSF grant DMI and the

Singap oreMIT alliance

y

Ioana Pop escu Decision Sciences INSEAD Fontainebleau Cedex FRANCE

ioanap op escuin seadfr

z

Jay Sethuraman Columbia University DepartmentofIEORNewYork NY jayccolumbiaedu

Intro duction

Problems involving moments of random variables arise naturally in many areas of mathe

matics economics and op erations research Let us give some examples that motivate the

present pap er

Moment problems in

The problem of deriving b ounds on the probability that a certain random variable b elongs

in a set given information on some of the moments of this random variable has a rich

history whichisvery much connected with the development of probability theory in the

twentieth century The inequalities due to Markov Chebyshev and Cherno are some of

the classical and widely used results of mo dern probability theory Natural questions arise

however

a Are such bounds best possible ie do there exist distributions that match them A

concrete and simple question in the univariate case Is the Chebyshev inequality best

possible

b Can such bounds be generalized in multivariate settings

c Can we develop a general theory based on optimization methods to address moment

problems in probability theory

Moment problems in nance

A central question in nancial economics is to nd the price of a derivative security given

information on the underlying asset This is exactly the area of the Nob el prize in

economics to Rob ert Merton and Myron Scholes Under the assumption that the price

of the underlying asset follows a geometric Brownian motion and using the noarbitrage

assumption the Blac kScholes formula provides an explicit and insightful answer to this

question Natural questions arise however Making no assumptions on the underlying

price dynamics but only using the noarbitrage assumption

a What arethebest possible bounds for the price of a derivative security based on the

rst and second moments of the price of the underlying asset

b How can we derive optimal bounds on derivative securities that arebased on multiple

underlying assets given the rst two moments of the asset prices and their correla

tions

c Conversely given observable option prices what arethebest bounds that we can derive

on the moments of the underlying asset

d Final ly given observable option prices what arethebest bounds that we can derive on

prices of other derivatives on the same asset

Moment problems in sto chastic optimization

Scheduling a multiclass queueing network is a central problem in sto chastic optimization

Queueing networks represent dynamic and sto chastic generalizations of job shops and have

b een used in the last thirtyyears to mo del communication computer and manufacturing

systems The central optimization problem is to nd a scheduling p olicy that optimizes a

p erformance cost function c x d y where x x x x is the mean number of

N j

jobs of class j y y y and y is the second moment of the numb er of jobs of

N j

class j and c d are N v ectors of nonnegative constants The design of optimal p olicies is

EX P T I M Ehard Papadimitiou and Tsitsiklis ie it provably requires exp onential

time as P EX P T I M E A natural question that arises

Can we nd strong lower bounds eciently exploiting the fact that the performancevectors

represent moments of random variables

Moment problems in discrete optimization

The development of semidenite relaxations in recentyears represents an imp ortant ad

vance in discrete optimization In several problems semidenite relaxations are provably

closer to the discrete optimization solution value Go emans and Williamson for the

maxcut problem for example than linear ones The pro of of closeness of the semidenite

relaxation to the discrete optimization solution value involves a randomized argument that

exploits the geometry of the semidenite relaxation A key question arises

Is there a general methodofgenerating near optimal integer solutions starting from an op

timal solution of the semidenite relaxation

We will see that the interpretation of the solution of the semidenite relaxation as a co

matrix for a collection of random variables leads to such a metho d and connects

moment problems and discrete optimization

The central message in this survey pap er is to demonstrate that convex and in partic

ular semidenite optimization metho ds giveinteresting and often unexp ected answers to

moment problems arising in probability economics and op erations research We also rep ort

new computational results in the area of sto chastic optimization that show the eectiveness

of semidenite relaxations

The key connection

The key connection b etween moment problems and semidenite optimization is centered in

the notion of a feasible moment sequenceLetk k k bea vector of nonnegative

n

integers

is a feasible n k moment vector or se Denition Asequence

k k k k

n

n

quence if there is a multivariate random variable X X X with domain R

n

k

k

n

whose moments are given by k k k Wesay X thatis E X

n k

n

that any such multivariate random variable X has a feasible distribution and denote this

as X

We denote by M Mn k the set of feasible n k momentvectors

The univariate case

For the univariate case n the problem of deciding if M M M isa

k

feasible kmomentvector is the classical moment problem which has b een completely

characterized by necessary and sucient conditions see Karlin and Shapley Akhiezer

Siu Sengupta and Lind and Kemp erman

Theorem a Nonnegative random variables The vector M M is a

n

feasible n R moment sequence if and only if the fol lowing matrices are semide

nite

M M

n

C B

C B

C B

M M M

n

C B

R

C B

n

C B

C B

A

M M M

n n n

M M M

n

C B

C B

C B

M M M

n

C B

R

C B

n

C B

C B

A

M M M

n n n

b Arbitrary random variables The vector M M M is a feasible n R

n

moment sequence if and only if R

n

Pro of

We will only show the necessity ofpartbIfM M M is a feasible n R

n

moment sequence then there exists a probability measure f xsuch that

Z

k

x f xdx M k n

k

n

where M Consider the vector x xx x and the semidenite matrix

xx Since f x is nonnegative the matrix

Z

R xx f xdx

n

should b e semidenite

The multivariate case

n

We consider the question of whether a sequence M is a feasible n R moment

vector ie whether there exists a random vector X such that E X MEXX

n

Theorem Asequence M is a feasible n R moment vector if and only if

the fol lowing matrix is semidenite

M

M

Pro of

n

Supp ose M is a feasible n R momentvector Then there exists a random variable

X such that E X MEXX The matrix X MX M is semidenite Taking

exp ectations we obtain that

E X MX M MM

which expresses the fact that a covariance matrix needs to b e semidenite It is easy to see

that MM if and only if

Converselyif then MM LetX be a multivariate

ariance matrix MM This shows that the vector M is with mean M and cov

n

a feasible n R momentvector

n

There are known necessary conditions for a sequence to b e a feasible n k R moment

vector for k that also involve the semideniteness of a matrix derived from the vector

but these conditions are not known to b e sucient In general the complexity of deciding

n

whether a sequence is a feasible n k R momentvector has not b een resolved

Structure of the pap er

The structure of the pap er is as follows In Section we outline the application of semide

nite programming to sto chastic optimization problems In Section we derive explicit and

often surprising optimal b ounds in probability theory using convex and semidenite pro

gramming metho ds In Section we apply convex and semidenite programming metho ds

to problems in nance In Section we illustrate a connection b etween moment problems

and semidenite relaxations in discrete optimization Section contains some concluding

remarks

Semidenite Relaxations for Sto chastic Optimization Prob

lems

The development of semidenite relaxations represents an imp ortant advance in discrete

optimization In this section we review a theory for deriving semidenite relaxations for

classical sto chastic optimization problems The idea of deriving semidenite relaxations

for this class of problems is due to Bertsimas Our development in this pap er follows

Bertsimas and NinoMora the interested reader is referred to these pap ers for fur

ther details We demonstrate the central ideas for the problem of optimizing a multiclass

queueing network that represents a sto chastic and dynamic generalization of a job shop

Mo del description

We consider a network of queues comp osed of K singleserver stations and p opulated by N

job classes The set of job classes N fNg is partitioned into subsets C C so

K

that station m K fKg only serves classes in its constituency C We refer to jobs

m

of class i as ijobs and we let si b e the station that serves ijobs The network is open

so that jobs arrive from outside follow a Markovian route through the network ijobs wait

for service at the iqueue and eventually exit External arrivals of ijobs followaPoisson

pro cess with rate if class i do es not have external arrivals The service times of

i i

ijobs are indep endentandidentically distributed having an exp onential distribution with

mean Up on completion of service at station si an ijob b ecomes a j job and

i i

hence is routed to the j queue with probability p orleaves the system with probability

ij

P

p p We assume that the routing matrix P p is such that a single

i ij ij ij N

j N

job moving through the network eventually exits ie the matrix I P is invertible We

further assume that all service times and arrival pro cesses are mutually indep endent

The network is controlled byascheduling policy which sp ecies dynamically how each

server is allo cated to waiting jobs Scheduling p olicies can b e either dynamic or staticIna

scheduling decisions may dep end on the current or past states of all queues dynamic policy

in a static p olicy the scheduling decisions of eachserver are indep endent of the queue

lengths of the job classes A scheduling p olicy is stable if the queuelength vector pro cess

has an equilibrium distribution with nite mean Weallow p olicies to b e preemptive ie a

jobs service maybeinterrupted and resumed later Finallyascheduling p olicy is nonid ling

if a server cannot idle whenever there is a job waiting for service at that station

Next we dene other mo del parameters of interest The eective arrival rate of j jobs

denoted by is the total rate at which b oth external and internal jobs arrive to the

j

j queue The s are computed by solving the system

j

X

p for j N

ij i j j

iN

The trac intensity of j jobs denoted by is the timestationary probability that

j j j

P

a j job is in service The total trac intensity at station m is C and is the

m j

j C

m

timestationary probability that server m is busyWe note that the condition

C for m K

m

is necessary but not sucient for guaranteeing the stabilityofany nonidling p olicy

We assume that the system op erates in a steadystate regime under a stable p olicy

and intro duce the following variables

L t number of ijobs in system at time t

i

B t ifan ijob is in service at time t otherwise

i

P

m m

B t B t if server m is busy at time t otherwise notice that B t

i

iC

m

m

In what follows we write for convenience of notation L L B B and B

i i i i

m

B

The p erformance optimization problem

The performancemeasures we are interested in are x x and y y where

j j

j N j N

i h

for j N x E L y E L

j j j

j

ie the vectors whose comp onents are the timestationary mean and second momentofthe

numb er of jobs from each class in the system

Given a performancecost function c x d y weinvestigate the following performance

c x d y that is valid under a given optimization problem compute a lower b ound Z

class of admissible p olicies and design a p olicy which nearly minimizes the cost c x d y

For our purp oses any preemptive nonidling p olicy is admissible In this pap er we restrict

our attention to the question of computing strong lower b ounds As wementioned in

the Intro duction the design of optimal p olicies is EX P T I M Ehard Papadimitiou and

Tsitsiklis ie it is provably requires exp onential time as P EXPTIME In recent

years progress has b een made in designing nearoptimal scheduling p olicies based on the

idea of uid control in which discrete jobs are replaced by the ow of a uid we refer the

interested reader to the pap ers by Avram Bertsimas and Ricard Weiss Luo and

Bertsimas and the references cited therein for details

We study the problem of computing go o d lower b ounds via the achievable region ap

proach The achievable region equivalently p erformance region X is dened as the set of

all p erformance vectors x y that can b e achieved under admissible p olicies Our goal is to

derive constraints on the p erformance vector x y that dene a relaxation of p erformance

region X Since it is not obvious howtoderivesuch constraints directlywe pursue the

following plan a identify system equilibrium relations and formulate them as constraints

involving auxiliary performance variables b formulate additional constraints b oth linear

and positive semidenite on the auxiliary p erformance variables c formulate constraints

that express the original p erformance vector x y in terms of the auxiliary variables

Notice that this approach is fairly standard in the mathematical programming literature

and has a clear geometric interpretation It corresp onds to constructing a relaxation of the

p erformance region of the natural variables x y byalifting this region into a higher

dimensional space by means of auxiliary variables b b ounding the lifted region through

constraints on the auxiliary variables and c projecting backinto the original space Lift

and project techniques haveproven p owerful to ols for constructing tight relaxations for hard

discrete optimization problems see eg Lovasz and Schrijver Wehave summarized

the p erformance measures considered in this pap er including auxiliary ones in Table

The rest of this section is organized as follows In Section we include linear con

straints that relate the natural p erformance measures in terms of auxiliary p erformance

variables Using the fact that our p erformance measures are exp ectations of random vari

ables we describ e a set of p ositive semidenite constraints in Section In Section

weintro duce a linear and a semidenite relaxation using the constraints of the previous

sections We further present computational results that illustrate that the semidenite

relaxation is substantially stronger than the linear programming relaxation

Performance variables Interpretation

x x x E L

j j j N j

i i i i

x X x x x E L j B

ij N j N j i

j j j

m m m m m

x X x x x E L j B

mKj N j N j

j j j

r R r E B B

ij ij ij N i j

k k k

r R r E B B j B

ij N i j k

ij ij

m m m m

r R r E B B j B

ij N i j

ij ij

y Y y E L L

ij ij ij N i j

k k k

y E L L j B Y y

i j k ij N

ij ij

m m m m

y E L L j B Y y

i j ij N

ij ij

Table Network p erformance measures

Linear constraints

In this section we presentseveral sets of linear constraints that express natural p erformance

measures in terms of auxiliary ones The rst set of constraints describ es constraints that

follow from elementary arguments

Theorem Elementary constraints Under any stable policy the fol lowing equations

hold

a Projection Constraints

X

i m

x x C x j N m K

j i m

j j

iC

m

X

k m

r r C r i j N m K

ij k m

ij ij

k C

m

X

k m

y C y i j N m K y

k m ij

ij ij

k C

m

b Denitional Constraints

j

i j N r r

ij j

ii

i

r r i N

ii i

ii

k

r r i j C

ij m

ij

k

i k C or j k C r

m m

ij

m

r i or j C

m

ij

c Lower bound constraints

r max i j N

ij i j

r

ij

i

i j N x

j

i

C

j m

m

x max m KjN

j

C

m

r r

ki kj

k

i j k N r max

ij

k

max C max C

i m j m

m

r i j NmK max

ij

C

m

y r i j N

ij ij

k k

y r i j k N

ij ij

m m

y r i j NmK

ij ij

Pro of

The constraints in a followby a simple conditioning argument by noticing that at each

time instant a server is either serving some job class in its constituency or idling The

constraints in b c follow from elementary arguments

Flow conservation constraints

We next present a set of linear constraints on p erformance measures using the classical ow

conservation law of queueing theory L L We rst provide a brief discussion of ow

conservation in sto chastic systems and then showhow to use these ideas to derive linear

relations b etween timestationary moments of queue lengths

The classical ow conservation law of queueing systems states that under mild restric

tions the stationary state probabilities of the numberinsystematarrival ep o chs and that

at departure ep o chs are equal The key assumption is that jobs arrive to the system and

depart from the system one at a time so that the queue size can change only by unit steps

Consider a multiclass queueing network op erating in a steady state regime with the

numb er in system pro cess fLtgWe assume that the pro cess fLtg has rightcontinuous

sample paths and we use Lt to denote the left limit of the pro cess at time t The

corresp onding right limit Lt Lt b ecause of rightcontinuity of sample paths Let A

d a

g b e the sequences of arrival and departure ep o chs of jobs resp ectively g and D f f

k k

a

th

Let L b e the numb er of jobs in the system seen bythek arriving job just b efore its

k

d th

arrival similarlyletL bethenumb er of jobs in the system seen bythe k departing

k

job just after its departure We dene

a

L L

and

d

L L

Since we assumed the system to b e in steadystate L maybeinterpreted as the

numb er of jobs in the system seen bya typical arrival while L maybeinterpreted as the

numb er of jobs in the system seen byatypical departure By considering any realization

we see that for every upward transition for the numb er in system from i to i there

a

is equal to is a corresp onding downward transition from i to ithus every L

k

d

a distinct L in a samplepath sense In particular wehave L L yielding the

k

following theorem

Theorem Flow Conservation Law If jobs enter and leave the system one at a time

then

L L

holds in distribution

In what follows we apply the law L L to a family of queues obtained by aggregating

job classes as explained next Let S N

Denition S queue The S queue is the queueing system obtainedbyaggregating job

classes in S The number in system at time t in the S queue is denotedby L t

S

P

L t

j

j S

L L For convenience of L L As usual we write L L L

S S S S S

S S

notation we also write

X

pi S p

ij

j S

and

X

S

j

j S

The next theorem formulates the law L L as it applies to the S queue

Theorem The law L L in MQNETs Under any dynamic stable policy and

for any subset of job classes S N and nonnegative integer l

X X

pi S P L l j B pi S P L l j B S P L l

i S i i S i S

c

iS iS

Pro of

By applying Theorem to the S queue wehave that

P L l P L l

S S

An arrival ep o chtotheS queue is either an arrival from the outside world external arrival

c

that happ ens with rate S or an internal movement from a class i in S to a class in S

c

internal arrival that happ ens only if B for i S with rate

i

pi S P B pi S pi S

i i i i i

The total arrival rate to S queue is

X

pi S S

i S

c

iS

Therefore

X

S pi S

i

l P L P L l P L l j B

S S i

S

S S

c

iS

A departure ep o ch from S queue happ ens with rate

pi S P B pi S

i i i

for all i S The total departure rate is

X

pi S

S i

iS

It can b e easily checked that the total arrival rate to the S queue and the total departure

rate from the S queue are equal ie Therefore

S S

X

pi S

i

P L l j B P L l

S i

S

S

iS

By applying P L l P L l Eq follows

S S

Taking exp ectations in identitywe obtain

Corollary Under any stable policy and for any subset of job classes S N and positive

h i

K

integer K for which E L L

N

h i h i h i

X X

K K K

S E L pi S E L j B pi S E L j B

i i i S i

S S

c

iS iS

Note that Corollary formulates a linear relation b etween timestationary moments of

queue lengths The equilibrium equations in Corollary corresp onding to K and

S fig fi j g for i j N yield directly the system of linear constraints on p erformance

variables shown next Let Diag

Corollary Flow conservation constraints Under any dynamic stable policy the

fol lowing linear constraints hold

a

x x I P X X I P I P I P

L L then b If E

N

X

j j

r

p y y p x p j N y

r rj j j jj j jj j jj

jj

jj j

r N

X X

r r

y y y p y p y

i jj j ii i j ij r ri r rj

jj ii

r N r N

X

j j

r i i

p p y y y y y

r ri rj i j i j

ij jj ij

ii ij

r N

j

i

p p x p p p p x

j ji jj i ij j ji i ii ij

j

i

i j N

The ow conservation constraints were rst derived for multistation MQNETs by Bert

simas Paschalidis and Tsitsiklis and by Kumar and Kumar using a p otential

function approach The derivation we presented is from Bertsimas and NinoMora

Positive semidenite constraints

We present in this section a set of positive semidenite constraints that strengthen the

formulations obtained through equilibrium relations Recall that the p erformance measures

x y are moments of random variables Applying Theorem to the p erformance variables

intro duced in Table yields directly the following result

Theorem Under any dynamic stable policy the fol lowing semidenite constraints hold

k k m m

a Let r r and r r

iN iN

ii ii



 R

k

r

k N

k k

r R

m

r

m K

m m

R r

P

b If E L then

j

j N

x

x Y

k

x

k N

k k

x Y

m

x

m K

m m

x Y

On the p ower of the semidenite relaxation

Our ob jective in this section is to compare computationally the linear and semidenite

relaxations of the multiclass queueing network p erformance optimization problem The

linear programming relaxation is dened as follows

Z minimize c x d y

LP

sub ject to Pro jection constraints

Denitional constraints

Lower b ound constraints

Flow conservation constraints

x y

The semidenite relaxation Z is obtained by adding the constraints

SD

Amulticlass network

M M

  

Figure A Multiclass Network

We consider the network of Figure In this network external arrivals come into either class

or class and so In our computations we x the service times as shown in

the gure and vary only the arrival rates Wemaintain the symmetry b etween classes and

so weset where varies from to We select c and d ie

i i

we are interested in minimizing the exp ected numb er of jobs in the system in steadystate

We present b elow the optimal values Z and Z The SDP relaxation has variables

LP SD

including slackvariables and constraints Wesolve the semidenite relaxation using

In certain cases SDPA the package SDPAdevelop ed byFujisawa Ko jima and Nakata

was unable to solve the relaxation to the desired accuracy but returned primal and dual

feasible solutions in such cases we rep ort the cost of the b est primal and dual feasible

solutions obtained by SDPA This has nothing to do with the size of the SDP relaxation

but p erhaps something to do with the particular values of the constraint matrices All of

Z Z Z E Z Best B

LP SD SD LB F S B

primal dual

Table Comparison of LP and SDP relaxations for the network of Figure

these instances were solved in less than one minute by SDPA onaPentium I I workstation

as the SDP relaxation has a simple blo ck structure

For comparison purp oses we also rep ort simulation results for a particular p olicy that

was derived from uid optimal control see Avram et al When b oth L tLt B

the rst station gives preemptive priority to class and the second station gives preemptive

priority to class When L t B class has preemptive priorityover class Similarly

when L t B class has preemptivepriorityover class We call this p olicy last

buerrstserved with a threshold B denoted by LB F S B We let E Z denote

LB F S B

the exp ected numb er of jobs under this p olicyWe select the value of B optimally using

simulation

e rep ort the values Z a primal feasible value for Z a dual feasible In Table w

LP SD

value for Z the simulation value E Z and the value of the threshold B that

SD LB F S B

gives the optimal p erformance The computational results suggest the following

a The semidenite relaxation substantially improves the linear programming relaxation

In our exp eriments the improvement is in the range of The improvement

is more substantial as the trac intensity increases

b The value of the semidenite relaxation is close to the exp ected value of the p olicy

LB F S B This shows that not only the semidenite relaxation pro duces near optimal

b ounds but the particular p olicy we constructed is a near optimal

Amulticlass single queue

We consider a single station network with four classes Our ob jective here is to minimize

P

x y For the case that we do not include terms involving y in the ob jective

i ii ii

i

function the LP relaxation is exact see Bertsimas and NinoMora

We assume that the arrival rate for each class is the same and that the mean service

times for the job classes are and resp ectively The results of the LP and

SDP relaxations are tabulated in Table In this exp eriment the SDP relaxation has

variables and constraints All of these instances were solved in less than two minutes

by SDPAonaPentium I I workstation

For comparison purp oses wehave simulated the following dynamic priority p olicy P

Atevery service completion time twe giv epriority to the class that has the highest index

L t The p olicy was derived from uid optimal control see Avram et al

i i

We observe again that the semidenite relaxation provides a sustantial improvement

over the value of the LP relaxation often by an order of magnitude

Both computational exp eriments demonstrate that unlike the LP relaxation the semidef

inite relaxation provides practically useful sub optimality guarantees that can b e used to

assess the closeness to optimality of heuristic p olicies We b elieve that the combination

of uid optimal control metho ds to generate near optimal p olicies for large scale problems

see Luo and Bertsimas and semidenite relaxation to provide near optimal b ounds

is p erhaps the most promising metho dology to address the multiclass queueing network

optimization problem

Z Z Z E Z

LP SD SD P

primal dual

Table Comparison of LP and SDP relaxations for a multiclass queue

Optimal Bounds in Probability

In this section we review the work of Bertsimas and Pop escu Supp ose that is a

feasible moment sequence and X has a feasible distribution Wenow dene the central

problem we address in this section

The n k b ound problem

Given a sequence of up to k th order moments

k k

k

 n

E X X X k k k k

k n

n

n

ofamultivariate random variable X X X X on R nd the b est p ossible

n

or tight upp er and lower b ounds on P X S for arbitrary events S

The term b est p ossible or tight upp er and by analogy lower b ound ab ove is dened

as follows

Denition We say that is a tight upper bound on P X S and we wil l denote it by

sup P X S if

X

a it is an upper bound ie P X S for al l random variables X

b it cannot beimproved ie for any thereisarandom variable X for which

P X S

The well known inequalities due to Markov Chebyshev and Cherno which are widely

used if weknow the rst moment the rst twomoments and all moments ie the gener

ating function of a random variable resp ectively are feasible but not necessarily optimal

solutions to the kb ound problem ie they are not necessarily tight b ounds

In the univariate case the idea that optimization metho ds and duality theory can b e

used to address these typ e of questions is due to Isii Thirtyyears later Smith

has generalized this work to the multivariate case and prop osed interesting applications

in decision analysis dynamic programming statistics and nance He also intro duces a

n

computational pro cedure for the n k R b ound problem although he do es not refer to

it in this way Unfortunately the pro cedure is far from an actual algorithm as there is

no pro of of convergence and no investigation theoretical or exp erimental of its eciency

Bertsimas and Pop escu whose work we survey in this pap er have prop osed convex

optimization algorithms to address the n k b ound problem

i computes a value We examine the existence of an algorithm that on input hn k

where sup P X S and runs in time p olynomial in n k log and

max

X

where max We assume the availability of an oracle to test memb ership log

max k

in S and we allow our algorithm to make oracle queries of the typ e is x in S

Primal and dual formulations

The n k upp er b ound problem can b e formulated as the following optimization problem

P

Z

P Z maximize f zdz

P

S

Z

k

k

n

sub ject to z z f zdz k k k

k n

n

z z f z f z z z

n n

Notice that if Problem P is feasible then is a feasible moment sequence and any

feasible distribution f zisa feasible distribution

In the spirit of linear programming duality theorywe asso ciate a dual variable u with

k

each equality constraint of the primal We can identify the vector of dual variables with a

k degree nvariate dual p olynomial

X

k

k

n

x u x g x x

k n

n

k k k

n

We refer to such a p olynomial as a k degree nvariate p olynomial The dual ob jective

translates to nding the smallest value of

X X

k

k

n

u u E X X E g X

k k k

n

k k

feasible distribution In this framework the where the exp ected value is taken over any

Dual Problem D corresp onding to Problem P can b e written as

D Z minimize E g X

D

sub ject to g x k degree nvariate p olynomial

g x x x

S

where x is the indicator function of the set S dened by

S

if x S

x

S

otherwise

Notice that in general the optimum may not b e achievable Whenever the primal opti

mum is achieved we call the corresp onding distribution an extremal distributionWe

next establish weak duality

Theorem Weak duality Z Z

P D

Pro of

Let f z b e a primal optimal solution and let g zbeany dual feasible solution Then

Z Z Z

dz E g X Z f zdz zf zdz g zf z

P S

S

and hence Z inf E g X Z

P D

g

S

Theorem indicates that by solving the Dual Problem D we obtain an upp er b ound

on the primal ob jective and hence on the probabilitywe are trying to b ound Under some

mild restrictions on the momentvector the dual b ound turns out to b e tight This strong

duality result follows from a more general theorem rst proved in one dimension by Isii

and in arbitrary dimensions by Smith The following theorem is a consequence of their

work

Theorem Strong Duality and Complementary Slackness If the moment vector

is an interior point of the set M of feasible moment vectors then the fol lowing results

hold

a Strong Duality Z Z

P D

b Complementary Slackness If the dual is bounded there exists a dual optimal solution

g and a discrete extremal distribution concentratedonpoints x where g x

opt opt

x that achieves the bound

S

It can also b e shown that if the dual is unb ounded then the primal is infeasible ie

the multidimensional moment problem is infeasible Moreover if is a b oundary p ointof

M then it can b e shown that the feasible distributions are concentrated on a subset

of and strong duality holds provided we relax the dual to see Smith p In

the univariate case Isii proves that if is a b oundary p ointofM then exactly one

feasible distribution exists

If strong duality holds then by optimizing over Problem D we obtain a tight b ound

on P X S On the other hand solving Problem D is equivalent to solving the corre

sp onding separation problem under certain technical conditions see Grotschel Lovasz and

Schrijver

n

Explicit b ounds for the n n R b ound problems

In this section we presenttight b ounds as solutions to n convex optimization problems for

n

b ound problems and as a solution to a single convex optimization problem the n R

n

for the n R b ound problem for the case when the event S is a convex set The pro of

of the theorem uses duality for convex optimization problems

n

Theorem a The tight n R upper bound for an arbitrary convex event S is given

by

M

i

sup P X S min max

in

inf x

xS i XM

i

where S S fxj M x M x g

i j i i j j i

b If the Bound is achievable then thereisanextremal distribution that exactly

achieves it otherwise thereisasequence of distributions with mean M that asymptotical ly

achieve it

Theorem constitutes a multivariate generalization of Markovs inequalityWe denote by

P X M P X M i n

i i i

e

where  e and M M M Then applying

n n n



Theorem leads to

sup P X M min

e

in

XM i

n

The n R b ound problem for convex sets

n

We rst rewrite the n R b ound problem in a more convenient form Rather than

assuming that E Xand E XX are known we assume equivalently that the vector M

n

E X and the covariance matrix E X MX M are known Given a set S R

we nd tight upp er b ounds denoted by sup P X S on the probability P X S

XM

n

with mean M E X and covariance for all multivariate random variables X dened on R

matrix E X MX M

First notice that a necessary and sucient condition for the existence of such a random

variable X is that the covariance matrix is symmetric and p ositive semidenite Indeed

given X for an arbitrary vector a wehave

E a X M a E X MX M a a a

so must b e p ositive semidenite Conversely given a symmetric semidenite matrix

and a mean vector Mwe can dene a multivariate normal distribution with mean M and

covariance Moreover notice that is p ositive denite if and only if the comp onents of

X M are linearly indep endent Indeed the only waythata a E a X M

for a nonzero vector a is that a X M

We assume that has full rank and is p ositive denite This do es not reduce the

generality of the problem it just eliminates redundant constraints and thereby insures

that Theorem holds Indeed the tightness of the b ound is guaranteed by Theorem

whenever the momentvector is interior to M If the momentvector is on the b oundaryit

means that the covariance matrix of X is not of full rank implying that the comp onents of

X are linearly dep endent By eliminating the dep endent comp onents we reduce without

loss of generality the problem to one of smaller dimension for which strong duality holds

Hence the primal and the dual problems P and D satisfy Z Z

P D

n

upper bound for an arbitrary convex event S is given Theorem a The tight n R

by

sup P X S

d

XM

infx M x M is the squared distancefrom M to the set S under where d

xS

the norm induced by the matrix

b If M S and if d inf x M x M is achievable then there is an extremal

xS

distribution that exactly achieves the Bound otherwise if M S or if d is not achiev

able then thereisasequenceof M feasible distributions that asymptotical ly approach

the Bound

Theorem constitutes a multivariate generalization of Chebyshevs inequality The tight

multivariate onesided Chebyshev b ound is

sup P X M

e

d

XM

where d is given by

d minimize x x

sub ject to x M



or alternatively d is given by the Gauge dual problem of

minimize x x

d

sub ject to x M



x

If M then the tight b ound is expressible in closed form



sup P X M

e

M M

XM





Surprisingly the b ound improves up on the Chebyshevs inequality for scalar random

variables In order to express Chebyshevs inequalitywe dene the squared co ecientof

M M

variation C Chebyshevs inequalityisgiven by

M

M

C

M

P X M

where as b ound is stronger

C

M

P X M

C

M

n

In Section we review a p olynomial time algorithm for the n R upp er b ound

problem when the set S is a disjoint union of convex sets

n n

The complexity of the n R n k R b ound problems

In this section we show that the separation problem asso ciated with Problem D for the

n n

cases n R n k R b ound problems are NPhard for k By the equivalence of

optimization and separation see Grotschel Lovasz and Schrijver solving Problem D

n n

is NPhard as well Finally b ecause of Theorem solving the n R n k R b ound

problems with k is NPhard

n

The complexityofthe n R b ound problem

The separation problem can b e formulated as follows in this case

n

S R Problem SEP Given a multivariate p olynomial g x x Hx c x d and a set

do es there exist x S such that g x

n

If we consider the sp ecial case c d andS R Problem SEP reduces to the

question whether a given matrix H is cop ositive which is NPhard see Murty and Kabadi

n

The complexityofthe n k R b ound problem for k

For k the separation problem can b e formulated as follows

n

Problem SEP Given a multivariate p olynomial g x of degree k and a set S R

do es there exist x S such that g x

Bertsimas and Pop escu show that problem SEP is NPhard by p erforming a re

duction from SAT

Moment Problems in Finance

The idea of investigating the relation of option and sto ck prices just based on the no

arbitrage assumption but without assuming any mo del for the underlying price dynamics

has a long history in the nancial economics literature Cox and Ross and Harrison

and Kreps show that the noarbitrage assumption is equivalent with the existence of a

probability distribution the socalled martingale measure such that that option prices

b ecome martingales under In this section we survey some recentwork of Bertsimas and

Pop escu that sheds new light to the relation of option and sto ck prices and shows that

the natural way to address this relation without making distributional assumptions for the

underlying price dynamics but only using the noarbitrage assumption is the use of convex

optimization metho ds

In order to motivate the overall approachweformulate the problem of deriving optimal

b ounds on the price of a Europ ean call option given the mean and variance of the underlying

sto ck price solved byLo A call option on a certain sto ck with maturity T and strike k

gives the owner of the option the right to buy the underlying sto ckattime T at price k If

X is the price of the sto ck at time T then the payo of such an option is zero if Xkthe

owner will not exercise the option and X k if X k ie it is maxX k Following

Cox and Ross and Harrison and Kreps the noarbitrage assumption is equivalent

with the existence of a probability distribution of the sto ckpriceX such that the price

of any Europ ean call option with strike price k is given by

q k E maxX k

where the exp ectation is taken over the unknown distribution Note that wehave assumed

without loss of generality that the risk free interest rate is zero Moreover given that the

mean and variance of the underlying asset are observable

E X and VarX

the problem of nding the b est p ossible upp er b ound on the call price written as

E maxX k Z sup

 

X

can b e formulated as follows

Z sup E maxX k

sub ject to E X

VarX

Conversely the problem of nding sharp upp er and lower b ounds on the moments of

the sto ck price using known option prices can b e formulated as follows

sup inf E X or E X or E max X k

sub ject to E maxX k q in

i i

These formualtions naturally lead us to the following general optimization problem

sup inf E X

sub ject to E f X q i n

i i

m

where X X X isamultivariate random variable and R R is a real

m

m

valued ob jective function f R R i n are also realvalued socalled moment

i

functions whose exp ectations q R referred to as moments are known and nite We

i

assume that f x and q E f X corresp onding to the implied probability

mass constraint

Note that the problem of nding optimal b ounds on P X S of the previous section

giv en moments of X can b e formulated as a sp ecial case of Problem with x x

S

Ecient algorithms

In this section we prop ose a p olynomial time algorithm for Problem We analyze the

upp er b ound problem since we can solve the lower b ound problem bychanging the sign of

the ob jective function

P Z sup E X

P

sub ject to E f X q i n

i i

We dene the vector of moment functions f f f f and the corresp onding vector

n

of moments q q q q Without loss of generalitywe assume that the moment

n

m

functions are linearly indep endentonR meaning that there is no nonzero vector y so

m

that y f x for all x R

The dual problem can b e written as

D Z inf E y f X inf y q

D

m

sub ject to y f x x x R

Smith shows that if the vector of moments q is interior to the feasible moment

set M fE f X j X arbitrary multivariate distribution g then strong duality holds

Z Z

P D

Thus by solving Problem D we obtain the desired sharp b ounds On the other hand

under certain technical conditions see Grotschel Lovasz and Schrijver solving Prob

lem D is equivalent to solving the corresp onding separation problem S

The Separation Problem S

Given an arbitrary y y y y check whether g x y x x for all

n

m

x R and if not nd a violated inequality

The following theorem shows that for imp ortant sp ecial cases Problem is solvable

in p olynomial time

Theorem If and f i n are quadratic or piecewise linear functions or

i

piecewise constant over d convex sets and d is a polynomial in n m then Problem

can besolvedinpolynomial time

Pro of

We consider rst the case when all functions are quadratic or linear x x Ax b x c

and f x x A x b x c i n Then

i i i

i

n

X

y f x x g x y f x x y x Ax b x c

i i

i

n n n

P P P

where A y A A b y b b c y c y c We show that solving

i i i i i i

i i i

g x reduces to checking whether the matrix A the separation problem inf

m

xR

n

P

y A A is p ositive semidenite and in that case solving the corresp onding convex

i i

i

quadratic optimization problem

The following algorithm solves the separation problem in p olynomial time

Algorithm A

is not p ositive semidenite then we nd a vector x so that g x We a If A

decomp ose A Q Q where diag is the diagonal matrix of eigen

n

values of ALet beanegative eigenvalue of ALetu be a vector with u

i j

for all j iand u large enough so that u Qb u c Let x Q u Then

i i i i

i

g x x Ax b x c

u QQ QQ u b Q u c

u u b Q u c

n n

X X

b u c u Q

j j j

j

j j

u Qb u c

i i i

i

This pro duces a violated inequality

m

b Otherwise if A is p ositive semidenite then wetestif g x x R by solving

the convex quadratic optimization problem

inf g x

m

xR

If the optimal value is z we nd x such that g x which represents a

violated inequality

We next examine the case that some of the functions or f i n are piecewise

i

linear over d convex sets with d b eing a p olynomial in n m In this case the function

g x y x x can b e written as g x x Ax b x c for all x D k

k k

k

m

d where the sets D form a convex partition of R and d pol y n m We show

k

that in this case again solving the separation problem reduces to checking whether the

matrix A is p ositive semidenite and in that case solving the convex quadratic problems

inf g x k d This can b e done in p olynomial time using ellipsoid algorithm see

xD

k

The following algorithm solves the separation problem in p olynomial time

Algorithm B

a If A is not p ositive semidenite w e nd a violated inequality in exactly the same way

as part a of Algorithm A

m

b Otherwise if A is p ositive semidenite then wetestif g x x R by solving

a p olynomial number d of convex quadratic optimization problems

x c for k d x Ax b inf

k

k

xD

k

k k k

If the optimal value in any of these problems is z we nd x such that g x

which represents a violated inequality

Remarks

a Theorem covers the case in whichwewould like to nd optimal b ounds for an

option given prices for other options and the rst twomoments of the underlying

sto ck price In these cases the functions f xand x are quadratic or piecewise

linear We will see in the next section that we can derive an explicit answer in the

univariate case

b If x x where S is a disjoint union of d convex sets and thus x is piecewise

S

constant f x x x Mx M q M and n m then Problem is

n

the n R upp er b ound problem sup P x S where S is a disjoin t union of d

XM

convex sets Wehave seen in the previous section that when S is a convex set then

we can nd an explicit answer for the problem Theorem Theorem provides an

algorithmic solution if the set S is a disjoint union of a p olynomial numberofconvex

sets

c The semidenite prop erty plays an imp ortant role in the pro of of Theorem Not only

wecheck whether the matrix A is semidenite as a rst step of b oth Algorithms A and

B but we also solveconvex quadratic optimization problems over p olyhedral spaces

assuming that the sets D are p olyhedral whichistypically the case in applications

k

which can b e formulated as semidenite programming problems

Bounds on option prices given moment information

In this section we derive an explicit upp er b ound for the case of a single sto ck and a

Europ ean call option with strike k x maxx k The solution is due to Lo

Our pro of follows Bertsimas and Pop escu The pro of illustrates the use of duality

Theorem Tight upp er b ound on option prices The tight upper bound on the

price of an option with strike k on a stock whose price at maturity has a known mean

and variance is compute d by

q

k if k k

max E maxX k

 

X

k k if k

Pro of

The dual in this case can b e formulated by asso ciating dual variables y y y with the

probabilitymass mean and resp ectivelyvariance constraints We obtain the following

dual formulation

Z minimize y y y

D y

sub ject to g x y x y x y maxx k x

A dual feasible function g isany quadratic function that on the p ositive orthant is

nonnegative and lies ab ove the line x k In an optimal solution such a quadratic should

b e tangent to the line x k so we can write g x x k ax b for some a

The nonnegativity constrainton g can b e expressed as ax b x k x

Let x b b e the p oint of minimum of this quadratic Dep ending whether x is

a

nonnegative or not either the inequalityatx x or at x is binding in an optimal

solution Wehavetwo cases

then b k binding constraintat x a If b

a a

Substituting a in the ob jective we obtain

b k

q

k b k

k k Z min

D

b

b k

achieved at b This b ound is valid whenever b b k that is

a

k

then ab k binding constraintat x b If b

a

k

Substituting a in the ob jective we obtain

b

k k

k Z min

D

b

b b

achieved at b

b

that is k This b ound is valid whenever b

a k

Moment Problems in Discrete Optimization

In this section we explore the connection of moment problems and discrete optimization

We consider the maximum s t cut problem Go emans and Williamson showed that

a natural semidenite relaxation is within of the value of the maximum s t cut

Bertsimas and Yeprovide an alternativeinterpretation of their metho d that makes

the connection of moment problems and discrete optimization explicit We review this

development in this section

Given an undirected graph on n no des and weights c on the edges wewould liketo

ij

nd an s t cut of maximum weight Weformulate the problem as follows

X

Z maximize c x x

IP ij i j

ij

sub ject to x x

s t

x jn

j

and consider the semidenite relaxation

X

c y Z maximize

ij ij SD

ij

sub ject to y

st

y jn

jj

Y

We solve the semidenite relaxation and obtain the semidenite matrix Y In order to

understand the closeness of Z and Z we create a feasible solution to the s t maximum

SD IP

cut problem byinterpreting the matrix Y asacovariance matrix

Randomized heuristic H

We generate a vector x from amultivariate normal distribution with mean and

covariance matrix Y thatis

x N Y

We create a vector x with comp onents equal to or

x signx

iex if x andx ifx

j j j j

Notice that instead of using a multivariate normal distribution for x we can use any

distribution that has covariance CovxY What is interesting is that weshow the

degree of closeness of Z and Z by considering results regarding the normal distributed

SD IP

that were known in

Prop osition

Ex Ex j n

j

j

arcsin y i j n Ex x

ij i j

Pro of

The marginal distribution of x is N and thus Px Px Thus

i i i

Ex and Ex Furthermore

i

i

Ex x

i j

P x x P x x P x x P x x

i j i j i j i j

P x x P x x P x x P x x

i j i j i j i j

The tail probabilities of a multivariate normal distribution is a problem that has b een

studied in the last years Sheppard shows see Johnson and Kotz p that

arcsin y P x x P x x

ij i j i j

P x x P x x arcsin y

i j i j ij

This leads to

Ex x arcsiny

i j ij

Theorem Heuristic H provides a feasible solution for the s t maximum cut problem

with objective value Z

H

EZ Z

H SD

Pro of

First we notice that Ex x and

s t

arcsin Ex x Ex x

s t s t

iex x with probability ie the solution is feasible

s t

Moreover the value of the heuristic solution is

X

P x x P x x c EZ

i j i j ij H

ij

X

c arcsiny

ij ij

ij

X

c y

ij ij

ij

Z

SD

where we used the inequality arcsiny y from Go emans and

ij ij

Williamson

Bertsimas and Ye show that the interpretation of the matrix Y asacovariance ma

trix leads to interesting b ounds for several problems such as the graph bisection problem

the s t u maximum cut problem and constrained quadratic maximization If the distribu

tion used is the multivariate normal distribution the metho d is equivalent to the Go emans

and Williamson metho d However it would b e interesting to explore the generation

x using a distribution other than the multivariate normal distribution of random variables

For the max cut problem the b ound is not known to b e tight Generating a random

vector using a distribution other than the normal might lead to a sharp er b ound

Concluding Remarks

Wewould liketoleave the reader with the following closing thoughts

a Convex optimization has a central role to playinmoment problems arising in proba

bility and nance Optimization not only oers a natural way to formulate and study

such problems it leads to unexp ected improvements of classical results

b Semidenite optimization represents a promising direction for further research in the

area of sto chastic optimization The two ma jor areas in which semidenite program

ming has had a very p ositive impact are discrete optimization and control theory

Paralleling the development in discrete optimization webelieve that semidenite pro

gramming can play an imp ortant role in sto chastic optimization by increasing our

ability to nd b etter lower b ounds

References

N I Akhiezer The Classical Moment Problem Hafner New York NY

F Avram D Bertsimas and M Ricard Optimization of multiclass queueing networks

a linear control approach In Stochastic Networks Proceedings of the IMA pages

F Kelly and R Williams eds

D Bertsimas The achievable region metho d in the optimal control of queueing systems

formulations b ounds and p olicies Queueing Systems

D Bertsimas and J NinoMora Conservation laws extended p olymatroids and multi

armed bandit problems a p olyhedral approach to indexable systems Math Oper

Res

D Bertsimas and J NinoMora Optimization of multiclass queueing networks with

changeover times via the achievable region approach Part I the singlestation case

Math Oper Res to app ear

D Bertsimas and J NinoMora Optimization of multiclass queueing networks with

changeover times via the achievable region approach Part I I the multistation case

Math Oper Res to app ear

D Bertsimas I Paschalidis and J Tsitsiklis Optimization of multiclass queueing

networks p olyhedral and nonlinear characterizations of achievable p erformance Ann

Appl Probab

D Bertsimas and I Pop escu On the relation b etween option and sto ck prices an

optimization approach Submitted for publication

D Bertsimas and I Pop escu Optimal inequalities in probability a convex program

ming approach Submitted for publication

D Bertsimas and Y Ye Semidenite relaxations multivariate normal distributions

and order statistics In Handbook of Combinatorial Optimization Vol volume

pages Kluwer Academic Publishers DZ Du and PM Pardalos Eds

J Cox and S Ross The valuation of options for alternative sto chastic pro cesses

Journal of Financial Economics

K Fujisawa M Ko jima and K Nakata SDPA Semidenite Programming Algo

rithm Users Manual Version Research Rep ort on Mathematical and Computing

Sciences Tokyo Institute of TechnologyTokyo Japan

MX Go emans and DP Williamson Improved approximation algorithms for maxi

mum cut and satisability problems using semidenite programming Journal of ACM

M Grotschel L Lovasz and A Schrijver The ellipsoid metho d and its consequences

in combinatorial optimization Combinatorica

M Grotschel L Lovasz and A Schrijver Geometric Algorithms and Combinatorial

Optimization Algorithms and Combinatorics Springer BerlinHeidel b e rg

M Harrison and D Kreps Martingales and arbitrage in multip erio d securities markets

Journal of Economic Theory

K Isii On sharpness of chebyshevtyp e inequalities Ann Inst Stat Math

N Johnson and S Kotz Distributions in Statistics Continuous Multivariate Distri

butions John Wiley Sons New York NY

S Karlin and LS Shapley Geometry of moment spaces Memoirs Amer Math Soc

J H B Kemp ermann On the role of duality in the theory of moments In Lecture

Notes in Ec onom and Math Systemsvolume pages BerlinNew York

Springer

S Kumar and PR Kumar Performance b ounds for queueing networks and scheduling

p olicies IEEE Trans Autom Control

A Lo Semiparametric upp er b ounds for option prices and exp ected payos Journal

of Financial Economics

L Lovasz and A Schrijver Cones of matrices and setfunctions and optimization

SIAM J Optim

X Luo and D Bertsimas A new algorithm for state constrainted separated continuous

linear programs SIAM J Contr Optim

KG Murty and SN Kabadi Some npcomplete problems in quadratic and nonlinear

programming Math Progr

CH Papadimitriou and JN Tsitsiklis The complexity of optimal queueing network

control Math Oper Res

S Sengupta W Siu and NC Lind A linear programming formulation of the problem

of moments Z Angew Math Mecg

WF Sheppard On the calculation of the double integral expressing normal correlation

Trans Cambr Phil Soc

J Smith Generalized chebyshev inequalities theory and applications in decision

analysis Oper Res

G Weiss On the optimal draining of reentrant uid lines In Stochastic Networks

Proceedings of the IMA pages F Kelly and R Williams eds