<<

SEMIDEFINITE PROGRAMMING AND MULTIVARIATE CHEBYSHEV BOUNDS

Katherine Comanor Lieven Vandenberghe Stephen Boyd

The RAND Corporation, Santa Monica, Electrical Engineering Department, UCLA Electrical Engineering Department, Stanford University

Abstract: Chebyshev inequalities provide bounds on the probability of a set based on known expected values of certain functions, for example, known power moments. In some important cases these bounds can be eciently computed via convex optimization. We discuss one particular type of generalized Chebyshev bound, a lower bound on the probability of a set dened by strict quadratic inequalities, given the mean and the covariance of the distribution. We present a semidenite programming formulation, give an interpretation of the dual problem, and describe some applications.

1. INTRODUCTION The best lower bound on E f0(X) is given by the optimal value of the linear optimization problem The classical Chebyshev inequality states that minimize E f0(X) (1) subject to E fi(X) = ai, i = 1, . . . , m, Prob(|X| 1) 2

where we optimize over all probability distribu- for a zero-mean random variable X ∈ R with vari- tions on Rn. The dual of this problem is ance 2. Generalized Chebyshev bounds provide similar upper or lower bounds on the probability m of a set in Rn based on known expected values maximize z0 + aizi Xi=1 of certain functions, for example, the mean and m (2) covariance. Several such multivariate generaliza- subject to z0 + zifi(x) f0(x) for all x, tions of Chebyshev’s inequality appeared during Xi=1 the and , see Isii (1959), Isii (1963), Isii (1964), Marshall and Olkin (1960), Karlin and and has a nite number of variables zi, i = Studden (1966). Isii (1964), for example, considers 0, . . . , m, but innitely many constraints. Isii the problem of computing upper and lower bounds n shows that strong duality holds under appropriate on E f0(X), where X is a random variable on R constraint qualications, so we can nd a sharp that satises the moment constraints lower bound on E f0(X) by solving (2). Note that the constraints in (2) can be written as a single E fi(X) = ai, i = 1, . . . , m. constraint g(z0, . . . , zm) 0, where m where 1C is the 0-1 indicator function of C (i.e., g(z0, . . . , zm) = sup(z0 + zifi(x) f0(x)). 1 (x) = 1 if x ∈ C and 1 (x) = 0 otherwise). x C C Xi=1 Hence, if E X = x, and E XXT = S, then

This is a convex function of z, but in general tr(SP ) + 2qT x + r = E(XT P X + 2qT X + r) dicult to evaluate. Hence, (2) is usually an intractable optimization problem. 1 E 1C (X) Bertsimas and Sethuraman (2000), Lasserre (2002), = 1 Prob(X ∈ C). Bertsimas and Popescu (2005), and Popescu This simple argument shows that 1 tr(SP ) (2005) have recently shown that various types of 2qT x r is a lower bound on Prob(X ∈ C). In generalized Chebyshev bounds can be computed the SDP (4) we compute the best lower bound via semidenite programming. In this paper we that can be constructed this way. discuss one important example: a lower bound on the probability of a set dened by strict quadratic The proof does not establish that the bound is inequalities, given the mean and the covariance of actually achieved by a distribution with the spec- the distribution. The main purpose of the paper ied moments. This stronger result can be proved is to outline a proof of the main result from by combining Isii’s semi-innite linear program- semidenite programming duality (see Vanden- ming bounds with the S-procedure (Boyd et al., berghe et al. (2006)), give a geometric interpre- 1994, page 23). It can also be proved directly tation, and describe some applications. from semidefnite programming duality. The dual problem of (4) is

m minimize 1 2. PROBABILITY OF A SET DEFINED BY i Xi=1 QUADRATIC INEQUALITIES Zi zi subject to T 0, i = 1, . . . , m Rn · zi i ¸ Let C be dened by m strict quadratic (5) inequalities m Zi zi S x T T T T · zi i ¸ · x 1 ¸ x Aix + 2bi x + ci < 0, i = 1, . . . , m, (3) Xi=1 T tr(AiZi) + 2bi zi + cii 0, n with Ai ∈ S (not necessarily positive semide- i = 1, . . . , m, n nite), bi ∈ R , ci ∈ R. It is easily veried that n n the optimal value of the following semidenite which is an SDP with variables Zi ∈ S , zi ∈ R , program (SDP) is a lower bound on Prob(X ∈ C) and i ∈ R. It can be shown that from every for distributions with E X = x and E XX T = S: feasible solution in (5) a discrete distribution can be constructed with E X = x, E XX T = S and maximize 1 tr(SP ) 2qT x r Prob(X ∈ C) 1 i i (Vandenberghe et al. P q subject to 0 (2006)). Tightness of Pthe generalized Chebyshev · qT r ¸ inequality (4) then follows from the fact that the (4) P iAi q ibi two SDPs are duals and have the same optimal T 0, · (q ibi) r 1 ici ¸ value. i = 1, . . . , m i 0, i = 1, . . . , m. 3. GEOMETRIC INTERPRETATION n n The variables in this SDP are P ∈ S , q ∈ R , 2 Figure 1 shows an example in R . The set C r ∈ R, and ∈ R, for i = 1, . . . , m. i is dened by three linear inequalities and two To verify this, we rst note that the constraints concave quadratic inequalities. The mean x = imply that E X is shown as a small circle, and the set

T T T T T T 1 x P x + 2q x + r 1 + i(x Aix + 2bi x + ci) {x | (x x) (S xx ) (x x) = 1} and xT P x + 2qT x + r 0 for all x. This means is shown as the dashed ellipse. The optimal Cheby- that shev lower bound on Prob(X ∈ C) for this exam- ple is 0.3992. The six solid dots are the possible T T x P x + 2q x + r 1 1C (x) values of the discrete distribution computed from maximize 1 sP 2xq r C P q 1 0 0.0311 subject to 0.1494 · q r 1 ¸ · 0 1 ¸ 0 PSfrag replacements 0.0527 P q 0.3992 0, · q r ¸ x

0.2934 with variables P, q, r, ∈ R. The values P = q = 0.0743 = 0, r = 1 are obviously feasible, with objective value zero. The values P = = 1, r = q = 0 are also feasible, with objective value 1s. The values

T Fig. 1. The distribution that achieves the lower P q 1 1 x 1 x = bound on Prob(X ∈ C) for a given mean · q r ¸ (s 2x + 1)2 · s x ¸ · s x ¸ and covariance. 1 x = s 2x + 1 the optimal solution of the SDP (5). The numbers next to the dots give the probability of the six are feasible if s < x, with objective value values. (Since C is dened as an open set, the 2 ve points on the boundary are not in C itself, so (1 x) 1 sP 2xq r = . Prob(X ∈ C) = 0.3992 for this distribution.) s 2x + 1 The solid ellipse is the level curve This proves (6). To show that the inequality is {x | xT P x + 2qT x + r = 1} tight, we use the dual SDP (5): minimize 1 where P , q, and r are the optimal solution of the subject to Z lower bound SDP (4). We notice that the optimal Z z s x 0 distribution allocates nonzero probability to the · z ¸ · x 1 ¸ points where the ellipse touches the boundary of C, and to its center. This relation between the so- with variables , Z, z ∈ R. If x > s, the values lutions of the upper and lower bound SDPs holds in general, and is a consequence of the optimality s x2 s x2 Z = z = = = , conditions of semidenite programming. s 2x + 1 s x2 + (x 1)2

are feasible with objective function

(1 x)2 1 = . 4. EXAMPLES s 2x + 1

In the simplest cases, the two SDPs can be solved If x s < 1, we can also take Z = s, z = x, analytically. As an example, we derive an exten- = s, which provides a feasible point with a sion of the Chebyshev inequality known as Sel- smaller objective value, 1 s. Finally, if s 1, we berg’s inequality (Karlin and Studden, 1966, page can take Z = s, z = x, = 1, which has objective 475). We take C = (1, 1) = {x ∈ R | x2 < 1}. 2 value zero. This shows that there are distributions We show that if E X = x and E X = s, then with the specied mean and variance for which the righthand side of (6) is equal to Prob(X ∈ C). 0 1 s 2  1 s |x| s < 1 For example, if x = E X = 0.4 and s = E X = Prob(|X| < 1) 2 (6)  (1 |x|) 0.2 we get Prob(|X| < 1) 0.9, and the bound  s < |x| s 2|x| + 1 is achieved by the distribution   1 with probability 0.1 X = and that there is a distribution that achieves the ½ 1/3 with probability 0.9. bound. We will assume that x 0. The SDP (4) is This is illustrated in gure 2. 2 4 P x2 + 2qx + r 1.5 3

1 2 0.1 0.5 0.5 1 0.78 0 0 PSfrag replacements 0.7 −1 −0.5 PSfrag replacements −1.5 −1 −0.5 0 0.5 1 1.5 0.3 x −2 Fig. 2. Primal and dual SDP solution for C = −3 {x | x2 < 1}, E X = 0.4 and E X2 = 0.2. −2 0 2 4 6

0.06 Fig. 4. A few contour lines of the Chebyshev lower bound on Y (x) = Prob(x + w) ∈ C for T x 0.21 E w = 0, E ww = I. 0.73 6. PROBABILITY OF DETECTION ERROR

PSfrag replacements Consider a signal constellation of m possible sym- bols or signals s ∈ {s1, s2, . . . , sm}. One of the symbols is transmitted over a noisy channel. The received signal is x = s + v, where v is a noise vector with E v = 0 and E vvT = 2I. The Fig. 3. Geometrical interpretation of the Cheby- receiver then estimates s based on the received x. 2 shev bound for C = {x ∈ R | xT x < 1}. The minimum distance detector chooses the sym- bol sk closest (in Euclidean norm) to x, so sk is Figure 3 illustrates the extension of the bound (6) detected correctly if x = sk + v is closer to sk to a unit ball C = {x ∈ Rn | xT x < 1}, for than to any of the other symbols. The set of values of the random variable x for which symbol sk is 0.2 0.20 0.06 x = , S = correctly detected, is therefore a polyhedron Ck · 0.3 ¸ · 0.06 0.11 ¸ (the Voronoi region of sk in the constellation). Generalized Chebyshev bounds can be used to The optimal bound is Prob(X ∈ C) 0.73 and provide lower bounds on the probability of correct is achieved by the discrete distribution with three detection Prob(sk + v ∈ Ck), valid for any noise possible values shown in the gure. distribution with zero mean and covariance 2I. An example is given in (Boyd and Vandenberghe, 2004, page 381). 5. BOUNDING MANUFACTURING YIELD As an extension we can consider constellations The yield of a manufacturing process can be with unequal noise covariances, i.e., situations in T expressed as which the noise covariance is E vv = k when symbol s is transmitted. Suppose the detector Y (x) = Prob(x + w ∈ C), k chooses the symbol sk with the minimum Maha- lanobis distance where x denotes the nominal or target values 1 2 Rn T 1 / of a set of design parameters, w ∈ is a (x sk) k (x sk) . random vector that represents variations in the ¡ ¢ n manufacturing process, and C R denotes the (This is also the maximum likelihood detector if set of acceptable parameter values for the product. the noise is Gaussian.) Then the region of correct If the set C is dened by quadratic inequalities, detection of symbol signal sk is a set dened by we can compute a lower bound on the yield Y (x), the m 1 quadratic inequalities valid for all distributions of w with given mean

and covariance, by solving the SDPs (4) and (5). T 1 Figure 4 shows an example for a polyhedral set C (x sk) k (x sk) T 1 (shown with a dashed line). < (x sj ) j (x sj), j 6= k. (7) REFERENCES

s3 D. Bertsimas and I. Popescu. Optimal inequalities in probability theory: a convex optimization approach. SIAM J. on Optimization, 15(3):780– s2 804, 2005. D. Bertsimas and J. Sethuraman. Moment problems and semidenite optimization. In s7 H. Wolkowicz, R. Saigal, and L. Vandenberghe, s4 s1 editors, Handbook of Semidenite Program- ming, chapter 16, pages 469–510. Kluwer Aca- PSfrag replacements demic Publishers, 2000. S. Boyd, L. El Ghaoui, E. Feron, and V. Balakr- s6 ishnan. Linear Matrix Inequalities in System s5 and Control Theory, volume 15 of Studies in Applied Mathematics. SIAM, , PA, June 1994. ISBN 0-89871-334-X. S. Boyd and L. Vandenberghe. Convex opti- mization. Cambridge University Press, 2004. www.stanford.edu/~boyd/cvxbook. Fig. 5. Detection example. K. Isii. On a method for generalizations of Tchebyche’s inequality. Annals of the Institute Tight lower bounds on the probability of cor- of Statistical Mathematics, 10:65–88, 1959. rect detection can be computed by solving the K. Isii. On sharpness of Tchebyche-type in- SDP (4). equalities. Annals of the Institute of Statistical Figure 5 shows an example with m = 7 symbols Mathematics, 14:185–197, 1963. 2 in R , shown as circles. The dashed ellipses are K. Isii. Inequalities of the types of Chebyshev and T 1 Cramer-Rao and mathematical programming. dened as {x | (x sk) k (x sk) = 1}. The solid lines show the boundaries of the regions Annals of The Institute of Statistical Mathemat- of correct detection for each symbol, as dened ics, 16:277–293, 1964. by the quadratic inequalities (7). The gure also S. Karlin and W. J. Studden. Tchebyche Sys- illustrates the optimal SDP solution for s1. The tems: With Applications in Analysis and Statis- solid ellipse is dened by xT P x + qT x + r = 1, tics. Wiley-Interscience, 1966. for the optimal values P , q, r in (4). From the J. B. Lasserre. Bounds on measures satisfying dual SDP we can construct the worst-case distri- moment conditions. The Annals of Applied bution (i.e., with the highest probability of error) Probability, 12(3):1114–1137, 2002. that matches the specied noise covariance. The A. W. Marshall and I. Olkin. Multivariate Cheby- worst-case distribution is the discrete distribution shev inequalities. Annals of the Mathematical indicated by the six solid dots. Statistics, 31:1001–1014, 1960. I. Popescu. A semidenite programming approach to optimal moment bounds for convex classes of distributions. Mathematics of Operations Research, 50(3):632–657, 2005. 7. CONCLUSION L. Vandenberghe, S. Boyd, and K. Comanor. Generalized Chebyshev bounds via semidenite To appear in SIAM Review We have discussed a multivariate extension of programming. , Chebyshev’s inequality that can be eciently 2006. computed via semidenite programming. The re- sult follows from more general results of Isii (1964) and Bertsimas and Popescu (2005), and can also be proved directly from semidenite programming duality. The bounds obtained are the best pos- sible, over all distributions with given rst and second order moments. From the optimal solution of the SDPs, the worst-case distribution can be established. We have also described some applica- tions in detection theory and design centering.