Fast Algorithms for Logconcave Functions: Sampling, Rounding, Integration and Optimization

Laszl´ o´ Lovasz´ Santosh Vempala ∗ Microsoft Research Georgia Tech and MIT [email protected] [email protected]

10 Abstract concave functions is still O∗(n ) from [1] (the O∗ no- tation suppresses the dependence on error parameters and We prove that the hit-and-run random walk is rapidly logarithmic terms). mixing for an arbitrary logconcave distribution starting from any point in the support. This extends the work of For minimizing a convex function over a convex set [26], where this was shown for an important special case, (equivalent to maximizing a logconcave function), the and settles the main conjecture formulated there. From original solution was the deterministic ellipsoid method this result, we derive asymptotically faster algorithms in [10]. The current best algorithm is based on sampling by 5 the general oracle model for sampling, rounding, integra- a random walk and its complexity is O∗(n ) [3] (with a tion and maximization of logconcave functions, improving separation oracle, the complexity is O∗(n) [30, 3]). This 4.5 or generalizing the main results of [24, 25, 1] and [16] has recently been improved to O∗(n ) for minimizing a respectively. The algorithms for integration and optimiza- linear function over a convex set [16]. tion both use sampling and are surprisingly similar. The class of functions for which either integration or optimization can be achieved in (randomized) polynomial 1 Introduction time is currently determined by efficient sampling from the distribution whose density is proportional to the given function. In [24], it is shown that arbitrary logconcave Given a real-valued function f in Rn, computing the functions can be sampled using the ball walk or the hit- integral of f and finding a point that maximizes f are and-run walk. However, the walks are proved to be rapidly two fundamental algorithmic problems. They include as mixing only from a warm start, i.e., a distribution that is special cases, the famous problems of computing the vol- already close to the target. This requirement leads to a ume of a convex body (here f is the indicator function of considerable overhead in the algorithm (and, of course, the convex body) and minimizing a linear function over a in the analysis). For the ball walk, such a requirement is convex set (here f is equal to the linear function over the unavoidable in general—the mixing rate depends polyno- convex set and infinity outside). In this general setting, mially on the “distance” between the start and target dis- the complexity of an algorithm can be measured by the tributions (see Section 1.2). number of function evaluations (“membership queries”). Both problems are computationally intractable in general [7, 2, 5, 17] and the question of understanding the set of On the other hand, the hit-and-run algorithm, which functions for which they are solvable in polynomial time seems to be the fastest sampling algorithm in practice, is has received much attention over the past few decades. conjectured to be rapid-mixing from any starting distribu- In a breakthrough paper, Dyer, Frieze and Kannan [6] tion (including one that is concentrated on a single point); 23 gave a polynomial-time algorithm (O∗(n )) for the spe- more precisely, the dependence on the distance to the sta- cial case of computing the volume of a convex body. Many tionary distribution could be logarithmic. An important improvements followed [1, 19, 21, 23, 5, 14, 25], the most step towards proving this conjecture was taken in [26] 4 recent reducing the complexity to O∗(n ). Applegate and where it was proved for the special case when the target Kannan [1] gave an algorithm for the integration of log- distribution is an exponential function over a convex body concave functions (with mild smoothness assumptions), a (which includes the uniform density). This result plays a common generalization of convex bodies and Gaussians. key role in the latest volume algorithm [25] and also in The driving idea behind all these results is random sam- a faster algorithm for minimizing a linear function over pling by geometric random walks. Unlike volume compu- a convex set [16]. The major hurdle in extending these tation, there has not been much improvement in the com- algorithms to integration and optimization of logconcave plexity of integration and the current best for general log- functions is the lack of a provably rapid-mixing random walk with a similar mild (logarithmic) dependence on the ∗ Supported in part by a Guggenheim fellowship. start. 1.1 Computational model the level sets L(a) = x Rn : f(x) a . The centroid { ∈ ≥ } of πf is zf = x dπf (x) = E(x), and the variance of πf 2 2 is Var(πf ) = R x zf dπf = E( x zf ). First, let us have a brief discussion of the case of con- k − k k − k vex bodies. One usual definition of mixing time is to take This samplingR problem is given by: the worst starting point, but we cannot use this here. Sup- (LS1) an oracle that evaluates f at any given point, to- pose that we want to sample from the unit n-cube and we gether with “guarantees” that the distribution is neither start at a vertex or infinitesimally close to it. Then, for too spread out nor to concentrated; this will be given by each of the algorithms considered so far, it takes about 2n r, R > 0 such that if πf (L(c)) 1/8 then L(c) contains ≥ 2 time before the orthant in which a nontrivial step can be a ball of radius r, and the variance of πf is at most R . Rn made is recognized, since our only access to the body is a (LS2) a starting point a0 , together with a “guar- ∈ membership oracle. antee” d > 0 such that a0 is at distance d from the bound- One consequence of this is that if we want polynomial ary of the level set L(f(a0)/2) and another guarantee n n β > 0 such that f(a0) β max f. bounds in the dimension , we have to either provide in- ≥ formation about the starting point, say by taking into ac- (LS3) an error bound ε > 0, describing how close the count the distance of the point to the boundary. Else, we sample point’s distribution must be to the target distribu- can start from a random point from a distribution which is tion πf (in total variation distance). sufficiently spread out. A strong version of this second ap- Again, we could replace (LS2) by the assumption that proach is warm start, when we assume the density of the we have starting distribution relative to the stationary distribution (LS2’) a random starting point a0 from a distribution 2 σ, and a “guarantee” M 1 such that dσ/dπf M is at most . ≥ ≤ A theorem of Aldous and Diaconis guarantees that if or even more generally, the distance of the distribution after m0 steps from the (LS2”) a random starting point a0 from a distribution σ, and a “guarantee” M 1 such that dσ/dπ M stationary distribution from any starting point is less than ≥ f ≤ m/m0 except for a set S with σ(S) ε. 1/4, then after m steps it is less than 2− ; but this The sampling algorithm we≤ analyze is the hit-and-run gives very poor bounds here because of “bad” starting n walk in R , which we start at a1 and stop after m steps points. So polynomiality in log ε (where ε is the dis- m tance we want) is an issue here.| | (see section 3 for the definition). Let σ denote the dis- To be more precise, a convex body sampling problem is tribution of the last point. We say that the walk is rapidly given by mixing if for some m polynomial in n, log R, log r, log ε, log d and log β (or log M), d (π , σm) ε. (CS1) a convex body K Rn, which is given by a tv f ≤ membership oracle, together with⊆ two “guarantees” r and R that it contains a ball with radius r and it is contained in 1.2 Results a ball with radius R; (CS2) a starting point a K, together with a “guar- In this paper, we first show that hit-and-run applied to 0 ∈ antee” d > 0 such that B(a0, d) K; any logconcave function mixes rapidly. The mixing time ⊆ 3 (CS3) an error bound ε > 0, describing how close the is O∗(n ) after appropriate normalization (to be discussed sample point’s distribution must be to the uniform distri- shortly). It is worth noting that this is the first random bution on K (in total variation distance). walk proven to be rapidly mixing (in the sense above) even Instead of (CS2), we could have for the special case of convex bodies. (Strictly speaking, the other well-known walks, namely the ball walk and the (CS2’) a random starting point a0 K, together with a “guarantee” M 1 such that the density∈ of the distri- lattice walk ([8, 9]), are not rapid-mixing.) ≥ bution σ of a0 is bounded by M (relative to the uniform Rn density on K) Theorem 1.1 Let f be a logconcave function in , given or in the sense of (LS1), (LS2”) and (LS3). Then for (CS2”) a random starting point a0 K, together with a n2R2 MnR M parameter M 1 such that the density∈ of the distribution 30 2 3 m > 10 2 ln ln , σ of a is bounded≥ by M except in a subset S with σ(S) r rε ε 0 ≤ ε/2. m the total variation distance of σ and πf is less than 2ε. It turns out the (CS2”) is the most useful version for applications. Remarks: 1. If instead of a bound on the moment E( x 2 | − More generally, we consider the problem of sampling zf ) we have the stronger condition that the support of f from a logconcave distribution. To state the problem for- is |contained in a ball of radius R, then the bound on the mally, we consider a logconcave function f : Rn R+; mixing time is smaller by a factor of ln2(M/ε). → it will be convenient to assume that f has a bounded sup- 2. The condition (LS2”) captures the setting when L2 port, so that there s a convex body K such that f = 0 distance (see Section 2) between the start and target den- outside K. We consider the distribution πf defined by sities is bounded. πf (X) = X f Rn f . So f is proportional to the 3. This result improves on the main theorem of [24] 4 density functionR of theR distrib ution, but not necessarily by reducing the dependence on M and ε from (M/ε) equal, i.e., we don’t assume that f = 1. We’ll need to polylogarithmic. For the ball walk, some polynomial R 2 dependence on M is unavoidable. (consider e.g., a starting points that attain the maximum (e.g. a tangent plane to the distribution that is uniform in the set of points within δ/2 level set L(f(x))), then the maximization problem can be of the apex of a rotational cone, where δ is the radius of solved using O∗(n) oracle calls [3, 30]. This gives an inte- 4 the ball walk). gration algorithm of complexity O∗(n ) for arbitrary log- 10 To analyze hit-and-run starting at a single point, we concave functions, improving on the O∗(n ) algorithm take one step and then apply the theorem above to the dis- of Applegate and Kannan. tribution obtained. It is, however, possible that the only access to f is via an oracle that evaluates f at a given point. In this set- Corollary 1.2 Let f be a logconcave function in Rn, ting, the maximization problem seems more difficult. Our given in the sense of (LS1), (LS2) and (LS3). Then for last result is an algorithm for this problem. The idea for maximization, described in [16] and earlier for a more re- n3R2 nR2 m > 1031 ln5 , stricted setting in [15], is again simulated annealing: we r2 εrdβ consider a sequence of functions f(x)1/T where T starts

m very high (so the distribution is close to uniform) and the total variation distance of σ and πf is less than ε. is reduced till the distribution is concentrated sufficiently close to points that maximize f (so, unlike integration, The bound above does not imply rapid-mixing, because we do not stop when T = 1 but proceed further till T the number of steps is polynomial in R/r, rather than in its is sufficiently close to zero). The resulting algorithm can logarithm. We call a logconcave function well-rounded, be viewed as an interior-point method that requires only if R/r = O(√n). In the rest of this introduction, we O (√n) phases. By using f(x) = e g(x), we get an algo- assume that our logconcave functions are well-rounded. ∗ − rithm for minimizing any convex function g over a convex In this case, our algorithms are polynomial, and the (per- body. haps most interesting) dependence on the dimension is 3 O∗(n ). Every logconcave function can be brought to a Theorem 1.4 For any well-rounded logconcave function well-rounded position by an affine transformation of the n f, given ε, δ > 0 and a point x0 with f(x0) β max f, space in polynomial time (we’ll return to this shortly). 4.5 ≥ we can find a point x in O∗(n ) oracle calls such that With this efficient sampling algorithm at hand, we turn with probability at least 1 δ, to the problems of integration of logconcave functions, − perhaps the most important problem considered here, at f(x) (1 ε) max f least from a practical viewpoint. The idea for integra- ≥ − tion, first suggested in [22], can be viewed as a general- and the dependence on ε, δ and β is bounded by a poly- ized version of the method called simulated annealing in nomial in ln(1/εδβ). a continuous setting. We consider a sequence of functions of the form f(x)1/T where T , the “temperature”, starts This improves on the previous best algorithm [3] by a very high (which means the corresponding distribution is factor of √n. nearly uniform over the support of f) and is reduced to 1. We will see shortly that the same idea is also suitable for 2 Preliminaries maximization of logconcave functions.

Theorem 1.3 Let f be a well-rounded logconcave func- We measure the distance between two distributions σ tion given by a sampling oracle. Given ε, δ > 0, we can and π in two ways, by the total variation distance compute a number A such that with probability at least 1 1 δ, dtv(σ, π) = σ(x) π(x) dx, − 2 ZRn | − | (1 ε) f A (1 + ε) f − Z ≤ ≤ Z and the L2 distance and the number of oracle calls is σ(x) 4 d2(σ, π) = dσ(x). n 7 n 4 Rn π(x) O log = O∗(n ). Z ε2 εδ n A function f : R R+ is said to be logconcave if → α 1 α n In [25], it was shown that a convex body can be put in f(αx + (1 α)y) f(x) f(y) − for every x, y R near-isotropic position (which implies well-roundedness, and 0 α− 1. ≥Gaussians, exponentials and indica-∈ 4 ≤ ≤ see Section 2) in O∗(n ) steps. In Section 6, we give an tor functions over convex bodies are all logconcave. The algorithm to put an arbitrary logconcave function in near- product, minimum and convolution of two logconcave 4 isotropic position in O∗(n ) steps. However, this algo- functions is also logconcave [4, 27, 18]. The last prop- rithm (as previous algorithms) requires the knowledge of erty implies that any marginal of a logconcave function is a point where f is (approximately) maximized (a stronger also logconcave. We will use several known properties of version of (LS2)). In many cases, such a point is much logconcave functions. The following property proved in easier to ﬁnd. If one has a separation oracle for f, which, [16] (a variant of a lemma from [25]) will be used multi- given a point x, returns a hyperplane that separates x from ple times.

3 Lemma 2.1 ([16]) Let f be a logconcave function in Rn. Move to a random point y along the line ` chosen For a > 0, let • with density proportional to f (restricted to the line). It is well-known that the stationary distribution of this Z(a) = f(x)a dx. ZRn walk is πf . Our goal is to bound the rate of convergence. We will use the conductance technique of Jerrum and Sin- Then anZ(a) is a logconcave function of a. clair [11] as extended to the continuous setting by Lovasz´ and Simonovits [23]. The state space of hit-and-run is the The level sets of a function f are denoted by L (t) = support K of f. For any measurable subset S K and f ⊆ x f(x) t or just L(t) when the context is clear. For x K, we denote by Px(S) the probability that a step { | ≥ } ∈ a logconcave f, the level sets are all convex. from x goes to S. For 0 < πf (S) < 1, the conductance A density function f is said to be in isotropic position φ(S) is defined as T if Ef (x) = 0 and Ef (xx ) = I. The second condition is equivalent to saying that x S Px(K S) dπf φ(S) = ∈ \ . minR π (S), π (K S) { f f \ } (vT x)2f(x) dx = 1 ZRn In words, this is the probability of going from S to its complement in one step, given that we start randomly on the for any unit vector v Rn. Similarly, f is C-isotropic if φ ∈ side of smaller measure. Then, if is the minimum conductance over all measurable subsets, O(1/φ2) is a bound 1 T 2 on the mixing time (roughly speaking, the number of steps (v (x zf )) f(x) d(x) C C ≤ ZRn − ≤ required to halve the distance to the stationary distribution). for any unit vector v Rn. For any function, there is ∈ At a high level, the analysis has two parts: (i) “large” an affine transformation that puts it in isotropic position. subsets (under the measure defined by f) have large An approximation to this can be computed from a random boundaries and (ii) points that are “near” the boundary are sample as implied by combining a lemma from [28] with a likely to cross from a subset to its complement in one step. moment bound from [24] (see e.g., Corollary A.2 in [16]). The first part will be given by an isoperimetric inequality. We quote it below as a theorem. In fact, we will use the weighted inequality developed in [26], quoted below. This geometric inequality is indepen- Lemma 2.2 Let f be a logconcave function in Rn that is 1 k dent of any particular random walk. It is the second part not concentrated on a subspace and let X , . . . X be in- that is quite different for different random walks. It con- dependent random points from the distribution πf . There sists of connecting appropriate notions of geometric dis- 3 is a constant C0 such that if k > C0t ln n then the trans- tance and probabilistic distance by a careful analysis of 1/2 formation g(x) = T − x where single steps of the random walk. The isoperimetric inequality below uses the cross-ratio k k 1 1 distance defined as follows. For two points u, v in a con- X¯ = Xi, T = (Xi X¯)(Xi X¯)T vex body K, let p, q be the endpoints of the line through k k − − Xi=1 Xi=1 u, v so that they appear in the order p, u, v, q. Then the cross-ratio distance is puts f in 2-isotropic position with probability at least 1 1/2t. − u v p q d (u, v) = | − || − |. K p u v q A weaker normalization will also be useful. We say | − || − | that a function f is C-round if R/r C√n where Theorem 3.1 ([26]) Let K be a convex body in Rn. Let 2 E 2 ≤ R = f ( X zf ) and the level set of f of measure f : K R be a logconcave function, π be the cor- | − | → + f 1/8 contains a ball of radius r. When C is a constant, we responding distribution and h : K R+, an arbitrary say that f is well-rounded. For an isotropic function f, we function. Let S , S , S be any partition→ of K into mea- E 2 1 2 3 have f ( X ) = n. In [24], it is shown that if f is also surable sets. Suppose that for any pair of points u S | | ∈ 1 logconcave, then any level set L contains a ball of radius and v S2 and any point x on the chord of K through u πf (L)/e. Thus, if a logconcave function is isotropic than and v,∈ it is also (8e)-round. 1 h(x) min(1, d (u, v)). ≤ 3 K 3 Analysis of the hit-and-run walk Then

π (S ) E (h(x)) min π (S ), π (S ) . The hit-and-run walk for an integrable, nonnegative f 3 ≥ f { f 1 f 2 } f Rn function in is deﬁned as follows: n For a function f in R , let µ`,f be the measure induced Pick a uniformly distributed random line ` through by f on the line `. For two points u, v Rn, let ` be • ∈ the current point. the line through them, `− be the semi-line starting at u

4 not containing v and `+ be the semiline starting at v not of the volume of the ball; further, each point in the sep- containing u. Then the f-distance is arated subset has function value at most f(y) f(x)/2. Therefore, s(x) < 4c u v √n. ≤ | − | µ`,f ([u, v])µ`,f (`) Suppose that u v max s(u), s(v) /c√n. We df (u, v) = + | − | ≥ { } µ`,f (`−)µ`,f (` ) may suppose that f(u) f(x) (one of u, v has function value at most f(x)). The≤ level set L(3f(u)/4) misses 3.1 The smoothness of single steps 1/64 of the ball of radius s(u) around u. Using Lemma 4.4 in [24], it misses at least 3/4 of the ball of radius 4c u v √n > 4s(u) around u. Now consider balls of As in [20, 26], at a point x, we define the step-size F (x) radius| −4c|u v √n around u and x. These balls overlap by in at least| 1−/2 of| their volume. Therefore, the level set 1 P ( x y F (x)) = , L(3f(u)/4) misses at least 1/4 of the ball around x; since | − | ≤ 8 f(u) f(x), the level set L(3f(x)/4) L(3f(u)/4) where y is a random step from x. Next, we define and so≤L(3f(x)/4) also misses at least 1⊆/4 of the ball around x. This contradicts the assumption that s(x) vol((x + tB) L((3/4)f(x))) 4c u v √n. ≥ λ(x, t) = ∩ | − | vol(tB) and 3.2 Conductance and mixing time R For a point u K, let P be the distribution obtained s(x) = sup t + : λ(x, t) 63/64 . ∈ u { ∈ ≥ } by taking one hit-and-run step from u. Let µf (u, x) be the Finally, we define α(x) to be the smallest t 3 for which integral of f along the line through u and x. Then, a hit-and-run step y from x satisfies P(f(y)≥ tf(x)) 2 f(x) dx 1/16. The following lemma was proved in [24≥]. ≤ Pu(A) = n 1 . (1) nπn ZA µf (u, x) x u − Lemma 3.2 ([24]) | − | The next lemma from [26] connects geometric distance 16 with probabilistic distance. π (u : α(u) t) . f ≥ ≤ t Lemma 3.6 ([26]) Let u, v K. Suppose that ∈ The next two lemmas are from [26]. 1 df (u, v) < Lemma 3.3 ([26]) 128 ln(3 + α(u)) s(x) and F (x) . 1 ≥ 64 u v < max F (u), F (v) . | − | 4√n { } f Lemma 3.4 ([26]) Let be any logconcave function such Then dtv(Pu, Pv) < 1 1/500. that the level set of f of measure 1/8 contains a ball of − radius r. Then We are now ready to prove the main theorem bounding r the conductance. Ef (s(x)) . ≥ 210√n Theorem 3.7 Let f be a logconcave function in Rn such that the level set of measure 1/8 contains a unit ball and Next we prove a new lemma that will be crucial in ex- the support K has diameter D. Then for any subset S, tending the analysis of hit-and-run. with πf (S) = p 1/2, the conductance of hit-and-run satisfies ≤ Lemma 3.5 Let u, v K, the support of a logconcave 1 f x ∈[u, v] s(x) 4c u v √n φ(S) . function and let . If for ≥ 1013nD ln( nD ) some c 1, then ∈ ≥ | − | p ≥ Proof. Let K = S1 S2 be a partition into measurable 1 max s(u), s(v) ∪ df (u, v) and u v { }. sets, where S1 = S and p = πf (S1) πf (S2). We will ≤ c | − | ≤ c√n prove that ≤

Proof. First suppose df (u, v) 1/c. Let y, y0 be 1 ≥ P (S ) dx π (S ) (2) the points on the line `(u, v) at distance 4c u v from x 2 ≥ 13 nD f 1 | − | ZS1 10 nD ln p x. From the deﬁnition of df and the logconcavity of f, min f(y), f(y0) f(x)/2. Assume f(y) f(x)/2. Consider the points that are deep inside these sets: Let H{ be a supporting} ≤ plane at y of the level set≤L(f(y)). In the ball of radius 4c u v √n around x, the halfspace 1 | − | S0 = x S1 : Px(S2) < bounded by H not containing x cuts off at least 1/16th 1 ∈ 1000

5 and Here we have used Lemma 3.4 and the bound on πf (W ). 1 Therefore, S0 = x S : P (S ) < . 2 ∈ 2 x 1 1000 1 1 Let S30 be the rest i.e., S30 = K S10 S20 . Px(S2) dx πf (S0 ) \ \ Z ≥ 2 · 1000 3 Suppose πf (S10 ) < πf (S1)/2. Then S1 1 πf (S1) 1 1 ≥ 1013nD ln nD Px(S2) dx πf (S1 S10 ) πf (S1) p ZS1 ≥ 1000 \ ≥ 2000 which proves (2). So we can assume that π (S0 ) which again proves (2) f 1 ≥ πf (S1)/2 and similarly πf (S20 ) πf (S2)/2. Next, define the exceptional subset≥ W as set of points u for which α(u) is very large. 3.3 Mixing time 227nD W = u S : α(u) . ∈ ≥ p When the support of f is bounded by a ball of radius R, the bound on the mixing time in Theorem 1.1 follows By Lemma 3.2, π (W ) p/223nD. Now, for any u f ≤ ∈ by applying Corollary 1.6 in [23] with p = ε/2M and S0 W and v S0 W , 1 \ ∈ 2 \ D = 2R/r in the expression for conductance. Now let E ( x z 2) R2. We consider the restric- 1 f k − f k ≤ dtv(Pu, Pv) 1 Pu(S2) Pv(S1) > 1 . tion of f to the ball of radius R ln(4e/p) around the cen- ≥ − − − 500 troid zf . By Lemma 5.17 in [24], the measure of f outside Thus, by Lemma 3.6, this ball is at most p/4. In the proof of the conductance bound, we can consider the restriction of f to this set. In 1 1 the bound on the conductance for a set of measure p, the df (u, v) ≥ 128 ln(3 + α(u)) ≥ 212 ln nD diameter D is effectively replaced by R ln(4e/p), i.e., for p any subset of measure p, the conductance is at least or 1 cr u v max F (u), F (v) . φ(p) | − | ≥ 4√n { } ≥ nR ln(nR/rp) ln(4e/p) By Lemma 3.3, the latter implies that where c is a fixed constant. The bound on mixing follows 1 by applying Corollary 1.6 in [23] with p = ε/2M. u v max s(u), s(v) . | − | ≥ 28√n { } Now applying Lemma 3.5, in either case, for any point 4 Integration x [u, v], we have ∈ nD The algorithm can be viewed as a generalized version s(x) 214 ln u v √n ≤ p | − | of simulated annealing. To integrate f, we start with the constant function over K, the support of f, and slowly 14 nD 2 ln dK (u, v)D√n. transform it into f via a sequence of m = O∗(√n) log- ≤ p concave functions. In each step, we compute the ratio of consecutive integrals by sampling. Multiplying all of To apply Theorem 3.1, we define these and the volume of K, we get our answer. s(x) In the description below, we assume that f is well- h(x) = rounded. We restrict f to a ball of radius O(√n ln(1/ε)) 216D√n ln nD p and to the larger of the level sets L(f(x0)/2) and 2n 2 ln(1/ε) L(max f/e− − ). By Lemma 6.13 of [24], this and consider the partition S0 W , S0 W and the rest. 1 \ 2 \ restriction removes a negligible part of the integral of f. Clearly, for any u S10 W, v S20 W and x [u, v], The algorithm uses an upper bound B on the quantity we have h(x) d ∈(u, v\)/3. Thus,∈ \ ∈ ≤ K ln(max f/ min f). We can set B = 2n + 2 ln(1/ε) + n ln(1/β) where the initial point x0 satisfies f(x0) π (S0 ) E (h)π (S0 W )π (S0 W ) π (W ) n ≥ f 3 ≥ f f 1 \ f 2 \ − f β max f. 1 π (S ). 29 nD f 1 ≥ 2 nD ln p

6 Integration algorithm: have 512 E 2 I0. Set m = √n ln B , k = 2 √n ln B. Let K be (Y ) Z(2ai ai 1)Z(ai 1) ε − − d e 2 = − 2 the support of f and f0 be the indicator function of E(Y ) Z(ai) K. For i = 1, . . . , m 1, let 2 n − ai i 1 1 ≤ (2ai ai 1)ai 1 ai − − − ai = 1 + and fi(x) = f(x) 2 n B √n 1 n 1 + √n 1 =   1 + e. and fm(x) = f(x). 1 + 2 ≤ n ≤  √n    I1. Let W0 = vol(K) or an estimate of vol(K) using 1 k the algorithm of [25]. Also, let X0 , . . . , X0 be independent uniform random points from K. With probability at least 3/4, the estimate W i = 1, . . . , m Lemma 4.2 I2. For , do the following. satisfies — Run the sampler k times with target density proportional to fi 1 and starting points 1 k − (1 ε) f W (1 + ε) f. Xi 1, . . . Xi 1 to get independent random points − Z ≤ ≤ Z 1− k − Xi , . . . , Xi . Proof. For i = 1, . . . , m, let Ri = Z(ai)/Z(ai 1) and − — Using these points, compute j j ai ai−1 E j for j = 1, . . . , k, let Yi = f(Xi ) − . Then (Yi ) = E j 2 2 E k Ri and by Lemma 4.1, ((Yi ) ) eRi . Also, (Wi) = 1 j ai ai−1 i ≤ W = f(X ) − . Ri and assuming the Yj ’s are all independent, i k i Xj=1 E 2 e 1 2 (Wi ) 1 + − Ri . I3. Return W = W0W1 . . . Wm. ≤ k E m E Finally, (W ) = vol(K)Πi=1 (Wi) = Rn f(x) dx and To prove Theorem 1.3 we will show that (i) the estimate R is accurate (Lemma 4.2) and (ii) use Theorem 1.1 to en- m 2 e 1 2 sure that each sample is obtained using O(n3 ln5(n/ε)) E(W ) 1 + − E(W ) . ≤ k steps; this is implied by (a) samples from each distribution provide a good start to sampling the next distribution This implies Var(W ) (ε2/32)E(W )2 and from this the (Lemma 4.3) and (b) the density being sampled remains lemma follows. ≤ O(ln(1/ε)) -round (Lemma 4.4). The next lemma is anal- There are two technical issues to deal with. The first ogous to Lemma 4.1 in [25] with similar proof. In this j a is that the distribution of each Xi is not exactly the target section, we let Z(a) denote Rn f(x) dx. j but close to it. The second is that the Xi ’s (and therefore R j the Yi ’s) are independent for a fixed i but not for different Lemma 4.1 Let X be a random point with density pro- i. Both issues can be handled in a manner identical to the portional to fi 1 and define Y = fi(X)/fi 1(X). Then proof of Lemma 4.2 in [25], the first using a coupling trick − 2 −2 E(Y ) = Z(ai)/Z(ai 1) and E(Y ) eE(Y ) . and the second by bounding the dependence. − ≤

Lemma 4.3 For i = 0, . . . , m, let πi be the distribution Proof. The ﬁrst part is straightforward. For the second ai with density proportional to f(x) . Then d2(πi 1, πi) we have − ≤ e and d2(πi, πi 1) 4. − ≤ ai−1 2 2ai 2ai−1 f(X) E(Y ) = f(X) − dX Proof. Z f(x)ai−1 dx d2(πi 1, πi) Z(2ai ai 1) R − = − − . f(x)ai−1 / f(x)ai−1 dx f(x)ai−1 Z(ai 1) = dx − ai R ai ai−1 Z f(x) / f(x) dx f (x) dx n 2ai−1 aRi ai R Now by Lemma 2.1, a Z(a) is logconcave and so f(x) − dx f(x) dx = 2 R ai−1 R n n 2n f(x) dx (2ai ai 1) Z(2ai ai 1)ai Z(ai 1) ai Z(a). n − − − R 2 − − ≤ Z(2ai 1 ai)Z(ai) ai 1 − − = − 2 4. Using this, and the fact that ai = ai 1(1 + 1/√n), we Z(ai 1) ≤ (2ai 1 ai)ai ≤ − − − −

7 On the other hand, as in the proof of Lemma 4.2, Optimization algorithm:

O1. Set m = √n ln 2B(n+ln(1/δ)) , k = C n ln5 n d ε e d 0 e Z(2ai ai 1)Z(ai 1) and for i = 1, . . . , m, let d2(πi, πi 1) = − − − e − Z(a )2 ≤ i 1 1 a = (1 + )i and f (x) = f(x)ai . i B √n i 1 k Also X0 , . . . , X0 are independent uniform random points from K and T0 is their covariance matrix. O2. For i = 1, . . . , m, do the following: 1 k — Get k independent random samples Xi , . . . , Xi Lemma 4.4 For i = 0, . . . , m, the distribution πi corre- from πfi using hit-and-run with covariance matrix 1 k sponding to fi is O(ln(1/ε))-round. Ti and starting points Xi 1, . . . , Xi 1 respectively. − − — Set Ti+1 to be the covariance matrix of 1 k Xi , . . . , Xi .

Proof. By assumption, fm(x) = f(x) is well-rounded j j O3. Output maxj f(Xm) and a point Xm that achieves and contained in a ball of radius O(√n ln(1/ε)). Thus, the maximum. each fi is contained in this ball and so it follows that E( x 2) = O(n ln2(1/ε)) for the distribution correspond- | | We generate uniform random points from K, required ing to each fi. in the first step (O1), as in [25], by putting K in near- Now consider the level set L = Lf (t) of measure 1/8. isotropic position. By assumption, this contains a ball of radius equal to some It is clear that the number of phases of the algorithm constant c. As we move through the sequence of func- is O∗(√n). To prove Theorem 1.4, we need to show that tions, fm, fm 1, . . . , f1, the measure of L is decreasing (i) the last distribution is concentrated near the maximum, − and so for each of them, L is contained in the level set of (ii) samples from the i’th phase provide a good start for the measure 1/8 which means this level set contains a ball of next phase, i.e., d (π , π ) is bounded and (iii) the trans- 2 i i radius c. formation Ti keeps fi near-isotropic and thus the time per 3 sample remains O∗(n ). Of these, (ii) is already given by Lemma 4.3. The next lemma establishes (i). Lemma 5.1 For any ε > 0, with 5 Optimization 2B(n + ln(1/δ) m √n ln , ≥ ε for a random point X drawn with density proportional to In this section, we give an algorithm for finding the fm, we have maximum of a logconcave function f, given access to f n 1 by means of an oracle that evaluates f at any desired point 2 − n P(f(X) < (1 ε) max f) δ . x and a starting point x0 with f(x0) β max f. The − ≤ e algorithm can be viewed as simulated annealing≥ in a continuous setting and is suggested in [16] and analyzed for Proof. Let X be a random point from πfm . We can bound T a x the desired probability as follows: the special case when f(x) = e− for some vector a. Here we consider the problem of maximizing an arbitrary P(f(X) < (1 ε) max f) − logconcave function. This is equivalent to finding the min- = P(f(X)am < (1 ε)am (max f)am ) imum of a convex function g(x) by considering the log- − am g(x) = P(fm(X) < (1 ε) max fm) concave function e− . − 2(n+ln(1/δ) P(f (X) < e− max f ). We restrict the support of the function f to K = ≤ m m y f(y) f(x0)/2 . In the description below, B is Now we use Lemma 5.16 from [24] which says that for { | ≥ } an upper bound on ln(max f/ min f). It will suffice to β 2 (since fm is logconcave), use B = n ln(2/β) from our assumption that f(x0) ≥ n P β(n 1) 1 β n 1 β max f. ≥ (fm(X) < e− − max fm) (e − β) − ≤ to get the bound. Finally, the near-isotropy is derived using the following lemma from [16].

8 Lemma 5.2 ([16]) Let µ and ν be logconcave densities [2] I. Bar´ an´ y and Z. Furedi:¨ Computing the Volume with centroids zµ and zν , respectively. Then for any c is Difficult. Discrete & Computational Geometry 2 Rn, ∈ (1987), 319-326. E ((c (x z ))2) 16d (µ, ν)E ((c (x z ))2) µ · − µ ≤ 2 ν · − ν [3] D. Bertsimas and S. Vempala: Solving convex pro- grams by random walks. J. ACM 51(4), (2004), 540– We apply this to µ = µ and ν = µ after the trans- i+1 i 556. formation Ti. By Lemma 2.2, the transformation Ti puts µi in 2-isotropic position with high probability. Using this [4] A. Dinghas: Uber¨ eine Klasse superadditiver Men- and the second inequality of Lemma 4.3, we get that for genfunktionale von Brunn–Minkowski–Lusternik- any c Rn, ∈ schem Typus, Math. Zeitschr. 68 (1957), 111–125. 2 E i ((c (x z i )) ) 128. µ +1 · − µ +1 ≤ [5] M. E. Dyer and A. M. Frieze: Computing the vol- Reversing the roles of µ and ν in Lemma 2.2 gives a lower ume of a convex body: a case where randomness bound and proves that µi+1 is 128-isotropic. Thus, the provably helps, Proc. of AMS Symposium on Prob- complexity of sampling is O(n3 ln5 n) (it suffices to set abilistic Combinatorics and Its Applications, 1991, the desired variation distance to the target distribution to 123–170. be 1/poly(n)). [6] M. Dyer, A. Frieze and R. Kannan: A random polynomial-time algorithm for approximating the 6 Rounding volume of convex bodies, J. Assoc. Comput. Mach. 38 (1991), 1–17. The optimization algorithm of the previous section can be used to achieve near-isotropic position, by starting at [7] G. Elekes: A Geometric Inequality and the Com- the uniform density and stopping with fm = f. The trans- plexity of Computing Volume. Discrete & Compu- formation Tm+1 will put f in 2-isotropic position with tational Geometry 1, 1986, 289-292. high probability. The algorithm uses only a function or- 4.5 [8] A. Frieze and R. Kannan: Log-Sobolev inequalities acle but requires O∗(n ) steps. Here we give an algo- 4 and sampling from log-concave distributions, Annals rithm that requires only O∗(n ) steps, but assuming we of Applied Probability, (1999), 14–26. are given a point x∗ where f is maximized (we will hence- 9 forth assume that x = 0). ∗ [9] A. Frieze, R. Kannan and N. Polson: Sampling Theorem 6.1 Any logconcave function f can be put in from log-concave distributions, Ann. Appl. Probab. 4 near-isotropic position using O∗(n ) oracle calls, given a 4 (1994), 812–837; Correction: Sampling from log- point that maximizes f. concave distributions, ibid. 4 (1994), 1255. The algorithm is similar to that for a convex body given [10] M. Grotschel,¨ L. Lovasz,´ and A. Schrijver: Geo- in [25]. The main difference is that instead of a pencil we metric Algorithms and Combinatorial Optimization, Rn+1 R use a log-pencil which is the function G : + Springer, 1988. defined as → [11] M. Jerrum and A. Sinclair: Approximating the per- f(x) if 0 x 2B and ≤ 0 ≤ manent, SIAM Journal on Computing, 18 (1989), G(x , x) =  ln f(x) ln max f x , 0 ≥ − 0 1149–1178. 0 otherwise.  [12] R. Kannan and L. Lovasz:´ Faster mixing through av- Here B is an upper bound on ln(max f/ min f). In erage conductance, Proc. of STOC, 1999, 282-287. words, the cross-section of G at x0 is the level set x0 Lf (max f/e ). It is easy to see G is also logconcave. [13] R. Kannan, L. Lovasz´ and M. Simonovits: Isoperi- For i = 1, . . . , m, define Gi as the restriction of G to metric problems for convex bodies and a localization the halfspace (x, x ) : x 2i/B and G = G. The lemma, J. Discr. Comput. Geom. 13 (1995) 541–559. { 0 0 ≤ } m algorithm makes the Gi’s isotropic successively. In most phases it only needs O∗(1) new samples and once every [14] R. Kannan, L. Lovasz´ and M. Simonovits: Random 5 n phases it uses O∗(n) new samples. Once G is near- walks and an O∗(n ) volume algorithm for con- isotropic, one more round of samples can be used to make vex bodies, Random Structures and Algorithms 11 f near-isotropic in Rn. (1997), 1-50. [15] R. Kannan, J. Mount and S. Tayur: A randomized References algorithm to optimize over certain convex sets, Math. of OR, 20(3), 1995, 529–549. [1] D. Applegate and R. Kannan: Sampling and integration of near log-concave functions, Proc. 23th ACM [16] A. Kalai and S. Vempala: Simulated Annealing for STOC (1990), 156–163. Convex Optimization. To appear in Math of OR.

9 [17] I. Koutis: On the Hardness of Approximate Multi- variate Integration. Proc. of APPROX, (2003), 122- 128. [18] L. Leindler: On a certain converse of Holder’¨ s In- equality II, Acta Sci. Math. Szeged 33 (1972), 217– 223. [19] L. Lovasz:´ How to compute the volume? Jber. d. Dt. Math.-Verein, Jubilaumsta¨ gung 1990, B. G. Teubner, Stuttgart, 138–151. [20] L. Lovasz:´ Hit-and-run mixes fast, Math. Prog. 86 (1998), 443-461. [21] L. Lovasz´ and M. Simonovits: Mixing rate of Markov chains, an isoperimetric inequality, and computing the volume, Proc. 31st IEEE Annual Symp. on Found. of Comp. Sci. (1990), 482–491. [22] L. Lovasz´ and M. Simonovits: On the randomized complexity of volume and diameter, Proc. 33rd IEEE Annual Symp. on Found. of Comp. Sci. (1992), 482–491. [23] L. Lovasz´ and M. Simonovits: Random walks in a convex body and an improved volume algorithm, Random Structures and Alg. 4 (1993), 359–412. [24] L. Lovasz´ and S. Vempala: The geometry of logconcave functions and sampling algorithms, to appear in Random Structures and Algorithms. Prelim. version in FOCS 2003. Available at: http://www- math.mit.edu/˜vempala/papers/logcon.pdf [25] L. Lovasz´ and S. Vempala: Simulated annealing in 4 convex bodies and an O∗(n ) volume algorithm. J. Comp. Sys. Sci. 72(2), (2006), 392–417. Prelim. version in FOCS 2003. [26] L. Lovasz´ and S. Vempala: Hit-and-run from a cor- ner. SIAM J. Computing, 35(4) (2006), 985–1005. Prelim. version in STOC 2004. [27] A. Prek´ opa: Logarithmic concave measures and functions, Acta Sci. Math. Szeged 34 (1973), 335– 343. [28] M. Rudelson: Random vectors in the isotropic position, J. Funct. Anal. 164 (1999), 60–72. [29] R.L. Smith: Efﬁcient Monte-Carlo procedures for generating points uniformly distributed over bounded regions, Operations Res. 32, (1984), 1296– 1308. [30] P. M. Vaidya: A new algorithm for minimizing convex functions over convex sets, Mathematical Pro- gramming, 73, 291–341, 1996.