Streaming for Extent Problems in High Dimensions∗

Pankaj K. Agarwal R. Sharathkumar Department of Computer Science Duke University pankaj, sharath @cs.duke.edu { }

Abstract respectively, and let (1 + ε)B denote the ε-expansion of B We develop (single-pass) streaming algorithms for main- B, i.e., (c(B), (1 + ε)r(B)). Rd taining extent measures of a S of n points in Let S be a (finite) set of points in . We use Rd. We focus on designing streaming algorithms whose MEB(S) to denote the smallest ball that contains S. working space is polynomial in d (poly(d)) and sub- For a parameter α > 1, a ball B is called an α- linear in n. For the problems of computing diameter, approximation of the minimum enclosing ball of S, or α-MEB(S) for brevity, if S B and r(B) α width and minimum enclosing ball of S, we obtain lower ⊂ ≤ · bounds on the worst-case approximation ratio of any r(MEB(S)). A set C S is called an α-coreset(S) if S α MEB(C). Let⊆ diam(S) denote any farthest streaming that uses poly(d) space. On the ⊂ · positive side, we introduce the notion of blurred ball pair (diameteral pair) of points in S, and any pair of cover and use it for answering approximate farthest- points r, s S such that, for every pair p, q S, pq α rs , is∈ referred to as α-diam(S) (α-diameteral∈ || pair).|| ≤ point queries and maintaining approximate minimum ·|| || enclosing ball and diameter of S. We describe a stream- For any slab J, which is a region bounded by a pair (h ,h ) of parallel (d 1)-dimensional hyperplanes, let ing algorithm for maintaining a blurred ball cover whose 1 2 − working space is linear in d and independent of n. d(J) be the minimum distance between h1 and h2. Let width(S) be any slab J such that S J and d(J) is ⊂ 1 Introduction minimized. For α > 1, α-width(S) is a slab J ′ such that S J and d(width(S)) d(J ) α d(width(S)). In many applications, data arrives rapidly as a stream ′ ′ Finally,⊂ for a point x Rd, an≤α-farthest-neighbor≤ · of x, of points and there is limited space to store the in- α-FN(x) is a point q∈ S such that, for every p S, put. Algorithms in the streaming model have to work xp α xq . ∈ ∈ with one or few passes over the data, using small space. || ||In ≤ this· paper,|| || we study the problems of maintaining Motivated by a wide range of applications, including α-MEB(S), α-coreset(S), α-width(S), and α-diam(S), networking, databases and geographic information sys- and of answering α-FN(x) queries, in the streaming tems, there is extensive work on streaming algorithms. model. We assume that the input points arrive one by Often, it is impossible to compute exact solutions given one, and the algorithm updates the information quickly the space constraints. Hence most of the work has using little space. Our goal is to design streaming focused on developing approximation algorithms and algorithms whose working space and update time are several novel techniques have been developed. See polynomial in d and sub-linear in the size of S. books [5, 27] for a comprehensive review. In this paper, we design streaming algorithms for several geometric problems in high dimensions. Related work. There is extensive literature on fast algorithms for computing various extent measures, such Rd Problem statement. For a point x and a value as diameter, width, and smallest enclosing ball of a B ∈ r > 0, let (x, r) denote a ball of radius r centered at x. point set in low dimensions [3, 14, 15, 17, 26, 28]. For a ball B, let c(B), r(B) denote its center and radius Agarwal et al. [1] have developed approximation algo- rithms for computing many extent measures of a set d ∗Work on this paper is supported by NSF under grants CNS- of n points in R using coresets whose running time 05-40347, CCF-06-35000, CCF-09-40671 and DEB-04-25465, by is O(n + 1/εO(d)); see also [2, 4, 11, 12]. Although ARO grants W911NF-07-1-0376 and W911NF-08-1-0452, by an NIH grant 1P50-GM-08183-01, by a DOE grant OEG- these algorithms work well in small dimensions, the ex- P200A070505, and by a grant from the U.S.–Israel Binational ponential dependency on d makes them unsuitable for Science Foundation. higher dimensions. This has led to work on develop- ing polynomial-time approximation algorithms (in n, d lower bounds are obtained using known lower bounds and 1/ε) for different extent measures and other related on the one-round communication complexity of index- problems. For example, B˘adoiu and Clarkson [9] de- ing [25]. Communication complexity has been widely velop an elegant coreset-based algorithm that given a used to prove lower bounds on the size of various data point set S Rd computes a (1 + ε)-MEB(S) in time structures including in the streaming model [6, 7, 18]. O(nd/ε + 1/ε∈5). See [10, 21, 23, 24] for other similar re- Our algorithms for maintaining α-diam(S) and α- sults. Clarkson [16] presented a general framework that coreset(S) are close to optimal. Note the contrast gives coreset-based algorithms for a number of geomet- between the lower and upper bounds for MEB — a ric problems in high dimensions; see also [19]. Goel et coreset based algorithm has a lower bound of √2, al. [20] describe fast approximation algorithms for sev- but we are able to circumvent this bound by, roughly eral proximity problems. For instance, they describe a speaking, maintaining a set of O((1/ε2) log(1/ε)) balls that answers √2-FN(x) queries in O(d) and returning a ball that contains these balls. We can time after spending O(nd) time in preprocessing. regard the centers of these balls as “weighted” points There is also some work on streaming algorithms for and thus we can get better results by maintaining a extent problems in high dimensions. Chan and Zarrabi- “weighted weak coreset” — it is not a subset of input Zadeh [29] gave a streaming algorithm, which maintains points. Although it is known that weak ε-nets are more a (3/2)-MEB(S), for a stream S Rd, using O(d) space. powerful than ε-nets [13], we are unaware of similar They also proved that any deterministic∈ algorithm that results for coresets of MEB and other extent measures. stores only a single ball in its working space cannot maintain a better than α-MEB(S), for α 1+√2 . In- 2 Blurred Ball Cover ≤ 2 dyk [22] proposed a streaming algorithm for maintain- This section defines the notion of blurred ball cover and 2 1/(α 1) ing α-diam(S) for α > √2, using O(dn − log n) describes an algorithm for maintaining such a cover in space. Recently Clarkson and Woodruff [18] described the streaming model. For a parameter 0 < ε 1, an ε- streaming algorithms for several problems in numerical blurred ball cover of a set S of n points in Rd≤, denoted algebra. by K = K(S), is a sequence K1, K2,...,Ku , where each K S is a subset of O(1h/ε) points thati satisfies Our results. We prove upper and lower bounds on the i the following⊆ three properties; let B = MEB(K ) and size of data structures for maintaining various extent i i K = i u Ki. measures under the streaming model. First, we present ≤ in Section 2, a data structure in the streaming model 2 (P1) ForS all 1 i 0, that maintains ≤ ≥ 3 a subset K S of O((1/ε ) log(1/ε)) points. Roughly (P2) For all i

q (a) (b)

Figure 1: Proof of Lemma 2.2. (a) c c′ 2εr /3, (b) c c′ > 2εr /3. || i || ≤ i || i || i

2 2 1/2 with approximation ratio ε/3. Let K∗ K A be the ( c c′ + q′c ) ⊆ ∪ ≥ || i || || i|| point set and B∗ = MEB(K∗) be the ball returned by 2 2 1/2 ((2εri/3) + ri ) Approx-MEB. The Update procedure adds K∗ to K ≥ 2 (1 + ε /8)ri. and then deletes all Ki’s for which r(Bi) εr(B∗)/4. ≥ ≤ We now prove that (P1)–(P3) are maintained by Update K Algorithm 1 ( , A) Update. If for every p A, there is an i u such 1: ∈ ≤ if p A, i u, p (1 + ε)Bi then that p (1 + ε)Bi, then the set K does not change, and ∃ ∈ ∀ ≤Approx-MEB6∈ ∈ 2: K∗,B∗ := ((K A), ε/3) (P1)–(P3) continue to hold. Hence, assume that there ∪ 3: K D := Ki r(Bi) εr(B∗)/4 is a p A such that p Bi for all i u. By Lemma 2.2, K K{ K| ≤ } ∈ 6∈ ≤ 4: := ( D) K∗ property (P1) continues to hold after each update. Note \ ◦ h i 5: end if that if p (1 + ε)B for all 1 i u, then Update 6∈ i ≤ ≤ computes a (1+ε/3)-MEB(K A). Since K∗ is the only ∪ subset added to K, K (1 + ε)B∗ and K∗ is added at Correctness. The proof of correctness relies on the ⊆ the end of the sequence, K (1+ε)B∗. Thus property following well-known observation; see e.g. [10]. i ⊆ (P2) holds after each Update. Note that a prefix KD Lemma 2.1. Let P be a set of points in Rd and let is deleted from K, so (P3) may be violated. However, B = MEB(P ). Then any closed halfspace that contains the next two lemmas prove that (P3) is also satisfied. c(B) also contains at least one point of P that is on ∂B. Lemma 2.3. For i (1 + ε /8)ri. If cic′ 2εri/3 (Figure 1(a)), then ∈ || || ≤ Lemma 2.4. K r′ c′q For all Ki D, (1 + ε)Bi (1 + ε)B∗. ≥ || || ∈ ⊆ ciq c′ci ≥ || || − || || Proof. Let c∗ = c(B∗), r∗ = r(B∗), ci = c(Bi), and (1 + ε)ri 2εri/3 ri = r(Bi). Let Ki KD, then ri εr∗/4. Since ≥ 2 − ∈ ≤ (1 + ε /8)r . Ki (1 + ε/3)B∗, the proof of Lemma 2.3 implies ≥ i ⊂ c(Bi) (1 + ε/3)B∗. For any point x (1 + ε)Bi, On the other hand, if c′c > 2εr /3 (Figure 1(b)), ∈ ∈ || i|| i then let h be the hyperplane passing through ci with the xc∗ c∗ci + cix + || || ≤ || || || || direction cic′ as its normal, and let h be the halfspace, c∗ci + (1 + ε)ri bounded by h, that does not contain c′. By Lemma 2.1, ≤ || || + (1 + ε/3)r∗ + ε(1 + ε)r∗/4 there is a point q′ Ki h that is at a distance ri ≤ ∈ ∩ (1 + ε)r∗. from ci. Therefore ≤

r′ q′c′ Therefore q (1+ε)B∗ and thus (1+ε)B (1+ε)B∗. ≥ || || ∈ i ⊆ Lemma 2.4 immediately implies that for any Ki Let h be the hyperplane passing through ci and ∈ + KD, if there is a point q (1 + ε)Bi then q (1 + ε)B∗. normal to xci and let h be the halfspace, bounded Hence, for every q S∈, there is an i such∈ that q by h, that does not contain x. By Lemma 2.1, there is ∈ ∈ (1 + ε)Bi, implying (P3). a point z Ki such that zci = ri. Since ∠xciz is obtuse, ∈ || || Size and update time. Let ri = r(Bi). Update 2 2 2 1/2 ensures that ru/r1 4/ε. By (P1), ri+1 (1 + ε /8)ri. xz ( xci + ciz ) ≤ ≥ 2 Therefore u log 2 (4/ε) = O((1/ε ) log(1/ε)). || || ≥ || || || || 1+ε /8 (3.3) ( xc 2 + r2)1/2. ≤ ⌈ 3 ⌉ i i Hence K Ki = O((1/ε ) log(1/ε)). ≥ || || Note| | ≤ that| Update| spends O(u. A ) = By combining (3.2) and (3.3) and using (3.1), 2 P | | O((d A /ε ) log(1/ε)) time to test the condition in 2 2 1/2 | | xp′ √2( xci + r ) + εri line 1. The time spent in computing K∗ and B∗ in line || || ≤ || || i 2 is O(( A + K )d/ε + (1/ε5)) or (√2 + ε) xz | | | | ≤ || || √ O((d A /ε) + (d/ε4) log(1/ε) + (1/ε5)). ( 2 + ε) xq′ . | | ≤ || || We conclude the following. If we update K after the arrival of each new point, 5 then the update time is O(d/ε ). However, if we batch Theorem 3.1. For a stream S of points in Rd and 3 the updates and invoke Update only after O(1/ε ) a parameter 0 < ε 1, there is a data structure of points have arrived, the total time spent will be size O((d/ε3) log(1/ε≤)) that answers (√2 + ε)-FN(x) in 5 O((d/ε ) log(1/ε)) which is the time taken by line 1 of time O((d/ε3) log(1/ε)). The amortized update time is the update procedure. The amortized time for Update O((d/ε2) log(1/ε)). will then be O((d/ε2) log(1/ε)). We thus conclude the following. Diameter. For a point set S, we maintain a α-diam(S), for α = √2 + ε, using the above data structure for Theorem 2.1. For any 0 < ε 1, an ε-blurred ball answering α-FN(x) queries. Let p, q be the current α- cover of size O((d/ε3) log(1/ε)) of≤ a stream of points can diameteral pair maintained by the algorithm. When be maintained in O((d/ε2) log(1/ε)) amortized time. a new point z arrives, we compute the w = α-FN(z). 3 Applications of Blurred Ball Cover If wz > pq , we replace (p, q) with (w,z). We conclude|| || the|| following.|| We now describe streaming algorithms for answering d (√2 + ε)-FN(x) queries and for maintaining (√2 + ε)- Theorem 3.2. For a stream S of n points in R , 3 diam(S), (√2 + ε)-MEB(S) and ( 1+√3 + ε)-MEB(S) there is a data structure of size O((d/ε ) log(1/ε)) that 2 maintains a (√2 + ε)-diam(S). The amortized update using the blurred ball cover. We use the following simple 3 inequality repeatedly. For any x,y R, time of this structure is O((d/ε ) log(1/ε)). ∈ (3.1) (x + y) √2(x2 + y2)1/2. Coreset for minimum enclosing ball. For a stream ≤ S of points, we maintain an ε-blurred ball cover K and Farthest-neighbor queries. We maintain an ε- update it in the batched mode. Let A be the set of K points not processed by Update and let Q = K A. blurred ball cover and update it in the batched ∪ mode. Let A be the set of newly arrived points in S Let B = MEB(Q), c = c(B) and r = r(B). We argue that for every K K, (1 + ε)B (√2 + ε)B. that have not been processed yet. Set Q = K A; i ∈ i ⊆ Q = O((1/ε3) log(1/ε)). For any query point x ∪Rd, Lemma 3.1. For all i u, (1 + ε)Bi (√2 + ε)B. we| | return the point of Q that is farthest from x∈. We ≤ ⊆ claim that for any x Rd, Proof. For any ball B in the blurred ball cover, let ∈ i ci = c(Bi) and ri = r(Bi). Observe that Ki B. Let max xp (√2 + ε) max qx . ⊆ p S || || ≤ q Q || || t∗ = arg maxt (1+ε)Bi tc . t∗c = cci + (1 + ε)ri. ∈ ∈ By Lemma 2.1,∈ there|| is|| a|| point|| p || K|| such that ∈ i Let p′ = arg maxp S px and q′ = arg maxq Q qx . pc cc 2 + r2. But pc r. Hence, ∈ || || ∈ || || || || ≥ || i|| i || || ≤ If p′ A then p′ = q′ and the claim is obviously true. If ∈ Update p t∗c cci + (1 + ε)ri p′ S A, i.e., p′ has been processed by , then, || || || || √2 + ε, ∈ \ 2 2 by (P3), there is an i u such that p′ (1 + ε)B . pc ≤ cci + r ≤ ≤ ∈ i || || || || i Assuming ci = c(Bi) and ri = r(Bi), or t∗c (√2 + ε) ppc (√2 + ε)r. Thus (1 + ε)B || || ≤ || || ≤ i ⊆ (3.2) xp′ xc + c p′ xc + (1 + ε)r . (√2 + ε)B. || || ≤ || i|| || i || ≤ || i|| i Lemma 3.1 in conjunction with property (P3) of By (P3), blurred ball cover implies that S (√2 + ε)B. Thus Q ⊂ is a (√2 + ε)-coreset(S). We conclude the following. S (1 + ε/9)B (1 + ε/9)B∗. ⊂ i ⊆ i u [≤ Theorem 3.3. For a stream S of points in Rd and a parameter 0 < ε 1, there is a data structure of size By Lemma 3.2, we have O((d/ε3) log(1/ε))≤that maintains a (√2+ε)-coreset(S) 1 + √3 of minimum enclosing ball. The amortized update time r∗ + ε/3 r.ˆ of this structure is O((d/ε2) log(1/ε)). ≤ 2 !

Thus Minimum enclosing ball. For a stream S of points, we maintain a (ε/9)-blurred ball cover K of S. K 1 + √3 is updated whenever a new point arrives. Let B = (1 + ε/3)r∗ + ε/3 (1 + ε/3)ˆr ≤ 2 ! B1,...,Bu . Let B∗ = MEB(B). We return (1 + { } 5 ε/3)B∗ which can be computed in time O(1/ε ) [23]. 1 + √3 5 + ε r,ˆ Hence the total update time is O(1/ε ). Letr ˆ = ≤ 2 ! r(MEB(S)). 1+√3 implying that (1 + ε/3)B∗ is indeed a + ε - 1 + √3 2 Lemma 3.2. r(B∗) + ε/3 rˆ. MEB(S). We conclude the following.   ≤ 2 ! Theorem 3.4. Given a stream S of points in Rd, K B Proof. Let ′ = Ki Bi ∂B∗ = and ′ = there is a data structure of size O((d/ε3) log(1/ε)) that K h | ∩ 6 ∅i K Bi Ki ′ . Note that any subsequence of , and maintains a ( √3+1 + ε)-MEB(S). The (worst-case) { | K ∈ } 2 hence ′, satisfies properties (P1) and (P2) of blurred update time of this structure is O(d/ε5). ball cover. Let Bt be the largest ball in B′. Let ct = c(Bt), rt = r(Bt), c∗ = c(B∗), r∗ = r(B∗) and let Remark. A slightly better bound on r∗, the radius k = r∗/rt. Observe that ctc∗ = r∗ rt = (k 1)rt. || || − − of the ball returned by the above algorithm can be Let h be a hyperplane passing through c∗ with a nor- + obtained using a more involved argument. While, mal parallel to ctc∗. Let h be the halfspace bounded from the lower bounds on α-MEB(S) in Section 4 it by h and that does not contain ct. By Lemma 2.1, 1+√2 there is a point bi ∂B∗ Bi, for some i, such that follows that r∗ > 2 , the following example shows ∈ ∩ 1+√2 bic∗ = r∗ = krt. Since ∠ctc∗bi is obtuse, we have that one cannot hope to prove r = + ε. Let || || ∗ 2 the input be a stream S = p ,p ,p ,p ,... , where h 1 2 3 4 i (3.4) c b (k 1)2r2 + k2r2. p = ( 1 , 1 , √3 ), p = ( 1 , 1 , √3 ), p = t i t t 1 √2 2√2 2√2 2 √2 2√2 2√2 3 || || ≥ − 1 1 − q ( , , 0), p4 = ( 1, 0, 0), and p5,p6,..., are points Since the proof of Lemma 3.1 requires only (P2) of √2 − √2 − on a unit sphere in R3. When the points in S arrive blurred ball cover, it holds for any subsequence of K. in this order, the blurred ball cover will consist of In particular, it holds for K′. Hence, for all Bi B′, ∈ K = p1,p2 , p1,p2,p3 , p1,p2,p3,p4 . The three Bi (√2 + ε/9)Bt, implying that {{ } { } { }} ⊆ balls B1,B2,B3 of K are described by { } (3.5) c b (√2 + ε/9)r . || t i|| ≤ t 1 1 √3 B1 = B , , 0 , , Combining (3.4) and (3.5), we have √2 2√2 2√2   ! 1 1 B2 = B , 0, 0 , , (k 1)2r2 + k2r2 (√2 + ε/9)2r2 2(1 + ε/9)2r2. √2 √2 − t t ≤ t ≤ t    B3 = B ((0, 0, 0), 1) . Solving this quadratic equation, we can deduce that, k 1+√3 + ε/3. ≤ 2 Thus Let B′ = MEB(B B ) with r(B′) = (1 + √2)/2 and 1 ∪ 2 c(B′) = ((1/√2) (1/2), 0, 0). Let q be the farthest 1 + √3 − 1+√2 r∗ krt krˆ + ε/3 r.ˆ point of B3 from c(B′). qc(B′) = 3/2 > 2 . We ≤ ≤ ≤ 2 || || 1+√2 5 ! evaluate B∗ to be θ-MEB(S) where θ + 10− . p≥ 2 1/2 4 Lower Bounds (1 + 2/d1/3)2 + (1 4/d2/3) In this section we show that any randomized streaming ≤ − √2(1 + 2/d1/3)1/2  algorithm that maintains α-diam(S), α-width(S), α- ≤ MEB(S), or α-coreset(S) with probability at least 2/3, √2(1 + 2/d1/3). requires Ω(min( S , exp(d1/3))) space for certain values ≤ | | of α. In particular, α < √2(1 2/d1/3) for α-diam(S) Similarly one can show and α-coreset(S), α d1/3/−8 for α-width(S), and ≤ 1/3 1+√2 1/3 Sd 1 st st′ √2(1 2/d ). α < 2 (1 2/d ) for α-MEB(S). Let − denote || || ≥ || || ≥ − (d 1)-dimensional− unit sphere centered at origin. We − Sd 1 show how to sample points from − which will be Diameter. The problem Index is defined as follows. crucial for proving our lower bounds. Let Alice and Bob be two players. Suppose Alice has k d 1 d 1 Sampling in S − . Let u S − and ε (0, 1). Let H a binary string σ 0, 1 and Bob has an index ∈ ∈ i [1 : k]. For any∈j { [1} : k], let σ be the bit that be the hyperplane x,u = ε, i.e., H is the hyperplane ∈ ∈ j that is at distance hε fromi the origin and normal to the appears in the jth position of σ. Alice sends a message d 1 vector u. H divides S − into two spherical regions. We to Bob after which Bob needs to determine σi with refer to the smaller spherical region as a spherical cap probability at least 2/3. It is well-known that the length and denote it by C(u, ε). We define its measure to be of any message sent by Alice that helps Bob determine σi with probability at least 2/3 is Ω(k) [25]. SA(C(u, ε)) 1/3 µ(u, ε) = , We choose the smallest d so that k exp(d ). SA Sd 1 1/3 ≤ ( − ) Hence k = Θ(exp(d )). Let A be a streaming algorithm that maintains α-diam(S) for α √2(1 where SA(X) is the surface area of X. It is well 2/d1/3) with probability at least 2/3. We≤ choose− known [8] that 1/3 d 1 2 exp(d ) points on S − using Lemma 4.1. Let L (4.6) µ(u, ε) exp( dε2/2). be the subset of at least k of these points that lie on the ≤ − hemisphere x 0. We sort L in lexicographic order, Lemma 4.1. 1 There is a centrally symmetric point set which induces≥ a map ϕ : [1 : k] L. The map ϕ Sd 1 1/3 → K − of size Ω(exp(d )) such that for any pair of is independent of the input string σ and the index i. distinct⊆ points p, q K if p = q, then ∈ 6 − Hence we can assume that both Alice and Bob know ϕ √2(1 2/d1/3) pq √2(1 + 2/d1/3). without communicating any bits. − ≤ || || ≤ For any σ 0, 1 k, let ϕ(σ) = ϕ(x) σ = 1 . ∈ { } { | x } Proof. The set K is constructed incrementally. We Alice adds ϕ(σ) as input to A in an arbitrary order. maintain the invariant that if p, q K and q = p Then, Alice communicates the working space of A to 1/3 ∈ 6 then q C(p, 2/d ). We also maintain a centrally Bob. In order to determine the σi, Bob adds as input, 6∈ Sd 1 1/3 A symmetric region F = − q K C(q, 2/d ). Ini- ϕ(i) to . Let p, q S be the pair of points returned d 1 \ ∈ − A ∈ tially K = and F = S − . If F = , we choose an by at the end of the protocol. If p = q, Bob ∅ S 6 ∅ − antipodal pair of points p, p from F . By construc- determines σi = 1. Else, Bob determines σi = 0. The tion, p, p C(q, 2/d1/3) for− any point q K. More- following lemma shows the correctness of this protocol. over, by− symmetry,∈ no point in K lies in C∈(p, 2/d1/3) Lemma 4.2. k C( p, 2/d1/3). We add p and p to K. We set∪ For any σ 0, 1 and i [1 : k]. Let − − ∈ { } ∈ 1/3 F = F C(p, 2/d1/3) C( p, 2/d1/3) to ensure that S = ϕ(σ) ϕ(σi) and α √2(1 2/d ). Every \ ∪ − ∪ {− } ≤ − the invariant holds after adding p and p . The algorithm α-diam(S) pair is an antipodal pair if and only if σi = 1.  ′ terminates when F = . By (4.6), µ(p, 2/d1∅/3)+µ( p, 2/d1/3) exp( d1/3), Proof. Since all points in ϕ(σ) lie in one hemisphere, therefore the above algorithm− performs≤ Ω(exp(−d1/3)) there is an antipodal pair of points in S if and only if steps before it stops. Hence K = Ω(exp(d1/3)). Let ϕ(i) ϕ(σ), i.e. σi = 1. If σi = 0, then ϕ(i) ϕ(σ) A∈ 6∈ s,t K be such that s = t, | t.| By construction, s and does not return an antipodal pair of points. On ∈ 1/3 1/6 3 − 6∈ the other hand, suppose σi = 1 and suppose there is C(t, 2/d ) C( t, 2/d ). Let t′ (resp. t′′) be a point ∪ − 1/3 1/3 an α-diameteral pair p, q S such that p = q. Since on the boundary of C(s, 2/d ) (resp. C( s, 2/d )). ∈ 6 − Then st st st . A simple− trignometric ϕ(σi), ϕ(σi) S, diam(S) 2. From Lemma 4.1, ′ ′′ { − } ⊆ √ ≥ 1/3 calculation|| || shows ≤ || that|| ≤ (see || Figure|| 2(a)) it follows that pq 2(1 + 2/d ), i.e., diam(S) > α pq for α || √||2(1 ≤ 2/d1/3). Thus p, q is not an α- diameteral· || || pair≤ implying− that α-diam(S) is an antipodal st st′′ pair of points. || || ≤ || || s

t ′ ϕ i t 2ε ( ) t 2 ′′ ϕ(i) p1 ε2 − − ǫ) √ 2(1 + −s qi (a) (b)

Figure 2: (a) Proof of Lemma 1, (b) Lower bound for α-MEB(S); here ε = 2/d1/3.

Since A returns an α-diam(S) with probablity at Theorem 4.2. Any streaming algorithm that main- d least 2/3, Bob determines σi correctly with the same tains an α-MEB(S) of a set S of n points in R , for probability. Since the communication complexity of α < 1+√2 (1 2/d1/3), with probability at least 2/3 re- 2 − any randomized algorithm for the indexing problem is quires Ω(min n, exp(d1/3) ) bits of storage. Ω(k) = Ω(exp(d1/3)), it follows that the work space of A is Ω(exp(d1/3)). We thus conclude the following.  Coreset for minimum enclosing ball. The reduc- Theorem 4.1. Any streaming algorithm that main- tion to α-MEB(S) can be modified to obtain lower tains an α-diam(S) of a set S of n points in Rd, for bounds on α-coreset. Let A be an algorithm that main- α < √2(1 2/d1/3), with probability at least 2/3 re- tains a α-coreset(S) for α √2(1 2/d1/3) with prob- ≤ − quires Ω(min− n, exp(d1/3) ) bits of storage. ability 2/3. Let ϕ be a map as defined previously. Alice passes points in ϕ(σ) as input to A in an arbitrary order Minimum enclosing ball. Let A be an algorithm and communicates the working space of A to Bob. For that maintains an α-MEB(S), for α 1+√2 (1 2/d1/3) index i, qi is the point as defined above; see Figure 2(b). 2 A with probability at least 2/3. The map≤ ϕ is defined− as Let C be α-coreset maintained by . For index i, Bob A adds qi as input to A and checks whether ϕ(i) C. If above. Alice passes points in ϕ(σ) as input to in an ∈ arbitrary order and communicates the working space ϕ(i) C, Bob reports σi = 1. Otherwise, Bob reports σ =∈ 0. of A to Bob. For index i, let qi be the point that is i in the direction ϕ(i) and at distance √2(1 + 2/d1/3) The correctness of this reduction follows from the (resp. 2 + √2(1 +− 2/d1/3)) from ϕ(i)(resp. ϕ(i)); see following lemma. − A A Figure 2(b). Bob adds qi as input to . If returns Lemma 4.3. If C is an α-coreset(ϕ(σ) q ), for α < 1 i a ball of radius at least (1 + ), then Bob declares 1/3 ∪ { } √2 √2(1 2/d ) then, σi = 1. Otherwise, Bob declares σi = 0. − (i) q C, and Let S = ϕ(σ) qi . Note that ϕ(i)qi = i {1/3 ∪ { }} || || ∈ 2 + √2(1 + 2/d ). Hence if σi = 1, then ϕ(i), qi S (ii) ϕ(i) C if and only if σi = 1. and hence the radius of any ball containing S must∈ be ∈ at least Proof. Let S = ϕ(σ) q . First, we prove (i). ∪ { i} 1/3 1 Suppose, on the contrary, that qi C and hence ϕ(i)qi /2 = (2 + √2(1 + 2/d ))/2 > 1 + . Bd 6∈ || || √ C ϕ(σ). Let be the unit ball centered at origin. 2 ⊆ d 1 Let β be MEB(C). Since C S − , r(β) 1 and On the other hand, if σ = 0, then ϕ(i) S. By Bd ⊂ ≤ Bd i 6∈ c(β) . Observe that, for any point t , Lemma 4.1, all points in ϕ(σ) are within distance √2(1+ ∈ 1/3 ∈ tqi √2(1 + 2/d ). Hence, 2/d1/3) from ϕ(i). Since, q ( ϕ(i)) = √2(1 + || || ≥ − || i − || 2/d1/3), the radius of MEB(S) is at most √2(1+2/d1/3). c(β)q √2(1 + 2/d1/3) || i|| ≥ In this case, the radius of any α-MEB(S) is at most √2(1 + 2/d1/3)r(β) α r(MEB(S)) (1 + 1 ), for α < 1+√2 (1 2/d1/3). ≥ · ≤ √2 2 − > α r(β), Since A outputs a correct α-MEB(S) with probability · at least 2/3, Bob can determine σi with the same leading to a contradiction that C is an α-coreset(S). probability. We can thus conclude that the workspace Now, we prove (ii). If σi = 0, then ϕ(i) S and of A is Ω(exp(d1/3)) and obtain the following. C S implies ϕ(i) C. On the other hand,6∈ suppose ⊆ 6∈ σi = 1 but ϕ(i) C. Let β = MEB(C). Observe that other hand, if σi = 0, then ( ϕ(i), ϕ(i)) S. Let W all points in C are6∈ within a distance √2(1+2/d1/3) from be a slab bounded by the two− hyperplanes6∈ x, ϕ(i) = ϕ(i). Hence r(β) √2(1 + 2/d1/3). It follows from 2/d1/3 and x, ϕ(i) = 2/d1/3. By Lemmah 4.1 andi − ≤ h i − (i) that the center c(β) β∗, where β∗ = B(qi, √2(1 + the fact that Nϕ(i) hϕ(i) W , we have S W and 1/3 ∈ ⊂ ⊂ ⊂ 2/d )). For any point p β∗, pϕ(i) 2, therefore d(width(S)) 4/d1/3. For α < d1/3/8, ∈ || || ≥ ≤ c(β)ϕ(i) 2 d(J) α d(width(S)) α 4/d1/3 1/2. || || ≥ ≤ · ≤ · ≤ √2(1 2/d1/3)√2(1 + 2/d1/3) ≥ − Hence, we conclude the following. α r(β), ≥ · Theorem 4.4. 1/3 Any streaming algorithm that main- for α < √2(1 2/d ), leading to a contradiction that d − tains an α-width(S) of a set of n points in R , for C is an α-coreset(S). α < d1/3/8, with probability at least 2/3, requires Ω(min n, exp(d1/3) ) bits of storage. Hence, we conclude the following.  Theorem 4.3. Any streaming algorithm that main- 5 Conclusion tains an α-coreset(S) of a set S of n points in Rd, for In this paper, we design streaming algorithms for main- α < √2(1 2/d1/3), with probability at least 2/3 requires taining several extent measures on high-dimensional Ω(min n,−exp(d1/3) ) bits of storage. point sets. Our approach is based on the notion of blurred ball cover that, for a stream S of points, main-  tains a subset K S. K can be interpreted as the union Width. Let B be a ball centered at origin with of a set of K ,...,K⊆ , u = O((1/ε2) log(1/ε)), of sub- r(B) = 1/2. For any vector u Sd 1, let h be a 1 u − u sets of S each{ of size O}(1/ε) so that S lies in the union hyperplane passing through the origin∈ and normal to u, of (1 + ε)MEB(K ). We provide a simple streaming al- i.e., x,u = 0. Let N h be a set of d points such i u u gorithm for maintaing blurred ball cover. We then use thath conv(i N u,u )⊂ contains B; see Figure 3. For u ∪ {− } blurred ball cover to provide streaming algorithms for any σ 0, 1 k, let ϕ(σ) = p p ϕ(σ) . ∈ { } − {− | ∈ } various extent measures. Finally, we show lower bounds u on the worst-case approximation ratio of any stream- ing algorithm that uses poly(d) space to maintain these B extent measures. We conclude by stating related prob-

v1 lems. hu v3 v 2 Is there a fully dynamic data structure, whose size • is linear in n and poly(d, 1/ε), that maintains a (1 + ε)-MEB(S)? −u Can the concept of blurred ball cover be general- • Figure 3: Nu = v1,v2,v3 , B conv(Nu u,u ). ized for maintaining approximate smallest encosing { } ⊂ ∪ {− } convex shape of a high dimensional point set, e.g., minimum enclosing ellipsoids, under the streaming Index The problem can be reduced to α-width(S) model? as follows. Let A be a streaming algorithm that maintains α-width(S), for α < d1/3/8, with probability at least 2/3. Alice passes points in ϕ(σ) ϕ(σ) to A in an arbitrary order and then communicates{ ∪ − the} References working space of A to Bob. For index i, Bob adds Nϕ(i) to A. If A reports a slab J such that d(J) is at least 1, then Bob reports σ = 1 and otherwise Bob reports [1] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan, i Approximating extent measures of points, J. ACM, σ = 0. i 51 (2004), 606–635. The correctness of the reduction follows from the [2] P. K. Agarwal, S. Har-Peled, and K. R. Varadara- following observation. Let S = ϕ(σ) ϕ(σ) N . { ∪ − ∪ ϕ(i)} jan, Geometric approximation via coresets, in: Combi- If σi = 1, then Nϕ(i) ϕ(i), ϕ(i) S. By natorial and Computational Geometry { ∪ {− }} ⊆ (J. Goodman, construction of N , B conv(N ϕ(i), ϕ(i) ). J. Pach, and E.Welzl, eds.), Cambridge University ϕ(i) ⊂ ϕ(i) ∪ { − } Since r(B) = 1/2, d(J) d(width(S)) 1. On the Press, 2005, pp. 1–30. ≥ ≥ [3] P. K. Agarwal, J. Matouˇsek, and S. Suri, Farthest pp. 922–931. neighbors, maximum spanning trees and related prob- [17] K. L. Clarkson and P. W. Shor, Applications of ran- lems in higher dimensions, Comput. Geom. Theory dom sampling in computational geometry, II., Discrete Appl., 1 (1992), 189–201. Comput. Geom., 2 (1987), 195–222. [4] P. K. Agarwal and H. Yu, A space-optimal data-stream [18] K. L. Clarkson and D. P. Woodruff, Numerical linear algorithm for coresets in the plane, Proc. 23rd Annual algebra in the streaming model, Proc. 41st Annual Sympos. Comput. Geom., 2007, pp. 1–10. ACM Sympos. Theory of Comput., 2009, pp. 205–214. [5] C. Aggarwal, Data Streams: Models and Algorithms, [19] B. G¨artner and M. Jaggi, Coresets for polytope dis- Springer, 2007. tance, Proc. 25th Annual Sympos. Comput. Geom., [6] N. Alon, Y. Matias, and M. Szegedy, The space 2009, pp. 33–42. complexity of approximating the frequency moments, [20] A. Goel, P. Indyk, and K. Varadarajan, Reductions J. Comput. Syst. Sci., 58 (1999), 137–147. among high dimensional proximity problems, Proc. [7] A. Andoni, D. Croitoru, and M. Patrascu, Hardness 12th Annual ACM-SIAM Sympos. Discrete Algorithms, of nearest neighbor under L-infinity, Proc. 49th An- 2001, pp. 769–778. nual IEEE Sympos. Foundations of Comp. Sc., 2008, [21] S. Har-Peled and K. R. Varadarajan, Projective clus- pp. 424–433. tering in high dimensions using core-sets, In Proc. 18th [8] K. Ball, An elementary introduction to modern convex Annual Sympos. Comput. Geom, 2003, pp. 312–318. geometry, in Flavors of Geometry, 31 (1997), 1–58. [22] P. Indyk, Better algorithms for high-dimensional prox- [9] M. B˘adoiu and K. L. Clarkson, Optimal core-sets for imity problems via asymmetric embeddings, Proc. balls, Comput. Geom. Theory Appl., 40 (2008), 14–22. 14th Annual ACM-SIAM Sympos. Discrete Algorithms, [10] M. B˘adoiu, S. Har-Peled, and P. Indyk, Approximate 2003, pp. 539–545. clustering via core-sets, Proc. 34th Annual ACM Sym- [23] P. Kumar, J. S. B. Mitchell, and E. A. Yildirim, Ap- pos. Theory Comput., 2002, pp. 250–257. proximate minimum enclosing balls in high dimensions [11] T. M. Chan, Approximating the diameter, width, using core-sets, J. Exp. Algorithmics, 8 (2003), 1.1. smallest enclosing cylinder, and minimum-width annu- [24] P. Kumar and E. A. Yildirim, Minimum-Volume En- lus, Intl. J. Comput. Geom. Appl., 12 (2002), 67–85. closing Ellipsoids and Core Sets, J. Optimization The- [12] T. M. Chan, Faster core-set constructions and data- ory and Appl., 126 (2005), 1–21. stream algorithms in fixed dimensions, Comput. Geom. [25] E. Kushilevitz and N. Nisan, Communication Complex- Theory Appl., 35 (2006), 20 – 35. ity, Cambridge University Press, 1997. [13] B. Chazelle, H. Edelsbrunner, M. Grigni, L. Guibas, [26] J. Matouˇsek, M. Sharir, and E. Welzl, A subexpo- M.Sharir, and E. Welzl, Improved bounds on weak nential bound for linear programming, Algorithmica, ε-nets for convex sets, Discrete Comput. Geom., 16 (1996), 498 – 516. 13 (1995), 1–15. [27] S. Muthukrishnan, Data Streams: Algorithms and [14] B. Chazelle and J. Matouˇsek, On linear-time deter- Applications, Now Publishers Inc., 2005. ministic algorithms for optimization problems in fixed [28] E. A. Ramos, An Optimal Deterministic Algorithm for dimension, J. Algorithms, 21 (1996), 579 – 597. Computing the Diameter of a three-dimensional point [15] K. L. Clarkson, Las Vegas algorithms for linear and set, Discrete Comput. Geom., 26 (2001), 233–244. integer programming when the dimension is small, J. [29] H. Zarrabi-Zadeh and T. Chan, A simple streaming ACM, 42 (1995), 488–499. algorithm for minimum enclosing balls, Proc. 18th An- [16] K. L. Clarkson, Coresets, sparse greedy approxima- nual Canadian Conf. Comput. Geom., 2006, pp. 139– tion, and the Frank-Wolfe algorithm, Proc. 19th An- 142. nual ACM-SIAM Sympos. Discrete Algorithms, 2008,