K-Medians, Facility Location, and the Chernoff-Wald Bound
Total Page:16
File Type:pdf, Size:1020Kb
86 K-Medians, Facility Location, and the Chernoff-Wald Bound Neal E. Young* Abstract minimizez d + k We study the general (non-metric) facility-location and weighted k-medians problems, as well as the fractional cost(x) < k facility-location and k-medians problems. We describe a dist(x) S d natural randomized rounding scheme and use it to derive approximation algorithms for all of these problems. subject to ~-~/x(f,c) = 1 (Vc) For facility location and weighted k-medians, the re- =(f,c) < x(f) (vf, c) spective algorithms are polynomial-time [HAk + d]- and [(1 + e)d; In(n + n/e)k]-approximation algorithms. These x(f,c) >_ 0 (vf, c) performance guarantees improve on the best previous per- x(f) E {0,1} (vf) formance guarantees, due respectively to Hochbaum (1982) and Lin and Vitter (1992). For fractional k-medians, the al- gorithm is a new, Lagrangian-relaxation, [(1 + e)d, (1 + e)k]- Figure 1: The facility-location IP. Above d, k, x(f) and approximation algorithm. It runs in O(kln(n/e)/e 2) linear- x(f, c) are variables, dist(x), the assignment cost of x, time iterations. is ~-~Ic x(f, c)disz(f, c), and cost(x), the facility cost of For fractional facilities-location (a generalization of frac- tional weighted set cover), the algorithm is a Lagrangian- relaxation, ~(1 + e)k]-approximation algorithm. It runs in O(nln(n)/e ~) linear-time iterations and is essentially the The weighted k-medians problem is the same same as an unpublished Lagrangian-relaxation algorithm as the facility location problem except for the following: due to Garg (1998). By recasting his analysis probabilis- a positive real number k is given as input, and the goal tically and abstracting it, we obtain an interesting (and as far as we know new) probabilistic bound that may be of is to choose a subset F of facilities minimizing dist(F) independent interest. We call it the Chernoff- Wald bound. subject to the constraint cost(F) < k. The standard integer programming formulation is the k-medians IP, 1 Problem definitions which differs from the IP in Fig. 1 only in that k is given, not a variable, and the objective function is d instead The input to the weighted set cover problem is of d + k. The (unweighted) k-medians problem is the a collection of sets, where each set s is given a cost special case when each cost(f) = 1. For the fractional cost(s) E ~+. The goal is to choose a cover (a collection k-medians problem, the input is the same; the goal is of sets containing all elements) of minimum total cost. to solve the linear program obtained by removing the The (uncapacitated) facility-location problem is constraint "x(f) E {0, 1}" from the k-medians IP. a generalization of weighted set cover in which each We take the "size" of each of the above problems to set f (called a "facility") and element c (called a be the number of pairs (f, c) such that d±st(f, c) < oo. "customer") are given a distance dist(f, c) E ~+U{oo}. We assume the size is at least the number of customers The goal is to choose a set of facilities F minimizing and facilities. cost(F) + dist(F), where cost(F), the facility cost of By an In(d); fl(k)]-approximation algorithm for F, is ~"~.EF COSt(f) and dist(F), the assignment cost k-medians, we mean an algorithm that, given a problem of F, is )-:c rainier d±st(f, c). instance for which there exists a fractional solution of Fig. 1 shows the standard integer programming assignment cost d and facility cost k, produces a solution formulation of the problem -- the facility-location IP of assignment cost at most a(d) and facility cost at most [12, p. 8]. The facility-location linear program (LP) is D(k). We use similar non-standard notations for facility the same except without the constraint "x(f) E {0, 1}". location, set cover, and k-set cover. For instance, by A fractional solution is a feasible solution to the LP. a [d + 2k]-approximation algorithm for facility location, Fractional facility location is the problem of solving the we mean an algorithm that, given an instance for which LP. there exists a fractional solution of assignment cost d -'--'¢R-~'earch partially funded by NSF CAREER award CCR- and facility cost k, produces a solution for which the 9720664. Department of Computer Science, Dartmouth College, assignment cost plus the facility cost is at most d + 2k. Hanover, NH. ney~}cs,daxtmouZh, edu 87 2 Background 3 Results In the mid 1970's Johnson and Lovasz gave a greedy Fig. 2 shows the simple randomized rounding scheme [HAk]-approximation algorithm for unweighted set at the center of all our results. With minor variations cover [9, 11]. In 1979 Chvatal generalized it to a [HAk]- (e.g. Fig. 4), this rounding scheme can be used as approximation algorithm for weighted set cover [3]. the basis for approximation algorithms for set cover, In 1982 Hochbaum gave a greedy [HA(d + k)]- weighted set cover, facility location, and k-medians, and approximation algorithm for the uncapacitated facility- as the basis for Lagrangian-relaxation algorithms for the location problem by an implicit reduction to the fractional variants of these problems. weighted set-cover problem [8].1 Above /x is at most Although essentially the same rounding scheme suf- the maximum, over all facilities f, of the number of rices for each of these problems, the respective proba- customers c such that dist(f, c) < oo. bilistic analyses require different (albeit standard) tech- In 1992 Lin and Vitter gave a polynomial-time niques in each case. For set cover, a simple direct [(1 + e)d; (1 + 1/e) (lnn + 1) k]-approximation algorithm analysis suffices [16]. For weighted set cover, facility for the k-medians problem [10]. (Here c > 0 is an location, and k-medians, a basic probabilistic lemma input parameter that tunes the tradeoff between the called Wald's inequality is necessary. For fractional set two criteria and n is the number of customers.) Their cover and k-medians, the analysis rests on the Chernoff algorithm combines a greedy algorithm with a technique bound. For fractional weighted set cover and facility lo- they call filtering. cation, the analysis is a simple application of what we In 1994 Plotkin, Shmoys, and Tardos (PST) gave call the Chernoff- Wald bound. Lagrangian-relaxation algorithms for general packing For each problem, we apply the method of condi- and covering problems [13]. As a special case, their tional probabilities to the rounding scheme in order to algorithms imply a [(1 + e)k]-approximation algorithm derive a corresponding approximation algorithm. The for fractional set cover that runs in O(k ln(n)/e 2) linear- structure of each resulting algorithm, being closely tied time iterations. Here n is the number of elements. to the underlying probabilistic analysis, ends up differ- In 1998 Garg generalized and simplified the PST ing substantially from problem to problem. set cover algorithm to obtain a [(1 + e)k]-approximation For facility location, the resulting algorithm (Fig. 2) algorithm for fractional weighted set cover [6]. Garg's is a randomized rounding, polynomial-time [d + H~xk]- algorithm runs in O(nln(n)/e 2) linear-time iterations. approximation algorithm. The performance guarantee By Hochbaum's reduction, one can use Garg's algorithm improves over Hochbaum's algorithm with respect to as a [(1 + e)k]-approximation algorithm for fractional the assignment costs. /acility location. The running time is the same, where For weighted k-medians, the resulting algorithm n is the number of customers. (Fig. 3) is a greedy [(1 + e)d; ln(n +n/e)k]-apprordmation Recent work has focused on metric k-medians and algorithm. In comparison to Lin and Vitter's algorithm, facility location problems. In the metric versions, the the performance ratio with respect to the facility costs distance function is assumed to satisfy the triangle in- is better by a factor of roughly 1/e. equality. For example, [O(d + k)]-approximation algo- For fractional k-medians, the algorithm (Fig. 5) rithms have recently been shown for the metric facility- is a [(1 + e)d, (1 + e)k]-approximation algorithm. It location problem [7, 15]. Charikar, Guha, Tardos and is a Lagrangian-relaxation algorithm and runs in Shmoys [2] recently gave an [O(d); k]-approximation al- O(k ln(n/e)/e 2) linear-time iterations. This is a factor gorithm for metric k-medians. Many of these algorithm of n faster than the best bound we can show by applying first solve the fractional problems and then round the the general algorithm of Plotkin, Shmoys and Tardos. fractional solutions. Finally, for fractional facility location, the algo- rithm (see Fig. 6) is a [(1 + e)(d + k)]-approximation, rHHochbaum's reduction is easy to adapt in order to reduce Lagrangian-relaxation algorithm. It runs in at most unweighted k-medians to a variant of set cover that we call k- O(nln(n)/e 2) iterations, where each iteration requires set cover. The reduction also extends naturally to the fractional time linear in the input size times Inn. This algorithm problems. (See appendix for details.) This shows a loose is the same as the unpublished fractional weighted set- equivalence between facility location and weighted set cover, and between k-medians and k-set cover. However, Hochbaum's cover algorithm due to Garg [6].