Approximation Algorithms for Resource Allocation Optimization
Approximation Algorithms for Resource Allocation Optimization
by
Kewen Liao
A thesis submitted for the degree of Doctor of Philosophy
School of Computer Science, The University of Adelaide.
March, 2014 Copyright c 2014 Kewen Liao All Rights Reserved To my parents and grandparents, for their endless support. Contents
Contents i
List of Figures iii
List of Tables iv
List of Acronyms v
Abstract viii
Declaration ix
Preface x
Acknowledgments xii
1 Introduction 1 1.1 Research problems ...... 2 1.2 Thesis aims and impact ...... 5 1.3 Thesis results ...... 8 1.4 Thesis structure ...... 9
2 Preliminaries 10 2.1 Computational complexity ...... 11 2.1.1 Computational problems ...... 11 2.1.2 Languages,machinemodels,andalgorithms...... 12 2.1.3 P and NP ...... 14 2.2LPandILP...... 16 2.3Approximationalgorithms...... 20 2.3.1 Greedyandlocalsearchalgorithms...... 22 2.3.2 LProundingalgorithms...... 24 2.3.3 Primal-dual algorithms ...... 26
i 3 Discrete Facility Location 30 3.1 Uncapacitated Facility Location ...... 31 3.1.1 LPformulations...... 31 3.1.2 Hardnessresults...... 33 3.1.3 Approximationalgorithms...... 36 3.2 Other facility location problems ...... 49 3.2.1 Fault-Tolerant Facility Location ...... 51 3.2.2 Capacitated Facility Location ...... 53 3.2.3 K Uncapacitated Facility Location ...... 56
4 Unconstrained Fault-Tolerant Resource Allocation 59 4.1Introduction...... 60 4.2 A greedy algorithm with ratio 1.861 ...... 63 4.3 A greedy algorithm with ratio 1.61 ...... 69 4.4 A hybrid greedy algorithm with ratio 1.52 ...... 75 4.5 A simple reduction to UFL ...... 76 4.6 Capacitated FTRA∞ ...... 78 4.7 Summary ...... 79
5 Reliable Resource Allocation 80 5.1Introduction...... 80 5.2 Primal-dual algorithms ...... 83 5.3 The inverse dual fitting analysis ...... 87 5.4 Minimum set cover: formalizing IDF ...... 92 5.5 Reduction to UFL ...... 94 5.6 Reduction to FTRA∞ ...... 96 5.7 Summary ...... 98
6 Constrained Fault-Tolerant Resource Allocation 100 6.1Introduction...... 101 6.2 A unified LP-rounding algorithm ...... 104 6.3 Reduction to FTFL ...... 109 6.4 The uniform FTRA ...... 113 6.5 The uniform k-FTRA ...... 124 6.6 Summary ...... 133
7 Conclusion 134
Bibliography 137
ii List of Figures
1.1 A unified resource allocation model with input parameters p, l, r,k,R, and u (to be explained in detail later) that capture various prac- ticalissuesinresourceallocation...... 3
2.1Astargraph...... 24
3.1 Clusters constructed from G ...... 41
5.1 An example of the RRA model...... 82
6.1 An FTRA instance with a feasible solution ...... 101 6.2Illustrationofboundingtheconnectioncosts...... 109
iii List of Tables
3.1 UFL approximationresults...... 36
iv List of Acronyms
Acronym Meaning UFL Uncapacitated Facility Location (problem) KM K-median (problem) ILP Integer linear program FTFL Fault-Tolerant Facility Location (problem) SCFL Soft Capacitated Facility Location (problem) k-UFL K-Uncapacitated Facility Location (problem) FTRA∞ Unconstrained Fault-Tolerant Resource Allocation (problem) CFTRA∞ Capacitated Unconstrained Fault-Tolerant Resource Allocation (problem) RRA Reliable Resource Allocation (problem) FTRA Constrained Fault-Tolerant Resource Allocation (problem) k-FTRA K-Constrained Fault-Tolerant Resource Allocation (problem) QoS Quality of service FPT Fixed parameter tractable CDN Content distribution/delivery network PM Physical machine VM Virtual machine OR Operations research TCS Theoretical computer science CC Computational complexity (theory) VCO Vertex cover optimization (problem) VCD Vertex cover decision (problem) TM Turing Machine FSC Finite state control RAM Random Access Machine CU Control unit PC Program counter SAT Boolean satisfiability (problem)
v Acronym Meaning IS Independent set (problem) UVC Unweighted vertex cover (problem) LP Linear program CSC Complementary slackness condition APX Approximable PT AS Polynomial time approximation scheme FPT AS Full polynomial time approximation scheme TSP Traveling salesman problem AP-reduction Approximation preserving reduction PCP Probabilistically checkable proof (theorem) VC Weighted vertex cover (problem) SC Set cover (problem) JV An approximation algorithm by Jain and Vazirani [75] for UFL MP An approximation algorithm by Mettu and Plaxton [113] for UFL MMSV An approximation algorithm by Mahdian et al. [104] for UFL JMS An approximation algorithm by Jain et al. [74] for UFL CRR Clustered randomized rounding (algorithm) in [41] CSGA Cost scaling and greedy augmentation (procedures) MYZ An approximation algorithm by Mahdian et al. [106] for UFL FLO Facility Location with Outliers (problems) MLFL Multi-level Facility Location (problems) OFL Online Facility Location (problems) DR Dependent rounding (technique) LC Laminar clustering (technique) CFL Capacitated Facility Location (problem) HCFL Hard Capacitated Facility Location (problem) IG Integrity gap (of an integer linear program) CFLS Capacitated Facility Location (problem) with splittable demands CFLU Capacitated Facility Location (problem) with unsplittable demands UniF L Universal Facility Location (problem)
vi Acronym Meaning HCFLU Hard Capacitated Facility Location (problem) with unsplittable demands GAP Generalized assignment problem LR Lagrangian relaxation (technique) LMP Lagrangian multiplier preserving (property) FTFA Fault-Tolerant Facility Allocation (problem) FTFP Fault-Tolerant Facility Placement (problem) SG-1 A star-greedy algorithm for FTRA∞ PD-1 A primal-dual algorithm for FTRA∞ SG-2 An improved star-greedy algorithm for FTRA∞ PD-2 An improved primal-dual algorithm for FTRA∞ MRR Minimum reliability requirement VLSI Very-large-scale integration PD-3 A primal-dual algorithm for RRA APD-3 An accelerated primal-dual algorithm for RRA IDF Inverse dual fitting (technique) ULPR A unified LP-rounding algorithm for FTRA PD-4 A primal-dual algorithm for FTRA APD-4 An accelerated primal-dual algorithm for FTRA SOC Sum of contributions AGA Acceleration of greedy augmentation (procedure) PK Procedures for solving k-FTRA BS Binary search (procedure) GP Greedy pairing (procedure) RR Randomized rounding (procedure) e.g. For example i.e. That is etc. And so on w.r.t. With respect to w.l.o.g. Without loss of generality s.t. Such that
vii Abstract
Nowadays, data storage, server replicas/mirrors, virtual machines, and vari- ous kinds of services can all be regarded as different types of resources. These resources play an important role in today’s computer world because of the con- tinuing advances in information technology. It is usual that similar resources are grouped together at the same site, and can then be allocated to geograph- ically distributed clients. This is the resource allocation paradigm considered in this thesis. Optimizing solutions to a variety of problems arising from this paradigm remains a key challenge, since these problems are NP-hard. For all the resource allocation problems studied in this thesis, we are given a set of sites containing facilities as resources, a set of clients to access these facilities, an opening cost for each facility, and a connection cost for each allocation of a facility to a client. The general goal is to decide the number of facilities to open at each site and allocate the open facilities to clients so that the total cost incurred is minimized. This class of the problems extends the classical NP-hard facility location problems with additional abilities to capture various practical resource allocation scenarios. To cope with the NP-hardness of the resource allocation problems, the thesis focuses on the design and analysis of approximation algorithms. The main techniques we adopt are linear programming based, such as primal-dual schema, linear program rounding, and reductions via linear programs. Our de- veloped solutions have great potential for optimizing the performances of many contemporary distributed systems such as cloud computing, content delivery networks, Web caching, and Web services provisioning.
viii Declaration
I, Kewen Liao, certify that this work contains no material which has been accepted for the award of any other degree or diploma in any university or other tertiary institution and, to the best of my knowledge and belief, con- tains no material previously published or written by another person, except where due reference has been made in the text. In addition, I certify that no part of this work will, in the future, be used in a submission for any other degree or diploma in any university or other tertiary institution without the prior approval of the University of Adelaide and where applicable, any partner institution responsible for the joint-award of this degree. I give consent to this copy of my thesis, when deposited in the Univer- sity Library, being made available for loan and photocopying, subject to the provisions of the Copyright Act 1968. The author acknowledges that copyright of published works contained within this thesis (as listed on Page x) resides with the copyright holder(s) of those works. I also give permission for the digital version of my thesis to be made avail- able on the web, via the University’s digital research repository, the Library catalogue, and also through web search engines, unless permission has been granted by the University to restrict access for a period of time.
Signature Date
ix Preface
During my PhD study at the University of Adelaide from 2009 to 2013, I have produced four conference papers and two journal articles related to this thesis (see my homepage at http://cs.adelaide.edu.au/~kewen for my bio with a complete list of publications). The thesis topic is theoretical in nature and based on the content presented in the following papers.
Conference Publications:
• Kewen Liao and Hong Shen. Unconstrained and constrained fault-tolerant resource allocation. In Proceedings of the 17th annual international con- ference on computing and combinatorics (COCOON), pages 555–566, Dallas, Texas, USA, 14-16 August 2011. Springer-Verlag, Berlin
• Kewen Liao and Hong Shen. Fast fault-tolerant resource allocation. In Proceedings of 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pages 231–236, Gwangju, Korea, October 20-22 2011. IEEE
• Kewen Liao and Hong Shen. Approximating the reliable resource alloca- tion problem using inverse dual fitting. In Proceedings of the Eighteenth Computing: The Australasian Theory Symposium (CATS), page Vol. 128, Melbourne, Australia, January-February 2012. ACS, Sydney
• Kewen Liao, Hong Shen, and Longkun Guo. Improved approximation algorithms for constrained fault-tolerant resource allocation. In Fun- damentals of Computation Theory (FCT) - 19th International Sympo- sium, pages 236–247, Liverpool, UK, August 19-21 2013. Springer-Verlag, Berlin (extended version invited to the special issue of Theoretical Com- puter Science)
Journal Publications:
x • Kewen Liao and Hong Shen. Lp-based approximation algorithms for reliable resource allocation. The Computer Journal, 57(1):154–164, 2014
• Kewen Liao, Hong Shen, and Longkun Guo. Constrained fault-tolerant resource allocation. Theoretical Computer Science, submitted in Decem- ber 2013, at http://arxiv.org/abs/1208.3835, currently under review
xi Acknowledgments
This PhD study was a real challenge for me, and overcoming this challenge would not have been possible without the support of many people. First and foremost, I would like to thank my principal supervisor Prof. Hong Shen, for his great supervision over the past few years. I am especially grateful for his trust in me working on tough theoretical problems. Without his constant advice and guidance, this thesis would not be even completed. I am truly indebted and thankful to my supervisor A/Prof. Michael Sheng. He was also the supervisor of my honors thesis. He is the one who introduced me to research, inspired and encouraged me to pursue a PhD during my honors year. Without him, this thesis might not even exist. I am sincerely grateful to my supervisor Emeritus Prof. Zbigniew Michalewicz, for his kind help, support, and encouragement in every aspect of my research. I owe a huge debt of gratitude to my parents and grandparents, for their unconditional love, support, and encouragement. My most heartfelt thanks go to my parents for their spiritual and financial support during my eight years of study in Australia. I would like to thank some of my colleagues and friends: Changjian, Donglai, Jeff, Lei, Li, Lina, Longkun, Mike, Scott, Shihong, Sim, Xiang, Xiaoqiang, Yi- dong, Yihong, Yong, Yongrui, Denny, Ke, Leo, Nick, Su, White, Xiaoming, Yibing, Yuguo, and Zilang, for their accompany, inspiration, and influence during my PhD journey. Finally, I thank the University of Adelaide for providing me scholarship and the School of Computer Science for financially supporting my conference travels. I would also like to express my gratitude to the anonymous examiners of this thesis.
xii Chapter 1
Introduction
In real life, many economic problems of great practical importance are about how to strategically choose the locations of sites to open facilities for meeting the demands of users, customers or clients1. Traditionally, in operations re- search, such problems are called facility location problems. The applications of these problems/models widely appear in supply chain management, where facilities can be manufacturing plants, processing units, distribution centers, and warehouses. More recently, in the fields of theoretical computer science and computer networks, facilities are often regarded as nodes, base stations, servers, databases, contents, and services. There are different classifications of facility location problems. If a facility can only serve up to a certain number of users, the problem is capacitated. Without this limitation, the problem is called uncapacitated. If the site and client locations are finite, the problem is discrete. Otherwise, one may select locations anywhere from a plane or higher dimension and the problem becomes continuous. In this thesis, we will restrict our attention to the discrete problems (see the book [116] for an extensive sur- vey). Typically, these problems are combinatorial optimization problems, in which the objectives are to find an extreme solution (usually with a maximum or minimum cost) from a large feasible solution space. Unfortunately, many of the problems are NP-hard and difficult to solve optimally (see Chapter 2 for computational complexity background). These include the following funda- mental discrete Uncapacitated Facility Location and (uncapacitated) k-median problems.
Definition 1.0.1. In the Uncapaciated Facility Location (UFL) problem, we are given a set of sites F and a set of clients C. The cost of opening a facility at site i ∈F is fi. For a client j ∈C, the cost of connecting a facility at i to
1We will use these terms interchangeably in the text.
1 Chapter 1. Introduction j is cij. Every client has to use one open facility. The objective is to select a subset of sites to open facilities and assign clients to the open facilities, such that the total facility opening and connection/service cost is minimized.
Definition 1.0.2. In the k-median (KM) problem, we are given a set of sites F and a set of clients C. For a client j ∈C, the cost of connecting a facility at i ∈F to j is cij. Every client has to use one open facility. The objective is to open at most k facilities and assign clients to the open facilities, such that the total connection cost is minimized.
Note that the KM problem differs from UFL in that an upper bound of k facilities2 need to be selected and there is no facility opening cost involved. Also, if F = C, both problems are complete3 and can be viewed as different clustering problems. Moreover, although identical, we presented the above definitions in a slightly different way4 from the conventional definitions in [116, 129], for the purpose of naturally leading to our focused research problems below. In Chapter 3, we will revisit the UFL and KM problems in more detail.
1.1 Research problems
Nowadays, data storage, server replicas/mirrors, virtual machines, and various kinds of services can all be regarded as different types of resources. It is usual that similar resources are grouped together at the same site, and can then be allocated to geographically distributed clients. In this thesis, we address various resource allocation problems built on UFL. These problems reveal additional practical issues such as fault-tolerance, reliability, facility service capacity, and constraints on the number of available facilities. For the ease of illustration, we capture the problems by a unified resource allocation model as shown in Figure 1.1. Correspondingly, the model is encoded by the following unified integer lin- ear program (ILP) formulation (see Chapter 2 for LP background).
2It is easy to see that this covers the case where exactly k facilities need to be selected. 3The complete problems are special cases of the bipartite problems (as in our definitions) in which there are two separate sets of sites and clients. They usually have weaker hardness results (see Chapter 3 for more details). 4In fact the difference is, conventionally F is denoted as a set of facilities rather than sites. However, it is identical to our definitions since every client has to use only one facility, and therefore at most one facility is needed to open at each site in both UFL and KM.
2 Chapter 1. Introduction
R k p
u l r
unopen resource/facility client site open resource/facility
Figure 1.1: A unified resource allocation model with input parameters p, l, r,k,R,andu (to be explained in detail later) that capture various practical issues in resource allocation.