Towards Reliable (And Efficient) Job Executions in a Practical Geo
Total Page:16
File Type:pdf, Size:1020Kb
Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System Xiaoda Zhang, Zhuzhong Qian, Sheng Zhang, Yize Li, Xiangbo Li, Xiaoliang Wang, Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing University Abstract master worker Geo-distributed data analytics are increasingly common WAN WAN to derive useful information in large organisations. Naive extension of existing cluster-scale data analytics systems DC 1 DC 2 DC 1 DC 2 to the scale of geo-distributed data centers faces unique DC 3 DC 3 challenges including WAN bandwidth limits, regulatory constraints, changeable/unreliable runtime environment, (a) Centralized architecture (b) Decentralized architecture and high monetary costs. Our goal in this work is to develop a practical geo-distributed data analytics system Figure 1: Centralized vs. decentralized data analytics. that (1) employs an intelligent mechanism for jobs to effi- ciently utilize (adjust to) the resources (changeable envi- ronment) across data centers; (2) guarantees the reliability sponse time and maximizing throughput are important. of jobs due to the possible failures; and (3) is generic and However, these face the unique challenges of wide area flexible enough to run a wide range of data analytics jobs network (WAN) bandwidth limits, legislative and regu- without requiring any changes. latory constraints, unreliable runtime environment, and To this end, we present a new, general geo-distributed even monetary costs. data analytics system, HOUTU, that is composed of mul- Existing approaches optimize tasks and/or data place- tiple autonomous systems, each operating in a sovereign ment across data centers so as to improve data locality data center. HOUTU maintains a job manager (JM) for [34, 45, 28, 38, 44, 46]. However, all previous works em- a geo-distributed job in each data center, so that these ploy a centralized architecture where a monolithic master replicated JMs could individually and cooperatively man- controls the resources of the worker machines from all age resources and assign tasks. Our experiments on the data centers, as shown in Fig. 1(a). We argue that regu- prototype of HOUTU running across four Alibaba Cloud latory constraints prevent us to do so. More and more re- regions show that HOUTU provides efficient job perfor- gions are establishing laws to restrict the data movement mance as in the existing centralized architecture, and [6, 41, 10] and to restrict IT resources from being con- guarantees reliable job executions when facing failures. trolled by other untrusted parties in the shared environ- arXiv:1802.00245v3 [cs.DC] 7 Feb 2018 ment [18](§2.1). An alternative way is to deploy an au- 1 Introduction tonomous data analytics system per data center (Fig. 1(b)), and extend the original system functionalities to coordi- Nowadays, organizations are deploying their applications nate for geo-distributed job executions. We explore this in multiple data centers around the world to meet the decentralized architecture and its potentialities, making it latency-sensitive requirements [13, 21, 36, 12]. As a re- possible for a job to acquire resources from remote data sult, the raw data – including user interaction logging, centers which respects to the regulatory constraints. compute infrastructure monitoring, and job traces – is In addition, most existing works assume that the WAN generated at geographically distributed data centers. Ana- bandwidth is stable. This may not accurately conform to lytics jobs on these geo-distributed data are emerging as a the reality [26, 32], and our experiments verify that data daily requirement [25, 39, 45, 48, 28, 38, 44, 46, 27, 19]. transmission rate across data centers varies even in a short Because these analytics jobs usually support the real- period (§2.2). Hence, this restriction does not allow us to time decisions and online predictions, minimizing re- explicitly formulate WAN bandwidth as a constant. On the other hand, for most organizations who have For resource management, we classify three cases the geo-distributed data analytics requirement, the most where each job manager independently either requests convenient way is to purchase public cloud instances. more resources, or maintains current resources, or proac- Decisions must be made between choosing reliable (Re- tively releases some resources. The key insight here is served and On-demand) instances and unreliable (Spot) using nearly past resource utilization as feedback, irre- instances, due to the different monetary costs and job re- spectively of the prediction of future job characteristics. liability demands. Spot market prices are often signifi- Even without the future job characteristics, when cooper- cantly lower – by up to an order of magnitude – than ating with our new task assignment method, we theoreti- fixed prices for the same instances with a reliability Ser- cally prove (under some conditions) the efficiency of job vice Level Agreement (SLA) (§2.3). However, is it pos- executions by extending the very recent result [52](§4.4). sible for cloud users to obtain reliability from unreliable Each replicated JM keeps track of the current process instances with a reduced cost? There are positive answers of the job execution. We carefully design what need to by designing user bidding mechanisms [47, 53], while we be included in the intermediate information, which can answer this question in a systematic way, by providing be used to successfully recover the failure, of even the job-level fault tolerance. primary JM. Our goal in this new decentralized and change- We build HOUTU in Spark [50] on YARN [42] system, able/unreliable environment is to design new resource and leverage Zookeeper [29] to guarantee the intermediate management, task scheduling and fault tolerance strate- information consistent among job managers in different gies to achieve reliable and efficient job executions. data centers. We deploy HOUTU across four regions on To achieve this goal, such a system needs to address Alibaba Cloud (AliCloud). Our evaluation with typical three key challenges. First, we need to find an efficient workloads including TPC-H and machine learning algo- scheduling strategy that can dynamically adapt scheduling rithms shows that, HOUTU: (1) achieves efficient job per- decisions to the changeable environment. This is difficult formance as in the centralized architecture; (2) guarantees because we do not assume job characteristics as a priori reliable job executions when facing job failures; and (3) is knowledge [33], or use offline analysis [43] for its signif- very effective in reducing monetary costs. icant overhead. Second, we need to implement fault tol- We make three major contributions: erance mechanism for jobs running atop unreliable Spot • We present a general decentralized data analyt- instances. Though existing frameworks [20, 30, 50] tol- ics system to respect the possible regulatory con- erate task-level failures, the job-level fault tolerance is straints and changeable/unrealible runtime environ- absent. While in the unreliable setting, the two types of ment. The key idea is to provide a job manager for a failures have the same chance to occur. Third, we need geo-distributed job in each data center. The system is to design a general system that efficiently handles geo- general and flexible enough to deploy a wide range distributed job executions without requiring any job de- of data analytics jobs while requiring no change to scription changes. This is challenging because data can the jobs themselves (§3.1). disperse among sovereign domains (data centers) with regulatory constraints. • We propose resource management strategy Af for In this work, we present HOUTU1, a new general geo- each JM which exploits resource utilization as feed- distributed data analytics system that is designed to effi- back. We design task assignment method Parades ciently operate over a collection of data centers. The key which combines the assignment within and between idea of HOUTU is to maintain a job manager (JM) for the data centers. We prove Af + Parades guarantees geo-distributed job in each data center, and each JM can efficiency for geo-distributed jobs with respect to individually assign tasks within its own data center, and makespan (§4). We carefully design the mechanism also cooperatively assign tasks between data centers. This of coordinating JMs, and the intermediate informa- differentiation allows HOUTU to run conventional task as- tion to recover a failure (§3.2). signment algorithms within a data center [49, 33, 52]. At • We build a prototype of our proposed system using the same time, across different data centers, HOUTU em- ploys a new work stealing method, converting the task Spark, YARN and Zookeeper as blocks, and demon- steals to node update events which respects to the data strate its efficiencies over four geo-distributed re- locality constraints. gions with typical diverse workloads (§5 and §6). We show that HOUTU provides efficient and reliable 1 HOUTU is the deity of deep earth in ancient Chinese mythology who job executions, and significantly reduces the costs for controls lands from all regions. running these jobs. 2 2 Background and Motivation NC-3 NC-5 EC-1 SC-1 NC-3 (821,95) (79,22) (78,24) (79,24) This section motivates and provides background for NC-5 -- (820,115) (103,28) (71,28) HOUTU. §2.1 describes the existing and upcoming regu- EC-1 -- -- (848,99) (103,30) latory constraints which prevent us from employing a cen- SC-1 -- -- -- (821,107) tralized architecture. We measure the scarce and change- able WAN bandwidth between AliCloud regions in §2.2. Figure 2: Measured network bandwidth between four dif- We investigate a way to reduce monetary cost using Spot ferent regions in AliCloud. The entry is of form (Average, instances in §2.3, which introduce the unreliability. Standard deviation) Mbps. 2.2 Changeable environment 2.1 Regulatory constraints It is well known that WAN bandwidth is a very scarce Though it is efficient to employ data analytics systems in resource relative to LAN bandwidth.