Differentiated Latency in Data Center Networks with Erasure Coded ﬁles Through Trafﬁc Engineering Yu Xiang, Vaneet Aggarwal, Senior Member, IEEE, Yih-Farn R

1 Differentiated latency in data center networks with erasure coded files through traffic engineering Yu Xiang, Vaneet Aggarwal, Senior Member, IEEE, Yih-Farn R. Chen, and Tian Lan, Member, IEEE Abstract—This paper proposes an algorithm to minimize these days. Google and Amazon have published that every weighted service latency for different classes of tenants (or 500 ms extra delay means a 1.2% user loss [1]. As latency service classes) in a data center network where erasure-coded is becoming a key performance metric in datacenter storage, files are stored on distributed disks/racks and access requests are scattered across the network. Due to the limited bandwidth taming service latency and providing elastic latency levels available at both top-of-the-rack and aggregation switches, and that fit tenants different preferences in real storage systems differentiated service requirements of the tenants, network band- is crucial for storage providers to offer differentiated services. width must be apportioned among different intra- and inter- In particular, minimizing latency in HDFS through VM place- rack data flows for different service classes in line with their ment optimization is proposed in [2], [3], and an evaluation of traffic statistics. We formulate this problem as weighted queuing and employ a class of probabilistic request scheduling policies to latency levels is provided in [4] for real applications such as derive a closed-form upper-bound of service latency for erasure- the Amazon EC2, Doom 3 game server and Apple’s Darwin coded storage with arbitrary file access patterns and service Streaming Server, which have heterogeneous requirements. time distributions. The result enables us to propose a joint There are also works on latency-aware pricing [5], and multi- weighted latency (over different service classes) optimization over tier pricing models for charging tenants with differentiated three entangled “control knobs”: the bandwidth allocation at top-of-the-rack and aggregation switches for different service service requirements [6], [7]. classes, dynamic scheduling of file requests, and the placement Modern data centers usually provide differentiated service of encoded file chunks (i.e., data locality). The joint optimization tiers and ensuring that tenant service level agreement (SLA) is shown to be a mixed-integer problem. We develop an iterative objectives are met, by tailoring storage performance to specific algorithm which decouples and solves the joint optimization needs. This is because users have different latency require- as 3 sub-problems, which are either convex or solvable via bipartite matching in polynomial time. The proposed algorithm is ments, for instance, time-sensitive jobs that need to complete prototyped in an open-source, distributed file system, Tahoe, and in a flash, and insensitive jobs that can experience longer evaluated on a cloud testbed with 16 separate physical hosts in waiting time without significant performance degradation, may an OpenStack cluster using Cisco switches. Experiments validate have different priorities and service classes, determined by our theoretical latency analysis and show significant latency the price tiers. They often adopt a hierarchical architecture reduction for diverse file access patterns. The results provide valuable insights on designing low-latency data center networks with multiple layers of network switches (Top-of-Rack (TOR), with erasure coded storage. aggregation and core switches) [8]. To ensure differentiated latency performance for each tenant in an erasure-coded storage system on such a clear architecture might seem I. INTRODUCTION straightforward, however, quantifying access latency in erasure coded storage is an open problem. Some Markov Chain based Data center storage is growing at an exponential speed and approaches have been proposed, which have large state space customers are increasingly demanding more flexible services and thus hard to get closed form expressions [9], [10]. As exploiting the tradeoffs among the reliability, performance, encoded chunks are distributed across the system, accessing and storage cost. Erasure coding (forward error correction the files in a storage system requires - in addition to reasonable coding which transforms a content of k chunks into a longer data chunk placements and dynamic request scheduling - content with n chunks such that the original content can be intelligent traffic engineering solutions to handle differentiated recovered from a subset of the n chunks) has seen itself quickly service of 1) data transfers between racks that go through emerged as a very promising technique to reduce the cost of aggregation switches and 2) data transfers within a rack that storage while providing similar system reliability as replicated go through Top-of-Rack (TOR) switches. Hence, although systems. The effect of coding on service latency to access providing optimal resource allocation to ensure latency per- files in an erasure-coded storage is drawing more attention formance would certainly be appealing to storage providers, it Y. Xiang and Y. R. Chen are with AT&T Labs-Research, Bedminster, can be quite difficult to analyze exact access latency for each NJ 07921 (email: fyxiang,[email protected]). V. Aggarwal is with tenant in such multi-tier storage systems, which is essential for the School of IE, Purdue University, West Lafayette, IN 47907 (email: accurate latency optimization. To the best of our knowledge, [email protected]). T. Lan is with Department of ECE, George Washington University, DC 20052 (email: [email protected]). This work was supported in this is the first work jointly minimizing service latency with a part by the National Science Foundation under Grant no. CNS-1618335. quantified latency model for a multi-tenant environment with This paper was presented in part at the 15th IEEE/ACM International differentiated latency performance needs. Symposium on Cluster, Cloud and Grid Computing 2015 [13] and the 35th International Conference on Distributed Computing Systems (ICDCS) 2015 In this paper, we utilize the probabilistic scheduling policy [14]. developed in [11], [12] ( where each request has a certain 2 probability to be assigned to each storage server in the system) TOR and aggregation switches, which partition tenants into and analyze service latency of erasure-coded storage with different service classes based on their delay requirement. The respect to data center topology and traffic engineering. Service file requests submitted by tenants in each class are buffered latency, also known as downlink latency (DCL) [15], is the and processed in a separate FCFS queue at storage servers in total time from when the request was served from the server each designated rack, we apportion bandwidth at TOR and to when the user gets the file. To the best of our knowledge, aggregation switches so that each flow has its own FCFS this is the first work that accounts for both erasure coding and queue for the data transfer. The service rates of these queues data center networking in multi-tenant latency analysis. Our are tunable and tuning these weights allows us to provide result provides a tight upper bound on mean service latency differentiated service rate (latency) to tenants in different of erasure coded-storage and applies to general data center classes. Section IV provides the latency bound. Section V topology. It allows us to construct quantitative models and jointly optimizes bandwidth allocation and data locality (i.e., find solutions to a novel latency optimization leveraging both placement of encoded file chunks) to achieve service latency the placement of erasure coded chunks on different racks and minimization, the latency minimization is decoupled into 3 the bandwidth reservations at different switches. sub-problems; two of which are proven to be convex. We We consider a data center storage system with a hierarchical propose an algorithm which iteratively minimizes service structure where files are encoded and spread across the system latency over the three engineering degrees of freedom with using erasure coding. Each rack has a TOR switch that is guaranteed convergence. Finally Section VI provides a proto- responsible for routing data flows between different disks and type of the proposed algorithms in an erasure coded distributed associated storage servers in the rack, while data transfers file system. The implementation results also show that our between different racks are managed by an aggregation switch proposed approach significantly improved service latency in that connects all TOR switches. File access requests may storage systems compared to native storage settings of the be submitted by tenants from any servers inside the same testbed. datacenter, which is common in todays’ datacenter storage. Tenants have differentiated latency preferences. Our goal is Symbol Meaning to minimize the differentiated average latency for each tenant N Number of racks, indexed by i = 1;:::;N in such storage systems. This paper quantifies average latency for each tenant in the system, and with such quantification m Number of storage servers in each rack of latency, a differentiated latency optimization problem with R Number of files in the system, indexed by r = 1;:::;R limited resources is formulated. Also, a system diagram is (n; k) Erasure code for storing files shown in Section III. Customizing elastic service latency for the tenants in such D Number of services classes a realistic environment (most common data center structure,

Differentiated Latency in Data Center Networks with Erasure Coded ﬁles Through Trafﬁc Engineering Yu Xiang, Vaneet Aggarwal, Senior Member, IEEE, Yih-Farn R

Globalfs: a Strongly Consistent Multi-Site File System

Big Data Storage Workload Characterization, Modeling and Synthetic Generation

A Decentralized Cloud Storage Network Framework

The Quantcast File System

Collocated Data Deduplication for Virtual Machine Backup in the Cloud

An Updated Look at the Decentralized Forest Information

Performance Evaluation of a Distributed Storage Service in Community Network Clouds

A Survey on Large Scale Metadata Server for Big Data Storage

Distributed Computing Problems from Wikipedia, the Free Encyclopedia Contents

Size Matters: Improving the Performance of Small Files in Hadoop In: (Pp

Massively Parallel Databases and Mapreduce Systems

CEPH Storage [Open Source Osijek]