Openec: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems Xiaolu Li, Runhui Li, and Patrick P

OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems Xiaolu Li, Runhui Li, and Patrick P. C. Lee, The Chinese University of Hong Kong; Yuchong Hu, Huazhong University of Science and Technology https://www.usenix.org/conference/fast19/presentation/li This paper is included in the Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19). February 25–28, 2019 • Boston, MA, USA 978-1-939133-09-0 Open access to the Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19) is sponsored by OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems Xiaolu Li†, Runhui Li†, Patrick P. C. Lee†, and Yuchong Hu‡ †The Chinese University of Hong Kong ‡Huazhong University of Science and Technology Abstract coupling between erasure coding management and the DSS workflows makes new erasure coding solutions hard to be gen- Erasure coding becomes a practical redundancy technique eralized for other DSSs and further enhanced. Some DSSs for distributed storage systems to achieve fault tolerance with with built-in erasure coding features (e.g., HDFS with era- low storage overhead. Given its popularity, research stud- sure coding [1,5], Ceph [54], and Swift [7]) provide certain ies have proposed theoretically proven erasure codes or effi- configuration capabilities, such as interfaces for implement- cient repair algorithms to make erasure coding more viable. ing various erasure codes and controlling erasure-coded data However, integrating new erasure coding solutions into ex- placement, yet the interfaces are rather limited and it is non- isting distributed storage systems is a challenging task and trivial to extend the DSSs with more advanced erasure codes requires non-trivial re-engineering of the underlying storage and repair algorithms (x2.2). How to fully realize the power workflows. We present OpenEC, a unified and configurable of erasure coding in DSSs remains a challenging issue. framework for readily deploying a variety of erasure coding We present OpenEC, a unified and configurable frame- solutions into existing distributed storage systems. OpenEC work for erasure coding management in DSSs, with the pri- decouples erasure coding management from the storage work- mary goal of bridging the gap between designing new erasure flows of distributed storage systems, and provides erasure coding solutions and enabling the feasible deployment of coding designers with configurable controls of erasure coding such new solutions in DSSs. Inspired by software-defined operations through a directed-acyclic-graph-based program- storage [16,48,51], which aims for configurable storage man- ming abstraction. We prototype OpenEC on two versions agement without being constrained by the underlying storage of HDFS with limited code modifications. Experiments on a architecture, we apply this concept into erasure coding man- local cluster and Amazon EC2 show that OpenEC preserves agement. Our main idea is to decouple erasure coding man- both the operational performance and the properties of erasure agement from the DSS workflows. Specifically, OpenEC coding solutions; OpenEC can also automatically optimize runs as a middleware system between upper-layer applications erasure coding operations to improve repair performance. and the underlying DSS, and is responsible for performing all erasure coding operations on behalf of the DSS. Such a 1 Introduction design relaxes the stringent dependence on the erasure cod- Erasure coding provides a low-cost redundancy mechanism ing support of DSSs. More importantly, OpenEC takes the for fault-tolerant storage, and is now widely deployed in full responsibility of erasure coding management, and hence today’s distributed storage systems (DSSs). Examples in- provides flexibility for erasure coding designers to (i) incor- clude enterprise-level DSSs [15,21,30] and many open-source porate a variety of erasure coding solutions, (ii) configure DSSs [1,3,7,31,54,55]. Unlike replication that simply cre- the workflows of erasure coding operations, and (iii) decide ates identical data copies for redundancy protection, erasure the placement of both erasure-coded data and erasure cod- coding introduces much less storage overhead through the ing operations across storage nodes. Our contributions are coding operations of data copies, while preserving the same summarized as follows: degree of fault tolerance [53]. Modern DSSs mostly realize • We propose a new programming model for erasure coding erasure coding based on the classical Reed-Solomon (RS) implementation and deployment. Our model builds on an codes [43], yet RS codes have high performance penalty, es- abstraction called an ECDAG, a directed acyclic graph that pecially in repairing lost data when failures happen. Thus, defines the workflows of erasure coding operations. We research studies have proposed new erasure coding solutions show how we feasibly realize a general erasure coding with improved performance, such as erasure codes with the- solution through the ECDAG abstraction. oretical guarantees and efficient repair algorithms that are • We design OpenEC, which translates an ECDAG into applicable to general erasure-coding-based storage (x6). erasure coding operations atop a DSS. OpenEC supports However, deploying new erasure coding solutions in DSSs encoding operations on or off the write path as well as is a daunting task. Existing studies often integrate new era- various state-of-the-art repair operations. In particular, it sure coding solutions into specific DSSs by re-engineering can automatically optimize an ECDAG for hierarchical the DSS workflows (e.g., the read/write paths). The tight topologies to improve repair performance. USENIX Association 17th USENIX Conference on File and Storage Technologies 331 • We implement a prototype of OpenEC on HDFS-RAID single lost block of a coding group in degraded reads or a [5] and Hadoop 3.0 HDFS (HDFS-3) [1]. Its integrations single failed node in full-node recovery) are the most com- into HDFS-RAID and HDFS-3 only require limited code mon repair scenarios [21, 40], existing repair-friendly erasure changes (with no more than 450 LoC). codes aim to minimize the repair bandwidth or I/O in single- • We evaluate OpenEC on a local cluster and Amazon EC2. failure repairs. Examples are regenerating codes [14], in- OpenEC incurs negligible performance overhead in DSS cluding minimum-storage regenerating (MSR) and minimum- operations, supports various state-of-the-art erasure codes bandwidth regenerating (MBR) codes, as well as locally re- and repair algorithms, and increases the repair through- pairable codes (LRCs) [21, 23, 44, 49]. put by at least 82% through automatically customizing an Our work focuses on practical erasure codes. In particular, ECDAG for a hierarchical topology. we target linear codes, which include RS codes, MSR and MBR codes, as well as LRCs. Linear codes perform linear The source code of our OpenEC prototype is available at: coding operations based on the Galois field arithmetic [17]. http://adslab.cse.cuhk.edu.hk/software/openec. Mathematically, for an (n;k) code, let d0;···;dk−1 be the k data blocks, and p ;···; p be the n − k parity blocks. 2 Background and Motivation 0 n−k−1 Each parity block p j (0 ≤ j ≤ n − k − 1) can be expressed k−1 2.1 Erasure Coding Basics as p j = ∑i=0 g jidi, where g ji is some coding coefficient for Consider a DSS that comprises multiple storage nodes and computing p j. Note that the linear operations are additive organizes data in units of blocks. We construct erasure coding associative (i.e., independent of how additions are grouped). as an (n;k) code with two configurable parameters n and k, Also, our work addresses sub-packetization, which is used where k < n. For every k fixed-size original blocks (called in various designs of MSR and MBR codes [14, 18, 32, 39, 42, data blocks), an (n;k) code encodes them into n−k redundant 45,50,52]. Sub-packetization divides each block into smaller- blocks of the same size (called parity blocks), such that any size sub-blocks, so that repairs can be done by retrieving k out of the n erasure-coded blocks (including both data and sub-blocks rather than whole blocks. parity blocks) can decode the k data blocks; that is, any n − k Most DSSs assume that all erasure-coded blocks are im- block failures can be tolerated. We call the collection of n mutable and do not support in-place updates. Thus, we focus erasure-coded blocks a coding group. A DSS encodes differ- on four basic operations: writes, normal reads, degraded ent sets of k data blocks independently, and distributes the n reads, and full-node recovery (x4.2), while we address in- erasure-coded blocks of each coding group across n storage place updates in future work. nodes to protect against any n − k storage node failures. In 2.2 Limitations of Erasure Coding Management this paper, our discussion focuses on the coding operations (i.e., encoding or decoding) of a single coding group. Modern DSSs now support erasure coding, yet existing era- For performance reasons, a DSS implements coding opera- sure coding management in such DSSs remains stringent and tions in small-size units called packets, while the read/write still faces practical limitations. To motivate our study, we units are in blocks; for example, our experiments set the review six state-of-the-art DSSs that currently realize erasure- default packet and block sizes as 128 KiB and 64 MiB, respec- coded storage: HDFS-RAID [5], HDFS-3 [1], QFS [31], tively). It divides a block into multiple packets, and encodes Tahoe-LAFS [55], Ceph [54], and Swift [7]. HDFS-RAID is the packets at the same block offsets in a coding group to- the erasure coding extension of HDFS [46] in the earlier ver- gether. Thus, instead of first reading the whole blocks to start sion of Hadoop.

Load more