PERSES:DataLayoutforLowImpact Failures Technical Report UCSC-SSRC-12-06 September 2012 A. Wildani E. L. Miller
[email protected] [email protected] I. F. Adams D. D. E. Long
[email protected] [email protected] Storage Systems Research Center Baskin School of Engineering University of California, Santa Cruz Santa Cruz, CA 95064 http://www.ssrc.ucsc.edu/ PERSES: Data Layout for Low Impact Failures Abstract If a failed disk contains data for multiple working sets or projects, all of those projects could stall until the rebuild Disk failures remain common, and the speed of recon- is completed. If the failure occurs in a working set that struction has not kept up with the increasing size of disks. is not actively being accessed, it could potentially have Thus, as drives become larger, systems are spending an zero productivity cost to the system users: the proverbial increasing amount of time with one or more failed drives, tree fallen in a forest. potentially resulting in lower performance. However, if To leverage this working set locality, we introduce an application does not use data on the failed drives, PERSES,adataallocationmodeldesignedtodecrease the failure has negligible direct impact on that applica- the impact of device failures on the productivity and per- tion and its users. We formalize this observation with ceived availability of a storage system. PERSES is named PERSES,adataallocationschemetoreducetheperfor- for the titan of cleansing destruction in Greek mythology. mance impact of reconstruction after disk failure. We name our system P ERSES because it focuses destruc- PERSES reduces the length of degradation from the tion in a storage system to a small number of users so reference frame of the user by clustering data on disks that others may thrive.