P ASSI: a Parallel, Reliable and Scalable Storage Software Infrastructure for Active Storage System and I/O Environments (Position Paper)

P ASSI: A Parallel, Reliable and Scalable Storage Software Infrastructure for Active Storage System and I/O Environments (position paper)

Hsing-bung Chen SongFu High Performance Computing Division Department of Computer Science and Engineering Los Alamos National Laboratory University of North Texas Los Alamos, New Mexico 87545, USA Denton, Texas 76203, USA [email protected] song.fu@untedu

Abstract---Storage systems are a foundational capability is provided at the storage enclosure level, while the component of computational, experimental, aod majority disk drives are still passive. observational science today. The ever-growiog size of Moreover, with exabytes of data to be stored on hundreds computation aod simulation results demaods huge storage of thousands of disk drives, storage scalability becomes a big capacity, which challenges the storage scalability aod challenge. More importantly, at such a scale, data corroption causes data corruption aod disk failure to be and disk failures will be the nonn in Exascale storage commonplace in eI8scale storage environments. Moreover, environments. As data generated by HPC applications, such as the increasing complexity of storage hierarchy aod passive checkpoint/restart data sets, get bigger and so do disk drives, storage devices make today's storage systems inefficient, the time to rebuild a failed drive is extended, causing tens of which force the adoption of new storage technologies. In hours of data diaruption. For the new helium-filled hard this position paper, we propose a Parallel, Reliable aod drives, due to the larger capacity and the slower increase in Scalable Storage Software Infrastructure (PASSI) to performance, the rebuild time of such a disk would be support the design aod prototyping of neIt-generation extensive (almost 3 days). In large arrays, a second disk drive active storage environment. The goal is to meet the scaling can fail before the first is rebuilt, resulting in data loss and aod resilience need of extreme scale science by ensuring significant performance degradation. Dealing with data that storage systems are pervasively intelligent, always corruption and disk failures will become the standard mode of available, never lose or damage data and energy-eflicient. operation for storage systems and it should be performed by machines with little human intervention due to the Keywords--Ethemet drives; parallel erasure coding; disk overwhelming storage scale. Moreover, it should cause little failure prediction; metadata management; energy efficiency. data diaruption or performance degradation to systems and applications. 1. INTRODUCTION In this position paper, we propose a Parallel, Reliable and Scalable Storage Software Infrastructure (PASSI) to Computation and simulation advance knowledge in support the design and prototyping of next-generation active science, energy, and national security. As these simulations storage system and input/output environments. The goal is to have become more accurate and thus realistic, their demands meet the scaling and resilience need of extreme scale science for computational power have also increased. This, in turn, has by ensuring that storage systems are intelligent (at disk drive caused the simulation results to grow in size, increasing the level), always available and never lose or damage data. demands for huge storage systems. High performance I/O PASS! (in Fig. I) explores the new-generation Ethernet becomes increasingly critical, as storing and retrieving a large drives to provide smart, low-power and scaling storage amount of data can greatly affect the overall performance of structure, integrates parallel erasure coding for high the system and applications. performance data integrity, disk failure prediction for proactive The existing storage systems are mostly passive, that is data rescue, and object storage with balanced data placement, disk drives stage and drain memory/burst buffer checkpoints and supports metadata technologies for high performance file with little intelligence. With the increasing performance and systems. These storage services are ojJloaded to the smart decreasing cost of processors, storage manufactures are adding Ethernet drives, which makes PASS! unique, novel and more system intelligence at I/O peripherals. Active storage attractive. aims to expose computation elements within the storage The rest of this paper is organized as follows. We present infrastructure for general-purpose computation on data. Ethernet connected drives in Section ll. In Section ill, we Although this concept has been appealing for several years present the design of Parallel Erasure Coding and Parallel Data and incorporated in the no Object-based Storage Device Placement software systems. In Section IV, we propose an S' (OSD) standard, a limited number of storage products are (Self-Monitoring, Self-Managing and Self-Healing) software available for HPC systems. Moreover, their processing system in PASS! to address the disk failure challenge and

978-1-4673-8590-9/15/$31.00 ©2015 IEEE Applications using file systems, object storages, and KJV storage Access Interfaces '/ '\ '\ 7 '; ~ · P ... lle l Er ..u re codln, 7 Y : S~ If· )l a nilo rin " ~ If- Ad . ptin, . nd Incorper.tln, 7' P olic:,>' B as(' d P OWH :> '''pport bandwidth on d ~m . nd )bu lI:m , and S.1f-Hn lin a:) hl,h pe rform. nca . nd hi, h R(,SOlll'C l' " ~nd .,,1"11 MPI M \.e d P~r~II~II/O Slan ,e R n ilience )bnallemenl b."dwl dt h matad. ta ."d :M an a gl'IDt'ld - ,. u,11I6 paraH.. I S I MD ~ SofnY3 n I"dexln, tachnolocl" ,. R .. ~lble ,mcooi"8 "'~me to > support . cabbl.e monilorini ~d 1. MDH IM • Par.llal Kay/Valua colI«tion ofpnfomllnce. buit. md .upport rell~bllityon dem~nd 5to ra o.upp ortin, >Tbe- goal is to develop ,. Uti li ... computinf; and u'iljtdaa • multl.dimemional lndi!xI"I! po!icy-basedpower > provide predicti'-e ca.p~bili l i es of networldn, copobl lityfrom M i ddlcw~r", management to control critical ""... 10 tru.t affect "crage Et her""t drl"" ba~ mi«o + 8 1: 10/>111 k ~ y .pac ~ by Iebbility power usage ollaging and orderi", r~"Ii: ~ . and storage ...rver > mm3l:t mdlrpoIt fauluand faull acrive data. • minimizing nr!:work traffi c :> OfflO>ld ~ r8,ur ~ cod ~ [omputi"l domain. worldoad to Ethernet drive. from ,,~th bull.: opn improve the Icoiti ... ' e ofSSIO "> Lisl thenvailable knobs 2. ~ · A n .. r POS I)( file w rve .. .oft"·ar~ I)... trm. .vltam . upportin, in exislingproducls for power ffial::agem ent sucb S.H-iRT- £ rhllnwrDm'. .,Iobal nam ~ 'poce O\'er POSIK as disk sp~(spinup or J J.. HDD -DR.-tM f il ~ system and ~ar ·POSI)( !' '\ l Jl obj.let Itorag .. Iv,tem down), ARM: processor f.!rR!! . Par.llal o.ta PI.ca ....n t H H • paraliHI copy, ~ rHmov~, speed, ARM CPU power com~r .., li,t tool, :> Diltributed halhi"l! bued da t ~ consumption, DDR3 HMDA • hll ~ ully ut lli l ~ network bandwidth [ proacm'll dara '''''CUll from each Ethernet drive

Etbrrnfl Conn~ed DriVt ,,~!h ARM C PU. DDR3 mrmory. XTB diU< ..... cr. Etbrrnff link and tonn~i,ily; It run. rmbedded Linux OS. handle CR!UIt tode computini:. P2P dua placement. plIticip~lt HMDA ~d FPDR :octi,;!it •. support fil e Iyllrm. Obj~ IlOIt. md KIV 110ft II Seagate Kinetic KJV Drive J[ HGST Qt2enEthernet Drives Jl To.hih Kry'Vaine drin fOIpnforrnam;eand t~city J <::;,t E tI, em et drives to support geD-distributed, slI/ort storage t:> ldell iv caJ)acitl' o1l-demfl1ul e,,01l'i1l e vatll

Fig. 1: PASSI software infrastructure.

TABLE 1: Technical features of three different Ethernet connected products.

ARM Memory OS/Software DiskIHDD Ethernet Platform SSD 32-bit, One active Close platform, Seagate 700MHZ,I- 512MB KJV store 4TB and one API or using Planning core standby LLVM Open platform, 32-bit, Embedded 2GBDD3 4TB/8TB One active run Planning HGST LinuxOS I GHZ, I-core applications 4TB: two 2.5" 2 SSDs for HDDfor Two Toshiba 64-bit KJV store Same as performance N/A performance Ethernet Quad-core LinuxOS Kinetic drive One 3.5" SMR links drive for capacity N/A

DRAM errors of Ethernet drives. In Section V, we describe that can support software-defined storage (SDS). There are how we plan to leverage LANL's MDHIM (a Parallel three different Ethernet connected disk drive technologies KeyNalue store) and MarFS (a near-POSIX metadata (Table 1). These products possess attractive features: 1) An technology for Exascale storage systems) in PASSI. In Section Ethernet drive has embedded low-power ARM CPU and VI, we present a policy based power management approach for memory. 2) It runs Linux OS on the drive. 3) It can run data archive storage systems. We conclude this paper and outline processing services closer to the storage media. 4) It has our current and future development activities in Section VII. Ethernet access link and IP based connectivity. 5) It has a high capacity disk space and/or SSD storage. Ethernet drives II. BUILD ACTIVE AND SCALING STORAGE SYSTEMS USING enable us to eliminate high-end storage server requirements, SMART ETHERNET-CONNECTED DRIVES cut down power consumption of data storage systems, and move application services closer to storage media. Ethernet Drive technology provides a new paradigm shift The new Ethernet drives do not merely use Ethernet as a of storage architecture. Ethernet Drives are storage devices new connection interface; they move the communication protocols :from commands on read-and-write data-blocks to a higher level of abstraction. Ethernet drives can offioad traditional logical block address (LBA) management and provide new data p1acement schemes for dispersing object ..... ,-""'-... _., ...... data. This new technology is intended to simplify storage system software stack, allow drivc-to-drive data movement, and reduce significant drive I/O activities :from servers. In this paper, we use Ethernet drives to build a scale-out storage system and design PASS! to achieve self-management, self monitoring, and self-healing on this active storage system.

A.. Seagate's Kinetic Drives Seagate's Kinetic drive [1, 11] employs a new access interface and APIs. It uses a key-value data approach to store data at the object level on the drives. Kinetic drive's access interface is an attempt to bypass many layers of the current as file stack and provide an efficient software stack to accommodate current and future application demands. Seagate also provides an LLVM access interface [27], allowing us to run non-key/value storage services on Kinetic drives.

B. HGSTs OpenEthernet Drives HOST', Open Ethomet drive [2] (aka OED) has been designed as a micro-storage server. An OED has one singl.e core 32-bit 1 OHZ ARM processor, 2 OB DDR3 DRAM memory, one single Gigabit Ethernet link, and 4 TB disk: Fig. 2: Architecture of smart Ethernet-Drive storage system. space. It runs an embedded Linux as and has extra DRAM memory space and fast processor to run applications. Moreover, HOST's OED architecture enables us to take m. DESIGN PARALLEL ERAsURE CODING AND BALANCE DATA PLACEMENTFORPASSI advantage of Ethernet drives without having to implement the proposed PASSI software following a vendor-specific API Brasure-coded storage systems become :increasingly and architecture. popular for HPC due to their cost-effectiveness in storage space saving and high fault resilience capability [28]. C. Toshiba's KeylValue Drives However, the existing systems use a single macbjne or a multi threaded. approach to perform erasure coding, which is not Toshiba has released two Ethernet connected drive scalable to process the gigantic checkpoint/restart data sets for products [3, 4, 5]. The new performance based. Ethernet drive Bxascale science applications. To address this sca1ing tecbnology combines two 2.5-inches high-capacity SATA disk challenge, we propose a Parallel Erasure Coding (parEC) drives, two high-performance M.2 SSDs, two Gigabit Ethernet technique which can encode and decode small chllllks of a big ports, a four-core 64-bit ARM processor, and the Linux OS in data object in parallel on Ethernet drives, and a Parallel Data a 3.5-inch hard drive enclosure. The other capacity based Placement (parDP) technique that distributes era8mc segmenbl Ethernet drive uses an SMR disk for capacity-(:CI[bic pwpose. evenly to all available Ethernet drives in a system. Toshiba's Ethernet Drives adapt Kinetic Drive's interface A. ParEC - Parallel Erasure Coding assess APIs, which helps us to port our PASSI software between the two products. Multi-core processor design has been implemented in today's CPU technology. Each multi-core processor has its D. Ethernet Drives Meet Scale-Out Requirements own embedded SIMD subsystems [16]. It is a challenging task: Big data applications rely on being able to easily and to achieve a high parallelism for erasure coding and :maximize inexpensively create, manage, administrate, and migrate CPU utilization. We notice that the portion of System-IDLE unstructured data. Legacy file and storage architectures cannot state dominates the overall processing time and energy cost meet with these requirements, and thus cannot provide a high compared with I/O and erasure coding processes. When performance and cost-effective solution to Exasca1e encoding and decoding a very big file or dataset, using a computing [10]. The new Ethernet drive technologies provide single-process approach is not capable of achieving sustained a new, promising direction to address the extreme sca1e-out bandwidth for erasure code computation and data movement :in challenge for bulk storage applications. These include an HPC campaign storage system. We need to fully utilize the applications of primary storage, unstructured data, infonnation processing capacity of a storage cluster and provide parallel governance, analytical data, active archival, and cold storage. I/O capability. To this end, we propose a parallel erasure code technology and apply it to very large files and dataset, such as checkpoint-restart data and simulation results, for high performance scientific simulation. We extend the existing single-process or single-thread erasure coding method by adding task/function para1le]jsm and data parallelism. We propose a hybrid parallel approacb, naroed ParalJel Erasure Coding (parEC). It uses MPI parallel I/O for task parallelism and processor's SIMD extensions for data paralJelism [28, 29]. We apply this hybrid paralJelism to perform compute-intensive erasure coding and to provide a high performance, energy-saving, and cost-effective solution for future all-disk based HPC campaign storage systems. We demonstrate that ParEC can significantly reduce the power consumption, improve coding bandwidth, and speed up data placement. We evaluate ParEC on Ethernet drives equipped with 32-bit ARM CPUs and conduct performance and feasibility studies. We compare the vectorization technologies of Intel X86/SIMD and ARMINeon in terms of performance and energy use for erasure code based HPC campaign storage systems. With the combination of data paralJelism based on SIMD computation and task parallelism based on MPI parallel I/O, Fig. 3: Major components ofS'. parallel erasure coding software can improve the performance and scalability of erasure coding techniques, reduce the be recovered through parallel rebuilds on Ethernet drives using archival overhead of extreme-scale scientific data, and reduce ParEC to remedy data corruptions. ParDP also records power and energy consumption. with ParEC, we efficiently metadata, index, and data placement information for MarFS run more scientific simulations on extreme-scale computers and MDlllM to manage these metadata (see Section V). and thus enhance the system utilization. Moreover, the saved power and energy can be exploited by other components of the system, which further improves the system throughput in a N. S' (SELF-MONITORING, SELF-MANAGING AND SELF cost-effective manner. HEALING) STORAGE REsiLIENCE MANAGEMENT B. ParDP - Parallel Data Placement The design of S' storage resilience management software For a file with K data segments, ParEC produces M must satisfy a number of requirements: I) They should redundant code segments. The K +M encoding scheme can support scalable mouitoring and collection of performance, tolerate a loss ofup to M data segments and/or code segments. fault, and usage data. Thus onlioe analytics can provide key These segments are placed on K +M Ethernet drives according data to storage managers with low overhead. 2) They should to a placement policy. The reconstruction strategy in case of provide predictive capabilities of critical events that affect disk failures is also addressed in the placement po]jcy. storage reliability. The collected parameters should be We leverage the Ethernet drive technology and Peer-to correlated with Ethernet drive characteristics to gain a correct Peer data placement protocols to build a distributed data assessment of the drive state. 3) They should efficiently storage environment for PASS!. We propose a paralJel data manage and report faults and fault domains. Thus proactive placement mechanism called ParDP. ParDP is responsible for data rescue mechanisms can be employed promptly to heal the placing the data and code segments generated from ParEC faults. 4) They should improve the resilience of storage onto Ethernet drives. More specifically, we explore distributed software systems. The vulnerability of storage software to soft hashing methods, in which each Ethernet drive is responsible errors should be characterized and mitigated. for a data region on a circular space, called ring as shown in We propose a set of novel techniques for storage Fig. 2. To address the limitation of traditional consistent resilience assurance that integrate disk health monitoring, disk hashing methods for Exascale storage systems, we enhance the degradation detection, disk failure prediction, proactive data consistent hashing by assigning multiple points in the ring to rescue and memory error detection and correction with an Ethernet drive and using the concept of virtual drives. A distributed controls over Ethernet drives. Machine learning virtual drive is like a physical drive, but each physical drive technologies are explored extensively to automate these can be responsible for multiple virtual drives, depending on its management tasks. The novelty of S' lies in that it is the first capacity. This addresses the heterogeneity issue. When an of its kind that can automatically discover different types of Ethernet drive is predicted to fail (see Section IV), the data disk failures and accurately forecast the futore occurrences of stored on that drive are dispersed to other available drives failures for each type. This is based on a novel concept that we proactively and smoothly without data and computation propose, called disk degradation signatures [8], which disruption, probably at different geo-locations. Data blocks can characterize and quantify the deterioration process of drives B. FPDR - Failure Prediction and Proactive data Rescue Sub-system --Disk health record FPDR leverages regression learning and ensemble learning to track disks' degradation and memory health status, and apply the degradation signatures to forecast when disk failures will occur and detect memory errors. FPDR finds a spare drive and performs proactive data rescue for a failing drive, from which the data are migrated to the spare drive in background prior to a disk: failure. FPDR also corrects the detected memory errors. Ultimately, we apply FPDR to reduce and even avoid disk-rebuild overhead commonly seen in today',f storage systems and ensure storage systems are always available. Index ofhealth records C. Research Issues ofHMDA and FPDR Sub-systems Fig. 4: Disk performance degradation of a disk failUR. There are critical issues that we investigate for efficient, scalable storage resilience management 1) How to store the from having disk errors to becoming failed. We explore the collected runtime disk and memory health data? The runtime embedded processors and memory on Ethernet drives to disk SMART records and system logs are saved on the perfonn in situ analysis of the real-time disk health data, and attached disk of each Ethernet drive. To limit the footprint of the on-drive Ethernet ports to exchange analysis results for these data, we employ a monitoring window, whose size is higher-level aggregation and analysis. We also investigate the configurable. Only the health data inside the current correlation between disk workloads and disk degradation, monitoring window are stored locally. Those that are prior to which guides us to develop reliability-aware workload the cmren.t window have been lea:rned and incorporated in the scheduling strategies [17, 32]. disk. degradation and memory error models. Thus the raw data The embedded DRAM on Ethernet drives may experience can be discarded except for those that are correlated with errors and faults at runtime [24]. We propose software based critical disk and memory errors, which are transferred to the memory integrity checking and correction. We check data HMDA backmd processes for bookkeeping and aggregation. integrity on each memory access by using error-checking We evaluate different sizes of the monitoring windows and code. We use internal data and metadata checksums to detect exploiting machine I.earn:ing to select the best one at runtime. silent data corruptions. In addition, the PASSI software is 2) How to handle false negative disk failure predictions? Our vulnerable to soft errors. To ensure the resilience of PASSI disk fail= predictors keep tracking tho degradation .tatus of software system, we leverage soft error injection tools, such as each drive. Since disks follow gradual degradations in a our F-SEFI [30, 31], to evaluate the vulnerability of PASSI to production environment. our preliminary results show the SOCs and design effective mechanisms to mitigate these false negative rate is very low. Some of the false negatives can vulnerabilities. be logical failures, e.g., caused by corrupted files or file We propose two sub-systems of S3: health monitoring system faults. In these cases, the drives have no physical and degradation analysis (HMDA) andfailure prediction and damage. Diagnostic tools are used to find the software proactive data rescue (FPDR), as shown in Fig. 3. problems and correct them if possible. For missed disk hardware failURS, disk rebuild or parallel disk rebuild is A. HMDA- Health Monitoring and Degradation Analysis performed. An alternative strategy is to explore parallel Sub-system erasure coding (parEC) to rebuild data blocks. We expect the An HMDA daemon on each drive retrieves disk SMART performance degradation to be little because those failures arc readings, memory and system logs periodically from. the very rare. 3) How to detect memory errors and silent data attached disk drive. These runt.ime data arc machine-learned to corruption in Ethernet driver'? Non-chipkill memory build disk. and memory health models and detect anomalous technology is currently used in Ethernet drives. Without behaviors [8][18][19][20][21][22][23]. Aggregated disk health chipkill and an end-to-end data protection technology, data data arc collected through Ethernet connections from. these corruption can go unnoticed and recovery is difficult and active drives to HMDA ba.cbnd processes for higher-level costly, or even impossible to perform. Without end-to-end analysis, where disk replacement and memory fault events are integrity checking, these silent data corruptions can lead to automatically cross-referenced with the detected anomalies unexpected and unexplained problems [25]. W. apply and monitored health data to build and update diak software based memory integrity checking and correction [26] degradation signature and memory error models [8][24]. Fig. in addition to DRAM ECC. We check data integrity during 4 shows the performance degradation of a disk drive prior to a memory accesses by using error-checking code. We use failure. internal data and metadata checksums to detect silent data corruptions. MarfS

GPFS ~1DS I GPFS·~' I OS" 1 I·· .. I 1 Object Object PQS IX I)()SIX ... ••• Slora~e l SIOraKC'i FS , FS,

POM - Parallel data Mover, (GPFS.PFS. NFS. MOS. Object storage) using PFTOOL

I '1\1" I I 1.011'" \rl I I (1)\\1 M't II PI), " \PI I : : TSM I NfS I Archive I HPSS I P .....-5

Fia:. 5: MarfS lyStemarchitecture with metadataman1gement,

are UICd in MaIfIS 10 relOlOnl WI embedded infimDation of the V. MBTADATA SlfPPOR.T lNPASSI data store. IUCb. til data location.. MarFS enables the PASSI middlcwve to provide metadIIa ICMc:cI for both non-POSIX Mad'S [6] IDd MDJDM [7] _ two by tcdmologiel and POSIX HPC 1tonIp. invaded aDd dcliped by LANL'. f'CICIlCbcrI to ~ We intllglm tile MarfS &smewoU into PASSI 1Oftwan: mdIIdI!ta fw HPC ItOdp I)'IkmL ~ mil 1*1 J.WFS ~ to support hi&b pcrfurm'DN! metadI&a at ticeltod .II:IImI&CI cnaue code bued A. h~ Jt7Uh MdI'FS - A Hybrid JI~ ~ Ethermet-drive ~Jor /kMj'0SlX aJUl POSIXStorage.S)tJaat stonp .,.....

/l ~ Mth MHDIM - A PIUtIlhl ~.. MctadIta» .... ,1• '"lilly divided inln IWO broMI ~ F~fOr /fPC StlHvge strvcIwvl .-twiaID, which ~ tho desig!I and ipCCifieIItioa r:4 dIIa Itt'LlCUa, IDd descrlpttve "..wIata, We have developed a M~ Habing whic:b coapiIcI all oIhcr ilIIOdMtd mfomw:ion mcb til Izwkring ~ (MDInM) (7). a panlIcl Uy-vaB Itorc ~. meID;U" in'II:Ddcd usc .. provcDIIDCC. IIIIOC:iItiou, IIId framework cksipcd for HPC I)'Itlm». MDHIM diffcn from oontcU [10J. M.FS u cbIgMd to iDtcgmc both structural o1hCI" key-vtlue IIoIa: ill that it IIllJ'PC'N HPC nctworb mctId.aa IDd deIaiptivo DX'Iadata [6]. natively. UlCI dynamic ac:rvcn that Itltt md end with applicatiom:, WOIb with plugabl.c d.Wme 1w:kcnd .. onicrJ Ma!:FS it • nllllr-POSIX file IIystem using lC&lo-File System denotational view. For example, maintaining the disk using Distributed Object Stures , 2015 IEEE performance to be 100MB/s or above. The denotational-based International Conference on Massive Sturage Systems and policy ouly specifies the end goal, but lets the systems Technology (MSST 2015) Conference software determine how to achieve the goal. Both views can be [7] Hugh Greenburg, MDHIM: A Parallel KeyNalue applied to the case of power limiting as well. In this research, Framework for HPC, USENIX HotStorage Workshops, we study both types of policy, starting with policy based on 2015. [8] S. Huang, S. Fu, Q. Zhang and W. Shi. "Characterizing [21] J. F. Murray, G. F. Hughes, and K. Kreutz-Delgado. Disk Failures with Quantified Disk Degradation "Machine learning methods for predicting failures in hard Signatures: An Early Experience". In Proceedings of drives: A multiple-instance application". Journal of IEEE International Symposium on Workload Machine Learning Research, 6:783--1!16, 2005. Characterization (IISWC), 2015. [22] B. Eckart, X. Chen, X. He, and S. L. Scott. "Failure [9] Hsing-bung Chen, An Empricial Study of Performance, prediction models for proactive fault tolerance within Power Consumption, and Energy Cost of Eraaure Code storage systems". In Proceedings of IEEE International Computing for HPC storage systems , 20 IS IEEE Symposium on Modeling, Analysis and Simulation of International Conference on Networking, Architecture, Computers and Telecommunication Systems (MASCOTS), and Storages (NAS IS) 2008. [10] Storage Syatems and Input/Output to Support Extreme [23] J. Gray and C. V. Ingen. "Empirical measurements of Scale Science, Report of the DOE Workshops on Storage disk failure rates and error rates". Technical Report MSR Syatems and Input/Output, Dec. 8-11, 2014 TR-2005-166, Microsoft Research, 2005. [11] James Hughes, Seagate Kinetic Open Storage Platfonn, [24] V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. 2014 IEEE International Conference on Massive Storage Ferreira, J. Stearley, J. Shalf, and S. Gurumurthi. "Memory Syatems and Technology (MSST 2014) Conference Errors in Modem Syatems: The Good, The Bad, and The [12] Chung-Hsing Hsu and Stephen W. Poole, Power Ugly". In Proceedings ofA CM International Coriference on Measurement for High Performance Computing: State of Architectural Support for Programming Languages and the Art, In 2011 International Conference Green Operating Systems (ASPLOS), 2015. Computing Conference and Workshops (IGC) [25] Preventing Silent Data Corruption - Using Emulex Host [13] James William Smith, Ali Khajeh-Hosseini, Jonathan Bus Adapters, EMC VMAX and Oracle Linux , Oracle Stuart Ward, and Ian Sommerville, CloudMonitor: Profiling White paper, Sept. 2013 Power Usage, In 2012 IEEE 5th International Conference [26] Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci on Cloud Computing Dusseau, Rernzi H. Arpaci-Dusseau , End-to-end Data [14] Jack Dongarra, Hatem Ltaief, Piotr Luszczek, and Vincent Integrity for File Systems: A ZFS Case Study, In M. Weaver, Energy Footprint of Advanced Dense Proceedings of USENIX International Conference on File Numerical Linear Algebra Using Tile Algorithms on and Storage Technologies (FAST), 2010 Multicore Architectures, In 2012 Second Intemational [27] Chris Lattner Vikram Adve, LLVM: A Compilation Conference on Cloud and Green Computing (CGC) Framework for Lifelong Program Analyais & [IS] Chung-Hsing Han, Jacob Combs, Jolie Nazor, Fabian Transformation, 2004 Proceedings of the international Santiago, Rachelle Thyaell, Suzanne Rivoire. symposium on Code generation and optimization. Application Power Signature Analysis, HPPAC 2014 - The [28] James S. Plank, Eraaure Codes for Storage Syatems: A 10th Workshop on High-Performance, Power-Aware Brief Primer; login: the Usenix Magazine, December Computing 2013, Volume 38, Number 6. Usenix Association. [16]Aleksei Marov and Sergei Platonov, Effective method for [29] James S. Plank, Kevin M. Greenan and Ethan L. Miller. coding and decoding RS codes using SIMD instructions, Screaming Fast Galois Field Arithmetic Using Intel SIMD 2014 High performance Extreme Computing Conference Instructions, FAST 2013: 11th USENIX Conference on [17]Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, File and Storage Technologies. Samuel Lang, and Katherine Riley, 2417 Characterization of [30] Q. Guan, S. Fu, N. DeBardeleben and S. Blanchard. F Petascale I/O Workloads, 2014 Parallel Processing: 20th SEFI: A Fine-grained Soft Error Fault Injection Tool International Conference Euro-Par for Profiling Application Vulnerability, In Proceedings [18] G. Hughes, J. Murray, K. Kmetz-Delgado, and C. Elkan. of 28th IEEE International Parallel & Distributed "Improved disk drive failure warnings". IEEE Processing Symposium (IPDPS), 2014. Transactions on Reliability, 51(3):350-357, 2002. [31] Q. Guan, N. DeBardeleben, S. Blanchard and S. Fu. [19] J. Li, X. Ji, Y. Jia, B. Zhu, G. Wang, Z. Li, and X. Liu. Addressing Statistical Significance of Fault Injection: "Hard drive failure prediction using classification and Empirical Studies of the Soft Error Susceptibility, In the regression trees". In Proceedings of IEEEIIFIP 21st IEEEIIFIP International Symposium on Dependable International C01iference on Dependable Systems and Computing (PRDC), 2015. Networks (DSN), 2014. [32] S. Huang, S. Fu, N. DeBardeleben, Q. Guan and C. Xu. [20] J. F. Murray, G. F. Hughes, and K. Kreutz-Delgado. Differentiated Failure Remediation with Action Selection "Hard drive failure prediction using non-parametric for Resilient Computing, In the 21st IEEEIIFIP statistical methods". In Proceedings of International International Symposium on Dependable Computing Coriference on Artificial Neural Networks, 2003. (pRDC),2015.