LOCKSS Boxes in the Cloud

LOCKSS Boxes in the Cloud David S. H. Rosenthal Daniel L. Vargas LOCKSS Program, Stanford University Libraries, CA Abstract These and other libraries asked whether they could use “afford- The LOCKSS system is a leading technology in the able cloud storage” for their LOCKSS boxes. The Library of field of Distributed Digital Preservation. Libraries run Congress funded the LOCKSS team to investigate this ques- tion, and this is the final report of this work. LOCKSS boxes to collect and preserve content published on the Web in PC servers with local disk storage. They We first describe the relevant features of typical cloud services, then discuss the various possible technical architectures form nodes in a network that continually audits their by which a LOCKSS box could use such a service, ruling content and repairs any damage. Libraries wondered some out for performance reasons. Applying the typical ser- whether they could use cloud storage for their LOCKSS vice’s charging model to the remaining architectures rules oth- boxes instead of local disks. We review the possible ers out for economic reasons. We then describe an experiment configurations, evaluate their technical feasibility, assess in which we implemented the most cost-effective architecture their economic feasibility, report on an experiment in and ran a production LOCKSS box preserving GLN content which we ran a production LOCKSS box in Amazon’s in Amazon’s cloud service for several months. We report on cloud, and describe some simulations of future costs of the pricing history of cloud storage, using it and the costs from cloud and local storage. We conclude that current cloud the experiment to compare the projected future costs of stor- storage services are not cost-competitive with local hard- ing content preserved by LOCKSS boxes in cloud and local ware for long-term storage, including LOCKSS boxes. storage. We conclude with discussion of future work, some of which is already under way. Copyright c 2012 David S. H. Rosenthal 1 Introduction 2 Features of Cloud Technology “I really worry about everything going to the Amazon, as a typical cloud service, provides the following cloud,” [Wozniak] said. “I think it’s going to be components relevant here: horrendous. I think there are going to be a lot of horrible problems in the next five years. ... With the • A compute service in which a customer can run a num- cloud, you don’t own anything. You already signed ber of virtual machine instances. These instances have it away. ... I want to feel that I own things, ... A the normal resources of a physical PC but their (virtual) lot of people feel, ’Oh, everything is really on my disk storage is limited in size and is evanescent, vanish- ing without trace when the instance stops for any rea- computer,’ but I say the more we transfer everything 2 onto the web, onto the cloud, the less we’re going to son . Amazon calls this service Elastic Cloud Computing have control over it.” [12] (EC2). About 14 years ago the LOCKSS1 Program at the Stanford • An object storage service in which a customer can store University Libraries started building tools for libraries to col- arbitrary-sized byte strings. These objects persist, are lect and preserve content published on the Web. About 150 named by URLs, and are accessed via an API using HTTP libraries currently run LOCKSS boxes preserving e-journals, e- PUT, GET and HEAD requests. Depending on the ser- books, databases, special collections and other content. These vice, the object store may be more or less reliable. Ama- boxes are typically modest PC servers with substantial amounts zon calls this service Simple Storage Service (S3), and of local disk. offers two levels of reliability. The standard level is designed for 11 nines reliability, and Reduced Reliability Some of these libraries are members of the US National Digital Stewardship Alliance run by the Library of Congress. Storage (S3-RRS) is designed for 4 nines reliability. 1Lots Of Copies Keep Stuff Safe, a trademark of Stanford Univer- 2This is true for instances backed by S3, not by EBS. But EBS is sity. not reliable enough; see Section 5. • A block storage service, in which a customer can store computer that accesses the repository to perform the preser- a file system that can be mounted by virtual machine in- vation functions described in the OAIS Reference Model [9] stances running in the compute service. These file sys- including ingest, dissemination. integrity checking, damage re- tems persist across virtual machine restarts but are de- pair and management as part of a peer-to-peer network [13]. signed for performance rather than extreme reliability. The following architectures are feasible using current com- Facilities are normally provided by which snapshots of pute and storage service capabilities: the file systems can be reliably preserved as objects in the • A LOCKSS daemon running in a local machine with stor- object storage service. Amazon calls this service Elastic age in an object storage service. We have modified the Block Storage (EBS). LOCKSS daemon’s repository to use the S3 object stor- 2.1 Technical Aspects age interface and run experiments against S3, and also against Eucalyptus and the Internet Archive’s object stor- It is important to understand the interfaces between these com- age service, both of which are mostly compatible with ponents. Their performance characteristics determine the tech- S3. The performance we observe disqualifies this archi- nical viability of the various system architectures possible for tecture; extracting the content from S3 via its Web inter- LOCKSS boxes in the cloud: face for integrity checking and dissemination is too slow. • The interface between the outside world and the compute • A LOCKSS daemon running in a compute service virtual service, which is typically as slow as the corresponding machine instance with storage in the object storage ser- interface for a local computer. vice. The performance of this architecture led us to reject • The interface between the outside world and the storage it; even from the compute service the I/O performance of service, which is typically as slow as the corresponding the object storage service is much slower than local disk. interface for a local computer. • A LOCKSS daemon running in a compute service virtual • The interface between the compute service and the object machine instance with storage in the virtual machine’s storage service which is typically much slower than the disk. We implemented this architecture in EC2 with an interface between a local computer and its disk, or a local S3-backed virtual machine. It is disqualified for two rea- storage area network. sons. If the instance stopped for any reason the preserved • The interface between the compute service and the block content would be entirely lost, and the total space avail- storage service which is typically about the same speed as able is too small for practical use. the interface between a local computer and its disk. • A LOCKSS daemon running in a compute service vir- • The interface between the block storage service and the tual machine instance with storage in the block storage object storage service whose performance is typically not service. We implemented this architecture in EC2 with critical. EBS storage, but concerns about the unknown reliability of EBS and the cost of recovering from a total failure of 2.2 Economic Aspects EBS over the network from other LOCKSS boxes led us The charging models for the various services determine the to prefer the next architecture. economic viability of the various architectures possible for • A LOCKSS daemon running in a compute service virtual LOCKSS boxes in the cloud: machine instance with storage in the block storage ser- • The compute service typically levies charges based on the vice and a snapshot preserved in the object storage service size of the virtual machine and the time for which it is at regular intervals. We implemented this architecture in running. EC2 with EBS storage and snapshots in S3 and ran it for several months, see Section 5. • The object storage service typically levies charges based on the total size of the objects stored, the time for which We have examined some other architectures that require en- they are stored, the GET, PUT and other HTTP operations hanced capabilities from the object storage service. They are invoked on them, and the reliability level at which they discussed briefly in Section 7.2. are stored. 4 Economic Considerations • The block storage service typically levies charges based We now apply the Amazon charging model to the preferred ar- on the total amount of storage consumed by the file sys- chitecture: tems, the time for which it is stored, and the number of I/O transactions between it and the compute service. • A LOCKSS daemon running in a compute service virtual machine instance with storage in the block storage service • The service typically also levies charges based on the and a snapshot preserved in the object storage service at amounts of data transferred in each direction across the regular intervals. interface between the compute and storage services and the outside world. The following costs could be incurred: 3 Technical Architectures • EC2 charges for the virtual machine running the box. A LOCKSS box consists of some storage holding the preserved • EBS charges for storing the content. content in a repository, and a daemon, software running in a • EBS charges for I/Os to and from the content. • EBS charges for storing the backup snapshot. Week Compute Storage Bandwidth Total $ $ $ $ • Charges for inbound network usage for collecting the content. In Amazon’s case, inbound network usage is 2 53.76 2.43 0.16 56.36 free.

Load more