ZZFS: a Hybrid Device and Cloud File System for Spontaneous Users
Total Page:16
File Type:pdf, Size:1020Kb
ZZFS: A hybrid device and cloud file system for spontaneous users Michelle L. Mazurek*, Eno Thereska+, Dinan Gunawardena+, Richard Harper+, and James Scott+ *Carnegie Mellon University, Pittsburgh, PA +Microsoft Research, Cambridge, UK Abstract her personal medical file (that she does not trust the cloud for storing) from the beach while on holidays later in the Good execution of data placement, caching and consis- week. In all likelihood she will find the above tasks im- tency policies across a user’s personal devices has always possible given that her home computer is most likely dor- been hard. Unpredictable networks, capricious user be- mant or off, and she has not had time to specify any par- havior with leaving devices on or off and non-uniform ticular data replication policy among the computer and energy-saving policies constantly interfere with the good the smartphone, or hoarded the files beforehand. intentions of a storage system’s policies. This paper’s The second example illustrates a consistency problem contribution is to better manage these inherent uncertain- and is taken from Live Mesh’s [14] mailing list. Many ties. We do so primarily by building a low-power com- technology-savvy users experienced frequent conflicts munication channel that is available even when a device is off. This channel is mainly made possible by a novel with music files. A single user would listen to music on device A, then later listen to the same music on device network interface card that is carefully placed under the B while A was turned off (the files were kept in peer-to- control of storage system protocols. peer sync between A and B because the user did not have The design space can benefit existing placement enough space on the cloud to store all files). Because the policies (e.g., Cimbiosys [21], Perspective [23], particular music player software updated song metadata Anzere [22]). It also allows for interesting new ones. We (like play count and rating), it turns out that this is not a build a file system called ZZFS around a particular set of read-only workload. As a result, the syncing generated policies motivated by user studies. Its policies cater to conflicts requiring manual resolution whenever the user users who interact with the file system in an ad hoc way switched devices. It is unfortunate that even in the ab- — spontaneously and without pre-planning. sence of true multi-user concurrency, a single user can still get an inconsistent view of the system. 1 Introduction This paper’s main contribution is to build a low-power, always-on communication channel that is available even Much work has been done in developing appropriate when a device is off. The hypothesis is that this channel data placement, caching and consistency policies in the reduces the likelihood that a device is unreachable and “home/personal/non-enterprise” space (e.g., see [8, 10, thus helps the execution of data placement and consis- 16, 19, 20, 21, 22, 23, 24, 26, 28]). Good policies are tency policies. We build this channel using new hardware crucial in maintaining good performance, reliability and and storage system protocols. availability. Unfortunately, there are many barriers that On the hardware front, we incorporatea novel network make the execution of such policies far from automatic. interface card (NIC) in the design of the overall storage These barriers often stem from the unpredictabilityof ev- system (Section 3.1). The NIC maintains device network eryday life, reflected in variable network resources, de- access with negligible energy consumption even when vices being off or dormant at inconvenient times, and the device is dormant. The NIC is able to rapidly turn on users’ time and priority given to data management. the main device if needed. The ability to turn on the main Consider two mundane examples (Section 2 has device can be thought of as Wake-on-Lan(WoL) [11] “on more): In the first example, a busy mom desires to show steroids,” in that the NIC operates through any firewalls a friend in the mall a photo that happens to be on the or NAT boxes, does not need to know the MAC address home computer. That same person might wish to access of the dormant device, and handles mobility across sub- nets. The NIC also exports to our storage system a small very likely that when U edits F from the work machine, on-board flash storage. While the hardware part of the the tablet is dormant so the changes do not immediately NIC is not a contribution of this paper, we build the stor- propagate to the tablet. Typical implementations of this age system software around it. policy make use of a transaction log L that keeps track of We design the I/O communication channel on top of the changes U makes on the work machine. The log is the NIC by leveraging several technical building blocks. later replayed on the tablet to maintain consistency. These are not new individually, but, as we discovered, When the photographer later on moves to work on the work well together to lead to a usable system. In partic- tablet, the log will still be on the now-dormant work ma- ular, we use data placement protocols based on version chine. Thus, the tablet is not able to replay the log. The histories for ensuring consistency (Section 3.3); I/O of- user has two options, neither which leads to great satis- floading [15, 29] is used to mask any performance laten- faction with the system: option 1 is for the user to manu- cies of turning on a device on a write request by using ally turn on the work machine and wait until all the data the NIC’s flash storage as a versioned log/journal (Sec- is consistent. This option is implicitly assumed in Per- tion 3.3); and users get a device-transparent view of the spective, for example. Option 1 may be out of reach for namespace with the metadata by default residing on the non tech-savvy users who just want to get on with their cloud. Metadata can also reside on any device with the work and do not understand they have to wait (“for how always-on channel implemented (Section 3.2). long?”) for consistency to catch up. Fundamentally, our approach makes good use of any Option 2 is to continue working on the stale copy of always-on resources, if available (such as the cloud or F on the tablet, keep a separate transaction log L2 of the a home server), but also actively augments the number work in the tablet, and then later on, when both machines of always-on resources by turning any personal device happento be up at the same time, havea way to reconcile with the new network interface card into an always-on re- L and L2. In the best case, the copies can be reconciled source. Perhaps subtly, however, it turns out that having a automatically (e.g., the user is working on two different few extra always-on resources allows for interesting data parts of the photo that can be just merged). In the worst placement policies that were not possible before. We ex- case, manual conflict resolution is required. Option 2 plore these through building a file system called ZZFS. is in fact the only option if there is truly no other way We chose to implement a unique set of data placement the devices can communicate with one another (e.g., if and consistency policies that cater mostly to spontaneous the user is on a plane with the tablet and with no net- users (Section 4). These policies were partially influ- work connectivity). However, it seems wasteful human enced by qualitative user research. However, other poli- effort that the user has to resort to this option even when cies (e.g., Cimbiosys [21], Perspective [23], Anzere [22]) the network bandwidth in many places (e.g., within the would equally benefit. home, or work) would be perfectly adequate for auto- matic peer-to-peer sync, if only the devices were on. 2 Background on the problem Example 2: Device transparency: Several systems advocate device transparency, where the namespace re- Users often have access to a set of devices with storage flects ones’ files and data, not the device where they re- capabilities, such as desktops, laptops, tablets, smart- side. Eyo, for example, allows a user to list from any phones and data center/cloud storage. Data placement device the metadata (e.g., name) of all files, residing in policies revolve around deciding which user’s data or all subscribed devices [26]. We like the idea of the meta- files go onto which device. Often, a data placement data being always available, but want to help further by policy indicates that the same file should be placed on satisfying the user’s data needs as well. Imagine a user multiple devices (e.g., for better reliability, availability U having the names of all her documents, photos and and performance from caching). Consistency policies re- videos, displayed on her tablet. When U meets a friend volve around ways of keeping the multiple file replicas in in the mall, she wishes to show her a short video from sync as to provide the abstraction of a single file to users. a birthday party. The video happens to physically re- We illustrate problems related to the execution of these side on her home computer (although the metadata is on policies through three simple examples, that reflect poli- the tablet).