
TierStore: A Distributed Filesystem for Challenged Networks in Developing Regions Michael Demmer, Bowei Du, and Eric Brewer University of California, Berkeley {demmer,bowei,brewer}@cs.berkeley.edu Abstract To address these challenges, various groups have used novel approaches for connectivity in real-world applica- TierStore is a distributed filesystem that simplifies the de- tions. The Wizzy Digital Courier system [36] distributes velopment and deployment of applications in challenged educational content among schools in South Africa by network environments, such as those in developing re- delaying dialup access until night time, when rates are gions. For effective support of bandwidth-constrained cheaper. DakNet [22] provides e-mail and web connec- and intermittent connectivity, it uses the Delay Toler- tivity by copying data to a USB drive or hard disk and ant Networking store-and-forward network overlay and then physically carrying the drive, sometimes via mo- a publish/subscribe-based multicast replication protocol. torcycles. Finally, Ca:sh [1] uses PDAs to gather rural TierStore provides a standard filesystem interface and a health care data, also relying on physical device trans- single-object coherence approach to conflict resolution port to overcome the lack of connectivity. These projects which, when augmented with application-specific han- demonstrate the value of information distribution appli- dlers, is both sufficient for many useful applications and cations in developing regions, yet they all essentially simple to reason about for programmers. In this paper, started from scratch and thus use ad-hoc solutions with we show how these properties enable easy adaptation little leverage from previous work. and robust deployment of applications even in highly in- This combination of demand and obstacles reveals termittent networks and demonstrate the flexibility and the need for a flexible application framework for “chal- bandwidth savings of our prototype with initial evalua- lenged” networks. Broadly speaking, challenged net- tion results. works lack the ability to support reliable, low-latency, end-to-end communication sessions that typify both the 1 Introduction phone network and the Internet. Yet many important applications can still work well despite low data rates The limited infrastructure in developing regions both and frequent or lengthy disconnections; examples in- hinders the deployment of information technology and clude e-mail, voicemail, data collection, news distribu- magnifies the need for it. In spite of the challenges, a tion, e-government, and correspondence education. The variety of simple information systems have shown real challenge lies in implementing systems and protocols to impact on health care, education, commerce and produc- adapt applications to the demands of the environment. tivity [19, 34]. For example, in Tanzania, data collection Thus our central goal is to provide a general purpose related to causes of child deaths led to a reallocation of framework to support applications in challenged net- resources and a 40% reduction in child mortality (from works, with the following key properties: First, to adapt 16% to 9%) [4, 7]. existing applications and develop new ones with mini- Yet in many places, the options for network connec- mal effort, the system should offer a familiar and easy- tivity are quite limited. Although cellular networks are to-use filesystem interface. To deal with intermittent net- growing rapidly, they remain a largely urban and costly works, applications must operate unimpeded while dis- phenomenon, and although satellite networks have cov- connected, and easily resolve update conflicts that may erage in most rural areas, they too are extremely expen- occur as a result. Finally, to address the networking chal- sive [30]. For these and other networking technologies, lenges, replication protocols need to be able to leverage a power problems and coverage gaps cause connectivity to range of network transports, as appropriate for particular vary over time and location. environments, and efficiently distribute application data. USENIX Association FAST ’08: 6th USENIX Conference on File and Storage Technologies 35 As we describe in the remainder of this paper, Tier- Based in part on these observations, TierStore imple- Store is a distributed filesystem that offers these prop- ments a single-object coherence policy for conflict man- erties. Section 2 describes the high-level design of the agement, meaning that only concurrent updates to the system, followed by a discussion of related work in Sec- same file are flagged as conflicts. We have found that this tion 3. Section 4 describes the details of how the system simple model, coupled with application-specific conflict operates. Section 5 discusses some applications we have resolution handlers, is both sufficient for many useful ap- developed to demonstrate flexibility. Section 6 presents plications and easy to reason about for programmers. It an initial evaluation, and we conclude in Section 7. is also a natural consequence from offering a filesystem interface, as UNIX filesystems do not naturally expose a 2 TierStore Design mechanism for multiple-file atomic updates. When conflicts do occur, TierStore exposes all infor- The goal of TierStore is to provide a distributed filesys- mation about the conflicting update through the filesys- tem service for applications in bandwidth-constrained tem interface, allowing either automatic resolution by and/or intermittent network environments. To achieve application-specific scripts or manual intervention by a these aims, we claim no fundamentally new mechanisms, user. For more complex applications for which single- however we argue that TierStore is a novel synthesis of file coherence is insufficient, the base system is exten- well-known techniques and most importantly is an effec- sible to allow the addition of application-specific meta- tive platform for application deployment. objects (discussed in Section 4.12). These objects can be TierStore uses the Delay Tolerant Networking (DTN) used to group a set of user-visible files that need to be bundle protocol [11, 28] for all inter-node messag- updated atomically into a single TierStore object. ing. DTN defines an overlay network architecture for To distribute data efficiently over low-bandwidth net- challenged environments that forwards messages among work links, TierStore allows the shared data to be par- nodes using a variety of transport technologies, includ- titioned into fine-grained publications, currently defined ing traditional approaches and long-latency “sneakernet” as disjoint subtrees of the filesystem namespace. Nodes links. Messages may also be buffered in persistent stor- can then subscribe to receive updates to only their pub- age during connection outages and/or retransmitted due lications of interest, rather than requiring all shared state to a message loss. Using DTN allows TierStore to adapt to be replicated. This model maps quite naturally to naturally to a range of network conditions and to use so- the needs of real applications (e.g. users’ mailboxes lution(s) most appropriate for a particular environment. and folders, portions of web sites, or regional data col- To simplify application development, TierStore im- lection). Finally, TierStore nodes are organized into a plements a standard filesystem interface that can be ac- multicast-like distribution tree to limit redundant update cessed and updated at multiple nodes in the network. transmissions over low-bandwidth links. Any modifications to the shared filesystem state are both applied locally and encoded as update messages that are 3 Related Work lazily distributed to other nodes in the network. Because nodes may be disconnected for long periods of time, the Several existing systems offer distributed storage ser- design favors availability at the potential expense of con- vices with varying network assumptions; here we briefly sistency [12]. This decision is critical to allow applica- discuss why none fully satisfies our design goals. tions to function unimpeded in many environments. One general approach has been to adapt traditional The filesystem layer implements traditional NFS-like network file systems such as NFS and AFS for use in semantics, including close-to-open consistency, hard and constrained network environments. For example, the soft links, and standard UNIX group, owner, and per- Low-Bandwidth File System (LBFS) [18] implements mission semantics. As such, many interesting and useful a modified NFS protocol that significantly reduces the applications can be deployed on a TierStore system with- bandwidth consumption requirements. However, LBFS out (much) modification, as they often already use the maintains NFS’s focus on consistency rather than avail- filesystem for communication of shared state between ability in the presence of partitions [12], thus even application instances. For example, several implemen- though it addresses the bandwidth problems, it is unsuit- tations of e-mail, log collection, and wiki packages are able for intermittent connectivity. already written to use the filesystem for shared state and Coda [16] extends AFS to support disconnected oper- have simple data distribution patterns, and are therefore ation. In Coda, clients register for a subset of files to be straightforward to deploy using TierStore. Also, these “hoarded”, i.e. to be available when offline, and modi- applications are either already conflict-free in the ways fications made while disconnected are merged
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-