Xfs: a Wide Area Mass Storage File System
Total Page:16
File Type:pdf, Size:1020Kb
xFS: A Wide Area Mass Storage File System Randolph Y. Wang and Thomas E. Anderson frywang,[email protected] erkeley.edu Computer Science Division University of California Berkeley, CA 94720 Abstract Scalability : The central le server mo del breaks down when there can b e: The current generation of le systems are inadequate thousands of clients, in facing the new technological challenges of wide area terabytes of total client caches for the servers networks and massive storage. xFS is a prototyp e le to keep track of, system we are developing to explore the issues brought billions of les, and ab out by these technological advances. xFS adapts p etabytes of total storage. many of the techniques used in the eld of high p er- Availability : As le systems are made more scalable, formance multipro cessor design. It organizes hosts into allowing larger groups of clients and servers to work a hierarchical structure so lo cality within clusters of together, it b ecomes more likely at any given time workstations can b e b etter exploited. By using an that some clients and servers will b e unable to com- invalidation-based write back cache coherence proto col, municate. xFS minimizes network usage. It exploits the le system Existing distributed le systems were originally de- naming structure to reduce cache coherence state. xFS signed for lo cal area networks and disks as the b ottom also integrates di erent storage technologies in a uni- layer of the storage hierarchy. They are inadequate in form manner. Due to its intelligent use of lo cal hosts facing the challenges of wide area networks and massive and lo cal storage, we exp ect xFS to achieve b etter p er- storage. Many su er the following shortcomings: formance and availability than current generation net- Poor le system protocol : Frequently used tech- work le systems run in the wide area. niques, such as broadcasts and write through pro- to cols, while acceptable on a LAN, are no longer appropriate on a WAN b ecause of their inecient 1 Intro duction use of bandwidth. Dependency on centralized servers : The availabil- Recent advances in rob ot technology and tertiary me- ity and p erformance of many existing le systems dia have resulted in the emergence of large capacity dep end crucially on the health of a few centralized and cost e ective automated tertiary storage devices. servers. When placed on high p erformance wide area networks, Data structures that do not scale :For example, if such devices have dramatically increased the amount a server has to keep cache coherence state for every of digital information accessible to a large number of piece of data stored on the clients, then b oth the geographically distributed users. The amountof near- amount of data and the numb er of clients in the line storage and degree of co op eration made p ossible by system will have to b e quite limited. these hardware advances are unprecedented. Lack of tertiary support : Existing le systems do a The presence of wide area networks and tertiary stor- p o or job at hiding multiple layers of storage hierar- age, however, has brought new challenges to the design chy. Some use manual migration and/or whole le of le systems: migration. This is neither convenient nor ecient. Bandwidth and latency :At least in the short run, Some o er ad ho c extensions to existing systems. bandwidth over a WAN is much more exp ensive than This usually increases co de complexity. over a LAN. Furthermore, latency over the wide While others have addressed some asp ects of the area, a parameter restricted by the sp eed of light, problem [2,5, 9, 11, 13], there is yet no system that will remain high. handles b oth WANs and mass storage. While wedo This work was supp orted in part by the National Science not pretend that wehave reached the p erfect solution Foundation CDA-8722788, the Digital Equipment Corp oration of providing fast, cheap, and reliable wide area access the Systems Research Center and the External Research Pro- to massive amounts of storage, we present the xFS pro- gram, and the AT&T Foundation. Anderson was supp orted by a National Science Foundation Young Investigator Award. totyp e as a design p oint from which some of the issues increase scalability of the system. of mass storage in the wide area can b e explored and future measurements can b e taken. Server WAN vide the p erformance and Our goal in xFS is to pro UCB LAN availability of a lo cal disk le system when sharing is minimum and storage requirement is small. We observe y of the problems we face have also b een con- that man Client 0 Client 1 Client 2 Client 3 Client 4 Client 5 Client 6 UCLA UCLA UCSB UCSB UCSBUCB UCB sidered by researchers studying distributed shared mem- ory multipro cessors and there are manywell-tested so- Figure 1: Simple client-server relationship . lutions that can b e applied to our design. In particular, xFS has the following features: The mo del seen in gure 1 has a numb er of prob- xFS organizes hosts into a more sophisticated hi- lems. First of all, WANs have higher latency,lower erarchical structure analogous to those found in bandwidth, and higher cost. Although the situation is some high p erformance multipro cessors suchas exp ected to improve as the technology matures, webe- DASH [10]. Requests are satis ed by a lo cal cluster lieve wide area communication is likely to remain more if p ossible to minimize remote communication. costly and o er less bandwidth than its LAN counter- xFS uses an invalidation-based write back cache co- part for some time. Even if the bandwidth b ottleneck herence proto col pioneered by research done on mul- is solved, latency will continue to haunt a naive sys- tipro cessor caches. Clients with lo cal stable stor- tem. Even under the b est p ossible circumstances, the age can store private data inde nitely, consuming round trip delayover 1000 km will b e comparable to less wide area bandwidth with lower latency than if a disk seek. A simple client-server mo del as seen in all mo di cations were written through to the server. gure 1 fails to recognize the distinction b etween wide For data cached lo cally, clients can op erate on them area links and lo cal links. The result is overuse of wide without server attention. The Co da [18] study il- area communication, which in turn leads to high cost lustrates that this should o er b etter availabilityin and p o or p erformance. case of server or network failure. Secondly, lo cality plays a more imp ortant role in a xFS exploits the le system naming structure to re- wide area where clusters of hosts that are geographically duce the state needed to preserve cache coherence. close to each other are likely to share information more Ownership of a directory and its descendents can b e frequently and have an easier time talking to each other. granted as a whole. By avoiding the maintenance A le system that treats all clients as equals might make of cache coherence information on a strictly p er- le sense on a LAN where all clients are virtually p eers that or p er-blo ck basis, xFS reduces storage requirements exhibit little distinction. Such an organization b ecomes and improves p erformance. less acceptable in a wide area where paying more atten- xFS integrates multiple levels of storage hierarchy. tion to lo cality is likely to improve p erformance. xFS employs a uniform log-structured management Thirdly, the organization of gure 1 has problems and fast lo okup across all storage technologies. with scale. The server load and the amountofcache xFS is currently in the middle of the implementa- coherence state on the server increases with the number tion stage and we do not yet have p erformance num- of clients. A straightforward deploymentofmany exist- b ers to rep ort. The remainder of this p osition pap er ing le systems in the wide area can b e neither \wide" is organized as follows. Section 2 presents the hierar- nor \massive". chical client-server structure of xFS. Section 3 describ es In an attempt to solve these problems, xFS employs the cache coherence proto col. Section 4 describ es how a hierarchy similar to those found in scalable shared 1 xFS reduces the amount of cache coherence information memory multipro cessors gure 2 .To understand the maintained by hosts. Section 5 discusses the integra- motivation of such an organization, it is instructiveto tion of multiple levels of storage. Section 6 discusses study the analogy b etween DASH [10] and xFS. The some additional issues including crash recovery. Sec- DASH system consists of a numb er of pro cessor clus- tion 7 concludes. ters. Communication among the clusters is provided by a mesh network. This is analogous to WAN links in xFS where bandwidth is limited and latency is high. Each 2 Hierarchical Client-server Or- DASH cluster consists of a small numb er of pro cessors, analogous to clients on a LAN. Intra-cluster communi- ganization cation is provided by a bus. This is analogous to the LAN links in xFS where lo cal communication is cheap Andrew [6], NFS [17], and Sprite [20] all have a simple and fast. one-layer client-server hierarchy. While it is conceivable The analogy,however, do es not stop here.