DFS Case Studies, Part 2

The Andrew (from CMU) Case Study ­ Andrew File System

● Designed to support information sharing on a large scale by minimizing client­server communications ● Makes heavy use of caching technologies ● Adopted as the basis for the DCE/DFS file system in the Open Software Foundation's Distributed Computing Environment (DCE) AFS Characteristics

● Provides transparent access to remote shared files for UNIX programs (using the normal UNIX file primitives) ● Programs access to remote files without modification or recompilation ● AFS is compatible with NFS, in that files may be remotely accessed using NFS ● However, AFS differs markedly from NFS in its design and implementation AFS Design Goals

● Scalability is the most important design goal for the AFS designers ● Designed to perform well with larger numbers of active users (when compared to NFS) ● Key strategy in achieving scalability is the caching of whole (complete) files in client nodes AFS Design Characteristics

● Whole­file serving ­ the entire contents of directories and files are transmitted to client computers by AFS servers ● Whole­file caching ­ once a copy of a file (or a file­ chunk) has been transferred to a client computer, it is stored in a cache on the local disk; the cache is permanent, surviving reboots of the client computer and it is used to satisfy clients' open requests in preference to remote copies whenever possible AFS Observations

● Shared files that are infrequently updated (such as UNIX commands and libraries) and files accessed solely by a single user account form the overwhelming majority of file accesses ● If the local cache is allocated a sufficiently large amount of storage space, the "working set" of files in regular use on a given workstation are normally retained in the cache until they are needed again ● AFS's design strategy is based on some assumptions about the average and maximum file size and locality of reference to files in the UNIX environment AFS Assumptions

● Most files are small ­ less than 10 k­bytes (typically) ● Read operations are six times more likely than writes ● Sequential­access is common, random­access is rare ● Most files are read/written by a single user; even when a file is shared, typically only one of the sharers updates it ● Files are referenced in "bursts" ­ there's a high probability that a recently accessed file will be used again in the near future AFS Gotcha!

● There is one important type of file that does not fit in the design goals of AFS ­ shared databases ● Databases are typically shared by many users and updated frequently ● The AFS designers explicitly excluded the provision of storage facilities for databases from the AFS design goals ● It is argued that the provision of facilities for distributed databases should be addressed separately AFS Questions

● How does AFS gain control when an open or close system call referring to a file in the shared file space is issued by a client? ● How is the AFS server that's holding the required file actually located? ● What space is allocated for cached files in workstations? ● How does AFS ensure that the cached copies of files are up­to­date when files may be updated by several clients? AFS Software Components

● AFS is implemented as two software components ­ Vice and Venus ● Vice is the server component, and runs as a user­level UNIX process within the server's process space ● Venus is the client component Vice and Venus

Workstations Servers

User Venus program Vice UNIX kernel

UNIX kernel

User Venus program Network UNIX kernel

Vice

Venus User program UNIX kernel UNIX kernel Dealing with File Accesses

● Files are either local or shared ● Local files are handled in the usual UNIX way (by UNIX) ● Shared files are stored on servers, and copies of them are cached on the local disks of workstations (as required) ● The AFS namespace is a standard UNIX hierarchy, with a specific sub­tree (called cmu) containing all of the shared files The AFS Namespace

Local Shared / (root)

tmp bin . . . vmunix cmu

bin

Symbolic links Important Points

● The splitting of the file namespace into local and shared files leads to some loss of location transparency, but this is hardly noticeable to users other than system administrators ● Users' directories are always stored in the shared space, enabling users to access their file from any workstation on the network ● Each workstation's kernel within AFS is modified to intercept the file access system calls and pass them to Venus when they are non­local accesses System Call Integration

Workstation

User Venus program UNIX file Non­local file system calls operations

UNIX kernel

Local disk AFS Caching

● One of the file partitions on the local disk of each workstation is used as a cache, holding the cached copies of files from the shared space ● The workstation cache is usually large enough to accommodate several hundred average­sized files ● This renders the workstation largely independent of the Vice servers once a working set of the current user's files and frequently used system files has been cached AFS Volumes

● Files are grouped into volumes for ease of location and movement ● Each user's personal files are generally located in a separate ● Other volumes are allocated for system binaries, documentation and library code Open/Read/Write/Close within AFS

User process UNIX kernel Venus Net Vice open(FileName, If FileName refers to a mode) file in shared file space, pass the request to Check list of files in local cache. If not Venus. present or there is no valid callback promise, send a request for the file to the Vice server that is custodian of the volume containing the Transfer a copy of the file. file and a callback promise to the workstation. Log the Place the copy of the callback promise. file in the local file Open the local file and system, enter its local return the file name in the local cache descriptor to the list and return the local application. name to UNIX. read(FileDescriptor, Perform a normal Buffer, length) UNIX read operation on the local copy. write(FileDescriptor, Perform a normal Buffer, length) UNIX write operation on the local copy. close(FileDescriptor) Close the local copy and notify Venus that the file has been closed. If the local copy has been changed, send a copy to the Vice server Replace the file that is the custodian of contents and send a the file. callback to all other clients holding callback promises on the file. Cache Consistency

● On the Vice server there's a "callback promise" process that guarantees that it will notify the Venus process whenever any client modifies a file ● When a server performs a request to update a file, it notifies all of the Venus processes to which it has issued callback promises by sending a callback to each ● A callback is a remote procedure call from a server to a Venus process More Caching

● Whenever Venus handles an open on behalf of a client, it checks the cache ● If the required file is found in the cache, then its token is checked ● If its token value is canceled, then a fresh copy of the file must be fetched from the Vice server ● If its token value is valid, then the cached copy can be opened and used without reference to Vice Why Callbacks?

● The callback­based mechanism for maintaining cache consistency offers the most scalable approach ● It has been shown to dramatically reduce the number of client­server interactions The Vice Service Interface

Fetch(fid) ­> attr, data Returns the attributes (status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified file. Create() ­> fid Creates a new file and records a callback promise on it. Remove(fid) Deletes the specified file. SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. ReleaseLock(fid) Unlocks the specified file or directory. RemoveCallback(fid) Informs server that a Venus process has flushed a file from its cache. BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file. Update Semantics

● The goal of the cache consistency mechanism is to achieve the best approximation to one­copy file semantics that is practicable without serious performance degradation ● It has been shown that the callback promise mechanism maintains a well­known approximation to one­copy semantics AFS Updates

● AFS does not provide extra mechanisms for the control of concurrent updates ● When a file is closed, a copy of the file is returned to the server, replacing the current version ● All but the update resulting from the last "close" will be silently lost (with no error report given) ● Clients must implement concurrency control independently ● Despite this behaviour, AFS's update semantics are sufficiently close for the vast majority of existing UNIX programs to operate correctly AFS Performance

● When measured, whole­file caching and the callback protocol led to dramatically reduced loads on the servers ● Server loads of 40% were measured with 18 client nodes running a standard NFS benchmark, as opposed to a nearly 100% load using NFS with the same benchmark ● Transarc Corp. installed AFS on over 1000 servers at 150 sites ­ the survey showed cache hit ratios in the range 96­ 98% for accesses to a sample of 32,000 file volumes holding 200 Gig of data DFS Enhancements/Future Developments ● (For full details, refer to the textbook, pages 359ff) ● WebNFS ­ allows access to NFS servers from the WWW, Java applets, etc. ● Spritely NFS ­ based on Sprite OS, adds "open" and "close" to NFS ● NQNFS ­ (not quite) ­ adds caching and callbacks to NFS ● NFS version 4 ­ in the advanced stages of development and deployment, and on the Internet standards track DFS Summary ­ Key Design Issues

● The effective use of client caching ● The maintenance of consistency (when files are updated) ● Recovery after client or server failures ● High Throughput ● Scalability DFS ­ Current State

● DFSes are very heavily employed in organizational computing, and their performance has been the subject of much tuning ● NFS is still the dominant DFS technology ● However, AFS outperforms NFS in many situations

● Current state­of­the­art DFSes are highly scalable, provide good performance across both LANs and WANs, maintain one­copy semantics, and tolerate/recover from failures