DRAFT: The Story of the Write Once

Simson L. aarllnkel IRIS Brown University

@ Auguat 1, 1087

Abstract 1 Introduction

Thi. pap"r. d ... crib ... two file sy.temo which were In ~pring HIRI), I was @mployed as an designed for use with write-once media: The undergraduate researcher at the MIT Media Compact Disk File System (CDFS) and the Write Laboratory's Electronic Publishing Group.! That Once File System (WOFS). CDFS was d"siglled at spring, the laboratory received a Sony CDU·1 the MIT Media Laboratory during the summer of • prototype CDROM reader. Walter Bender, my 1985 and wa.s successfully used to master five research advisor, asked me if I would like to CDROM disk •. Althuugh CDFS WaS intended for experiment with the unit. use with write-once media, several design flaws prevented this possibility. The Write Once File For many years, the Electronic Publishing System, desiglled during the summer of 1987, Group (forma.lly called the Architecture M ..chine corrects the flaws of CDFS. Group) hrul. been interested in optical storage devices, in particular optical video disks, both This paper alBo describes an application because of the l.. rge amount of available storage and the relatively low access times when compared program written using CDFS which exports CDFS 2 file systems to remote computers via a high speed with other media, such as videotape. network and Sun Microsystem's Network File The CDU-l's minimal documentation System (NFS) protocol. indicated that the drive could read 2K blocks from a "CDROM" disk, each which was identified hy a This paper is divided into chapters. Each minute, second and block number. We tried an chapter designs a particular project involving one audio disk in the player and discovered that audio of the above file systems. For the purpose of reprinting, irrelevant chapters may be omitted. 1 An innovative program at MIT, "UROP' (undergradu­ a.te research opportunities program), alJows undergraduates to participate in research projects in the Institute'. departments The research described in this paper performed a.nd 16boro.toneo. at the MIT Media Lab was made possible by a :JOne of the Group's ea.rlier projects, "Aspen," allowed a. grant from IBM. person sitting in front of a touch sensitive monitor to "drive" a simulated car around the .treet. of Aspen, Colorado. To perform the demo. over 20,000 different views of the city had J. Spencer Love, formally of MIT Information to be stored on a video disk. This was one of the first uses of Systems, contributed significantly to the design of interactive video di.ks. Other project. by the group included CDFS. the use of write-once video disks in anima.tion.

1 -

WOFS Simson L. Garfinkel palSe 2 disks were not in the CDROM format. The test Over the next six months, under the disk which Sony had been provided contained supervision of Dr. Bender, I developed a file system approximately 10MB of test patterns on it; the rest for use with Write Once Media. At first, the effort of the disk was blank. "I wish they had asked me consisted largely of late-night design meetings with what to put on the rest of the disk," a graduate J. Spencer Love, then a system programmer at student said to me. "I would have given them MIT Information Systems, trying to devise a way pictures." to efficiently store and retrieve information from a write-once media. By the middle of the summer I We soon realized that in order to use our new had started on the file system's first CDROM player, we were going to have to make our implementation, which first became operational on own disk. The next question was how to arrange August 15, 1985. the information on the disk. In Spring 1985, neither Digital's UNIFILE CDROM format nor the Since write-once drives and media were not High Sierra standard existed. Such standards were available to use at the time, I developed the file under development at the time, but we were largely system using a write-once simulator. The simulator ignorant of such efforts. Without pre-defined presented the file system implementation with th" standards, the author decided to develop his own. appearance of a write-once device using a file resident on a magnetic disk. When we started A simple way for us to have used our CDROM mastering CDROMs later th~.t year, we discovered player would have been to master a CDROM that that the simulator system doubled as an excellent contained an exact, block-for-block of mastering system: the simulator file, copied to a an existing file system. With such a di.lc .url a nine-track magnMic tap.. in ANSI format, was all block level , we could read the disk as that was required by 3M corporation to master our a read-only file system on the same computer from CDROMs. which the data was originally "TP.at"rI. Other computers could read the disk by using a suitable conversion library or through some sort of 1.1 Design goals inrlependent such as Sun Microsystem's NFS. Our two primary goals in designing the file system were: The problem with the disk block image approach was that it would not be extensible to write-once optical devices as they became available, 1. That the file system operate with the same and the group'. experience with video disk. led us level of performance with any size of to believe that write-once devices would soon be write-once media, whether it be a 50MB available. Although applications such as optical card, a 650MB write-once COROM, or encyclopedias and maps would work fine with large a. 10GB jukebox or optkal plAtters, IJ.ml !ll .. ! read-only databases, other projects which we were this performance be comparable to interested in, such as personalized newspapers, performance obtained with magnetic media. would not. "It'. much mure interesting to think -and- about what you can do with a. 300MB database that is constantly growing," I told Dr. Bender, 2. That performance of the file system would not "than to think about all the ways to access 3UUMB degrade as a disk became filled with files and of static data.. Another application I was interested directories. by Was the possibility of using .. write-once optical disk system as a personal, portable mass storage system for coherent use with the wide variety of Since we didn't have a write-once device computers available at MIT. during the design process, we decided that we had WOFS Simson L. C",rfinkcl page 3

to make as few assumptions about the physical Micro-CDFS, to prove to Dr. Bender that it could llil.ture of write-once media as possible. The only be written in less than 10K of object code, hence assumptions that we made were that sequential the name. (The actual MCDFS implementa.tion blocks of data could be written to the write-once was less than 5K of object code and provided device, that the block size used by the device would complete emulation for all of the system be a constant from the first block to the last calls to access files in a read-only fashion.) (although no specific block size was assumed), that blocks once written could be rt'.ad in any order, and that the write-once hardware could determine In the Spring of 1987, several research groups whether a block had been written or was virginal. at MIT acquired write once devices and an initial attempt was made to use the CDFS. At this time, We did not assume that blocks could be the author discovered that a basic assumption invalidated or destructively written after the initial made by the CDFS implementation-that write write operation. We further did not assume that a opera.tions to the optical media were completely media would be consistently mounted on the same reliable-was invalid. Design of a new version of operating system or computer: that is, we wanted a CDFS (later named WOFS (Write Once File p.ll_on using our file system to be able to freely System)) with the assistance of James Anderson of move an optical platter from one computer to MIT's Project Athena was commenced at that another, even if the two computers used different time, but final implementation was delayed until operating systems. the author finished his undergraduate thesis' and graduated from MIT. This first version of the write-once file system was finished ill September 1985 and was called the Compact Disk File System (CDFS), it's name In June 1987, I was employed by the IRIS indicating that it was designed primarily for use project at Brown University to write a program to with comp ... cL di.ks (In 1911.:l, the author had allow the CDllOM mastered in the CDt·S format foolishly hoped that hardware vendors would be to be accessed via. Sun Microsystem's Network File supplying consumers with devices which would be System. Bec,,:use the project was designed to access ... hle to record in the audio CD and CDROM CUKOMs, the ptogram \CDFSD - Compact Disk format.) The implementation was written in File System Daemon) only implemented the NFS porta.ble C code and operated without modification procedures necessary for read operations. The on Digital Equipment Corporation's VAX series of read-only NFS server can be operated on any BSD computers under UNIX, Sun Microsystem's 4:2 (or later) UNIX computer which supports sun's workstations, and on IDM Personal Computers. portmapper RPC protocol and has a device driver capable of reading blocks from a CDROM player. During the following year and a half, CDFS It is currently in use with an IBM RT fPC was used to master four CDROMs at the MIT workstation. Media Laboratory and one CDROM at Brown University's IRIS project.3 In July and August 1987. the author In December 1986 and January 1987, I completed the development of WOFS. developed a read-only CDFS implementation, the

3The four MIT CDROMe consisted. of an initial test disk, • disk containing a copy of the CIA'. World Databank II and associated information, and two disks of encoded motion se.­ quences for the group'. "Movie'. of the Future" project. The IRIS disk consisted of the Thesaurus Linguae Graecae Da.ta • Radio Reswrch, McCartJiVi,m, and Paul F. Lazors/eld, Bank (aU remaining greek document. from the ancient world) Simson 1. Garfinkel, undergraduate thesis, M...... chusett. In­ and a.sociated index files. stitute of Technology, June 1987. WOFS SimsOll L. GarfinkeJ pa.ge 4

2 The Compact Disk File had been previously stored. Other file system System structures include Directories, File Headers and Lists.

2.1 How CDFS works File system structures contain pointers to other :file systems structures on the same media. These pointers are called c:dbloc:k pointers, and cnFS organizes write operations to the media. as " reference a particular block number and offset stream of sequential block writes. No blank blocks within that block. By allowing cdblocll: pointers to are left in the anticipation of future block writes. reference exact bytes, the CDFS specification allows CDFS' approach divides a disk into two discrete multiple :file system structures to be packed within regions: one in which blocks have been recorded the same logical block, although this possibility was and one in which they have not. The file system not exploited in our first implementation. implemE!'ntation kQ9ps record of where the dividing point between these two regions is by locating the The most recent EOT on the disk contains a last written block and maintaining that location pointer to the most recent Directory List on the during th .. cour." of all op"rations. disk. The Directory List contains an array in which each element of the array identifies a directory on When the implementation mounts an optical the disk and has a cdblocll: pointer to the most disk it first r~"n. t.he first block on the disk which recent version of that directory. (The full CDFS cont;Uns a special block called an End Of Directory List specification is presented in Transaction (EOT) block. The EOT contains, in a appendix A.2.) Similarly, each directory on the fixed byte,oM"r rApr ..."ntation, an 8 byte flag disk contains a cdblocll: pointer to the most recent which identifies the disk as a CDFS·format disk version of every file it contains. l:lnd. CQutn.ino other irnportAont in£v'[llla.tlvu. (Th'tl full CDFS EOT specification i. pr... .,nted in The Directory List is the key to an efficient Appendix A.1) implementation of a modifiable on a write-once disk. Since CDFS stores The End Of 'Transaction block i. so call

designed to be compact to further decrease storage 0 tha' EOT on disk (,uperblof;k) requireruent:s. "lif«.c" IlI.he.d,n and contenl...... ,1 In practice, write operations to the optical disk r'" 3 ""h ••l.e" tileh • .,der and conte.h are hatched in group. caJled Tran.actions. During i"<' , root diree'ory a Transaction, new files are written to the disk as ::;: , directory Ii,. they are created, but changes to their containing ~ i ..i EOT on di.k directorie. and th" Dir",tory List are buffered In • (bl.Db /ollow) memory. At the close of the Transaction, all modified directories, the Directory List, and a new EOT a.re writ tell and the disk can be removed from the drive. In the event of an interrupted Figure 1: Direct Read After Write disk after a sam­ Transaction (such as a power failure or removal of ple transaction the disk frum the drIve before dismounting), the File Header which is written with each file contains enough information to reconstruct the directories, o The first block of a CDFS disk contains an Directory List, and EOT which had not yet been EOT, which identifieo the di.k as a. CDFS disk. written. This block also contains the name of the disk's owner, the time of the disk's creation, the name of the .ite which created tll" CD and 2.2 Two Example Transactions other interesting information. 1 The file li/e.e is stored contiguously, preceded Although the CDFS standard does not specify the by its File Header. Since the File Header order in which Transactions should occurs, most contains a cdblock pointer to the contents of implp.mAntations will follow a prOCGSS simila.r to thP filA, th .. Fi]", H""d". could equally well b" example outlined bellow. written after the file. The file begins in the same block as the header and extends for two Tn this Py;\.mplp/ t.h~ writl:!'~ont:Q m9dia ~tarh block•. blank. In the first Transaction, the two files, life.c and wheel.c are written and placed in the root 3 The file wheel. e is the second file on the CD, directory. In the second Transaction, a new version stored as file header followed by file contents. The file and its contents fit within the single of the file life.c is written. This example Transaction is intentionally simple. block. 4 The root directory follows. The directory As defined in the standard, the first block on. contains a cdblock pointer to each file within the disk must be an EOT. During the course of the it. cdblock pointers are drawn as arrows in first transaction, the two files are written to the this example. disk. When the transaction is completed, the CDFS implementation writes a new copy of the 5 The Directory List follows the directories. It root directory to the disk, a. new Directory List, contains a cdbloc:k pointer to each directory and a new EOT. The EOT contains a pointer to on the disk. the Directory List, which contains a pointer to the 6 An EOT is the last block written to the CD. It root directory, whkh contains pointen to each of contains a cdblock pointer to the Directory the files. The state of the disk at the completion of List. the first transaction is depicted bellow:

The following blocks were written on the first The section transaction takes place transaction: independently of the first. At the start of the WOFS Simson L. Garfinkel page 6

• cdblock pointers to the most current version • "HI.. ,." 111 ...... <1 ...... " ..... , ... t. of each me within it. Note that directories can , (and often do) reference files which were , Uwheel.c" &lehlilader 6D.d con ten', written on previous Transactions. - • 1 11 The Directory List is written after the root • directory. It contains a pointer to the most J. EOT • recont vorsion of that directory. r'" 1 l- • 12 The EOT block is written last. It contains a • pointer to the most recent Dir""tory List. It roo' directory ( ••eGad vfl,ian) ~ ,. f-- also contains a pointer to the previous EOT. 11 dir.etory lilll. (.eeond venion) C 12 I-- 1••• EOT on ditk (bhutk. foUow) 2.3 Design choices made in CDFS

The design choices made in developing the CDFS Figure 2: Direct Read After Write disk after a second were based both upon the desire for the file system sample transaction. Arrows on the left hand side to efficient and for the file system to be usable in an trace pointers from the EOT to the most current environment of heterogeneous operating systems, version of each file. Arrows on the right hand side hardware vendors and administrative policies. A trace pointers to previous versions. secondary goal WiUi to provide substantial amounts of redundant information in the filesystem so that information could be easily - even automatically second transaction, the implementation reads the - recovered from damal1:ed disks. first and the l""t block. on the disk. A new vee"ion of the file life.c is written to the disk and the The design of the CnFS file header clearly Transaction is ended, which causes a new root illustrates how our concerns translated themselves directory, Directory Li.t .. ud EOT to be written. into the file system structures. CDFS file headers After the second transaction a schematic view of average over 240 bytes in length - substantially the optical media would looks like figure 2. larger than the equivalent structures in other operation systems (for example, UNIX inodes), but While no blocks on the disk have been still very small when compared to the average size changed, the last EOT commences a chain of of files on an average computer or when compared pointers which points now to the newest version of to the amount of space available on optical media. each file, because the definition of "the last EOT" CDFS uses the space to store on a per file basis has changed. information which traditional operating systems store once per magnetic disk. (The specification for The blocks that were written on the second the CnFS file header is given in appendix A.6). transaction include: For example, the CDFS file header includes 7 The first file written on the second transaction the full user name of the person who created the is the updated v"r.ion of life.c. Th" new fil" file (rather than merely storing the user's header of li/e.e contains a pointer to the "number," as the does, or storing previous version (shown on the right). The nothing at all, as the MSDOS and Macintosh file new veT.ion of life.c is thr

By explicitly aoaigning 0. version number And ~ Early in the CDFS development effort, we known length to each file system structure, the recognized clear advantages to using the same file CDFS specification allows the possibility for system standard for read-only and write-once extension to be made within the context of the optical storage devices. Beyond the ability to use existing standard. For example, files which are the write-once system as a mastering platform for actually links to other files are implemented as the read-only disks, using the same file system different version of the file_into 8tru«,;tun! in Lhe standard allows same software which was used to file header. create the read-only disk to retrieve the While data transfer rates to and from optical information. Another exciting possibility allowed disks are very high, head repositioning time is very by using the same file system for read-only media slow. CDFS is designed to minimize the number of as for write-once is that of using write-once media head rcpooitioning event. necessary to read as a publication format, to which a user can add information from a disk. new information (such as personal comments or updates) in a similar manner to the way a user can write in the margins of a book with a felt tip pen. 2.4 Advantages of CDFS Over Possibilities such as these led the Electronic Traditional File Systems Publishing Group to adopt CDFS as our standard for CDR.OMs in Fall HlRFi_

The principle advantage of using write-once optical After the Electronic Publishing Group made media over magnetic media (besides the increased its first CDROM, the author was contacted by Paul storage space), is the ability to recover any file or D. Kahn at Brown University's ffiIS group, document that was ever stored. Even if a file is requesting help in making the TLG CDROM updated by a later version or if it is "deleted" from mentioned above. The disk was pressed in spring its containing directory, it is in principle always 1986 and a variety of application programs were possible to find the original document. CDFS written to use it. The progra.ms accessed the realizes this possibility by providing specific information on the CDROM via. the CDFS interfaces for locating previous versions of files or subroutine library. During the summer of 1987, the for "undeleting" files after they have been deleted. author moved to the ffiIS project as a temporary systems programmer to write a program to allow Since blocks, once written, are never changed, the TLG CDROM to be read via the Sun the process of performing an incremental backup of Microsystems Network File System (NFS) protocol. a CDFS disk is simplified considerably over magnetic file systems: the disk's newly written The program, the Compact Disk File System blocks are merely copied, block-for-block, to the Daemon (CDFSD), allows a CDFS mastered disk WOFS Simson L. Garfinkel page 8 to be read (from any computer which uses the Sun The NFS system actually consists of two NFS protocol) as if it was a standard unix file protocol.: ami RFS (Rewute File system. Logically, CDFSD incorporates the CDFS System). The MOUNT protocol provides a means subroutine library into the operating system's for servers to identify to clients what filesystems h.et'nd, ollQwing the 3ubu:.u.t.ille'.l'lo 1,(., h~ cli'moved are candidates tor file sha.ring. MU U NT also a.lIow6 from user level programs. servers to monitor clients which are using which shared file systems, principally to allow notification. to be sent to ~li"llt. when servers are 3.1 NFS about to be withdrawn from service. The MOUNT protocol's only interface with the actual file system i. tu return upon request to the client a thandle to NFS is a layered system which allows one computer the root directory of any given file system. to access files on a remote computer M if they were mounted loc...:uy. NFS i. b""oo UpUll Sun RFS i. the protocol In which performs the Microsystem's Remote Procedure Call (RPC) actual mapping from operating system service calls library, a system which allows one computer to on the client to RPC calls which are executed on execute fun~tion. UlI

was required to writ. xoa since the computer which their early product. were ba.eed upon, the Motorola 68010, was • low-byte first byte ord.red machin... wbili. nth .. m._ ::1.2 Compact Disk File System Daemon chines owned by Sun'. targeted customers were high-byte first ordered machines. The Compact Disk File System Daemon (CDFSD) CDFS (alld WOFS) Implements Its own byte-order indepen­ is a user level program which implements the server dent external data repres.ntation standard which, while not as comprehensive as Sun MiclOSystems, is marginally faster side of both the MOUNT and RFS protocols. in execution. COFS adopted VAX byte-oMerin.. WOFS CDFSD accepts and responds to RPe requests changed the byte-ordering to 68000 byte-ordering. When using Sun Microsystem's svc.registerO, CDFS mounts a disk, il examines Ihe byte-ordering used on a VC.rull and BVcsendreplyO+ RPC library the disk and can reverse its byte-ordering on the fly. °, 11n pra.ctice, the NFS server is written into the "erver'& op­ functions. Since no provisions are made in their the erating system to provide direct access to file system structures RPC or the NFS protocols for multiple server (in particular, unix inodes) a.nd to improve performance. processes on a single host (for example, by WOFS Simson L. (iarfin k~1 page 9

assigning multiple program numbers to NFS servers contiguously, this procedure merely counts or allowing NFS to uoe arbitrary UDP port from the start of the file the requested offset numbers), CDFSD cannot coexist on a machine and then copies data directly from the which is also a unix-filesystem NFS server. CDROM to the RPC reply buffer.

Unlike a unix fhandle, which consists of a file RFS..READDIR This procedure is used by NFS system number and an inode number, the CDFSD to read the contents of directories in a fh ..ndl. comi.t. of the following information: operating system independent ma.nner. This procedure necessarily performs the translation between the CDFS and unix directory • "h~ iiI" number. structure directories, which NFS u~ea. • The containing directory number. While UNIX directories contain only file names and inode numbers, CDFS directories • The drive number. contain additional information (such as file size and last modification date). This additional • The file header location. information was intentionally placed in the directory structure to speed extended directory Although the CDFSD file number and list commands (e.g. the MSDOS "DIR" containing directory number can be derived from command or the UNIX "Is -1" command) by the file header location, incorporating this eliminating the need for the optical disk to information into the fhandla eliminates the seek to each file header when a directory list necessity of having to read the referenced file's file command is executed. Unfortunately, since header before the RPC call can be serviced. The NFS is basically a translation of the UNIX file overhead of including this information is negligible system to RPC subroutines, this additional to the server and zero to the cli~nt. information is not useful to CDFSD. RFS.,STATFS This procedure returns CDFSD implements the following RFS information about the CDFS file system. procedure calls:

RFS.NULL This procedure does nothing and is 3.3 Caching and Performance used for timing. CDROM performance (and performance of optical RFS_GETATTR This procedure returns .toragp .ystl'm. in gl'nA.al) d"p"nds on two fa~tors: attributes for a file or directory. It is the RPC how long it takes for the drive's read head to implementation of the unix statO and fstatO traverse the surface of the media to the correct calls. The CDFSD implementation necessarily blocle on the ,Ii_le (_.,,,k tim,,) and how long it tale"" performs translation between CDFS file to transfer data from the disk to the host computer headers and unix stat buffers (which are used (transfer time). CDFSD employs two forms of by NFS). caching to improve subjective performance: block RFS..LOOKUP This procedure performs the caching for blocks containing file system structures translation between file names and fhandlas. and read-ahead caching for data transfers. It j. th~ unly RFS procedure that creates fhandles. All CDFSD procedures call the function cd.raad_ to transfer data from the CDROM to the RFS.READ This procedure reads a requested computer. The third parameter of the cd..r.... d_ number of bytes in a file from a given starting function is an integer flag which is used to location. As CDFS files are stored differentiate reads of file system blocks from reads WOFS .'lim .• on L. Garfinkel page 10 of user data. cd..read_ maintains a 300 block the accumulated knowledge over the previous two associative cache for all read op"ratiuu. of file years. These changes substantially reduced the system structures. This cache dramatically complexity of the file system implementation. In improves performance of file system operations recognition of the substantial changes, the file such"" "open file," ami "read directory," by system was renamed the Write Once File System. changing them from disk· to-memory transactions to memory-to-memory transactions. This cache is invi.ible to the procedure calling cd..r8ad_ . 4.1 WOFS changes to the CDFS standard When cd..read_ is forced to read a block of datil from the disk, It reads several blocks at a time The principle difference between the WOFS and into a read ahead cache. If successive read requests are for a block in this read ahead cache, ed..read_ the CD FS is the adoption ofa new kind of file system structure called a "file system block," 0.1"". not have to perform another physical read. (fsblock for short). An fsblock is a 16-byte header, This cache dramatically improves performance of reading user data from files, since the time required containing a flag, a self-referential pointer, a type identifier and a chain count (8ee a.ppendix B.2) to read two blocks from the COROM is negligibly more than the time required to read one block. which identifies a block on the optical media. as a This cache is also invisible to the procedure calling block in which file system structures a.re stored. By cd..read_ . moving all COFS self-referential pointers to a new layer procedure layer between the hardware device In comparison between the CDFSD and the driver and the CDFS subroutine library, CDFS was • tandard NFS daemon operating from a magnetic greatly simplified. Other changes made to the file system, we find that many operations involving standard include: small files are marginally faster with the CDROM (definitely a result of the block caching) and • Giving explicit lengths to all variable length sustained read operations involving large files are file system structures. marginally slower with the CDROM (owing to the slower transfer rate from CDROM to computer, • Inclusion of the directory name into each when compared with magnetic.) directory list element of the directory list to allow resoltltion of path names without requiring each directory's file header and 4 The Write Once File System directory contents to be loaded. • Lengthening of the maximum file name size from 48 bytes to 80. When MIT received its first shipment of optical drives and media in Fall 1986, the author • The adoption of a new file type-Log discovered a serious flaw in the design of the CDFS files-which allows low-overhead a.ppend-only implementation: no provision was made for failed logs to be maintained. writos on the optical mediA. Althuugh the CDFS standard was insensitive to this oversight, the MIT • The implementation of fragmented files. Media Laboratory's implementation depended on disk write operatioJUl being perfectly reliable. 5 Related Work During the summer of 1987 the CDFS implementation was completely redesigned to allow for failed media writes. At the same time, the Other proposal for the use of write-once media (e.g. CDFS standard was substantially revised based on Easton 1985) assumed that write-once hardware WOFS Simson L. Garfinkel page 11 would have the ability to invalidate information IBM Research Laboratory, San Jose, California, previously recorded on the surface or to fill in fields July 1985. previously left empty. Time has shown these assumptions to be largely incorrect, due to the Dynamic Linking in a Small Address Space, nature of the error correction codes employed in M. L. Kazar, S. B. TheSis, Uepartment of Electrical the low level driver hard ware. Engineering and Computer Science, MIT, May 1978. Nearly all other proposals for the use of write-once media involve storing directory "A distributed file service based on optimistic information on some form of rewritable magnetic concurrency control," S. Mullender and storage system. The Amoeba file server [Mullender A. Tanenbaum, Proceedings 0/ the A eM and Tanenbaum, 1985] is an example systems of Symposium on Operating System Principles, 15-62, this type. December 1985. The paper that describes the Amoeba file server. A significant excpetion is the recent work on log files at Stanford University [Cheriton and "A Fast File System for Unix," Finlayson, 1986]. Like the write once file system, M. K. McKusick, W. N. Joy,S. J. Lerner and the Cheriton and Finlayson's log files exploit the R. S. Fabry, Berkeley Unix documentation, version characteristics of write once devices rather than 4.2, 1983. attempting to hid them from the user. The service they describes "provides efficient storage and The Multics System; An Ezamination of its retrieval of data that is written sequentially Structure, Elliott I. Organick, MIT Press, 1972, (append-only) and not subsequently modified." pag.,. 217-234. This paper was the inspiration behind the WOFS log file facility. "Compact Disc Digital Audio Systems," David Ranada, Compute" and Electronic., Augu.t 1983.

"Networking on the Sun Workstation," Sun 6 References Microsystems, Mountain View, CA 94043. 1985

"Working Paper for Information Processing: Volume and File Structure of CD-ROM for Information Interchange," prepared as a working paper of the CD"ROM Ad Hoe Advisory Committee, popularly known as the High Sierra Group, Optical Information Systems, January-February 1987.

Log Files: An Eztended File Service Exploiting Write-Once Optical Vi.", D ..vid R. Rherltoll MId Ross S. Finlayson, Stanford University, 1986.

"Thermo-Magneto-Optical Disk P ramise. High-Capacity, Low-Cost Removable Storage," Digital Design, August 1985.

key-Sequence Data Sets on Indelible Storage, M. C. Easton, Technical Report RJ 4776 (50637), WOFS Simson 1. Carfink,,] page 12

A CDFS Specification A.2 Directory List structures

A.3 Directory List Header format

.define DL_ID_STRI'G_LEI 8 A.1 End Of Transaction (EOT) format #define DL_ID_STRIIG "\237\OOlCDFS\250\OOO" #dotin. DIR_LIST_VERSIOI 1

typede! atruct { char id_.tring[ DL_ID_STRIIG LE. 1 :

typedet atruct { int1S dir_list_veraion; int32 modulo_of_value; int16 dir_liat_header_lengthj in"t16 bi'te_in... vo.lloWi cdblock d1r_llst_lOc; intlS pad; } pointerdet; intiS dir_list_checkaum; i:nt11 pAdJ ed.tine EOT_ID_STRIIG_LE' e #dotine EOT_ID_STRIIG "\237\002\CDFS\25S\OOO" cdblock prev_dir.list; edetine EOT_VERSIO' 1 .define CDFS_IMPLEMEITATIO._ID 1 int32 dir_list_entry.count; } dir_lilt; typedef struct { cher id_atringC EOT.ID.STRIIG.LEI 1; A.4 Directory List Entry format intlS eot_version; intlS eot_lonsth; typedef .truet { cdblock eot_location; int32 dir.number; cdblock header_location; int1S eQt_checkswa; int32 containing.dir; int1S COFS_implementation-id; int64 modify. time; int64 contained.bytes; cdblQck current_41r_lilt; intiS header_size; cdblock previous.eot.location; int1S pad; cdblock next_eot_location; } lilt.element;

int64 tilesyste•• creation_time; int32 trana_number; A.1i Directory Format int64 trana_"tart_tim.; int64 trana_end_Ume; int32 fil.a_writton_on-tran.; _define OIRECTORY.lIFO.vtRSrOI. 1 int32 dirl_writt.n-on_tran.; typedet atruct { lnt32 next_fr••• fl1e.number; int32 directory.info.version; int32 directory_info.length; pointndet pointerdeaC1S1; int32 directory.entri •• ; intiS number.of.used.pointerdeta; int32 d1rectory.entry_size; char encryption.standard[32]; } directory_into; char ovners_name[ variable 1; #define MAX.COMP.LEI 48 } eoLfomat; typodef atruct { char tile.name[ MAX_COMP_LEI ]; WOFS Sjmson L. Ga.rfinkel pa.ge 13

cdblock h.ader_location; char tila_group(altOUPLEI]; int64 DlQ4ify_t.J. ..,,; 1nne %11e_access; int32 file_number; } access_info;

int~'2 ""1e_vil!!ll"sion; .define DOW'_DIR_CHAR 0376 int16 file_type; 'define UP_DIR_CHAR 0375 int16 header_size; intiS ac1dname_ count; 'detine BACKUP_llFO_VERSIOI 1 int1S pad; typedef struet { } dir_contenta; intiS backup_into_version; int1S backup_info_length;

int32 containins-directory_number; A.6 File Header Format cdblock previous_version_location; cdblock preYioua_eot_1Qcatlun; adefine FILE_TYPE 1 int16 filename_off.et; #define DIRECTORY_TYPE 2 intiS previouI_verlioD_header_Biz.; adefine SOFT_LIIK_TYPE 3 char backup_pathname r variable J; #detine FRAGMElTED_TYPE 4 } backup_info: #define ADOIAME_TYPE 6

#detlne PH_ID_STRIIG_LEI 8 _define FILE_IlFO_VERSIOr 1 #detine PH_ID_STatlG "\237\001\CDFS\255\OOO" typedef struct { int16 file_info_version; .d.~in. HEADER_VERSIOI 1 int16 tile_info_lengt.lLj typed.f struet { char id_string( FH_ID_STltIIG_LEI ]; edblock tile_location; int.1A h ••d.r_v.r.ioft; int32 file letlj!;th; int16 header_length; int64 write_tille; int16 header_checksum; int64 ereatioll.. till.; int1S til.header_length; int32 tile_veroion_number; } file_info; cdblock tileheader_location;

int32 tile_number; /* If block is a soft link, ule lott_link_into intiS file_type; • to decode file_info */

int16 A~e ••• _info_ot1 ••tj 'define SOFT_LIIK_VERSIOI 1 int1S backup_info_offset; typed.f atruct { int1t1 file_into_offset; int16 ooft_link_into_veraion; int1S site_into_offset; int16 aott_link_info_length; inti6 property_list_offset; int64 creation_time; } tileheader; int32 target_dir; int32 tarset_ver.iQn; #define ACCESS_IlFO_VERSIOr 1 char target_namer variable J: adetine GltOUPLEI 32 } 8ott_link_info; adefine OWrERLEr 32 #define SITE_IlrO_VERSIOI 1 typedef struct { typedet struct { int16 access_info_version; intI8 site_into_version; inti6 access_info_length; int16 aite_info_length;

char file_owner(OWlERLEI); char opsys [16]; WOFS Simson L. Garfinkel page 14

char opsys_version[16] ; char file_namer MAX_CaMP_LEI ]; (;har site_nameC variable J; cdblock header_location; } site_intoi intS4 modify_tim.; int32 file_number; #d.fin. PROPERTY_LIST_VERSIO. 1 int32 tile_Bize; typ.def struct { int32 file_verlion; int32 property_list_version; int16 file_type; intiS property_list_length; int.tA h4!la.d.r_lIi~.; intiS addname_count; int1S pad; } dir_contonts; typede! struct { int18 prop.rty_na•• _l.n; B WOFS Specification int18 property_value_len; char property_nam.[ variable ]; char property_valuer variable ]; R.1 Time and wblock structures } property_list_record;

typedef struct { u_long high; A.7 Fragmented files file map u_long low; } int64; #d.tine STRIP_I8FO_VERSI08 1 typedef 8truCt { typedef int64 wtime; int32 strip_in1o_verlion; int32 Btrip_info_l.ngth: typedef struct { int32 strip_count; u_long block: } strip_info; u_short offset; u_short pad; typed.t struct { } wblock; cdblock loco int32 valid_(;hara; int32 ordinal; } fragmented_des; B.2 File System Block

'define FS_BLOCK_FLAG "\237\200\005\000"

A.S Directory Format .define FS_EOT_TYPE 1 'define FS_DL_TYPE 2 #define DIRECTORY_lIFO_VERSIOI_ 1 'define FS_FH_TYPE 3 typedof Itruct { 'define FS_DIR_TYPE 4 int32 directory_into_version; int32 directory_info_length; 'define FS_HEADER_SIZE (4 + sizeof(wblock) + 2 • sizeof(u_short» int32 directory_entries; int32 directory_ontry_size; 'define FS_DATA_SIZE \ } direetory_in~o: BLOCK_SIZE - FS_HEADER_SIZE #define MAX_COMP_LEI 48 typed,: struct { typedef struct { WOFS Si"J>uu L. Garfillkel page 15

char flag[4]; B.5 Directory List wblock loc; 1* self *1 u.short type; 'define DIR.LIST.VERSION 3 u_short chain.count: 'dttl!m. MAX.COMP .SIZE 80 char data[ FS.OATA.SIZE ]; } tS.block; typedef struct { u.Iong nag; u.short dir.Iist.version: u.short dir.list.header.length; B.3 Flags and other definitions

#define EDT.FLAG 1 u.long d1r.list.entry.count; #define DL.FLAG 2 } dir.list.info; /* array follows *1 #ddine OI.FLAG 3 #define FH.FLAG 4 typedef struct { u.long dir.number; wblock header.loc; B.4 End of Transaction u.long containing.dir; wtime modify. time; #define EDT. VERSION 2 int64 contained.bytes; typedef struct { u.short header.size; u.short pad; u.long flag; char name[MAX.COMP.SIZE]: u.short eot.version; } dir.list.element; u.short eot.length;

wblock current.dir.list; u.long prev.eot.loc; B.6 Directory format u.long next.eot.loc;

wtime filesyste~creation.time; /* • A dir is a normal file in which u.long trans.number; * the contents consist of a wtime trans_start. time; * dir.info header followed by a an wtime t~ana.end.time; * array of dir.contents. The * length stored in the file. info u.long files.written.on.trans; * structure of the file.header is u.long dlrs.WTltten.on.trans; * equal to sizeof(dir.info) + u.long next.tree.file.number: * dir.entries * dir.entry.size; *, cnar volume.name[32]; char encryption.standard[32]; char owners.name 'define DIR_INFO.VERSION 2 [FS.DATA.SIZE·128]: typedef struct { u.long flag; u_long dir.info.version: WOFS Simson L. Garfinkp.l p.g.. 16

u.short site.info.offset; u.short Slte.lnto.length; u.long dir.entrie.; u.short property.info.offset; u.long dir.entry.size; u.short property.info.length; } dir.info; } fileheader;

'detine UNIX.ACCESS.INFO.VERSION 1 'define NAME. LEN 32 typedef struct { typedef struct { Ch4X" file.name[KAX.COHP.SIZE]: u.10ng access.info.version: llblock header.loc: lltime modify. time: char file.ovner[ NAME. LEN ]: u.long fUe.nUJllb"r; char tile.group[ NAME. LEN J; u.long file.length; u.short unix.access; u.short file. type; } access. info ; u."hort header.size; u.short addname.count: u.short pad; /* backup info defines *1 } dir_contenta; 'define BACKUP. INFO. VERSION 2 typedef struct { u.long backup. info. version; u.long containing.dir.number; B.7 File Headers wblock prev.version.loc; u.10ng prav.eot.loc; u.short prev~version.header.size; /* Define tile types */ char filaname[MAX.COMP.SIZE]; } backup.info; Idefine NOTHING. TYPE o .define FILE. TYPE 1 'defina CONTIG.FILE.INFO.VERSION 1 .define DIR.TYPE 2 'define FRAG.FILE.INFO.VERSION 2 #define LINK. TYPE 3 t·ypedaf struct { 'define HEADER. VERSION 2 u.10ng fila.info.varsion; typedef struct { u.long flag; wblock contents; u.short header. version; u.10ng byte.count; u.short header.length: wtime write. time; wtima creation. time; u.long file.nuaber; u.long file.version.number; u.short file.type: } file. info ; typdef file.info frag.file.into; u.short access.info.offset; u.short access.info.length: 1* In a fragmanted fila: u.short backup.info.offset; * contants . 1oc of frag.des array. u.short backup.info.length; * byte.count • size of the array. u.short file.info.offset; *1 u.short file.info.length; WOFS Simson L. Garfinkel page 11 typedef struct { wblock loc; 'define SITE.INFO.VERSION 1 u.long valid.chars; typedef struct { u.long ordinal; u.long site.info.version; } frag.des; char opsys[ 16 ]; #define LOG.FILE.INFO.VERSION 3 char opsys.v[ 16 ]; typedef struct { char s1 te.name [ 64 ]; u.long file.info.version; } site.info :

"block entry; u.long entry.bytes; 'define PROPERTY.LIST_VERSION i "time entry.time; typedef struct { "block prev.header; u_long property_list_version; u.long total.bytes; u.long property.11st.length; wtime creation.time; u.long property.list.entries; } log.file.info; } prnpArty.list.info;

'define variable 1 1* If file.header is a soft link, typodef struct { * use sOft.link.info. u.short property.name.len; * to decode the file. info *1 u.short property.value.len; eh~r property.namo[ variable ]; 1* As denoted in symbolic links *1 char property.value[ variable ]; .define DOWN. OrR. CHAR 037~ } property.list.record; 'define IlP.DIR.CHAR 0375

'define SOFT.LINK.VERSION 3 typAdof stru~t { u.long soft.l1nk.info.version;

int64 creation. time; u.long target.dir; u.long target. version; char tarset.nameC 256 ]: } soft.link.info ;

'define FIRM.LINK.VERSION 4 typedef struct { u.long firm.link.info.varsion;

int64 creation.time; u.long target.dir; u.long target.version; u.long target.filenumber; } firrn.link.info ;