The LOCUS Distributed Operating System

Bruce Walker, Gerald Popek, Robert English, Charles Kline and Greg Thiel

University of California at Los Angeles 1983

Presented By – Quan(Cary) Zhang LOCUS is not just a paper… History on Distributed System

 Roughly speaking, we can divide the history of modern compung into the following eras:  1970s: Timesharing (1 computer with many users)  1980s: Personal compung (1 computer per user)  1990s: Parallel compung (many computers per user) -Andrew S. Tanenbaum(in Amoeba Project) LOCUS Distributed Operating System

 Network Transparency  Looks like one system, though each site runs a copy of kernel ☼ Distributed File system with name transparency and replication  Remote process creation & Remote IPC  Can be across heterogeneous CPU Overview

 Distributed File System  Flexible and automatic replication  Nested transaction  Remote Processes  Recovery from conflicts  Dynamic Reconfiguration Distributed File System

 Naming similar to : single directory tree  Transparent Naming: not mapping name to the location  Transparent Replication:  Glue filegroups to the directory tree Distributed File System(Cont.) Distributed File System

 Replication  Availability, Yet complicate the update  Directory entries stored `nearby’ the site the child file store  Performance  High level of the naming tree should be highly replicated, vice versa  Essential to the Environment Set up files (Q) Mechanism Support for replication

 Physical containers store subset of filegroup’s files  Copies of a file samely resolves to  Version vector for the complicated update File System: Operation

 All operations keep intact to the unix system call  Access  Using Site (US)  Storage Site (SS)  Current Synchronization Site (CSS)

US

US … … File System: Operation(cont.) File System: Operation(cont.)

 Open/Read

SS CSS Read Sharing File System: Operation(cont.)

 Close  If it is the last close, because the US can open the file for several mes. Name resolution

 Search for the pathname iteratively starting from working directory or root  Finds a at the end of each search that can match the pathname  Possible optimization: No synchronization on CSS  Because directory never sees an inconsistent picture ( directory is just a pointer )  Instead of iterating through a tree which is remotely located, try to use migration File System: Operation (Contd…)

 Modification  Modifications are written to new pages/old pages, followed by atomic commit to SS, and close  Commit & Abort( use undo log )  One copy of file ( shadow page ) is updated and committed  Notify to the CSS( change the version vector ) and to SS  Updated file propagation - “Pulling” by other SS, and also use the commit mechanism  Creation  Storage locations/Copy number for new file determined at create time.  Attempts to use same storage sites as parent directory/ local site  Remote sites – inode allocated by physical container decided by filegroup File System: Operation (Contd…)

 Machine Dependent File  Different Versions of the same file (Process Context based)  Remote device and IPC pipe(distributed memory) Remote Processes

 Supports remote fork and exec (a special module)  Copies entire process memory to destination machine: can be slow  run system call performs the equivalent to local fork and remote exec  Shared data protected using token passing (e.g. file descriptors, handle)  Child  Parent notified upon failures Recovery  “Partitioning will occur”  Strict Synchronization in a partition (independently/transaction)  Merging:  Directories and mailboxes are easy  Issue: File is both deleted and updated  Solution: Propagate changes/Delete, whichever is later  Name conflict: Rename and email  Automatic - CSS  Else, pass to filetype-specific recovery program  Else, mail owner by massage Dynamic Reconfiguration

 Principles for Reconfiguration  Delays (internal table reconstruction) should be negligible  Same version availability even upon failure - Consistency  Clean up when failure detected  Close files and sessions & issue error return values  Partition: Find fully connected component  Synchronize site tables  Polls all suspected-good machines then broadcast result  Merge: Forms large partitions - Centralized  Polls all sites asynchronously, merges partitions if found.  Finds new CSS for all filegroups and global tables.  Protocol Synchronization  Passive site periodically checks active site  Passive site can restart protocol Experiment

 Bla, bla, bla, yet with no essenal result Conclusion

 Transparent Distributed System Behavior with high performance “is feasible”

 “Performance Transparency”  Except remote fork/exec

 Not much experience with replicated storage

 Protocol for membership to a partition with high performance works My perspective on LOCUS

Not done:  Parallel Compung: because no thread concept currently  Process/Thread Migraon: Load balance  Security: SSH  Distributed File System VS File System Service in Distributed environment Discussion

 Why not simply store the Kernel, Shells etc on "local storage space"? – I think you are right, not all the resource/file should be distributed into the network of computers e.g. FITCI  Can the RPC techniques we discussed earlier be implemented in this framework and help? – Probably true: That RPC paper was in 1984, yet this paper was finished in 1983  The locaon transparency required/implemented in this work may incur imbalance of performance cost, is this a problem? Can centralized soluon help for this problem? – I think it is possible to add some policies to aain the load balance, e.g., set the physical container for the file group respecvely to the sites, yet there is real challenge in geng the informaon about the topology of the network.  Compare and contrast "The Mulkernel" that we discussed on last Systems Research group meeng and LOCUS ? Could we say LOCUS is a very early stage of a Mulkernel approach – I think the distributed system(Amoeba) itself is planning to design the mulkernel, so, the mulkernel concept is not new, nonetheless, the mulkernel we discussed last week can apply to one mechine with different execuon unit(ISA)  If the node responsible for a file is very busy and unable to handle network requests, then what happens to my request? – I don’t think this can happen, because the replicaon never stops.  It seems that distributed filesystems don't tend to be very popular in the real world. Instead, networked organizaons who need to make files available in mulple locaons tend to concentrate all storage on one server (or bank of servers), using something like NFS. Why has distributed storage, like in this paper, not become more popular? – Because of the scalability issue  As number of nodes and paroning events increase I think that the paper's approach of manual conflict resoluon won't scale!  What do they mean by the 'guess' provided for the incore inode number? Is this sent by the US or the CSS and what happens if the guess is incorrect? And why should guessing be necessary at all - shouldn't the CSS know exactly which logical file it needs to access from the SS  Is it possible to store a single file over mulple nodes, not in a replicated form, but in a striped form? That would mean that block A of a file is on PC 1, while block B of the file is on PC 2