AFS, a History and Survey 1980-2000 Derrick Brashear OpenAFS Project and Your , Inc. Timeline

• 1981: CMU “charge” • 1984: AFS 1 • 1985: AFS 2 • 1986: forked • 1987: AFS 3 • 1989: Transarc founded • 1991: I entered CMU • 1994: IBM buys out Transarc • 2000: OpenAFS Laying the groundwork

• 1980: CMU President Cyert creates committee to increase prominence • 1981 Oct: Cyert convened CMU Task Force for the Future of Computing • 1982 Feb: Preliminary report presages Andrew Project • 1982 Oct: 5 year joint venture agreement with IBM Hooking IBM

• Newell report used to attract interest of IBM Chief Scientist Lewis Branscomb. • IBM would loan about a dozen employees, and fund a similar sized group of CMU employees. • Cyert’s business background led to terms where IBM owned IP and would commercialize. Early personnel

• IBM - Michael Conner • CMU - Jim Morris (from Xerox PARC) • M. Satyanarayanan (CMU - first tech hire) • Alfred Spector (CMU faculty) - consultant • Rick Rashid • Jerome Saltzer, David Clark (MIT) - consultants Early discussions

• Original charge did not specify a filesystem • VICE: Vast Integrated Computing Environment • Coined by Jim Morris for a blob connecting file system clients • File System Team: Satya, Rashid, Spector, Dave McDonald, Dave Gifford Bootstrapping

Architecture design through early-mid • 1983. August 1983: ITC File System Subgroup • report • First real “cloud computing” proposal. September 1983: File System Goals and • Design • Months of debate on direction Design Goals

• Location Transparency (same path always) • User Mobility (use any computer) • Security • Performance • Scalability • Availability (avoid single points of failure) • Integrity (minimal risk of data loss) • Heterogeneity (use any kind of computer) Learning

• Some arguing on how to proceed • Throwaway prototype to gain experience • RPC, ACL mechanism, protection - Satya • Server - Mike West • Venus (client) - David Nichols AFS 1

• Whole file caching • Client outside the kernel (predated VFS) • TCP transport • () based server scaling • Polling for changes (Consistency Check) • File checkout allowed Moving on

• Became clear that the throwaway needed to be tossed. • Andrew Benchmark to profile for issues • New RPC system, reused ACL mech - Satya • New Cache Manager - Mike Kazar • Volume concept - Bob Sidebotham • New fileserver - Mike West AFS 2 Performance

• Cache Management (callbacks, LRU) • Name Resolution (fids) • Communication (RPC2) • Server architecture (LWP vs fork) • Server data storage (iopen) AFS2 Operability

• Volume, mountpoint concept for server- side location transparency • Quotas • Read-only replication • Volume-based backups • Authentication Groups Design tradeoffs

• No read-write replication • open()-close() rather than write() consistency • Advisory file locking AFS 2 Issues

• Whole file caching limited files to cache size • Scalability beyond 700 client nodes seemed unlikely • Self-administration of protection groups did not exist • Only a single administration domain was possible The “modern” era

• Different directions for research versus product: • Coda fork from AFS 2 late 1986 • Focused on issues from network unreliability • AFS 3 development began “Scale and Performance”

• Described AFS 2 • Most key features of modern AFS in place • 2008 SIGOPS Hall of Fame Award Changes for AFS 3

• New RPC system (first R, then Rx) • Cache manager in kernel • Chunked file caching • “Cellular Andrew” • Instant database replication (Ubik) • New authentication system (kaserver) Rx

• R: prototype, with XDR and rpcgen • Rx: R, extended. • Added: • Once-only semantic • Multiple call channels • Retained: • Bulk transfer • Parallel calls (rxmulti) Cellular Andrew

• Allowed separate domains of control for the first time. • Paper on the topic already called out the need for delegating permissions. Ubik

• Instant database replication • Distributed atomic commit semantics • Floating (elected) master • Relied on Rx semantics and security • Used under all AFS databases Kerberos AuthServer

• MIT Project Athena running concurrently • AuthServer rewritten to use Kerberos • DES derived keys available for Rx security use (FCrypt, not DES) Other features

• Time service - predated NTP • ACLs - mode bits not rich enough • Directory level security carried over since AFS 1 • Self-administered protection groups Intellectual Property

• IBM owned rights to AFS • Willing customers at AFS 2 time • Without documentation, IBM wouldn’t sell • Another path needed Transarc

• Founded 1989 by several Andrew Project members • IBM an initial investor • AFS 3.0 released • 1990 DARPA grant to document and improve product. Backups

• Added for AFS 3.0 • Online (snapshot) backups - exactly 1 • Volume based backups to tape • Per-backup host tape coordinator (butc) • Per-backup host command interpreter • Shared (Ubik) database added later (3.1) Divided focus

• Talk began of AFS 4.0 with missing features • AFS 3 releases also followed • 3.1 beta in November • 3.1 final in February 1991 • A migration path to 4.0 was planned • But most feature requests “will be in AFS 4.0” “AFS 4.0”

• OSF founded 1988 to counter Sun/AT&T • DCE to counter NIS/NFS • Transarc saw it as “the future” and joined • Ended up a political vessel • Some tech from each member Outside Research

• CITI (University of Michigan) • 1991 • Multilevel caching (intermediate AFS) • Hijacking AFS (Rx security) • 1992 • Faster AFS (larger chunks) • Intermediate fileservers (AFP, NFS, AFS) • AFS Write Performance Outside Research

• CITI (University of Michigan) • 1993 • Disconnected Operation for AFS • The Rx Hex (catching up to IP advances) • AFS Server Logging (RPC instrumentation) • 2001 • Improving AFS Performance via Selective Caching and Native ATM(Cache Bypass, Split Path) Contributed Development

• Aklog (MIT Athena) / Cklog (CMU CS) • MIT SIPB ports to NetBSD and (1994) • Nested PTS groups (MIT Athena, and U of Michigan) Comparing to peers

• AT&T RFS • NFSv2 • Coda • NFSv3 • DCE DFS • CIFS AT&T RFS

• Plus • Looked like a local filesystem - remote system calls. • Minus • Unix only • Requires STREAMS • No Cache (other than client buffer cache, with limited callbacks) NFSv2

• Plus • Free • Minus • No caching • Poor performance • Poor authentication • No management tools Coda

• Plus • Supports Read-Write replication • Supports disconnected operation • Minus • Not production quality • No cell support • Poor security integration NFSv3

• Plus • Unicode support • GSSAPI security • TCP support • Minus • Lacks single namespace • No native caching (not helped by stateless protocol) • Minimal Windows clients • No common management layer DCE DFS

• Plus • Tries to mimic local filesystem semantics • Per-file ACLs • Filesets for server data management • Minus • DCE adds much unnecessary complexity • Poor Windows support CIFS

• Plus • ACLs • Huge installed base • Open Source SAMBA available • OpLocks offer some caching advantages • Minus • recovery and migration is complex • Management layer is proprietary The IBM era

• 1994: IBM purchases Transarc • 1996: Windows NT client with 3.4a release • 1998: Revised pricing • First push for open source • 1999: 3.5 release. pthreads support, NT servers, namei fileserver, Linux port, gateway • 2000: 3.6 release, End of Support planned. • OpenAFS announced at LinuxWorld What went wrong

• Early instability • Wrong UNIX horse: Sun beat IBM DFS focus when 3.0 was out and barely • stable. • DCE RPC incompatible with Rx • Poor pricing models • NFS was often “free” • Missed chance to beat “the web” What went right

• AFS 3.0 incorporated many forward- looking ideas from the start. • The “Design Goals” made it useful to this day. • Ubiquity - on most non-mainframe OSes. • Code base mostly modernized. • Open sourced before it was too late. Thanks

Chronologically, • M. Satyanarayanan (and again last week) • Craig Everhart • Lyle Seaman • Mike Kazar • Jim Morris • Al Spector