2014 Newsletter

NEWSLETTER ON PDL ACTIVITIES AND EVENTS • SPRING 2 0 1 4 http://www.pdl.cmu.edu/ AN INFORMAL PUBLICATION Enhancing Metadata Efficiency in the Local File System with TABLEFS FROM ACADEMIA’S PREMIERE STORAGE SYSTEMS RESEARCH Kai Ren, Garth Gibson & Joan Digney CENTER DEVOTED TO ADVANCING Even in the era of big data, most things in many file systems are small. File sys- THE STATE OF THE ART IN tems for magnetic disks have long suffered low performance when accessing huge STORAGE AND INFORMATION collections of small files because of slow random disk seeks. Inevitably, scalable systems should expect the numbers of small files to achieve and exceed billions, INFRASTRUCTURES . but currently, effective scaling is not available for workloads that are dominated by metadata and tiny file access. Instead there has emerged a class of scalable small- CONTENTS data storage systems, commonly called key-value stores, which emphasize simple (NoSQL) interfaces and large in-memory caches. Table FS ....................................... 1 Some of these key-value stores feature high rates of change and efficient out-of- Director’s Letter .............................2 memory Log-structured Merge (LSM) tree structures. An LSM tree can provide Year in Review ...............................4 fast random updates, inserts and deletes without sacrificing lookup performance. Recent Publications ........................5 We believe that file systems should adopt LSM tree techniques used by modern key-value stores to represent metadata and tiny files, because LSM trees aggres- PDL News & Awards........................8 sively aggregate metadata. Moreover, today’s key-value store implementations are New PDL Faculty .......................... 10 “thin” enough to provide the performance levels required by file systems. In our Dissertations & Proposals ............... 14 experiments, we used a LevelDB key-value store to implement TableFS. TableFS uses modern key-value store techniques to pack small things (direc- PDL CONSORTIUM tory entries, inode attributes, small file data) into large on-disk files with the MEMBERS goal of suffering fewer seeks when seeks are un- Actifio (a) TableFS avoidable. It is a POSIX- American Power Corporation User Space FUSEFUS lib Metadatadataa Store compliant stacked file EMC Corporation Benchmark Process system that represents Facebook Large File StoreStorre LevelDBLevevelelD metadata and tiny files as Fusion-io key-value pairs using an- Google Kernel VFS other local file system as Hewlett-Packard Labs FUSEFUS Kernel Module Local File Systemtem an object store. TableFS Hitachi, Ltd. organizes all metadata Huawei Technologies Co. (b) Benchmark into a single sparse table Intel Corporation User Space Process backed on disk using a Microsoft Research Log-Structured Merge Kernel VFS NEC Laboratories (LSM) tree, LevelDB. By NetApp, Inc. Local File System using stacking, TableFS Oracle Corporation asks for efficient large Samsung Information Systems America Figure 1: (a) The architecture of TABLEFS. A FUSE kernel file allocation and access module redirects file system calls from a benchmark process Seagate Technology from the underlying local to TABLEFS, and TABLEFS stores objects into either LevelDB or file system. By using an Symantec Corporation a large file store. (b) When we benchmark a local file system, Western Digital there is no FUSE overhead to be paid. continued on page 11 FROM THE DIRECTOR’S CHAIR THE PDL PACKET Greg Ganger The Parallel Data Laboratory School of Computer Science Hello from fabulous Pittsburgh! Department of ECE It has been another great year for the Par- Carnegie Mellon University allel Data Lab. Some highlights include the return of database research, exciting 5000 Forbes Avenue new results on big pushes in cloud com- Pittsburgh, PA 15213-3891 puting and “Big Data” systems, several prestigious awards, and continuing growth VOICE 412•268•6716 in PDL-related Masters programs. Along the way, many students graduated and FAX 412•268•3010 joined PDL Consortium companies, new students joined the PDL, and many cool papers have been published. Let me highlight a few things. PUBLISHER I’ll start with Andy Pavlo. It’s been several years since we lost Natassa to Europe, Greg Ganger and I’m thrilled to see Andy bringing back both database systems focus and, amazingly, a similar level of energy. He’s a lot of fun and a great database systems researcher, and I (like others) am already enjoying working with him. You can EDITOR see his background in the new faculty write-up about him... and, if you haven’t Joan Digney seen him give a talk yet, you’re in for a treat when you do. The PDL Packet is published once per As I noted last year, it’s exhilarating to be a researcher whose topic-space is at the year to update members of the PDL core of a major growth area (and source of hype)... and PDL finds itself at the core Consortium. A pdf version resides in of two of them: cloud computing and Big Data. I just wish we had coined either the Publications section of the PDL Web term, since PDL was active in both areas long before the buzzwords arose. Oh well. pages and may be freely distributed. We continue to explore cool new systems approaches for supporting large-scale Contributions are welcome. machine learning (a primary component of Big Data analytics), expand Masters program activities in both areas, and lead cloud computing research of the 6-in- THE PDL LOGO stitution Intel Science and Technology Center for Cloud Computing (ISTC-CC). Skibo Castle and the lands that com- prise its estate are located in the Kyle of On the education front, we continue to expand our efforts to provide Masters Sutherland in the northeastern part of students with excellent foundations in storage systems, cloud technologies, and Scotland. Both ‘Skibo’ and ‘Sutherland’ Big Data systems. The storage systems class that Garth and I have taught for over 10 are names whose roots are from Old years had 100 students this year, and five excellent corporate guest lecturers (thank Norse, the language spoken by the you, PDL Consortium members!). We also created a new cloud computing class, Vikings who began washing ashore together with PDL alum Dr. Raja Sambasivan and Prof. Majd Sakr. Both classes reg ularly in the late ninth century. The serve several Masters programs, including the Masters program on data science word ‘Skibo’ fascinates etymologists, systems that Garth has developed. That latter trains students with strong practical who are unable to agree on its original skills in the creation and exploitation of systems for Big Data analytics, including meaning. All agree that ‘bo’ is the Old Norse for ‘land’ or ‘place,’ but they argue allowing 7-month internships to satisfy the program’s capstone project requirement whether ‘ski’ means ‘ships’ or ‘peace’ -- something in which many PDL companies may be interested in participating. or ‘fairy hill.’ Several of us continue to work closely with Carnegie Mellon’s excellent machine Although the earliest version of Skibo learning faculty to explore new systems for Big Data analytics. While the Map- seems to be lost in the mists of time, Reduce approach is good for very simple data processing tasks, it is a poor tool for it was most likely some kind of fortified many of the advanced machine learning techniques that give “Big Data” its great building erected by the Norsemen. The promise. The front-page article describes one of the new approaches we’ve been present-day castle was built by a bishop exploring, and a number of others are emerging from our active brainstorm- of the Roman Catholic Church. Andrew ing and exploration. Such cross-domain collaboration, which is a hallmark of Carnegie, after making his fortune, Carnegie Mellon and PDL, is critical to the success of data sciences in practice. bought it in 1898 to serve as his sum mer home. In 1980, his daughter, Mar garet, Experiences like these have underscored our long-held belief that no single donated Skibo to a trust that later sold programming system is going to serve the breadth of data analytics styles and the estate. It is presently being run as a activities. Combining such systems with the breadth of other cloud computing luxury hotel. activities, such as long-running services and others, leads to challenging resource scheduling challenges. For example, our Tetrisched project is developing new ways of allowing users to express their per-job resource type preferences (e.g., machine locality or hardware accelerators) and then exploring the trade-offs among them to maximize utility of the public and/or private cloud infrastruc- 2 THE PDL PACKET PARALLEL DATA LABORATORY FROM THE DIRECTOR’S CHAIR FACULTY ture. As another example, we are also exploring how to make storage and other Greg Ganger (pdl director) stateful services more elastic and agile, so that mixes of services and frameworks 412•268•1297 can more effectively share cloud resources. [email protected] Naturally, our long-standing focus on scalable storage continues strongly. A David Andersen Todd Mowry primary challenge is metadata scaling, and PDL researchers are exploring several Lujo Bauer Onur Mutlu approaches to dealing with scale along different dimensions. For example, huge Chuck Cranor Priya Narasimhan directories of files with structures names are sometimes used to organize huge Lorrie Cranor David O’Hallaron numbers of related files, and novel scalable directory structures like GIGA+ of- Christos Faloutsos Andy Pavlo fer an intriguing solution that simultaneously addresses scale in the number of Eugene Fink Majd Sakr directories as well. We are also exploring cool new approaches to exploiting log- Rajeev Gandhi based storage to accommodate high rates of metadata updates. Garth Gibson M. Satyanarayanan Seth Copen Goldstein Srinivasan Seshan We continue to explore ways of exploiting the exciting new underlying storage Mor Harchol-Balter Bruno Sinopoli technologies, such as NVM and Flash SSDs, to improve systems.

2014 Newsletter

DMFS - a Data Migration File System for Netbsd

An Incremental Path Towards a Safer OS Kernel

FS Design Around SMR Approved SNIA Tutorial © 2015 Storage Networking Industry Association

Advanced File Systems and ZFS

Ted Ts'o on Linux File Systems

Freebsd Enterprise Storage Polish BSD User Group Welcome 2020/02/11 Freebsd Enterprise Storage

Journaling File Systems

62 ABI. See Application Binary Inter

The Zettabyte File System

A BRIEF HISTORY of the BSD FAST FILE SYSTEM 9 June07login Press.Qxd:Login June 06 Volume 31 5/27/07 10:22 AM Page 10

Crash Consistency: FSCK and Journaling

Examining the Evolution of Urban Multipurpose Facilities