File Systems and Storage

;login JUNE 2014 VOL. 39, NO. 3 : File Systems and Storage & Ceph as Used in the ~okeanos Cloud Filippos Giannakos, Vangelis Koukis, Constantinos Venetsanopoulos, and Nectarios Koziris & OpenStack Explained Mark Lamourine & HDFS Under HBase: Layering and Performance Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci- Dusseau, and Remzi H. Arpaci-Dusseau Columns Practical Perl Tools: Using 0MQ, Part 1 David N. Blank-Edelman Python: Asynchronous I/O David Beazley iVoyeur: Monitoring Design Patterns Dave Josephsen For Good Measure: Exploring with a Purpose Dan Geer and Jay Jacobs /dev/random: Clouds and the Future of Advertising Robert G. Ferrell The Saddest Moment James Mickens Conference Reports FAST ’14: 12th USENIX Conference on File and Storage Technologies 2014 USENIX Research in Linux File and Storage Technologies Summit UPCOMING EVENTS 2014 USENIX Federated Conferences Week FOCI ’14: 4th USENIX Workshop on Free and Open June 17–20, 2014, Philadelphia, PA, USA Communications on the Internet USENIX ATC ’14: 2014 USENIX Annual Technical August 18, 2014 Conference www.usenix.org/foci14 June 19–20, 2014 HotSec ’14: 2014 USENIX Summit on Hot Topics www.usenix.org/atc14 in Security ICAC ’14: 11th International Conference on August 19, 2014 Autonomic Computing www.usenix.org/hotsec14 June 18–20, 2014 HealthTech ’14: 2014 USENIX Summit on Health www.usenix.org/icac14 Information Technologies HotCloud ’14: 6th USENIX Workshop on Safety, Security, Privacy, and Interoperability Hot Topics in Cloud Computing of Health Information Technologies June 17–18, 2014 August 19, 2014 www.usenix.org/hotcloud14 www.usenix.org/healthtech14 HotStorage ’14: 6th USENIX Workshop WOOT ’14: 8th USENIX Workshop on Offensive on Hot Topics in Storage and File Systems Technologies June 17–18, 2014 August 19, 2014 www.usenix.org/hotstorage14 www.usenix.org/woot14 9th International Workshop on Feedback Computing OSDI ’14: 11th USENIX Symposium on Operating June 17, 2014 www.usenix.org/feedbackcomputing14 Systems Design and Implementation October 6–8, 2014, Broomfield, CO, USA WiAC ’14: 2014 USENIX Women in Advanced www.usenix.org/osdi14 Computing Summit Co-located with OSDI ’14 and taking place October 5, 2014 June 18, 2014 www.usenix.org/wiac14 Diversity ’14: 2014 Workshop on Supporting Diversity UCMS ’14: 2014 USENIX Configuration Management in Systems Research www.usenix.org/diversity14 Summit June 19, 2014 HotDep ’14: 10th Workshop on Hot Topics in www.usenix.org/ucms14 Dependable Systems URES ’14: 2014 USENIX Release Engineering Summit www.usenix.org/hotdep14 Submissions due: July 10, 2014 June 20, 2014 www.usenix.org/ures14 HotPower ’14: 6th Workshop on Power-Aware Computing and Systems 23rd USENIX Security Symposium www.usenix.org/hotpower14 August 20–22, 2014, San Diego, CA, USA INFLOW ’14: 2nd Workshop on Interactions of NVM/ www.usenix.org/sec14 Flash with Operating Systems and Workloads Workshops Co-located with USENIX Security ’14 www.usenix.org/inflow14 EVT/WOTE ’14: 2014 Electronic Voting Technology Submissions due: July 1, 2014 Workshop/Workshop on Trustworthy Elections TRIOS ’14: 2014 Conference on Timely Results in August 18–19, 2014 Operating Systems www.usenix.org/evtwote14 www.usenix.org/trios14 USENIX Journal of Election Technology and Systems (JETS) LISA ’14 Published in conjunction with EVT/WOTE November 9–14, 2014, Seattle, WA, USA www.usenix.org/jets www.usenix.org/lisa14 CSET ’14: 7th Workshop on Cyber Security Co-located with LISA ’14: Experimentation and Test SESA ’14: 2014 USENIX Summit for Educators August 18, 2014 in System Administration www.usenix.org/cset14 November 11, 2014 3GSE ’14: 2014 USENIX Summit on Gaming, Games, and Gamification in Security Education August 18, 2014 www.usenix.org/3gse14 twitter.com/usenix www.usenix.org/youtube www.usenix.org/gplus Stay Connected... www.usenix.org/facebook www.usenix.org/linkedin www.usenix.org/blog JUNE 2014 VOL. 39, NO. 3 EDITOR Rik Farrow [email protected] OPINION COPY EDITORS Steve Gilmartin 2 Musings Rik Farrow Amber Ankerholz PRODUCTION MANAGER FILE SYSTEMS AND STORAGE Michele Nelson ~okeanos: Large-Scale Cloud Service Using Ceph 6 PRODUCTION Filippos Giannakos, Vangelis Koukis, Constantinos Venetsanopoulos, Arnold Gatilao and Nectarios Koziris Casey Henderson 10 Analysis of HDFS under HBase: A Facebook Messages TYPESETTER Star Type Case Study Tyler Harter, Dhruba Borthakur, Siying Dong, [email protected] Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau USENIX ASSOCIATION 2560 Ninth Street, Suite 215 Berkeley, California 94710 CLOUD Phone: (510) 528-8649 18 OpenStack Mark Lamourine FAX: (510) 548-5738 www.usenix.org TROUBLESHOOTING ;login: is the official magazine of the USENIX 22 When the Stars Align, and When They Don’t Tim Hockin Association. ;login: (ISSN 1044-6397) is published bi-monthly by the USENIX Association, 2560 Ninth Street, Suite 215, SYSADMIN Berkeley, CA 94710. 24 Solving Rsyslog Performance Issues David Lang $90 of each member’s annual dues is for a /var/log/manager: Rock Stars and Shift Schedules Andy Seely subscription to ;login:. Subscriptions for non- 30 members are $90 per year. Periodicals postage paid at Berkeley, CA, and additional offices. COLUMNS POSTMASTER: Send address changes to 32 Practical Perl Tools: My Hero, Zero (Part 1) ;login:, USENIX Association, 2560 Ninth Street, David N. Blank-Edelman Suite 215, Berkeley, CA 94710. 38 Python Gets an Event Loop (Again) David Beazley ©2014 USENIX Association USENIX is a registered trademark of the 42 iVoyeur: Monitoring Design Patterns Dave Josephsen USENIX Association. Many of the designations Dan Geer and Jay Jacobs used by manufacturers and sellers to 47 Exploring with a Purpose distinguish their products are claimed 50 /dev/random Robert G. Ferrell as trademarks. USENIX acknowledges all trademarks herein. Where those designations 52 The Saddest Moment James Mickens appear in this publication and USENIX is aware of a trademark claim, the designations have BOOKS been printed in caps or initial caps. 56 2014 Elizabeth Zwicky USENIX NOTES 58 Notice of Annual Meeting 58 Results of the Election for the USENIX Board of Directors, 2014–2016 CONFERENCE REPORTS 59 FAST ’14: 12th USENIX Conference on File and Storage Technologies 76 Linux FAST Summit ’14 Musings RIK FARROWEDITORIAL Rik is the editor of ;login:. fter writing this column for more than 15 years, I’m a bit stuck with [email protected] what I should talk about. But I do have a couple of things on my mind: AKrste Asanović’s FAST ’14 keynote [1] and something Brendan Gregg wrote in his Systems Performance book [2]. Several years ago, I compared computer systems architecture to an assembly line in a fac- tory [3], where not having a part ready (some data) held up the entire assembly line. Brendan expressed this differently, in Table 2.2 of his book, where he compares the speed of a 3.3 GHz processor to other system components by using human timescales. If a single CPU cycle is represented by one second, instead of .3 nanoseconds, then fetching data from Level 1 cache takes three seconds, from Level 3 cache takes 43 seconds, and having to go to DRAM takes six minutes. Having to wait for a disk read takes months, and a fetch from a remote datacen- ter can take years. It’s amazing that anything gets done—but then you remember the scaling of 3.3 billion to one. Events occurring in less than a few tens of milliseconds apart appear simultaneous to us humans. Krste Asanović spoke about the ASPIRE Lab, where they are examining the shift from the performance increases we’ve seen in silicon to a post-scaling world, so we need to consider the entire hardware and software stack. You can read the summary of his talk in this issue of ;login: or go online and watch the video of his presentation. A lot of what Asanović talked about seemed familiar to me, because I had heard some of these ideas from the UCB Par Lab, in a paper in 2009 [4]. Some things were new, such as using photonic switching and message passing (read David Blank-Edelman’s column) instead of the typical CPU bus interconnects. Asanović also suggested that data be encrypted until it reached the core where it would be processed, an idea I had after hearing that Par Lab paper, and one that I’m glad someone else will actually do something about. Still other things remained the same, such as having many homogeneous cores for the bulk of data processing, with a handful of specialized cores for things like vector processing. Asanović, in the ques- tion and answer that followed his speech, said that only .01% of computing requires specialized software and hardware. Having properly anticipated the need for encryption of data while at rest and even while being transferred by the photonic message passing system, I thought I’d take a shot at imagining the rest of Warehouse Scale Computing (WSC), but without borrowing from the ASPIRE FireBox design too much. The Need for Speed First off, you need to keep those swift cores happy, which means that data must always be ready nearby. That’s a tall order, and one that hardware designers have been aware of for many years. Photonic switching at one terabit per second certainly sounds nice, and it’s hard to imagine something that beats a design that already seems like science fiction based on the name. For my design, I will simply specify a message-passing network that connects all cores and their local caches to the wider world beyond. Like the FireBox design, there will be no Level 1 cache coherency or shared L1 caches. If a module running on one core wants to share 2 JUNE 2014 VOL. 39, NO. 3 www.usenix.org EDITORIAL Musings data with another, it will have to be done via message passing, My design has the outward appearance of a cube.

File Systems and Storage

NUMA-Aware Thread Migration for High Performance NVMM File Systems

Globalfs: a Strongly Consistent Multi-Site File System

Unravel Data Systems Version 4.5

Using the Andrew File System on *BSD [email protected], Bsdcan, 2006 Why Another Network Filesystem

Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO

Artificial Intelligence for Understanding Large and Complex

Characterizing, Modeling, and Benchmarking Rocksdb Key-Value

Myrocks in Mariadb

A Persistent Storage Model for Extreme Computing Shuangyang Yang Louisiana State University and Agricultural and Mechanical College

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

Dmon: Efficient Detection and Correction of Data Locality

Real-Time LSM-Trees for HTAP Workloads