27Th Large Installation System Administration Conference (LISA '13)
Total Page:16
File Type:pdf, Size:1020Kb
conference proceedings Proceedings of the 27th Large Installation System Administration Conference 27th Large Installation System Administration Conference (LISA ’13) Washington, D.C., USA November 3–8, 2013 Washington, D.C., USA November 3–8, 2013 Sponsored by In cooperation with LOPSA Thanks to Our LISA ’13 Sponsors Thanks to Our USENIX and LISA SIG Supporters Gold Sponsors USENIX Patrons Google InfoSys Microsoft Research NetApp VMware USENIX Benefactors Akamai EMC Hewlett-Packard Linux Journal Linux Pro Magazine Puppet Labs Silver Sponsors USENIX and LISA SIG Partners Cambridge Computer Google USENIX Partners Bronze Sponsors Meraki Nutanix Media Sponsors and Industry Partners ACM Queue IEEE Security & Privacy LXer ADMIN IEEE Software No Starch Press CiSE InfoSec News O’Reilly Media Computer IT/Dev Connections Open Source Data Center Conference Distributed Management Task Force IT Professional (OSDC) (DMTF) Linux Foundation Server Fault Free Software Magazine Linux Journal The Data Center Journal HPCwire Linux Pro Magazine Userfriendly.org IEEE Pervasive © 2013 by The USENIX Association All Rights Reserved This volume is published as a collective work. Rights to individual papers remain with the author or the author’s employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. Permission is granted to print, primarily for one person’s exclusive use, a single copy of these Proceedings. USENIX acknowledges all trademarks herein. ISBN 978-1-931971-05-8 USENIX Association Proceedings of the 27th Large Installation System Administration Conference November 3–8, 2013 Washington, D.C. Conference Organizers Program Co-Chairs David Nalley, Apache Cloudstack Narayan Desai, Argonne National Laboratory Adele Shakal, Metacloud, Inc. Kent Skaar, VMware, Inc. Daniel J Walsh, Red Hat Program Committee George Wilson, Delphix Patrick Cable, MIT Lincoln Laboratory Charles Wimmer, VertiCloud Mike Ciavarella, Coffee Bean Software Pty Ltd Poster Session Coordinator Andrew Hume, AT&T Labs—Research Marc Chiarini, Harvard SEAS Paul Krizak, Qualcomm USENIX Board Liaisons Adam Leff, WebMD David N. Blank-Edelman, Northeastern University John Looney, Google, Inc. Carolyn Rowland, Twinight Enterprises Andrew Lusk, Amazon Inc. Steering Committee Chris McEniry, Sony Paul Anderson, University of Edinburgh Tim Nelson, Worcester Polytechnic Institute David N. Blank-Edelman, Northeastern University Marco Nicosa, Moraga Systems LLC Mark Burgess, CFEngine Adam Oliner, University of California, Berkeley Alva Couch, Tufts University Carolyn Rowland, Twinight Enterprises Anne Dickison, USENIX Association Dan Russel, TED Talks Æleen Frisch, Exponential Consulting Adele Shakal, Metacloud Doug Hughes, D. E. Shaw Research, LLC Avleen Vig, Etsy, Inc. William LeFebvre, CSE Invited Talks Coordinators Thomas A. Limoncelli, Stack Exchange Nicole Forsgren Velasquez, Utah State University Adam Moskowitz Cory Lueninghoener, Los Alamos National Laboratory Mario Obejas, Raytheon Lightning Talks Coordinator Carolyn Rowland, Twinight Enterprises Lee Damon, University of Washington Rudi van Drunen, Xlexit Technology, The Netherlands Workshops Coordinator Education Director Kyrre Begnum, Oslo and Akershus University College Daniel V. Klein, USENIX Association of Applied Sciences Tutorial Coordinator Guru Is In Coordinator Matt Simmons, Northeastern University Chris St. Pierre, Amplify LISA Lab Hack Space Coordinators Gurus Paul Krizak, Qualcomm Owen DeLong, Hurricane Electric Chris McEniry, Sony Stephen Frost, Resonate Adele Shakal, Metacloud, Inc. Thomas A. Limoncelli, Stack Exchange External Reviewers David N. Blank-Edelman and Thomas A. Limoncelli LISA ’13: 27th Large Installation System Administration Conference November 3–8, 2013 Washington, D.C. Message from the Program Co-Chairs . v Wednesday, November 6 Building Software Environments for Research Computing Clusters ...................................1 Mark Howison, Aaron Shen, and Andrew Loomis, Brown University Fixing On-call, or How to Sleep Through the Night .................................................7 Matt Provost, Weta Digital Thursday, November 7 Poncho: Enabling Smart Administration of Full Private Clouds .....................................17 Scott Devoid and Narayan Desai, Argonne National Laboratory; Lorin Hochstein, Nimbis Services Making Problem Diagnosis Work for Large-Scale, Production Storage Systems. .27 Michael P . Kasick and Priya Narasimhan, Carnegie Mellon University; Kevin Harms, Argonne National Laboratory dsync: Efficient Block-wise Synchronization of Multi-Gigabyte Binary Data. 45 Thomas Knauth and Christof Fetzer, Technische Universität Dresden HotSnap: A Hot Distributed Snapshot System For Virtual Machine Cluster ...........................59 Lei Cui, Bo Li, Yangyang Zhang, and Jianxin Li, Beihang University Supporting Undoability in Systems Operations ....................................................75 Ingo Weber and Hiroshi Wada, NICTA and University of New South Wales; Alan Fekete, NICTA and University of Sydney; Anna Liu and Len Bass, NICTA and University of New South Wales Back to the Future: Fault-tolerant Live Update with Time-traveling State Transfer .....................89 Cristiano Giuffrida, Călin Iorgulescu, Anton Kuijsten, and Andrew S. Tanenbaum, Vrije Universiteit, Amsterdam Live Upgrading Thousands of Servers from an Ancient Red Hat Distribution to 10 Year Newer Debian Based One ...........................................................................105 Marc Merlin, Google, Inc. Managing Smartphone Testbeds with SmartLab .................................................115 Georgios Larkou, Constantinos Costa, Panayiotis G . Andreou, Andreas Konstantinidis, and Demetrios Zeinalipour-Yazti, University of Cyprus YinzCam: Experiences with In-Venue Mobile Video and Replays ...................................133 Nathan D. Mickulicz, Priya Narasimhan, and Rajeev Gandhi, YinzCam, Inc. and Carnegie Mellon University Friday, November 8 Challenges to Error Diagnosis in Hadoop Ecosystems. 145 Jim (Zhanwen) Li, NICTA; Siyuan He, Citibank; Liming Zhu, NICTA and University of New South Wales; Xiwei Xu, NICTA; Min Fu, University of New South Wales; Len Bass and Anna Liu, NICTA and University of New South Wales; An Binh Tran, University of New South Wales Installation of an External Lustre Filesystem Using Cray esMS Management and Lustre 1.8.6 ..........155 Patrick Webb, Cray Inc . Message from the Program Co-Chairs Welcome to the LISA ’13, the 27th Large Installation System Administration Conference. It is our pleasure as program co-chairs to present this year’s program and proceedings. This conference is the result of the hard work of many people. We thank our authors, shepherds, external reviewers, speakers, tutorial instructors, conference organizers, attendees, and the USENIX staff. We’d particularly like to thank our program committee and the coordinators of the new LISA Labs program for being willing to experiment with new approaches. This year we have accepted 13 submissions, which adds to USENIX’s considerable body of published work. Some of these papers will be applicable for LISA attendees directly, while others are more speculative, and provide ideas for future experimentation. When choosing the program, we specifically chose some content to stimulate discussion and hopefully future paper submissions. Publications in this area—the crossroads of operations practitioners and systems researchers—are difficult to recruit. Framing these complex issues in ways from which others can benefit has always been challenging. We’ve prepared a program that we are proud of, but the largest contribution to the success of LISA is the wealth of experience and opinions of our attendees. It is this combination of technical program and deep discussion of deep technical problems that makes LISA unique in our field. We hope that you enjoy the conference. Narayan Desai, Argonne National Laboratory Kent Skaar, VMware, Inc. LISA ’13 Program Co-Chairs USENIX Association 27th Large Installation System Administration Conference v Building Software Environments for Research Computing Clusters Mark Howison Aaron Shen Andrew Loomis Brown University Brown University Brown University Abstract agement and scheduling, network and storage configura- tion, physical installation, and security. In this report, we Over the past two years, we have built a diverse software look in depth at a particular issue that they touched on environment of over 200 scientific applications for our only briefly: how to provide a usable software environ- research computing platform at Brown University. In this ment to a diverse user base of researchers. We describe report, we share the policies and best practices we have the best practices we have used to deploy the software developed to simplify the configuration and installation environment on our own cluster, as well as a new sys- of this software environment and to improve its usability tem, PyModules, we developed to make this deployment and performance. In addition, we present a reference im- easier. Finally, we speculate on the changes to software plementation of an environment modules system, called management that will occur as more research computing PyModules, that incorporates many of these ideas. moves from local, university-operated clusters to high- performance computing (HPC) resources that are provi- Tags sioned in the cloud. HPC, software installation, configuration management 2 Best Practices 1 Introduction