LISA '13 Proceedings Interior (PDF, Best for Mobile Devices)
Total Page:16
File Type:pdf, Size:1020Kb
USENIX Association Proceedings of the 27th Large Installation System Administration Conference November 3–8, 2013 Washington, D.C. Conference Organizers Program Co-Chairs David Nalley, Apache Cloudstack Narayan Desai, Argonne National Laboratory Adele Shakal, Metacloud, Inc. Kent Skaar, VMware, Inc. Daniel J Walsh, Red Hat Program Committee George Wilson, Delphix Patrick Cable, MIT Lincoln Laboratory Charles Wimmer, VertiCloud Mike Ciavarella, Coffee Bean Software Pty Ltd Poster Session Coordinator Andrew Hume, AT&T Labs—Research Marc Chiarini, Harvard SEAS Paul Krizak, Qualcomm USENIX Board Liaisons Adam Leff, WebMD David N. Blank-Edelman, Northeastern University John Looney, Google, Inc. Carolyn Rowland, Twinight Enterprises Andrew Lusk, Amazon Inc. Steering Committee Chris McEniry, Sony Paul Anderson, University of Edinburgh Tim Nelson, Worcester Polytechnic Institute David N. Blank-Edelman, Northeastern University Marco Nicosa, Moraga Systems LLC Mark Burgess, CFEngine Adam Oliner, University of California, Berkeley Alva Couch, Tufts University Carolyn Rowland, Twinight Enterprises Anne Dickison, USENIX Association Dan Russel, TED Talks Æleen Frisch, Exponential Consulting Adele Shakal, Metacloud Doug Hughes, D. E. Shaw Research, LLC Avleen Vig, Etsy, Inc. William LeFebvre, CSE Invited Talks Coordinators Thomas A. Limoncelli, Stack Exchange Nicole Forsgren Velasquez, Utah State University Adam Moskowitz Cory Lueninghoener, Los Alamos National Laboratory Mario Obejas, Raytheon Lightning Talks Coordinator Carolyn Rowland, Twinight Enterprises Lee Damon, University of Washington Rudi van Drunen, Xlexit Technology, The Netherlands Workshops Coordinator Education Director Kyrre Begnum, Oslo and Akershus University College Daniel V. Klein, USENIX Association of Applied Sciences Tutorial Coordinator Guru Is In Coordinator Matt Simmons, Northeastern University Chris St. Pierre, Amplify LISA Lab Hack Space Coordinators Gurus Paul Krizak, Qualcomm Owen DeLong, Hurricane Electric Chris McEniry, Sony Stephen Frost, Resonate Adele Shakal, Metacloud, Inc. Thomas A. Limoncelli, Stack Exchange External Reviewers David N. Blank-Edelman and Thomas A. Limoncelli LISA ’13: 27th Large Installation System Administration Conference November 3–8, 2013 Washington, D.C. Message from the Program Co-Chairs . v Wednesday, November 6 Building Software Environments for Research Computing Clusters ...................................1 Mark Howison, Aaron Shen, and Andrew Loomis, Brown University Fixing On-call, or How to Sleep Through the Night .................................................7 Matt Provost, Weta Digital Thursday, November 7 Poncho: Enabling Smart Administration of Full Private Clouds .....................................17 Scott Devoid and Narayan Desai, Argonne National Laboratory; Lorin Hochstein, Nimbis Services Making Problem Diagnosis Work for Large-Scale, Production Storage Systems. .27 Michael P . Kasick and Priya Narasimhan, Carnegie Mellon University; Kevin Harms, Argonne National Laboratory dsync: Efficient Block-wise Synchronization of Multi-Gigabyte Binary Data. 45 Thomas Knauth and Christof Fetzer, Technische Universität Dresden HotSnap: A Hot Distributed Snapshot System For Virtual Machine Cluster ...........................59 Lei Cui, Bo Li, Yangyang Zhang, and Jianxin Li, Beihang University Supporting Undoability in Systems Operations ....................................................75 Ingo Weber and Hiroshi Wada, NICTA and University of New South Wales; Alan Fekete, NICTA and University of Sydney; Anna Liu and Len Bass, NICTA and University of New South Wales Back to the Future: Fault-tolerant Live Update with Time-traveling State Transfer .....................89 Cristiano Giuffrida, Călin Iorgulescu, Anton Kuijsten, and Andrew S. Tanenbaum, Vrije Universiteit, Amsterdam Live Upgrading Thousands of Servers from an Ancient Red Hat Distribution to 10 Year Newer Debian Based One ...........................................................................105 Marc Merlin, Google, Inc. Managing Smartphone Testbeds with SmartLab .................................................115 Georgios Larkou, Constantinos Costa, Panayiotis G . Andreou, Andreas Konstantinidis, and Demetrios Zeinalipour-Yazti, University of Cyprus YinzCam: Experiences with In-Venue Mobile Video and Replays ...................................133 Nathan D. Mickulicz, Priya Narasimhan, and Rajeev Gandhi, YinzCam, Inc. and Carnegie Mellon University Friday, November 8 Challenges to Error Diagnosis in Hadoop Ecosystems. 145 Jim (Zhanwen) Li, NICTA; Siyuan He, Citibank; Liming Zhu, NICTA and University of New South Wales; Xiwei Xu, NICTA; Min Fu, University of New South Wales; Len Bass and Anna Liu, NICTA and University of New South Wales; An Binh Tran, University of New South Wales Installation of an External Lustre Filesystem Using Cray esMS Management and Lustre 1.8.6 ..........155 Patrick Webb, Cray Inc . Message from the Program Co-Chairs Welcome to the LISA ’13, the 27th Large Installation System Administration Conference. It is our pleasure as program co-chairs to present this year’s program and proceedings. This conference is the result of the hard work of many people. We thank our authors, shepherds, external reviewers, speakers, tutorial instructors, conference organizers, attendees, and the USENIX staff. We’d particularly like to thank our program committee and the coordinators of the new LISA Labs program for being willing to experiment with new approaches. This year we have accepted 13 submissions, which adds to USENIX’s considerable body of published work. Some of these papers will be applicable for LISA attendees directly, while others are more speculative, and provide ideas for future experimentation. When choosing the program, we specifically chose some content to stimulate discussion and hopefully future paper submissions. Publications in this area—the crossroads of operations practitioners and systems researchers—are difficult to recruit. Framing these complex issues in ways from which others can benefit has always been challenging. We’ve prepared a program that we are proud of, but the largest contribution to the success of LISA is the wealth of experience and opinions of our attendees. It is this combination of technical program and deep discussion of deep technical problems that makes LISA unique in our field. We hope that you enjoy the conference. Narayan Desai, Argonne National Laboratory Kent Skaar, VMware, Inc. LISA ’13 Program Co-Chairs USENIX Association 27th Large Installation System Administration Conference v Building Software Environments for Research Computing Clusters Mark Howison Aaron Shen Andrew Loomis Brown University Brown University Brown University Abstract agement and scheduling, network and storage configura- tion, physical installation, and security. In this report, we Over the past two years, we have built a diverse software look in depth at a particular issue that they touched on environment of over 200 scientific applications for our only briefly: how to provide a usable software environ- research computing platform at Brown University. In this ment to a diverse user base of researchers. We describe report, we share the policies and best practices we have the best practices we have used to deploy the software developed to simplify the configuration and installation environment on our own cluster, as well as a new sys- of this software environment and to improve its usability tem, PyModules, we developed to make this deployment and performance. In addition, we present a reference im- easier. Finally, we speculate on the changes to software plementation of an environment modules system, called management that will occur as more research computing PyModules, that incorporates many of these ideas. moves from local, university-operated clusters to high- performance computing (HPC) resources that are provi- Tags sioned in the cloud. HPC, software installation, configuration management 2 Best Practices 1 Introduction As Keen et al. noted, it is common practice to organize the available software on a research compute cluster into Universities are increasingly centralizing their research modules, with each module representing a specific ver- compute resources from individual science departments sion of a software package. In fact, this practice dates to a single, comprehensive service provider. At Brown back nearly 20 years to the Environment Modules tool University, that provider is the Center for Computation created by Furlani and Osel [3], which allows adminis- and Visualization (CCV), and it is responsible for sup- trators to write “modulefiles” that define how a user’s en- porting the computational needs of users from over 50 vironment is modified to access a specific application. academic departments and research centers including the The Environment Modules software makes it possible life, physical, and social sciences. The move to central- to install several different versions of the same software ized research computing has created an increasing de- package on the same system, allowing users to reliably mand for applications from diverse scientific domains, access a specific version. This is important for stability which have diverse requirements, software installation and for avoiding problems with backward compatability, procedures and dependencies. While individual depart- especially for software with major changes between ver- ments may need to provide only a handful of key appli- sions, such as differences in APIs