27Th Large Installation System Administration Conference (LISA '13)

Total Page:16

File Type:pdf, Size:1020Kb

27Th Large Installation System Administration Conference (LISA '13) conference proceedings Proceedings of the 27th Large Installation System Administration Conference 27th Large Installation System Administration Conference (LISA ’13) Washington, D.C., USA November 3–8, 2013 Washington, D.C., USA November 3–8, 2013 Sponsored by In cooperation with LOPSA Thanks to Our LISA ’13 Sponsors Thanks to Our USENIX and LISA SIG Supporters Gold Sponsors USENIX Patrons Google InfoSys Microsoft Research NetApp VMware USENIX Benefactors Akamai EMC Hewlett-Packard Linux Journal Linux Pro Magazine Puppet Labs Silver Sponsors USENIX and LISA SIG Partners Cambridge Computer Google USENIX Partners Bronze Sponsors Meraki Nutanix Media Sponsors and Industry Partners ACM Queue IEEE Security & Privacy LXer ADMIN IEEE Software No Starch Press CiSE InfoSec News O’Reilly Media Computer IT/Dev Connections Open Source Data Center Conference Distributed Management Task Force IT Professional (OSDC) (DMTF) Linux Foundation Server Fault Free Software Magazine Linux Journal The Data Center Journal HPCwire Linux Pro Magazine Userfriendly.org IEEE Pervasive © 2013 by The USENIX Association All Rights Reserved This volume is published as a collective work. Rights to individual papers remain with the author or the author’s employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. Permission is granted to print, primarily for one person’s exclusive use, a single copy of these Proceedings. USENIX acknowledges all trademarks herein. ISBN 978-1-931971-05-8 USENIX Association Proceedings of the 27th Large Installation System Administration Conference November 3–8, 2013 Washington, D.C. Conference Organizers Program Co-Chairs David Nalley, Apache Cloudstack Narayan Desai, Argonne National Laboratory Adele Shakal, Metacloud, Inc. Kent Skaar, VMware, Inc. Daniel J Walsh, Red Hat Program Committee George Wilson, Delphix Patrick Cable, MIT Lincoln Laboratory Charles Wimmer, VertiCloud Mike Ciavarella, Coffee Bean Software Pty Ltd Poster Session Coordinator Andrew Hume, AT&T Labs—Research Marc Chiarini, Harvard SEAS Paul Krizak, Qualcomm USENIX Board Liaisons Adam Leff, WebMD David N. Blank-Edelman, Northeastern University John Looney, Google, Inc. Carolyn Rowland, Twinight Enterprises Andrew Lusk, Amazon Inc. Steering Committee Chris McEniry, Sony Paul Anderson, University of Edinburgh Tim Nelson, Worcester Polytechnic Institute David N. Blank-Edelman, Northeastern University Marco Nicosa, Moraga Systems LLC Mark Burgess, CFEngine Adam Oliner, University of California, Berkeley Alva Couch, Tufts University Carolyn Rowland, Twinight Enterprises Anne Dickison, USENIX Association Dan Russel, TED Talks Æleen Frisch, Exponential Consulting Adele Shakal, Metacloud Doug Hughes, D. E. Shaw Research, LLC Avleen Vig, Etsy, Inc. William LeFebvre, CSE Invited Talks Coordinators Thomas A. Limoncelli, Stack Exchange Nicole Forsgren Velasquez, Utah State University Adam Moskowitz Cory Lueninghoener, Los Alamos National Laboratory Mario Obejas, Raytheon Lightning Talks Coordinator Carolyn Rowland, Twinight Enterprises Lee Damon, University of Washington Rudi van Drunen, Xlexit Technology, The Netherlands Workshops Coordinator Education Director Kyrre Begnum, Oslo and Akershus University College Daniel V. Klein, USENIX Association of Applied Sciences Tutorial Coordinator Guru Is In Coordinator Matt Simmons, Northeastern University Chris St. Pierre, Amplify LISA Lab Hack Space Coordinators Gurus Paul Krizak, Qualcomm Owen DeLong, Hurricane Electric Chris McEniry, Sony Stephen Frost, Resonate Adele Shakal, Metacloud, Inc. Thomas A. Limoncelli, Stack Exchange External Reviewers David N. Blank-Edelman and Thomas A. Limoncelli LISA ’13: 27th Large Installation System Administration Conference November 3–8, 2013 Washington, D.C. Message from the Program Co-Chairs . v Wednesday, November 6 Building Software Environments for Research Computing Clusters ...................................1 Mark Howison, Aaron Shen, and Andrew Loomis, Brown University Fixing On-call, or How to Sleep Through the Night .................................................7 Matt Provost, Weta Digital Thursday, November 7 Poncho: Enabling Smart Administration of Full Private Clouds .....................................17 Scott Devoid and Narayan Desai, Argonne National Laboratory; Lorin Hochstein, Nimbis Services Making Problem Diagnosis Work for Large-Scale, Production Storage Systems. .27 Michael P . Kasick and Priya Narasimhan, Carnegie Mellon University; Kevin Harms, Argonne National Laboratory dsync: Efficient Block-wise Synchronization of Multi-Gigabyte Binary Data. 45 Thomas Knauth and Christof Fetzer, Technische Universität Dresden HotSnap: A Hot Distributed Snapshot System For Virtual Machine Cluster ...........................59 Lei Cui, Bo Li, Yangyang Zhang, and Jianxin Li, Beihang University Supporting Undoability in Systems Operations ....................................................75 Ingo Weber and Hiroshi Wada, NICTA and University of New South Wales; Alan Fekete, NICTA and University of Sydney; Anna Liu and Len Bass, NICTA and University of New South Wales Back to the Future: Fault-tolerant Live Update with Time-traveling State Transfer .....................89 Cristiano Giuffrida, Călin Iorgulescu, Anton Kuijsten, and Andrew S. Tanenbaum, Vrije Universiteit, Amsterdam Live Upgrading Thousands of Servers from an Ancient Red Hat Distribution to 10 Year Newer Debian Based One ...........................................................................105 Marc Merlin, Google, Inc. Managing Smartphone Testbeds with SmartLab .................................................115 Georgios Larkou, Constantinos Costa, Panayiotis G . Andreou, Andreas Konstantinidis, and Demetrios Zeinalipour-Yazti, University of Cyprus YinzCam: Experiences with In-Venue Mobile Video and Replays ...................................133 Nathan D. Mickulicz, Priya Narasimhan, and Rajeev Gandhi, YinzCam, Inc. and Carnegie Mellon University Friday, November 8 Challenges to Error Diagnosis in Hadoop Ecosystems. 145 Jim (Zhanwen) Li, NICTA; Siyuan He, Citibank; Liming Zhu, NICTA and University of New South Wales; Xiwei Xu, NICTA; Min Fu, University of New South Wales; Len Bass and Anna Liu, NICTA and University of New South Wales; An Binh Tran, University of New South Wales Installation of an External Lustre Filesystem Using Cray esMS Management and Lustre 1.8.6 ..........155 Patrick Webb, Cray Inc . Message from the Program Co-Chairs Welcome to the LISA ’13, the 27th Large Installation System Administration Conference. It is our pleasure as program co-chairs to present this year’s program and proceedings. This conference is the result of the hard work of many people. We thank our authors, shepherds, external reviewers, speakers, tutorial instructors, conference organizers, attendees, and the USENIX staff. We’d particularly like to thank our program committee and the coordinators of the new LISA Labs program for being willing to experiment with new approaches. This year we have accepted 13 submissions, which adds to USENIX’s considerable body of published work. Some of these papers will be applicable for LISA attendees directly, while others are more speculative, and provide ideas for future experimentation. When choosing the program, we specifically chose some content to stimulate discussion and hopefully future paper submissions. Publications in this area—the crossroads of operations practitioners and systems researchers—are difficult to recruit. Framing these complex issues in ways from which others can benefit has always been challenging. We’ve prepared a program that we are proud of, but the largest contribution to the success of LISA is the wealth of experience and opinions of our attendees. It is this combination of technical program and deep discussion of deep technical problems that makes LISA unique in our field. We hope that you enjoy the conference. Narayan Desai, Argonne National Laboratory Kent Skaar, VMware, Inc. LISA ’13 Program Co-Chairs USENIX Association 27th Large Installation System Administration Conference v Building Software Environments for Research Computing Clusters Mark Howison Aaron Shen Andrew Loomis Brown University Brown University Brown University Abstract agement and scheduling, network and storage configura- tion, physical installation, and security. In this report, we Over the past two years, we have built a diverse software look in depth at a particular issue that they touched on environment of over 200 scientific applications for our only briefly: how to provide a usable software environ- research computing platform at Brown University. In this ment to a diverse user base of researchers. We describe report, we share the policies and best practices we have the best practices we have used to deploy the software developed to simplify the configuration and installation environment on our own cluster, as well as a new sys- of this software environment and to improve its usability tem, PyModules, we developed to make this deployment and performance. In addition, we present a reference im- easier. Finally, we speculate on the changes to software plementation of an environment modules system, called management that will occur as more research computing PyModules, that incorporates many of these ideas. moves from local, university-operated clusters to high- performance computing (HPC) resources that are provi- Tags sioned in the cloud. HPC, software installation, configuration management 2 Best Practices 1 Introduction
Recommended publications
  • IBM Software Support
    Version 5.0.0 What’s New . Phone Contacts We have eliminated the phone number section from this document and direct you to IBM Planetwide for contact information. I would appreciate your feedback on what you like and what you think should be improved about this document. e-mail me at mcknight@ us.ibm.com Cover design by Rick Nelson, IBM Raleigh 2 June, 2014 Contents What’s New ................................................................................... 2 Overview of IBM Support .............................................................. 4 IBM Commitment......................................................................... 4 Software Support Organization ...................................................... 4 Software Support Portfolio ................................................................. 5 IBM Software Support ...................................................... 5 Support Foundation .......................................................................... 5 Passport Advantage & IBM Software Maintenance ................................ 6 System z (S/390) ......................................................................... 7 Support Line and SoftwareXcel ..................................................... 7 IBM Selected Support Offering ....................................................... 8 Premium Support ............................................................................ 8 Enhanced Technical Support ........................................................ 8 IBM Software Accelerated Value
    [Show full text]
  • Homework #5 Answer Key
    CS312 Homework #5 Answer Key March 9, 2016 Questions 1. Load balancers are designed to help with vertical scaling. (a) True (b) False 2. Layer 7 load balancers allow you to use both connection tracking and inspection of packets including modification of packets. (a) True (b) False 3. Linux containers do not require hypervisors (a) True (b) False 1 4. The RUN command can be used more than once in a Dockerfile. (a) True (b) False 5. The PID Namespace allows users to see processes from other containers (a) True (b) False 6. Paravirtualization uses cpu supported virtualization. (a) True (b) False 7. Which software load balancer primarily acts as an HTTP accelerator and a static cache server? (a) HAProxy (b) Apache (c) nginx (d) Varnish 8. Which of the following is not true about Linux Containers? (a) Operating system level virtualization (b) Provides little overhead (c) Fully emulates an operating system (d) Allows limits to resources with cgroups 9. Which service describes a virtual computing platform? (a) IaaS (b) PaaS (c) SaaS (d) VaaS 2 10. Describe the potential problems of using Round Robin DNS. • Depending on the implementation of the client resolver, this can be somewhat random or not. • Some resolvers always use alphabetical ordering. • DNS Caching and Time-To-Live (TTL) issues. • If the site goes down, you have to update DNS to reroute traffic which can cause problems. 11. Which scheduling algorithm is generally preferred if the web application uses sessions? Source Connection 12. Which HAProxy configuration section allows you to define a complete proxy? listen 13.
    [Show full text]
  • A Large-Scale Study of File-System Contents
    A Large-Scale Study of File-System Contents John R. Douceur and William J. Bolosky Microsoft Research Redmond, WA 98052 {johndo, bolosky}@microsoft.com ABSTRACT sizes are fairly consistent across file systems, but file lifetimes and file-name extensions vary with the job function of the user. We We collect and analyze a snapshot of data from 10,568 file also found that file-name extension is a good predictor of file size systems of 4801 Windows personal computers in a commercial but a poor predictor of file age or lifetime, that most large files are environment. The file systems contain 140 million files totaling composed of records sized in powers of two, and that file systems 10.5 TB of data. We develop analytical approximations for are only half full on average. distributions of file size, file age, file functional lifetime, directory File-system designers require usage data to test hypotheses [8, size, and directory depth, and we compare them to previously 10], to drive simulations [6, 15, 17, 29], to validate benchmarks derived distributions. We find that file and directory sizes are [33], and to stimulate insights that inspire new features [22]. File- fairly consistent across file systems, but file lifetimes vary widely system access requirements have been quantified by a number of and are significantly affected by the job function of the user. empirical studies of dynamic trace data [e.g. 1, 3, 7, 8, 10, 14, 23, Larger files tend to be composed of blocks sized in powers of two, 24, 26]. However, the details of applications’ and users’ storage which noticeably affects their size distribution.
    [Show full text]
  • Full Circle Magazine #63 1 Full Circle Magazine Is Neither Affiliated With, Nor Endorsed By, Canonical Ltd
    Full Circle THE INDEPENDENT MAGAZINE FOR THE UBUNTU LINUX COMMUNITY WEB DEV ISSUE #63 - July 2012 LAMP & WEB DEVELOPMENT ) m o c . r k c i l F ( e u S : o t o h P RREEDDUUCCEE **BBUUNNTTUU BBOOOOTT TTIIMMEE WWIITTHH TTHHIISS IINN DDEEPPTTHH AARRTTIICCLLEE.. WWIITTHH GGRRAAPPHHSS!! full circle magazine #63 1 Full Circle Magazine is neither affiliated with, nor endorsed by, Canonical Ltd. contents ^ HowTo Full Circle Opinions THE INDEPENDENT MAGAZINE FOR THE UBUNTU LINUX COMMUNITY My Story p.39 Python - Part 34 p.07 Columns LibreOffice - Part 15 p.11 Command & Conquer p.05 Audio Flux p.52 My Opinion p.41 Amateur Astronomy - Pt1 p.14 Ask The New Guy p.26 Ubuntu Games p.54 Review p.44 Letters p.46 GIMP - Beanstalk Pt3 p.17 Linux Labs p.29 Q&A p.49 Inkscape - Part 2 p.19 Ubuntu Women p.53 Closing Windows p.36 Web Dev - Pt1 p.22 Graphics Web Dev The articles contained in this magazine are released under the Creative Commons Attribution-Share Alike 3.0 Unported license. This means you can adapt, copy, distribute and transmit the articles but only under the following conditions: you must attribute the work to the original author in some way (at least a name, email or URL) and to this magazine by name ('Full Circle Magazine') and the URL www.fullcirclemagazine.org (but not attribute the article(s) in any way that suggests that they endorse you or your use of the work). If you alter, transform, or build upon this work, you must distribute the resulting work under the same, similar or a compatible license.
    [Show full text]
  • Josh Malone Systems Administrator National Radio Astronomy Observatory Charlottesville, VA
    heck What the #%!@ is wrong ^ with my server?!? Josh Malone Systems Administrator National Radio Astronomy Observatory Charlottesville, VA 1 Agenda • Intro to Monitoring • Internet protocols 101 • • Nagios SMTP • IMAP • Install/Config • HTTP • Usage • Custom plugins • Packet sniffing for dummies • Intro to Troubleshooting • Tools • telnet, openssl • grep, sed • ps, lsof, netstat 2 MONITORING 3 Automated Monitoring Workflow 4 Monitoring Packages: Open Source • • Pandora FMS • Opsview Core • Naemon • • • • • • Captialware ServerStatus • Core • Sensu All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here. 5 Monitoring Packages: Commercial • Nagios XI • Groundwork • PRTG network monitor • CopperEgg • WhatsUp Gold • PRTG network monitor • op5 (Naemon) All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here. 6 Why Automatic Service Monitoring? • Spot small problems before they become big ones • Learn about outages before your users do • Checklist when restoring from a power outage • Gives you better problem reports than users • Problems you might never spot otherwise • Failed HDDs in RAIDs • Full /var partitions • Logs not rotating • System temperature rising 7 Why Automatic Service Monitoring? • Capacity planning • Performance data can generate graphs of utilization • RAM, Disk, etc. • Availability reports - CAUTION • Easy to generate -- even easier to generate wrong • Make sure your configurations actually catch problems • Will also include problems with Nagios itself :( • If you’re going to quote your availability numbers (SLAs, etc.) make sure you understand what you’re actually monitoring.
    [Show full text]
  • Monitoring Bareos with Icinga 2 Version: 1.0
    Monitoring Bareos with Icinga 2 Version: 1.0 We love Open Source 1 © NETWAYS Table of Contents 1 Environment 2 Introduction 3 Host 4 Active Checks 5 Passive Events 6 Graphite 2 © NETWAYS 1 Environment 3 © NETWAYS Pre-installed Software Bareos Bareos Database (PostgreSQL) Bareos WebUI Icinga 2 IDO (MariaDB) Icinga Web 2 Graphite 4 © NETWAYS 2 Introduction 5 © NETWAYS 2.1 Bareos 6 © NETWAYS What is Bareos? Backup Archiving Recovery Open Sourced Backup, archiving and recovery of current operating systems Open Source Fork of Bacula (http://bacula.org) Forked 2010 (http://bareos.org) AGPL v3 License (https://github.com/bareos/bareos) A lot of new features: LTO Hardware encryption Bandwidth limitation Cloud storage connection New console commands Many more 7 © NETWAYS Bareos Structure 8 © NETWAYS 2.2 Icinga 2 9 © NETWAYS Icinga - Open Source Enterprise Monitoring Icinga is a scalable and extensible monitoring system which checks the availability of your resources, notifies users of outages and provides extensive BI data. International community project Everything developed by the Icinga Project is Open Source Originally forked from Nagios in 2009 Independent version Icinga 2 since 2014 10 © NETWAYS Icinga - Availability Monitoring Monitors everything Gathering status Collect performance data Notifies using any channel Considers dependencies Handles events Checks and forwards logs Deals with performance data Provides SLA data 11 © NETWAYS What is Icinga 2? Core based on C++ and Boost Supports all major *NIX and Windows platforms Powerful configuration
    [Show full text]
  • File System Design Approaches
    File System Design Approaches Dr. Brijender Kahanwal Department of Computer Science & Engineering Galaxy Global Group of Institutions Dinarpur, Ambala, Haryana, INDIA [email protected] Abstract—In this article, the file system development design The experience with file system development is limited approaches are discussed. The selection of the file system so the research served to identify the different techniques design approach is done according to the needs of the that can be used. The variety of file systems encountered developers what are the needed requirements and show what an active area of research file system specifications for the new design. It allowed us to identify development is. The file systems researched fell into one of where our proposal fitted in with relation to current and past file system development. Our experience with file system the following four categories: development is limited so the research served to identify the 1. The file system is developed in user space and runs as a different techniques that can be used. The variety of file user process. systems encountered show what an active area of research file 2. The file system is developed in the user space using system development is. The file systems may be from one of the FUSE (File system in USEr space) kernel module and two fundamental categories. In one category, the file system is runs as a user process. developed in user space and runs as a user process. Another 3. The file system is developed in the kernel and runs as a file system may be developed in the kernel space and runs as a privileged process.
    [Show full text]
  • Introduction Use Runit with Traditional Init (Sysvinit)
    2021/07/26 19:10 (UTC) 1/12 Runit Runit Introduction runit is a UNIX init scheme with service supervision. It is a cross-platform Unix init scheme with service supervision, a replacement for sysvinit, and other init schemes and supervision that are used with the traditional init. runit is compatible with djb's daemontools. In Unix-based computer operating systems, init (short for initialization) is the first process started during booting of the computer system. Init is a daemon process that continues running until the system is shut down. Slackware comes with its own legacy init (/sbin/init) from the sysvinit package, that used to be included in almost all other major Linux distributions. The init daemon (or its replacement) is characterised by Process ID 1 (PID 1). To read on the benefits of runit, see here: http://smarden.org/runit/benefits.html * Unless otherwise stated, all commands in this article are to be run by root. Use runit with traditional init (sysvinit) runit is not provided by Slackware, but a SlackBuild is maintained on https://slackbuilds.org/. It does not have any dependencies. As we do not want yet to replace init with runit, run the slackbuild with CONFIG=no: CONFIG=no ./runit.SlackBuild Then install the resulting package and proceed as follows: mkdir /etc/runit/ /service/ cp -a /usr/doc/runit-*/etc/2 /etc/runit/ /sbin/runsvdir-start & Starting via rc.local For a typical Slackware-stlyle service, you can edit /etc/rc.d/rc.local file if [ -x /sbin/runsvdir-start ]; then /sbin/runsvdir-start & fi and then edit write /etc/rc.d/rc.local_shutdown #!/bin/sh SlackDocs - https://docs.slackware.com/ Last update: 2020/05/06 08:08 (UTC) howtos:slackware_admin:runit https://docs.slackware.com/howtos:slackware_admin:runit RUNIT=x$( /sbin/pidof runsvdir ) if [ "$RUNIT" != x ]; then kill $RUNIT fi Then give rc.local_shutdown executive permission: chmod +x /etc/rc.d/rc.local_shutdown and reboot Starting via inittab (supervised) Remove the entries in /etc/rc.d/rc.local and /etc/rc.d/rc.local_shutdown described above.
    [Show full text]
  • Latency-Aware, Inline Data Deduplication for Primary Storage
    iDedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti NetApp, Inc. fskiran, tbisson, goodson, [email protected] Abstract systems exist that deduplicate inline with client requests for latency sensitive primary workloads. All prior dedu- Deduplication technologies are increasingly being de- plication work focuses on either: i) throughput sensitive ployed to reduce cost and increase space-efficiency in archival and backup systems [8, 9, 15, 21, 26, 39, 41]; corporate data centers. However, prior research has not or ii) latency sensitive primary systems that deduplicate applied deduplication techniques inline to the request data offline during idle time [1, 11, 16]; or iii) file sys- path for latency sensitive, primary workloads. This is tems with inline deduplication, but agnostic to perfor- primarily due to the extra latency these techniques intro- mance [3, 36]. This paper introduces two novel insights duce. Inherently, deduplicating data on disk causes frag- that enable latency-aware, inline, primary deduplication. mentation that increases seeks for subsequent sequential Many primary storage workloads (e.g., email, user di- reads of the same data, thus, increasing latency. In addi- rectories, databases) are currently unable to leverage the tion, deduplicating data requires extra disk IOs to access benefits of deduplication, due to the associated latency on-disk deduplication metadata. In this paper, we pro- costs. Since offline deduplication systems impact la-
    [Show full text]
  • Michael Mcgeein, CPA, CA September 2018
    IBM Cognos Analytics Data Prep. and Modelling Working with your data Michael McGeein, CPA, CA September 2018 IBM Business Analytics :: IBM Confidential :: © 2016 IBM Corporation Notices and disclaimers Copyright © 2018 by International Business Machines Corporation (IBM). as illustrations of how those customers have used IBM products and No part of this document may be reproduced or transmitted in any form without the results they may have achieved. Actual performance, cost, savings or other written permission from IBM. results in other operating environments may vary. U.S. Government Users Restricted Rights — use, duplication or disclosure References in this document to IBM products, programs, or services does not restricted by GSA ADP Schedule Contract with IBM. imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as Workshops, sessions and associated materials may have been prepared by of the date of initial publication and could include unintentional technical or independent session speakers, and do not necessarily reflect the typographical errors. IBM shall have no responsibility to update this information. views of IBM. All materials and discussions are provided for informational This document is distributed “as is” without any warranty, either express or purposes only, and are neither intended to, nor shall constitute legal or other implied. In no event shall IBM be liable for any damage arising from the use of guidance or advice to any individual participant or their specific situation.
    [Show full text]
  • Using Ganeti for Running Highly Available Virtual Services
    Using Ganeti for running highly available virtual services 2016-04-21, HEPiX Spring 2016, Zeuthen Overview ● What is Ganeti ● What is it good for ● How does it work ● NDGF usage 2 What is Ganeti ● A software stack for managing virtual machines – Like VMware or OpenStack or libvirt or ... – Supporting Xen or KVM hypervisors – Handles ● Storage: volume creation and assignment ● OS installation and customization ● Networking ● Startup, shutdown, live migration, failover of instances – Written in Python and Haskell – Aimed for ease of use and fast and simple error recovery after physical failures on commodity hardware 3 What is Ganeti ● Mainly developed by Google for their own use – Handles VMs for corporate network (office servers, remote desktops etc), not production services (what non-employees see) ● Outside Google – Debian – NDGF-T1 – Lufthansa – Etc ● Maintained by Google with significant external contributions 4 What is Ganeti good at ● Running highly available services on a small set of hardware – DRBD or external reliable block devices (CEPH, Enterprise storage) – Live migrations in case of impending hardware failure ● Or reboot into new kernel security upgrade on the hardnode – Failover handled automatically in case of sudden hardware failure – No external dependencies beyond networking ● Well, if you use external storage... ● But no extra servers or services needed – Typical reasonable cluster size, 3 – 50 hardnodes ● Multiple clusters integrate well though in admin tools 5 How does Ganeti work ● gnt-cluster init ... – Creates a cluster of ganeti nodes – We'll assume DRBD for storage, as at NDGF – One member is a master node ● Others can take over with master-failover if they can get quorum A B C D E 6 How does Ganeti work ● gnt-instance add – Creates VMs, with OS install scripts (image, debootstrap, pxe) – Each VM has a secondary location (DRBD mirror, sync) A B C D E 7 How does Ganeti work ● gnt-instance migrate – No noticable service impact from live migration, <1s network pause ● Unless something is broken..
    [Show full text]
  • Configuration File Manipulation with Configuration Management Tools
    Configuration File Manipulation with Configuration Management Tools Student Paper for 185.307 Seminar aus Programmiersprachen, SS2016 Bernhard Denner, 0626746 Email: [email protected] Abstract—Manipulating and defining content of configuration popular CM-Tool in these days and offers a huge variety of files is one of the central tasks of configuration management configuration manipulation methods. But we will also take a tools. This is important to ensure a desired and correct behavior look at other CM-Tools to explore their view of configuration of computer systems. However, often considered very simple, this file handling. task can turn out to be very challenging. Configuration manage- ment tools provide different strategies for content manipulation In Section II an introduction will be given about CM- of files, hence an administrator should think twice to choose the Tools, examining which ones are established in the community best strategy for a specific task. This paper will explore different and how they work. Before we dive into file manipulation approaches of automatic configuration file manipulation with the strategies, Section III will give some introduction in configu- help of configuration management tools. We will take a close look at the configuration management tool Puppet, as is offers various ration stores. Section IV will describe different configuration different approaches to tackle content manipulation. We will also manipulation strategies of Puppet, whereas Section V will explore similarities and differences of approaches by comparing look at approaches of other CM-Tools. Section VI will show it with other configuration management tools. This should aid some scientific papers about CM-Tools.
    [Show full text]