Agentless Cloud-Wide Monitoring of Virtual Disk State

Total Page:16

File Type:pdf, Size:1020Kb

Agentless Cloud-Wide Monitoring of Virtual Disk State Agentless Cloud-wide Monitoring of Virtual Disk State Wolfgang Richter CMU-CS-15-138 October 2015 School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA Thesis Committee Mahadev Satyanarayanan, Chair David G. Andersen Gregory R. Ganger Vasanth Bala (Google) Canturk Isci (IBM Research) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Copyright © 2015 Wolfgang Richter This research was sponsored by the National Science Foundation under grant numbers CNS-0614679, CNS-0833882, IIS-1065336, DGE-0750271, and DGE-1252522, the Defense Advanced Research Projects Agency under grant number FA8721-5-C-0003, Intel ISTC-CC, IBM, and Vodafone. The views and conclu- sions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government oranyother entity. Keywords: Agentless, Agentless Monitoring, Agentless Cloud Monitoring, Cloud, Cloud Com- puting, Cloud Monitoring, Deduplication, Deduplicated Snapshotting, Distributed Streaming Virtual Machine Introspection, File Deduplication, File Snapshotting, File Monitoring, Incremental Hash- ing, Introspection, Optimistic File Snapshotting, Retrospection, Searchable Backup, Snapshotting, Virtual Disk, Virtual Storage, Virtual Machine, VM, Virtual Machine Introspection, VMI Abstract This dissertation proposes a fundamentally different way of monitoring persistent storage. It intro- duces a monitoring platform based on the modern reality of software defined storage which enables the decoupling of policy from mechanism. The proposed platform is both agentless—meaning it op- erates external to and independent of the entities it monitors—and scalable—meaning it is designed to address many systems at once with a mixture of operating systems and applications. Concretely, this dissertation focuses on virtualized clouds, but the proposed monitoring platform generalizes to any form of persistent storage. The core mechanism this dissertation introduces is called Dis- tributed Streaming Virtual Machine Introspection (DS-VMI), and it leverages two properties of modern clouds: virtualized servers managed by hypervisors enabling efficient introspection, and file-level duplication of data within cloud instances. We explore a new class of agentless monitoring applications via three interfaces with two different consistency models: cloud-inotify (strong consistency), /cloud (eventual consistency), and /cloud-history (strong consistency). cloud- inotify is a publish-subscribe interface to cloud-wide file-level updates and it supports event-based monitoring applications. /cloud is designed to support batch-based and legacy monitoring applica- tions by providing a file system interface to cloud-wide file-level state. /cloud-history is designed to support efficient search and management of historic virtual disk state. It leverages new fast-to- access archival storage systems, and achieves tractable indexing of file-level history via whole-file deduplication using a novel application of an incremental hashing construction. I dedicate this work to those that have helped me make my dreams become reality. First, I must thank my maternal grandparents who supported me academically and my, at times, expensive explorations with computers and electronics. Gratitude alone is not nearly enough. I will never forget your legacy. Second, I must thank my paternal grandparents for showing me how people can persevere through the worst of times the world has ever seen and thrive. Third, for my parents who fostered a home filled with the digital wonders of the modern world, and my siblings for always being there for me. Fourth, to Ma and Baba for support and guidance through the fog of stress. Finally, and most importantly, to Debjani Biswas my companion for life, my wife. Thanks for putting up with my long hours, wacky ideas, and going on adventures with me around the world. Acknowledgments The PhD process is a herculean effort, of which I am only one small part. Along the way I have been aided by the smartest people at the top of the field of Computer Science. My success is attributable to the great opportunities presented to me by the academic environment fostered at Carnegie Mellon University’s School of Computer Science. I have been honored with public support through the NSF, and must thank the American people for supporting scientific efforts across the country. Of course, my interactions with many people along the way have shaped me, formed my research focus, and honed my skills. Here I thank many by name, but there are many more than I could possibly name. Thank you to everyone that interacted with me during my tenure as a graduate student. We all truly stand on the shoulders of giants. Of course, a large part of my success is due to excellent guidance and tutelage under my advi- sor Mahadev Satyanarayanan. Over the many years, our group administrative assistants including Tracy Farbacher, Angela Miller, and Chase Klingensmith always made my day better and ensured academic life was smooth. Deborah Cavlovich is amazing taking away every possible headache that arose during graduate school. Without her, scheduling classes, staying on top of PhD administrative overhead, and practically anything else would have been a nightmare. Catherine Copetas is a friend and helper to all students at CMU, she enriched my life by offering me many opportunities to meet new and interesting visitors. Karen Lindenfelser and Joan Digney always helped organize events via the SDI and the PDL. Their efforts made our lives less stressful, and more fun. Of course, conver- sations and nights well spent with friends led to new ideas and filled my life with fun. Thank you especially Yoshihisa Abe, Leman Akoglu, Brandon Amos, Sivaraman Balakrishnan, Field Cady, Zhuo Chen, Jim Cipar, Bhavana Dalvi, Leigh-Ann DeLyser, Ruta Desai, Bin Fan, Kiryong Ha, Wenlu Hu, Shiva Kaul, Elie Krevat, Anvesh Komuravelli, Jayant Krishnamurthy, Meghana Kshirsagar, Hyeon- taek Lim, Michelle Mazurek, Brendan Meeder, Iulian Moraru, Richard Peng, Amar Phanishayee, Kai Ren, Mehdi Samadi, Raja Sambasivan, Vyas Sekar, Vivek Seshadri, Jiri Simsa, Wittawat Tantisiriroj, Ekaterina Taralova, Alexey Tumanov, Vijay Vasudevan, Kevin Waugh, Gabriel Weisz, Lin Xiao, and Erik Zawadzki. For welcoming me to CMU, thank you Matthew Delaney, Jason Calaiaro, and Michael Mackin. Thank you all for the wonderful memories, they will stay with me for life. Last, but not least, I must thank a lot of my mentors and close research peers. This includes Glenn Ammons, Hrishikesh Amur, David Andersen, Vasanth Bala, Nilton Bila, Sastry Duri, Canturk Isci, Michael Kaminsky, Todd Mummert, Padmanabhan Pillai, Darrell Reimer, and Srinivas Seshan. The research scientists in my group Benjamin Gilbert, Adam Goode, and Jan Harkes have each vii viii ACKNOWLEDGMENTS served as mentors and an inspiration as I developed my systems research skills. Thanks for keeping me technically honest, humble, and holding me to the highest of standards. Each and every one of you has a hand in my future career and success. Contents Contents ix List of Figures xii List of Tables xv 1 Introduction 1 1.1 Problem Statement and Scope .............................. 3 1.1.1 Bridging the Semantic and Temporal Gaps ................... 4 1.1.2 Bounding Overhead ............................... 4 1.1.3 Generality ..................................... 4 1.1.4 Storing and Indexing Historic Changes ..................... 5 1.2 Thesis Statement ...................................... 5 1.3 Contributions ....................................... 6 2 Distributed Streaming Virtual Machine Introspection (DS-VMI) 9 2.1 Overview of the DS-VMI Prototype ........................... 10 2.2 Hypervisor, Kernel, and File System Requirements ................... 12 2.2.1 Hypervisor Virtual Storage Hooks ....................... 12 2.2.2 Guest Kernel Invariants ............................. 14 2.2.3 File System Invariants .............................. 16 2.2.4 Correctness of DS-VMI Relative to Snapshotting . 17 2.3 Crawling Initial Virtual Disk State ............................ 18 2.3.1 Impact on Virtual Image Library Operations . 20 2.3.2 Crawling a Virtual Disk ............................. 23 2.3.3 An Example Journaling File System: ext4 ................... 24 2.3.4 An Example Closed Source File System: NTFS . 24 2.3.5 An Example Non-Journaling File System: FAT32 . 27 2.4 Asynchronous Queuing of VM Writes .......................... 28 2.5 Introspecting Live Virtual Disk Writes .......................... 29 2.6 Live Attachment and Detachment ............................ 30 ix x CONTENTS 2.6.1 Live Crawling and Attaching .......................... 31 2.6.2 Detaching Introspection ............................. 32 2.7 Integration with Existing Clouds ............................. 32 2.7.1 Designing an API within OpenStack ...................... 33 2.8 Evaluating Overall Overhead ............................... 34 2.8.1 Experimental Setup ................................ 34 2.8.2 DS-VMI Tunables ................................. 35 2.8.3 Light-rate Small Writes: Modified Andrew Benchmark . 36 2.8.4 Clustered Large Writes: Installing Software . 37 2.8.5 Moderate-rate Small Writes: PostMark ..................... 37 2.8.6 Reducing Memory Footprint: Lazily Loading Metadata . 39 2.8.7 Dropping Writes ................................. 40 2.9 Limitations of DS-VMI .................................
Recommended publications
  • Huawei Announces EROFS Linux File-System, Might Eventually Be Used
    ARTICLES & REVIEWS NEWS ARCHIVE FORUMS PREMIUM CATEGORIES Custom Search Search Latest Linux News Huawei Announces EROFS Linux File-System, Might Huawei Announces EROFS Linux File- Eventually Be Used By Android Devices System, Might Eventually Be Used By Android Devices Written by Michael Larabel in Linux Storage on 31 May 2018 at 09:00 AM EDT. 3 Comments Mesa 18.0.5 Is The Last Planned Release In Huawei's Gao Xiang has announced the EROFS open-source Linux file-system The Series intended for Android devices, but still at its very early stages of AMD K8 Support Stripped Out Of Coreboot development. NVIDIA’s Next Generation Mainstream GPU Will At Least Be Detailed In August EROFS is the company's new approach for a read-only file-system that would work well for Android devices. EROFS is short for the Extendable Read-Only GNOME 3 Might Be Too Resource Hungry To File-System and they began developing it with being unsatisfied with other read-only file- Ever Run Nicely On The Raspberry Pi system alternatives. XWayland Gets Patch To Automatically Use EGLStreams For NVIDIA Support When EROFS is designed to offer better performance than other read-only alternatives while still Needed focusing upon saving storage space. As part of EROFS is also a compression mode pursuing BPFILTER Landing For Linux 4.18 For a different design approach than other file-systems: the compression numbers shared in Eventually Better Firewall / Packet Filtering today's announcement on both server hardware and a Kirin 970 are compelling for being in AMDGPU Patches Prepping JPEG Support For the early stages of development.
    [Show full text]
  • CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
    CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru­ mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas­ tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math­ ias Payer.
    [Show full text]
  • Ext4 File System and Crash Consistency
    1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al.
    [Show full text]
  • Distributing an SQL Query Over a Cluster of Containers
    2019 IEEE 12th International Conference on Cloud Computing (CLOUD) Distributing an SQL Query Over a Cluster of Containers David Holland∗ and Weining Zhang† Department of Computer Science, University of Texas at San Antonio Email: ∗[email protected], †[email protected] Abstract—Emergent software container technology is now across a cluster of containers in a cloud. A feasibility study available on any cloud and opens up new opportunities to execute of this with performance analysis is reported in this paper. and scale data intensive applications wherever data is located. A containerized query (henceforth CQ) uses a deployment However, many traditional relational databases hosted on clouds have not scaled well. In this paper, a framework and deployment methodology that is unique to each query with respect to the methodology to containerize relational SQL queries is presented, number of containers and their networked topology effecting so that, a single SQL query can be scaled and executed by data flows. Furthermore a CQ can be scaled at run-time a network of cooperating containers, achieving intra-operator by adding more intra-operators. In contrast, the traditional parallelism and other significant performance gains. Results of distributed database query deployment configurations do not container prototype experiments are reported and compared to a real-world RDBMS baseline. Preliminary result on a research change at run-time, i.e., they are static and applied to all cloud shows up to 3-orders of magnitude performance gain for queries. Additionally, traditional distributed databases often some queries when compared to running the same query on a need to rewrite an SQL query to optimize performance.
    [Show full text]
  • Ivoyeur: Inotify
    COLUMNS iVoyeur inotify DAVE JOSEPHSEN Dave Josephsen is the he last time I changed jobs, the magnitude of the change didn’t really author of Building a sink in until the morning of my first day, when I took a different com- Monitoring Infrastructure bination of freeways to work. The difference was accentuated by the with Nagios (Prentice Hall T PTR, 2007) and is Senior fact that the new commute began the same as the old one, but on this morn- Systems Engineer at DBG, Inc., where he ing, at a particular interchange, I would zig where before I zagged. maintains a gaggle of geographically dispersed It was an unexpectedly emotional and profound metaphor for the change. My old place was server farms. He won LISA ‘04’s Best Paper off to the side, and down below, while my future was straight ahead, and literally under award for his co-authored work on spam construction. mitigation, and he donates his spare time to the SourceMage GNU Linux Project. The fact that it was under construction was poetic but not surprising. Most of the roads I [email protected] travel in the Dallas/Fort Worth area are under construction and have been for as long as anyone can remember. And I don’t mean a lane closed here or there. Our roads drift and wan- der like leaves in the water—here today and tomorrow over there. The exits and entrances, neither a part of this road or that, seem unable to anticipate the movements of their brethren, and are constantly forced to react.
    [Show full text]
  • Andrew File System (AFS) Google File System February 5, 2004
    Advanced Topics in Computer Systems, CS262B Prof Eric A. Brewer Andrew File System (AFS) Google File System February 5, 2004 I. AFS Goal: large-scale campus wide file system (5000 nodes) o must be scalable, limit work of core servers o good performance o meet FS consistency requirements (?) o managable system admin (despite scale) 400 users in the “prototype” -- a great reality check (makes the conclusions meaningful) o most applications work w/o relinking or recompiling Clients: o user-level process, Venus, that handles local caching, + FS interposition to catch all requests o interaction with servers only on file open/close (implies whole-file caching) o always check cache copy on open() (in prototype) Vice (servers): o Server core is trusted; called “Vice” o servers have one process per active client o shared data among processes only via file system (!) o lock process serializes and manages all lock/unlock requests o read-only replication of namespace (centralized updates with slow propagation) o prototype supported about 20 active clients per server, goal was >50 Revised client cache: o keep data cache on disk, metadata cache in memory o still whole file caching, changes written back only on close o directory updates are write through, but cached locally for reads o instead of check on open(), assume valid unless you get an invalidation callback (server must invalidate all copies before committing an update) o allows name translation to be local (since you can now avoid round-trip for each step of the path) Revised servers: 1 o move
    [Show full text]
  • Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
    2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison.
    [Show full text]
  • CS 152: Computer Systems Architecture Storage Technologies
    CS 152: Computer Systems Architecture Storage Technologies Sang-Woo Jun Winter 2019 Storage Used To be a Secondary Concern Typically, storage was not a first order citizen of a computer system o As alluded to by its name “secondary storage” o Its job was to load programs and data to memory, and disappear o Most applications only worked with CPU and system memory (DRAM) o Extreme applications like DBMSs were the exception Because conventional secondary storage was very slow o Things are changing! Some (Pre)History Magnetic core memory Rope memory (ROM) 1960’s Drum memory 1950~1970s 72 KiB per cubic foot! 100s of KiB (1024 bits in photo) Hand-woven to program the 1950’s Apollo guidance computer Photos from Wikipedia Some (More Recent) History Floppy disk drives 1970’s~2000’s 100 KiBs to 1.44 MiB Hard disk drives 1950’s to present MBs to TBs Photos from Wikipedia Some (Current) History Solid State Drives Non-Volatile Memory 2000’s to present 2010’s to present GB to TBs GBs Hard Disk Drives Dominant storage medium for the longest time o Still the largest capacity share Data organized into multiple magnetic platters o Mechanical head needs to move to where data is, to read it o Good sequential access, terrible random access • 100s of MB/s sequential, maybe 1 MB/s 4 KB random o Time for the head to move to the right location (“seek time”) may be ms long • 1000,000s of cycles! Typically “ATA” (Including IDE and EIDE), and later “SATA” interfaces o Connected via “South bridge” chipset Ding Yuan, “Operating Systems ECE344 Lecture 11: File
    [Show full text]
  • Z/OS Distributed File Service Zseries File System Implementation Z/OS V1R13
    Front cover z/OS Distributed File Service zSeries File System Implementation z/OS V1R13 Defining and installing a zSeries file system Performing backup and recovery, sysplex sharing Migrating from HFS to zFS Paul Rogers Robert Hering ibm.com/redbooks International Technical Support Organization z/OS Distributed File Service zSeries File System Implementation z/OS V1R13 October 2012 SG24-6580-05 Note: Before using this information and the product it supports, read the information in “Notices” on page xiii. Sixth Edition (October 2012) This edition applies to version 1 release 13 modification 0 of IBM z/OS (product number 5694-A01) and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2012. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . xiii Trademarks . xiv Preface . .xv The team who wrote this book . .xv Now you can become a published author, too! . xvi Comments welcome. xvi Stay connected to IBM Redbooks . xvi Chapter 1. zFS file systems . 1 1.1 zSeries File System introduction. 2 1.2 Application programming interfaces . 2 1.3 zFS physical file system . 3 1.4 zFS colony address space . 4 1.5 zFS supports z/OS UNIX ACLs. 4 1.6 zFS file system aggregates. 5 1.6.1 Compatibility mode aggregates. 5 1.6.2 Multifile system aggregates. 6 1.7 Metadata cache. 7 1.8 zFS file system clones . 7 1.8.1 Backup file system . 8 1.9 zFS log files.
    [Show full text]
  • A Survey of Distributed File Systems
    A Survey of Distributed File Systems M. Satyanarayanan Department of Computer Science Carnegie Mellon University February 1989 Abstract Abstract This paper is a survey of the current state of the art in the design and implementation of distributed file systems. It consists of four major parts: an overview of background material, case studies of a number of contemporary file systems, identification of key design techniques, and an examination of current research issues. The systems surveyed are Sun NFS, Apollo Domain, Andrew, IBM AIX DS, AT&T RFS, and Sprite. The coverage of background material includes a taxonomy of file system issues, a brief history of distributed file systems, and a summary of empirical research on file properties. A comprehensive bibliography forms an important of the paper. Copyright (C) 1988,1989 M. Satyanarayanan The author was supported in the writing of this paper by the National Science Foundation (Contract No. CCR-8657907), Defense Advanced Research Projects Agency (Order No. 4976, Contract F33615-84-K-1520) and the IBM Corporation (Faculty Development Award). The views and conclusions in this document are those of the author and do not represent the official policies of the funding agencies or Carnegie Mellon University. 1 1. Introduction The sharing of data in distributed systems is already common and will become pervasive as these systems grow in scale and importance. Each user in a distributed system is potentially a creator as well as a consumer of data. A user may wish to make his actions contingent upon information from a remote site, or may wish to update remote information.
    [Show full text]
  • A Study of Failure Recovery and Logging of High-Performance Parallel File Systems
    1 A Study of Failure Recovery and Logging of High-Performance Parallel File Systems RUNZHOU HAN, OM RAMESHWAR GATLA, MAI ZHENG, Iowa State University JINRUI CAO, State University of New York at Plattsburgh DI ZHANG, DONG DAI, North Carolina University at Charlotte YONG CHEN, Texas Tech University JONATHAN COOK, New Mexico State University Large-scale parallel file systems (PFSes) play an essential role in high performance computing (HPC). However, despite the importance, their reliability is much less studied or understood compared with that of local storage systems or cloud storage systems. Recent failure incidents at real HPC centers have exposed the latent defects in PFS clusters as well as the urgent need for a systematic analysis. To address the challenge, we perform a study of the failure recovery and logging mechanisms of PFSes in this paper. First, to trigger the failure recovery and logging operations of the target PFS, we introduce a black- box fault injection tool called PFault, which is transparent to PFSes and easy to deploy in practice. PFault emulates the failure state of individual storage nodes in the PFS based on a set of pre-defined fault models, and enables examining the PFS behavior under fault systematically. Next, we apply PFault to study two widely used PFSes: Lustre and BeeGFS. Our analysis reveals the unique failure recovery and logging patterns of the target PFSes, and identifies multiple cases where the PFSes are imperfect in terms of failure handling. For example, Lustre includes a recovery component called LFSCK to detect and fix PFS-level inconsistencies, but we find that LFSCK itself may hang or trigger kernel panicswhen scanning a corrupted Lustre.
    [Show full text]
  • Erlang on Physical Machine
    on $ whoami Name: Zvi Avraham E-mail: [email protected] /ˈkɒm. pɑː(ɹ)t. mɛntl̩. aɪˌzeɪ. ʃən/ Physicalization • The opposite of Virtualization • dedicated machines • no virtualization overhead • no noisy neighbors – nobody stealing your CPU cycles, IOPS or bandwidth – your EC2 instance may have a Netflix “roommate” ;) • Mostly used by ARM-based public clouds • also called Bare Metal or HPC clouds Sandbox – a virtual container in which untrusted code can be safely run Sandbox examples: ZeroVM & AWS Lambda based on Google Native Client: A Sandbox for Portable, Untrusted x86 Native Code Compartmentalization in terms of Virtualization Physicalization No Virtualization Virtualization HW-level Virtualization Containerization OS-level Virtualization Sandboxing Userspace-level Virtualization* Cloud runs on virtual HW HARDWARE Does the OS on your Cloud instance still supports floppy drive? $ ls /dev on Ubuntu 14.04 AWS EC2 instance • 64 teletype devices? • Sound? • 32 serial ports? • VGA? “It’s DUPLICATED on so many LAYERS” Application + Configuration process* OS Middleware (Spring/OTP) Container Managed Runtime (JVM/BEAM) VM Guest Container OS Container Guest OS Hypervisor Hardware We run Single App per VM APPS We run in Single User mode USERS Minimalistic Linux OSes • Embedded Linux versions • DamnSmall Linux • Linux with BusyBox Min. Linux OSes for Containers JeOS – “Just Enough OS” • CoreOS • RancherOS • RedHat Project Atomic • VMware Photon • Intel Clear Linux • Hyper # of Processes and Threads per OS OSv + CLI RancherOS processes CoreOS threads
    [Show full text]