Virtual Square Users, Programmers & Developers Guide Renzo Davoli, Michael Goldweber Editors Contributions By

Total Page:16

File Type:pdf, Size:1020Kb

Virtual Square Users, Programmers & Developers Guide Renzo Davoli, Michael Goldweber Editors Contributions By Virtual Square Users, Programmers & Developers Guide Renzo Davoli, Michael Goldweber Editors Contributions by: Diego Billi, Federica Cenacchi, Renzo Davoli, Ludovico Gardenghi, Andrea Gasparini, Michael Goldweber The Virtual Square Team 2 Copyright c 2008 Renzo Davoli, Michael Golweber and the Virtual Square. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being the Introduction, with the Front-Cover Texts being the Title Page with the Logo (recto of this page) and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. Introduction Virtual Square is a container of projects about virtuality. The word “virtual” has been overused and misused, everything related to computers and networks sounds virtual. Computer Science define abstractions and interfaces. These two key con- cepts are strictly related. An abstraction defines the semantics of operations while an interface is the syntax required to access the operations defined by the abstraction. Programs and human users use interfaces to ask for the actions defined by an abstraction. Virtuality means providing equivalent abstractions, providing the same in- terface, such that the users (programs or humans) can effectively use the virtual abstraction instead of the real one. For example, a file system is an abstraction providing an application pro- gramming interface (API) composed by several calls like open, read, close. A virtual file system is an abstraction providing the same interface, such that the programs using the file system can use the virtual file system too. At the same time a virtual file system can apply the same abstraction to different domains not necessarily related to store and retrieve data on magnetic disks. The main memory is an abstraction, too. The hardware, memory cells arrays and MMU, provides the programs with an interface based on two main operations load and store. A virtual memory provides the same interface while uses a mix of main and secondary memory to store data. Programs use virtual memory effectively instead of the main memory. An entire computer hardware, a “machine” is perceived by the operating system as an abstraction. The interface is composed by the processor instruc- tion set and by the set of bus addresses, registers and commands required to interoperate with peripheral controllers. Another abstraction, maybe a pro- gram, able to provide the same interface to the operating system is properly defined virtual machine. The same definition applies to virtual networks, virtual devices, virtual hard disks. We perceive the world, the reality, through our senses. Thus it is an ab- straction for us, and the interface is made of light, colors, sounds, etc. The definition of virtual reality is consistent with our definition, in fact in what is commonly named as “virtual reality” our senses gets connected to devices that are able to provide the same interface of light, colors, sounds. Virtuality becames in this way a powerful tool for interoperability, virtual entities can act as puzzle tiles or building blocks to provide programs with suitable interfaces and services. This is a Virtual Square, a virtual place where different abstractions can i ii interoperate. It is possible to read it also as a Virtual Squared, i.e. how to exploit existing virtualities to build up further virtual services (this is the meaning of Virtual Square logo, V2) Virtual Square is a set of different projects sharing the idea of exploit vir- tuality by unifying concepts and by creating tools for interoperability. Today Virtual Square is also an international laboratory on virtuality ran by a research and development team. It started in 2004 at the University of Bologna, Italy. The research of Virtual Square involves several aspects of virtualization. Virtual Distributed Ethernet is the V2 Virtual Networking project. VDE is a Virtual Ethernet, whose nodes can be distributed across the real Internet. The idea of VDE sums up VPN, tunnel, Virtual Machines interconnection, overlay networking, as all these different entities can be implemented by VDE. View-OS is the V2 project about operating systems. The main idea is to negate the global view assumption. An operating system should provide services to a process without forcing all the processes to have its own unique view of the execution environment. File Systems, Networking, Device Drivers, Users, System id, can be defined or redefined at process level. This revolutionary view on virtuality has led to a better understanding of the limits of current implementations of operating systems structure and im- plementation, networking stacks and interfaces, C library support. V2 extends the Linux kernel support for virtuality and inter-process communication, imple- ments the networking stack as a library and add the support of multiple stacks to the Berkeley Socket interface, provide self virtualization for processes and libraries by adding features to the C library. All these enhancements preserve backward compatibility with existing applications. The description of a live research project like V2 is like to take a snapshot of something which is rapidly evolving. Your V2 could be different from the one here explained, maybe older because the mantainer of the tools for your Linux distribution has been late in updating the software. Typically your V2 will have more features than the one here described, and maybe items here listed as future developments will be already included in the code at the time you read this book. The first version of this book took about three years, and several sections have been written several times for the natural evolution of the projects. We suggest to use this document to have a complete view of the project and an analysis of its ideas and tools and we ask the reader to refer to the wiki of the project http://wiki.virtualsquare.org for updates. Comments, errata corrige, suggestions, bugreport and bugfixes are wel- come. Researchers and developers can be reached on the IRC public forum irc.freenode.net#virtualsquare or using the mail addresses of the editors [email protected] and [email protected]. This book describes the entire project including consolidated concepts like vde, or umview and young and evolving ideas like ipn or kmview. For this rea- son, the reader will find some tools already included in major linux distributions while others must be downloaded as source code, compiled and installed. Renzo Davoli, Michael Goldweber iii Notation This book uses icons to describe the intended audience of each section. Icons appear as prefixes in the title and in the table of contents. no icon Description of general ideas about the project. ⋆ User guide: these sections are for users of virtual square tools. Programmer guide: these sections are for programmers who need to inter- face their programs to virtual square libraries or servers. Developer guide: these sections are for programmers aiming to develop modules or plugin for virtual square tools and libraries. Internals: these sections describe the design and implementation of vir- tual square libraries and tools. sections are for developers aiming to contribute to V2. Education resource: virtual square provide valuable tool for education. The sections tagged by provide ideas and suggestions about using V2 to teach computer science. Contents Introduction i Contents v List of Figures ix I The Big Picture 1 1 Virtualization and Virtual Machines 3 1.1 IntroductiontoVirtualMachines . 3 1.2 Virtuality, Emulation and Simulation . 4 1.3 Brief history of virtuality . 5 1.4 Classification ............................ 6 1.5 Emulators/Heterogeneousvirtualmachines . 8 1.6 Homogeneousvirtualmachines . 10 1.7 Operating System-Level Virtualization . 12 1.8 Processlevelvirtualmachine. 13 1.9 Process level partial virtualization . 15 1.10 Microkernelsystems .. .. .. .. .. .. .. .. .. .. 20 2 V 2: The Virtual Square Framework 21 2.1 IntroductiontoVirtualSquare . 21 2.2 V 2 goalsandguidelines ...................... 22 2.3 V 2 components ........................... 23 3 What’s new in Virtual Square 27 3.1 ⋆VDE:aswiss-knifeforvirtualnetworking . 27 3.2 msockets: Multi stack support for Berkeley Sockets . 28 3.3 IPv6 hybrid stacks for IPv4 backward compatibility . 30 3.4 Whataprocessviews ...................... 31 3.5 ⋆PartialVirtualMachines . 33 3.6 Microkernels and Monolithic kernels are not mutually exclusive 34 3.7 Inter Process Networking: the need for multicast IPC . 35 II Virtual Square Networking 37 4 VDE: Virtual Distributed Ethernet 39 4.1 ⋆VDEMainComponents. .. .. .. .. .. .. .. .. 40 v vi CONTENTS 4.2 ⋆VDEConnectivityTools. 41 4.3 ⋆VDE:ACloserLook....................... 43 4.4 ⋆VDEExamples.......................... 55 4.5 VDE API: The vdeplug Library ................ 58 4.6 ⋆VDEtelweb ............................ 59 4.7 PluginSupportforVDESwitches. 60 4.8 vde switch Internals....................... 65 4.9 VDEinEducation ........................ 69 5 LWIPv6 71 5.1 LWIPv6API ........................... 72 5.2 AnLWIPv6tutorial ....................... 72 5.3 LWIPv6Internals......................... 81 5.4 LWIPv6ineducation....................... 96 6 Inter Process Networking 99 6.1 IPNusage............................. 100 6.2 ⋆CompileandinstallIPN. 104 6.3 IPNusageexamples ....................... 105 6.4 ⋆kvde switch,aVDEswitchbasedonIPN . 108 6.5 IPNprotocolsubmodules
Recommended publications
  • Design and Evaluation of Self-Management Approaches for Virtual Machine-Based Environments
    Fachhochschule Wiesbaden Fachbereich Design Informatik Medien Studiengang Informatik Master-Thesis zur Erlangung des akademischen Grades Master of Science – M.Sc. Design and Evaluation of Self-Management Approaches for Virtual Machine-Based Environments vorgelegt von Dan Marinescu am 29. Februar 2008 Referent: Prof. Dr. Reinhold Kröger Korreferent: Prof. Dr. Steffen Reith II Erklärung gem. BBPO, Ziff. 6.4.2 Ich versichere, dass ich die Master-Thesis selbstständig verfasst und keine anderen als die angegebenen Hilfsmittel benutzt habe. Wiesbaden, 29.02.2008 Dan Marinescu Hiermit erkläre ich mein Einverständnis mit den im Folgenden aufgeführten Verbreitungs- formen dieser Master-Thesis: Verbreitungsform ja nein Einstellung der Arbeit in die √ Bibliothek der FHW Veröffentlichung des Titels der √ Arbeit im Internet Veröffentlichung der Arbeit im √ Internet Wiesbaden, 29.02.2008 Dan Marinescu III IV Contents 1 Introduction 1 2 Background 5 2.1 Virtualization ................................ 5 2.1.1 Taxonomy ............................. 5 2.1.2 Case Study: Xen .......................... 10 2.1.3 Live Migration ........................... 13 2.1.4 Hardware-Assisted Virtualization ................. 16 2.1.5 Management of Vitual Machine Environments .......... 17 2.2 Service Level Management ......................... 20 2.3 Autonomic Computing ........................... 21 2.3.1 Motivation ............................. 21 2.3.2 Taxonomy ............................. 22 2.3.3 Architectural Considerations .................... 23 2.3.4 Examples of Autonomic Computing Systems ........... 24 2.4 Complexity theory and Optimization .................... 25 2.4.1 Introduction to Complexity Theory ................ 25 2.4.2 Complexity Classes ......................... 25 2.4.3 Optimization Problems ....................... 26 2.4.4 The Knapsack Family of Problems ................. 27 2.4.5 Approximation Algorithms ..................... 29 2.4.6 Heuristics .............................. 32 V 3 Analysis 37 3.1 State of the Art ..............................
    [Show full text]
  • Huawei Announces EROFS Linux File-System, Might Eventually Be Used
    ARTICLES & REVIEWS NEWS ARCHIVE FORUMS PREMIUM CATEGORIES Custom Search Search Latest Linux News Huawei Announces EROFS Linux File-System, Might Huawei Announces EROFS Linux File- Eventually Be Used By Android Devices System, Might Eventually Be Used By Android Devices Written by Michael Larabel in Linux Storage on 31 May 2018 at 09:00 AM EDT. 3 Comments Mesa 18.0.5 Is The Last Planned Release In Huawei's Gao Xiang has announced the EROFS open-source Linux file-system The Series intended for Android devices, but still at its very early stages of AMD K8 Support Stripped Out Of Coreboot development. NVIDIA’s Next Generation Mainstream GPU Will At Least Be Detailed In August EROFS is the company's new approach for a read-only file-system that would work well for Android devices. EROFS is short for the Extendable Read-Only GNOME 3 Might Be Too Resource Hungry To File-System and they began developing it with being unsatisfied with other read-only file- Ever Run Nicely On The Raspberry Pi system alternatives. XWayland Gets Patch To Automatically Use EGLStreams For NVIDIA Support When EROFS is designed to offer better performance than other read-only alternatives while still Needed focusing upon saving storage space. As part of EROFS is also a compression mode pursuing BPFILTER Landing For Linux 4.18 For a different design approach than other file-systems: the compression numbers shared in Eventually Better Firewall / Packet Filtering today's announcement on both server hardware and a Kirin 970 are compelling for being in AMDGPU Patches Prepping JPEG Support For the early stages of development.
    [Show full text]
  • Virtual Square: All the Virtuality You Always Wanted but You Were Afraid to Ask
    Virtual Square: all the virtuality you always wanted but you were afraid to ask. Renzo Davoli i Computer Science Department vol Da ALMA MATER STUDIORUM: University of Bologna o Renz eft, yl WorkShop 2007 sul Calcolo e Reti dell'INFN op C 7 Rimini, 10 maggio 2007 00 2 © re ua Sq l ua t Vir Virtual Square VIRTUAL VIRTUAL VIRTUAL SQUARED i VIRTUAL SQUARE vol Da o VIRTUAL VIRTUAL Renz eft, VIRTUAL yl VIRTUAL op C 7 00 2 VIRTUAL © re VIRTUAL ua Sq l ua t Vir VIRTUALITY today ● Virtual Machines – historical topic – lots of papers – lots of tools i vol Da – ... but something is already missing o Renz ● Virtual Networking eft, yl op – less historical C 7 00 2 – several papers © re ua Sq l ua t Vir Virtual Square Virtualization concepts and tools are disconnected. i There is a world of new applications that vol Da can be realized by interoperating, o Renz integrated virtuality eft, yl op C 7 UNIFICATION IS NEEDED 00 2 © re ua Sq l ua t Vir Virtual Square © 2007 Copyleft, Renzo Davoli Vi rtual S qu are Some Examples of VM (free software) ● Qemu: PVM or SVM, User Mode User Access (or dual-mode with KQEMU, proprietary sw). – cross emulation platform (ia32, ia64, ppc, i m68k, sparc, arm...) vol Da o – dynamic translation Renz ● eft, XEN: SVM, Native. yl op C 7 – xen uses para-virtualization (O.S. in domain0 00 2 © has the real device drivers). re ua – (xen ideas come from the Denali project: Sq l ua t SVN, Native, real virtualization).
    [Show full text]
  • Flexible Lustre Management
    Flexible Lustre management Making less work for Admins ORNL is managed by UT-Battelle for the US Department of Energy How do we know Lustre condition today • Polling proc / sysfs files – The knocking on the door model – Parse stats, rpc info, etc for performance deviations. • Constant collection of debug logs – Heavy parsing for common problems. • The death of a node – Have to examine kdumps and /or lustre dump Origins of a new approach • Requirements for Linux kernel integration. – No more proc usage – Migration to sysfs and debugfs – Used to configure your file system. – Started in lustre 2.9 and still on going. • Two ways to configure your file system. – On MGS server run lctl conf_param … • Directly accessed proc seq_files. – On MSG server run lctl set_param –P • Originally used an upcall to lctl for configuration • Introduced in Lustre 2.4 but was broken until lustre 2.12 (LU-7004) – Configuring file system works transparently before and after sysfs migration. Changes introduced with sysfs / debugfs migration • sysfs has a one item per file rule. • Complex proc files moved to debugfs • Moving to debugfs introduced permission problems – Only debugging files should be their. – Both debugfs and procfs have scaling issues. • Moving to sysfs introduced the ability to send uevents – Item of most interest from LUG 2018 Linux Lustre client talk. – Both lctl conf_param and lctl set_param –P use this approach • lctl conf_param can set sysfs attributes without uevents. See class_modify_config() – We get life cycle events for free – udev is now involved. What do we get by using udev ? • Under the hood – uevents are collect by systemd and then processed by udev rules – /etc/udev/rules.d/99-lustre.rules – SUBSYSTEM=="lustre", ACTION=="change", ENV{PARAM}=="?*", RUN+="/usr/sbin/lctl set_param '$env{PARAM}=$env{SETTING}’” • You can create your own udev rule – http://reactivated.net/writing_udev_rules.html – /lib/udev/rules.d/* for examples – Add udev_log="debug” to /etc/udev.conf if you have problems • Using systemd for long task.
    [Show full text]
  • Reducing Power Consumption in Mobile Devices by Using a Kernel
    IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. Z, NO. B, AUGUST 2017 1 Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display Server Stephen Marz, Member, IEEE and Brad Vander Zanden and Wei Gao, Member, IEEE E-mail: [email protected], [email protected], [email protected] Abstract—Mobile devices differ from desktop computers in that they have a limited power source, a battery, and they tend to spend more CPU time on the graphical user interface (GUI). These two facts force us to consider different software approaches in the mobile device kernel that can conserve battery life and reduce latency, which is the duration of time between the inception of an event and the reaction to the event. One area to consider is a software package called the display server. The display server is middleware that handles all GUI activities between an application and the operating system, such as event handling and drawing to the screen. In both desktop and mobile devices, the display server is located in the application layer. However, the kernel layer contains most of the information needed for handling events and drawing graphics, which forces the application-level display server to make a series of system calls in order to coordinate events and to draw graphics. These calls interrupt the CPU which can increase both latency and power consumption, and also require the kernel to maintain event queues that duplicate event queues in the display server. A further drawback of placing the display server in the application layer is that the display server contains most of the information required to efficiently schedule the application and this information is not communicated to existing kernels, meaning that GUI-oriented applications are scheduled less efficiently than they might be, which further increases power consumption.
    [Show full text]
  • Inter-Process Communication, Analysis, Guidelines and Its Impact on Computer Security
    The 7th International Conference for Informatics and Information Technology (CIIT 2010) INTER-PROCESS COMMUNICATION, ANALYSIS, GUIDELINES AND ITS IMPACT ON COMPUTER SECURITY Zoran Spasov Ph.D. Ana Madevska Bogdanova T-Mobile Macedonia Institute of Informatics, FNSM Skopje, Macedonia Skopje, Macedonia ABSTRACT Finally the conclusion will offer a summary of the available programming techniques and implementations for the In this paper we look at the inter-process communication Windows platforms. We will note the security risks and the (IPC) also known as inter-thread or inter-application best practices to avoid them. communication from other knowledge sources. We will look and analyze the different types of IPC in the Microsoft II. INTER -PROCESS COMMUNICATION (IPC) Windows operating system, their implementation and the usefulness of this kind of approach in the terms of Inter-Process Communication (IPC) stands for many communication between processes. Only local techniques for the exchange of data among threads in one or implementation of the IPC will be addressed in this paper. more processes - one-directional or two-directional. Processes Special emphasis will be given to the system mechanisms that may be running locally or on many different computers are involved with the creation, management, and use of connected by a network. We can divide the IPC techniques named pipes and sockets. into groups of methods, grouped by their way of This paper will discuss some of the IPC options and communication: message passing, synchronization, shared techniques that are available to Microsoft Windows memory and remote procedure calls (RPC). We should programmers. We will make a comparison between Microsoft carefully choose the IPC method depending on data load that remoting and Microsoft message queues (pros and cons).
    [Show full text]
  • An Introduction to Linux IPC
    An introduction to Linux IPC Michael Kerrisk © 2013 linux.conf.au 2013 http://man7.org/ Canberra, Australia [email protected] 2013-01-30 http://lwn.net/ [email protected] man7 .org 1 Goal ● Limited time! ● Get a flavor of main IPC methods man7 .org 2 Me ● Programming on UNIX & Linux since 1987 ● Linux man-pages maintainer ● http://www.kernel.org/doc/man-pages/ ● Kernel + glibc API ● Author of: Further info: http://man7.org/tlpi/ man7 .org 3 You ● Can read a bit of C ● Have a passing familiarity with common syscalls ● fork(), open(), read(), write() man7 .org 4 There’s a lot of IPC ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memory attach ● Pseudoterminals ● proc_vm_readv() / proc_vm_writev() ● Sockets ● Signals ● Stream vs Datagram (vs Seq. packet) ● Standard, Realtime ● UNIX vs Internet domain ● Eventfd ● POSIX message queues ● Futexes ● POSIX shared memory ● Record locks ● ● POSIX semaphores File locks ● ● Named, Unnamed Mutexes ● System V message queues ● Condition variables ● System V shared memory ● Barriers ● ● System V semaphores Read-write locks man7 .org 5 It helps to classify ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memory attach ● Pseudoterminals ● proc_vm_readv() / proc_vm_writev() ● Sockets ● Signals ● Stream vs Datagram (vs Seq. packet) ● Standard, Realtime ● UNIX vs Internet domain ● Eventfd ● POSIX message queues ● Futexes ● POSIX shared memory ● Record locks ● ● POSIX semaphores File locks ● ● Named, Unnamed Mutexes ● System V message queues ● Condition variables ● System V shared memory ● Barriers ● ● System V semaphores Read-write locks man7 .org 6 It helps to classify ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memoryn attach ● Pseudoterminals tio a ● proc_vm_readv() / proc_vm_writev() ● Sockets ic n ● Signals ● Stream vs Datagram (vs uSeq.
    [Show full text]
  • Authentication Services in Mobile Ad-Hoc Networks
    Authentication Services in Mobile Ad-hoc Networks LOgiciels-Réseaux Willy Jiménez 08013 -LOR Hakima Chaouchi Maryline Laurent-Maknavicius _______________________________________________________________________________ Authentication Services in Mobile Ad-hoc Networks ABSTRACT The deployment of wireless ad hoc networks is useful for people when they desire to communicate even if they are not connected to any infrastructure, with the purpose of playing games, sharing internet connection, or exchange files. In some ad hoc scenarios, they might know each other, so they can establish trusted relationships. However, if the number or users and mobility increase then it is more complicated to trust all users and a security mechanism is required. Few researches has been done in this field to find security solutions for MANETs deployments; one of them proposes a framework where the traditional AAA services are distributed inside the network with the idea of allowing secure exchange of services that could be chargeable. Based on this framework, we evaluate technical solutions, focusing mainly on the Authentication service; in order to have real implementations. One possibility is using virtualization technology to offer a de-centralized authentication service. Another solution is the development of a secure version of a routing protocol that uses a de-centralized authentication service as a previous requirement to allow any node to join the ad hoc routing domain. Willy Jiménez Hakima Chaouchi Maryline Laurent-Maknavicius Etudiant Maître de Conférences
    [Show full text]
  • I.MX Linux® Reference Manual
    i.MX Linux® Reference Manual Document Number: IMXLXRM Rev. 1, 01/2017 i.MX Linux® Reference Manual, Rev. 1, 01/2017 2 NXP Semiconductors Contents Section number Title Page Chapter 1 About this Book 1.1 Audience....................................................................................................................................................................... 27 1.1.1 Conventions................................................................................................................................................... 27 1.1.2 Definitions, Acronyms, and Abbreviations....................................................................................................27 Chapter 2 Introduction 2.1 Overview.......................................................................................................................................................................31 2.1.1 Software Base................................................................................................................................................ 31 2.1.2 Features.......................................................................................................................................................... 31 Chapter 3 Machine-Specific Layer (MSL) 3.1 Introduction...................................................................................................................................................................37 3.2 Interrupts (Operation)..................................................................................................................................................
    [Show full text]
  • Z/OS Distributed File Service Zseries File System Implementation Z/OS V1R13
    Front cover z/OS Distributed File Service zSeries File System Implementation z/OS V1R13 Defining and installing a zSeries file system Performing backup and recovery, sysplex sharing Migrating from HFS to zFS Paul Rogers Robert Hering ibm.com/redbooks International Technical Support Organization z/OS Distributed File Service zSeries File System Implementation z/OS V1R13 October 2012 SG24-6580-05 Note: Before using this information and the product it supports, read the information in “Notices” on page xiii. Sixth Edition (October 2012) This edition applies to version 1 release 13 modification 0 of IBM z/OS (product number 5694-A01) and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2012. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . xiii Trademarks . xiv Preface . .xv The team who wrote this book . .xv Now you can become a published author, too! . xvi Comments welcome. xvi Stay connected to IBM Redbooks . xvi Chapter 1. zFS file systems . 1 1.1 zSeries File System introduction. 2 1.2 Application programming interfaces . 2 1.3 zFS physical file system . 3 1.4 zFS colony address space . 4 1.5 zFS supports z/OS UNIX ACLs. 4 1.6 zFS file system aggregates. 5 1.6.1 Compatibility mode aggregates. 5 1.6.2 Multifile system aggregates. 6 1.7 Metadata cache. 7 1.8 zFS file system clones . 7 1.8.1 Backup file system . 8 1.9 zFS log files.
    [Show full text]
  • A Study of Failure Recovery and Logging of High-Performance Parallel File Systems
    1 A Study of Failure Recovery and Logging of High-Performance Parallel File Systems RUNZHOU HAN, OM RAMESHWAR GATLA, MAI ZHENG, Iowa State University JINRUI CAO, State University of New York at Plattsburgh DI ZHANG, DONG DAI, North Carolina University at Charlotte YONG CHEN, Texas Tech University JONATHAN COOK, New Mexico State University Large-scale parallel file systems (PFSes) play an essential role in high performance computing (HPC). However, despite the importance, their reliability is much less studied or understood compared with that of local storage systems or cloud storage systems. Recent failure incidents at real HPC centers have exposed the latent defects in PFS clusters as well as the urgent need for a systematic analysis. To address the challenge, we perform a study of the failure recovery and logging mechanisms of PFSes in this paper. First, to trigger the failure recovery and logging operations of the target PFS, we introduce a black- box fault injection tool called PFault, which is transparent to PFSes and easy to deploy in practice. PFault emulates the failure state of individual storage nodes in the PFS based on a set of pre-defined fault models, and enables examining the PFS behavior under fault systematically. Next, we apply PFault to study two widely used PFSes: Lustre and BeeGFS. Our analysis reveals the unique failure recovery and logging patterns of the target PFSes, and identifies multiple cases where the PFSes are imperfect in terms of failure handling. For example, Lustre includes a recovery component called LFSCK to detect and fix PFS-level inconsistencies, but we find that LFSCK itself may hang or trigger kernel panicswhen scanning a corrupted Lustre.
    [Show full text]
  • Lustre* Software Release 2.X Operations Manual Lustre* Software Release 2.X: Operations Manual Copyright © 2010, 2011 Oracle And/Or Its Affiliates
    Lustre* Software Release 2.x Operations Manual Lustre* Software Release 2.x: Operations Manual Copyright © 2010, 2011 Oracle and/or its affiliates. (The original version of this Operations Manual without the Intel modifications.) Copyright © 2011, 2012, 2013 Intel Corporation. (Intel modifications to the original version of this Operations Man- ual.) Notwithstanding Intel’s ownership of the copyright in the modifications to the original version of this Operations Manual, as between Intel and Oracle, Oracle and/or its affiliates retain sole ownership of the copyright in the unmodified portions of this Operations Manual. Important Notice from Intel INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IM- PLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSO- EVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR IN- FRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL IN- DEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE AT- TORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCON- TRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
    [Show full text]