UNIVERSITY of CALIFORNIA Los Angeles Practical Dependable
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Checkpoint and Restoration of Micro-Service in Docker Containers
3rd International Conference on Mechatronics and Industrial Informatics (ICMII 2015) Checkpoint and Restoration of Micro-service in Docker Containers Chen Yang School of Information Security Engineering, Shanghai Jiao Tong University, China 200240 [email protected] Keywords: Lightweight Virtualization, Checkpoint/restore, Docker. Abstract. In the present days of rapid adoption of micro-service, it is imperative to build a system to support and ensure the high performance and high availability for micro-services. Lightweight virtualization, which we also called container, has the ability to run multiple isolated sets of processes under a single kernel instance. Because of the possibility of obtaining a low overhead comparable to the near-native performance of a bare server, the container techniques, such as openvz, lxc, docker, they are widely used for micro-service [1]. In this paper, we present the high availability of micro-service in containers. We investigate capabilities provided by container (docker, openvz) to model and build the Micro-service infrastructure and compare different checkpoint and restore technologies for high availability. Finally, we present preliminary performance results of the infrastructure tuned to the micro-service. Introduction Lightweight virtualization, named the operating system level virtualization technology, partitions the physical machines resource, creating multiple isolated user-space instances. Each container acts exactly like a stand-alone server. A container can be rebooted independently and have root access, users, IP address, memory, processes, files, etc. Unlike traditional virtualization with the hypervisor layer, containerization takes place at the kernel level. Most modern operating system kernels now support the primitives necessary for containerization, including Linux with openvz, vserver and more recently lxc, Solaris with zones, and FreeBSD with Jails [2]. -
Operating System Support for Redundant Multithreading
Operating System Support for Redundant Multithreading Dissertation zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) Vorgelegt an der Technischen Universität Dresden Fakultät Informatik Eingereicht von Dipl.-Inf. Björn Döbel geboren am 17. Dezember 1980 in Lauchhammer Betreuender Hochschullehrer: Prof. Dr. Hermann Härtig Technische Universität Dresden Gutachter: Prof. Frank Mueller, Ph.D. North Carolina State University Fachreferent: Prof. Dr. Christof Fetzer Technische Universität Dresden Statusvortrag: 29.02.2012 Eingereicht am: 21.08.2014 Verteidigt am: 25.11.2014 FÜR JAKOB *† 15. Februar 2013 Contents 1 Introduction 7 1.1 Hardware meets Soft Errors 8 1.2 An Operating System for Tolerating Soft Errors 9 1.3 Whom can you Rely on? 12 2 Why Do Transistors Fail And What Can Be Done About It? 15 2.1 Hardware Faults at the Transistor Level 15 2.2 Faults, Errors, and Failures – A Taxonomy 18 2.3 Manifestation of Hardware Faults 20 2.4 Existing Approaches to Tolerating Faults 25 2.5 Thesis Goals and Design Decisions 36 3 Redundant Multithreading as an Operating System Service 39 3.1 Architectural Overview 39 3.2 Process Replication 41 3.3 Tracking Externalization Events 42 3.4 Handling Replica System Calls 45 3.5 Managing Replica Memory 49 3.6 Managing Memory Shared with External Applications 57 3.7 Hardware-Induced Non-Determinism 63 3.8 Error Detection and Recovery 65 4 Can We Put the Concurrency Back Into Redundant Multithreading? 71 4.1 What is the Problem with Multithreaded Replication? 71 4.2 Can we make Multithreading -
Evaluating and Improving LXC Container Migration Between
Evaluating and Improving LXC Container Migration between Cloudlets Using Multipath TCP By Yuqing Qiu A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Applied Science in Electrical and Computer Engineering Carleton University Ottawa, Ontario © 2016, Yuqing Qiu Abstract The advent of the Cloudlet concept—a “small data center” close to users at the edge is to improve the Quality of Experience (QoE) of end users by providing resources within a one-hop distance. Many researchers have proposed using virtual machines (VMs) as such service-provisioning servers. However, seeing the potentiality of containers, this thesis adopts Linux Containers (LXC) as Cloudlet platforms. To facilitate container migration between Cloudlets, Checkpoint and Restore in Userspace (CRIU) has been chosen as the migration tool. Since the migration process goes through the Wide Area Network (WAN), which may experience network failures, the Multipath TCP (MPTCP) protocol is adopted to address the challenge. The multiple subflows established within a MPTCP connection can improve the resilience of the migration process and reduce migration time. Experimental results show that LXC containers are suitable candidates for the problem and MPTCP protocol is effective in enhancing the migration process. i Acknowledgement I would like to express my sincerest gratitude to my principal supervisor Dr. Chung-Horng Lung who has provided me with valuable guidance throughout the entire research experience. His professionalism, patience, understanding and encouragement have always been my beacons of light whenever I go through difficulties. My gratitude also goes to my co-supervisor Dr. -
The Aurora Operating System
The Aurora Operating System Revisiting the Single Level Store Emil Tsalapatis Ryan Hancock Tavian Barnes RCS Lab, University of Waterloo RCS Lab, University of Waterloo RCS Lab, University of Waterloo [email protected] [email protected] [email protected] Ali José Mashtizadeh RCS Lab, University of Waterloo [email protected] ABSTRACT KEYWORDS Applications on modern operating systems manage their single level stores, transparent persistence, snapshots, check- ephemeral state in memory, and persistent state on disk. En- point/restore suring consistency between them is a source of significant developer effort, yet still a source of significant bugs inma- ACM Reference Format: ture applications. We present the Aurora single level store Emil Tsalapatis, Ryan Hancock, Tavian Barnes, and Ali José Mash- (SLS), an OS that simplifies persistence by automatically per- tizadeh. 2021. The Aurora Operating System: Revisiting the Single sisting all traditionally ephemeral application state. With Level Store. In Workshop on Hot Topics in Operating Systems (HotOS recent storage hardware like NVMe SSDs and NVDIMMs, ’21), June 1-June 3, 2021, Ann Arbor, MI, USA. ACM, New York, NY, Aurora is able to continuously checkpoint entire applications USA, 8 pages. https://doi.org/10.1145/3458336.3465285 with millisecond granularity. Aurora is the first full POSIX single level store to han- dle complex applications ranging from databases to web 1 INTRODUCTION browsers. Moreover, by providing new ways to interact with Single level storage (SLS) systems provide persistence of and manipulate application state, it enables applications to applications as an operating system service. Their advantage provide features that would otherwise be prohibitively dif- lies in removing the semantic gap between the in-memory ficult to implement. -
Checkpoint and Restore of Singularity Containers
Universitat politecnica` de catalunya (UPC) - BarcelonaTech Facultat d'informatica` de Barcelona (FIB) Checkpoint and restore of Singularity containers Grado en ingenier´ıa informatica´ Tecnolog´ıas de la Informacion´ Memoria 25/04/2019 Director: Autor: Jordi Guitart Fernandez Enrique Serrano G´omez Departament: Arquitectura de Computadors 1 Abstract Singularity es una tecnolog´ıade contenedores software creada seg´unlas necesidades de cient´ıficos para ser utilizada en entornos de computaci´onde altas prestaciones. Hace ya 2 a~nosdesde que los usuarios empezaron a pedir una integraci´onde la fun- cionalidad de Checkpoint/Restore, con CRIU, en contenedores Singularity. Esta inte- graci´onayudar´ıaen gran medida a mejorar la gesti´onde los recursos computacionales de las m´aquinas. Permite a los usuarios guardar el estado de una aplicaci´on(ejecut´andose en un contenedor Singularity) para poder restaurarla en cualquier momento, sin perder el trabajo realizado anteriormente. Por lo que la posible interrupci´onde una aplicaci´on, debido a un fallo o voluntariamente, no es una p´erdidade tiempo de computaci´on. Este proyecto muestra como es posible realizar esa integraci´on. Singularity ´esuna tecnologia de contenidors software creada segons les necessitats de cient´ıfics,per ser utilitzada a entorns de computaci´od'altes prestacions. Fa 2 anys desde que els usuaris van comen¸cara demanar una integraci´ode la funcional- itat de Checkpoint/Restore, amb CRIU, a contenidors Singularity. Aquesta integraci´o ajudaria molt a millorar la gesti´odels recursos computacionals de les m`aquines.Permet als usuaris guardar l'estat d'una aplicaci´o(executant-se a un contenidor Singularity) per poder restaurar-la en qualsevol moment, sense perdre el treball realitzat anteriorment. -
Project-Team REGAL
IN PARTNERSHIP WITH: CNRS Université Pierre et Marie Curie (Paris 6) Activity Report 2013 Project-Team REGAL Large-Scale Distributed Systems and Applications IN COLLABORATION WITH: Laboratoire d’informatique de Paris 6 (LIP6) RESEARCH CENTER Paris - Rocquencourt THEME Distributed Systems and middleware Table of contents 1. Members :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1 2. Overall Objectives :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 2 2.1. Overall Objectives2 2.2. Highlights of the Year2 3. Research Program :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 3 3.1.1. Modern computer systems are increasingly parallel and distributed.3 3.1.2. Multicore architectures are everywhere.3 4. Software and Platforms ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 4 4.1. Coccinelle 4 4.2. SwiftCloud 4 4.3. JESSY 4 4.4. Java and .Net runtimes for LLVM5 5. New Results :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 5 5.1. Introduction 5 5.2. Distributed algorithms for dynamic networks5 5.2.1. Mutual Exclusion and Failure Detection.6 5.2.2. Self-Stabilization and Self-* Services.6 5.2.3. Dissemination and Data Finding in Large Scale Systems.6 5.2.4. Peer certification.7 5.3. Management of distributed data7 5.3.1. Long term durability7 5.3.2. Adaptative replication8 5.3.3. Strong consistency8 5.3.4. Distributed Transaction Scheduling9 5.3.5. Eventual consistency9 5.3.6. Mixing commutative and non-commutative updates: reservations9 5.4. Performance and Robustness of Systems Software in Multicore Architectures 10 5.4.1. Managed Runtime Environments 10 5.4.2. System software robustness 10 5.4.3. Domain-specific languages for systems software 11 6. Bilateral Contracts and Grants with Industry ::::::::::::::::::::::::::::::::::::::::::::: 11 6.1. Bilateral Contracts with Industry 11 6.2. Bilateral Grants with Industry 11 7. -
Instant OS Updates Via Userspace Checkpoint-And
Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee, and Taesoo Kim, Georgia Institute of Technology; Pavel Emelyanov, CRIU and Odin, Inc. https://www.usenix.org/conference/atc16/technical-sessions/presentation/kashyap This paper is included in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16). June 22–24, 2016 • Denver, CO, USA 978-1-931971-30-0 Open access to the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16) is sponsored by USENIX. Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap Changwoo Min Byoungyoung Lee Taesoo Kim Pavel Emelyanov† Georgia Institute of Technology †CRIU & Odin, Inc. # errors # lines Abstract 50 1000K 40 100K In recent years, operating systems have become increas- 10K 30 1K 20 ingly complex and thus more prone to security and per- 100 formance issues. Accordingly, system updates to address 10 10 these issues have become more frequently available and 0 1 increasingly important. To complete such updates, users 3.13.0-x 3.16.0-x 3.19.0-x May 2014 must reboot their systems, resulting in unavoidable down- build/diff errors #layout errors Jun 2015 time and further loss of the states of running applications. #static local errors #num lines++ We present KUP, a practical OS update mechanism that Figure 1: Limitation of dynamic kernel hot-patching using employs a userspace checkpoint-and-restart mechanism, kpatch. Only two successful updates (3.13.0.32 34 and → which uses an optimized data structure for checkpoint- 3.19.0.20 21) out of 23 Ubuntu kernel package releases. -
Task Migration at Scale Using CRIU Linux Plumbers Conference 2018
Task Migration at Scale Using CRIU Linux Plumbers Conference 2018 Victor Marmol Andy Tucker [email protected] [email protected] 2018-11-15 Confidential + Proprietary Who we are Outside of Google, we’ve worked on open source cluster management and containers lmctfy Confidential + Proprietary 2 Who we are Inside Google: we’re part of the Borg team ● Manages all compute jobs ● Runs on every server ConfidentialImages + Proprietary by Connie Zhou3 What is Borg? Google’s cluster management system ● Borgmaster: Cluster control and main API entrypoint ● Borglet: On-machine management daemon ● Suite of tools and UIs for managing jobs ● Many purpose-built platforms created on top of Borg ● Everything runs on Borg and everything runs in containers Confidential + Proprietary 4 Borg basics Allocated Ports Base compute primitive: Task ● A priority signals how quickly a task should Processes Packages schedule Processes Packages Processes Packages ● It’s appclass describes a task as either Processes Packages serving (latency sensitive) or batch ● Static content/binaries provided by Container packages Task ● A container isolates a task’s resources ● Native Linux processes Borg Machine ● Share an IP with the machine, ports are allocated for each task Confidential + Proprietary 5 Borg basics: evictions When a task is forcefully terminated by Borg ● Typically receive a notification: 1-5min ● Our SLO allows for quite a few evictions ● Applications must handle them Reasons for evictions ● Preemption: a higher priority task needs the resources ● Software upgrades (e.g.: kernel, firmware) ● Re-balancing for availability or performance Confidential + Proprietary 6 Evictions are impactful and hard to handle Technical Complexity ● Handling evictions requires state management ○ How and what state to serialize and where to store it ● Application-specific and not very reusable Lost Compute ● Batch jobs run at lower priorities and get preempted often ● Even platforms that handle them for users, don’t do a great job Task 0 Evicted!Shard 1 Task 2 Task 3 Task 4 Compute is lost.. -
Open Source Software Packages
Hitachi Ops Center 10.5.1 Open Source Software Packages Contact Information: Hitachi Ops Center Project Manager Hitachi Vantara LLC 2535 Augustine Drive Santa Clara, California 95054 Name of package Version License agreement @agoric/babel-parser 7.12.7 The MIT License @angular-builders/custom-webpack 8.0.0-RC.0 The MIT License @angular-devkit/build-angular 0.800.0-rc.2 The MIT License @angular-devkit/build-angular 0.901.12 The MIT License @angular-devkit/core 7.3.8 The MIT License @angular-devkit/schematics 7.3.8 The MIT License @angular/animations 9.1.11 The MIT License @angular/animations 9.1.12 The MIT License @angular/cdk 9.2.4 The MIT License @angular/cdk-experimental 9.2.4 The MIT License @angular/cli 8.0.0 The MIT License @angular/cli 9.1.11 The MIT License @angular/common 9.1.11 The MIT License @angular/common 9.1.12 The MIT License @angular/compiler 9.1.11 The MIT License @angular/compiler 9.1.12 The MIT License @angular/compiler-cli 9.1.12 The MIT License @angular/core 7.2.15 The MIT License @angular/core 9.1.11 The MIT License @angular/core 9.1.12 The MIT License @angular/forms 7.2.15 The MIT License @angular/forms 9.1.0-next.3 The MIT License @angular/forms 9.1.11 The MIT License @angular/forms 9.1.12 The MIT License @angular/language-service 9.1.12 The MIT License @angular/platform-browser 7.2.15 The MIT License @angular/platform-browser 9.1.11 The MIT License @angular/platform-browser 9.1.12 The MIT License @angular/platform-browser-dynamic 7.2.15 The MIT License @angular/platform-browser-dynamic 9.1.11 The MIT License @angular/platform-browser-dynamic -
Red Hat Enterprise Linux 7 7.8 Release Notes
Red Hat Enterprise Linux 7 7.8 Release Notes Release Notes for Red Hat Enterprise Linux 7.8 Last Updated: 2021-03-02 Red Hat Enterprise Linux 7 7.8 Release Notes Release Notes for Red Hat Enterprise Linux 7.8 Legal Notice Copyright © 2021 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project. -
Seccomp Update
seccomp update https://outflux.net/slides/2015/lss/seccomp.pdf Linux Security Summit, Seattle 2015 Kees Cook <[email protected]> (pronounced “Case”) What is seccomp? ● Programmatic kernel attack surface reduction ● Used by: – Chrome – vsftpd – OpenSSH – Systemd (“SystemCallFilter=...”) – LXC (blacklisting) – … and you too! (easiest via libseccomp) seccomp update 2/8 Linux Security Summit 2015 Seattle, Aug 21 Architecture support ● x86: v3.5 ● s390: v3.6 ● arm: v3.8 ● mips: v3.15 ● arm64: v3.19, AKASHI Takahiro ● powerpc: linux-next (v4.3), Michael Ellerman seccomp update 3/8 Linux Security Summit 2015 Seattle, Aug 21 split-phase internals ● v3.19, Andy Lutomirski ● Splits per-architecture calls to seccomp into 2 phases ● Speeds up simple (no tracing) callers ● Only used on x86 so far seccomp update 4/8 Linux Security Summit 2015 Seattle, Aug 21 Regression tests ● v4.2: moved the 48 tests from github into the kernel: tools/testing/selftests/seccomp/ ● Shows some interesting glitches with restart_syscall on arm (hidden) and arm64 (hidden, unless compat, then exposed) ● Gained big-endian support during powerpc port ● Added s390 seccomp support today seccomp update 5/8 Linux Security Summit 2015 Seattle, Aug 21 Minor changes ● v4.0: SECCOMP_RET_ERRNO capped at MAX_ERRNO – Avoid confusing userspace ● v4.1: asm-generic for seccomp.h – Easier architecture porting seccomp update 6/8 Linux Security Summit 2015 Seattle, Aug 21 Future ● Argument inspection ● CRIU (checkpoint/restore) – PTRACE_O_SUSPEND_SECCOMP with CAP_SYS_ADMIN: linux-next (v4.3), Tycho Andersen – Serialize dump/restore of filters. ● eBPF – Use maps or tail calls instead of balanced if/else trees for checking syscall numbers. seccomp update 7/8 Linux Security Summit 2015 Seattle, Aug 21 Questions? https://outflux.net/slides/2015/lss/seccomp.pdf @kees_cook [email protected] [email protected] [email protected] seccomp update 8/8 Linux Security Summit 2015 Seattle, Aug 21. -
Linux Kernel User Documentation V4.20.0
usepackagefontspec setsansfontDejaVu Sans setromanfontDejaVu Serif setmonofontDejaVu Sans Mono Linux Kernel User Documentation v4.20.0 The kernel development community 1 16, 2019 Contents 1 Linux kernel release 4.x <http://kernel.org/> 3 2 The kernel’s command-line parameters 9 3 Linux allocated devices (4.x+ version) 109 4 L1TF - L1 Terminal Fault 171 5 Reporting bugs 181 6 Security bugs 185 7 Bug hunting 187 8 Bisecting a bug 193 9 Tainted kernels 195 10 Ramoops oops/panic logger 197 11 Dynamic debug 201 12 Explaining the dreaded “No init found.” boot hang message 207 13 Rules on how to access information in sysfs 209 14 Using the initial RAM disk (initrd) 213 15 Control Group v2 219 16 Linux Serial Console 245 17 Linux Braille Console 247 18 Parport 249 19 RAID arrays 253 20 Kernel module signing facility 263 21 Linux Magic System Request Key Hacks 267 i 22 Unicode support 273 23 Software cursor for VGA 277 24 Kernel Support for miscellaneous (your favourite) Binary Formats v1.1 279 25 Mono(tm) Binary Kernel Support for Linux 283 26 Java(tm) Binary Kernel Support for Linux v1.03 285 27 Reliability, Availability and Serviceability 293 28 A block layer cache (bcache) 309 29 ext4 General Information 319 30 Power Management 327 31 Thunderbolt 349 32 Linux Security Module Usage 353 33 Memory Management 369 ii Linux Kernel User Documentation, v4.20.0 The following is a collection of user-oriented documents that have been added to the kernel over time. There is, as yet, little overall order or organization here — this material was not written to be a single, coherent document! With luck things will improve quickly over time.