A Cross-Layered Direct-Access File System
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Development of a Verified Flash File System ⋆
Development of a Verified Flash File System ? Gerhard Schellhorn, Gidon Ernst, J¨orgPf¨ahler,Dominik Haneberg, and Wolfgang Reif Institute for Software & Systems Engineering University of Augsburg, Germany fschellhorn,ernst,joerg.pfaehler,haneberg,reifg @informatik.uni-augsburg.de Abstract. This paper gives an overview over the development of a for- mally verified file system for flash memory. We describe our approach that is based on Abstract State Machines and incremental modular re- finement. Some of the important intermediate levels and the features they introduce are given. We report on the verification challenges addressed so far, and point to open problems and future work. We furthermore draw preliminary conclusions on the methodology and the required tool support. 1 Introduction Flaws in the design and implementation of file systems already lead to serious problems in mission-critical systems. A prominent example is the Mars Explo- ration Rover Spirit [34] that got stuck in a reset cycle. In 2013, the Mars Rover Curiosity also had a bug in its file system implementation, that triggered an au- tomatic switch to safe mode. The first incident prompted a proposal to formally verify a file system for flash memory [24,18] as a pilot project for Hoare's Grand Challenge [22]. We are developing a verified flash file system (FFS). This paper reports on our progress and discusses some of the aspects of the project. We describe parts of the design, the formal models, and proofs, pointing out challenges and solutions. The main characteristic of flash memory that guides the design is that data cannot be overwritten in place, instead space can only be reused by erasing whole blocks. -
Linux Scheduler Documentation
Linux Scheduler Documentation The kernel development community Jul 14, 2020 CONTENTS i ii CHAPTER ONE COMPLETIONS - “WAIT FOR COMPLETION”BARRIER APIS 1.1 Introduction: If you have one or more threads that must wait for some kernel activity to have reached a point or a specific state, completions can provide a race-free solution to this problem. Semantically they are somewhat like a pthread_barrier() and have similar use-cases. Completions are a code synchronization mechanism which is preferable to any misuse of locks/semaphores and busy-loops. Any time you think of using yield() or some quirky msleep(1) loop to allow something else to proceed, you probably want to look into using one of the wait_for_completion*() calls and complete() instead. The advantage of using completions is that they have a well defined, focused pur- pose which makes it very easy to see the intent of the code, but they also result in more efficient code as all threads can continue execution until the result isactually needed, and both the waiting and the signalling is highly efficient using low level scheduler sleep/wakeup facilities. Completions are built on top of the waitqueue and wakeup infrastructure of the Linux scheduler. The event the threads on the waitqueue are waiting for is reduced to a simple flag in ‘struct completion’, appropriately called “done”. As completions are scheduling related, the code can be found in ker- nel/sched/completion.c. 1.2 Usage: There are three main parts to using completions: • the initialization of the ‘struct completion’synchronization object • the waiting part through a call to one of the variants of wait_for_completion(), • the signaling side through a call to complete() or complete_all(). -
Advances in Mobile Cloud Computing and Big Data in the 5G Era Studies in Big Data
Studies in Big Data 22 Constandinos X. Mavromoustakis George Mastorakis Ciprian Dobre Editors Advances in Mobile Cloud Computing and Big Data in the 5G Era Studies in Big Data Volume 22 Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: [email protected] About this Series The series “Studies in Big Data” (SBD) publishes new developments and advances in the various areas of Big Data-quickly and with a high quality. The intent is to cover the theory, research, development, and applications of Big Data, as embedded in the fields of engineering, computer science, physics, economics and life sciences. The books of the series refer to the analysis and understanding of large, complex, and/or distributed data sets generated from recent digital sources coming from sensors or other physical instruments as well as simulations, crowd sourcing, social networks or other internet transactions, such as emails or video click streams and other. The series contains monographs, lecture notes and edited volumes in Big Data spanning the areas of computational intelligence incl. neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. More information about this series at http://www.springer.com/series/11970 Constandinos X. Mavromoustakis George Mastorakis ⋅ Ciprian Dobre Editors Advances in Mobile Cloud Computing and Big Data in the 5G Era 123 Editors Constandinos X. -
On the Defectiveness of SCHED DEADLINE W.R.T. Tardiness and Afinities, and a Partial Fix Stephen Tang Luca Abeni James H
On the Defectiveness of SCHED_DEADLINE w.r.t. Tardiness and Afinities, and a Partial Fix Stephen Tang Luca Abeni James H. Anderson [email protected] {sytang,anderson}@cs.unc.edu Scuola Superiore Sant’Anna University of North Carolina at Chapel Hill Pisa, Italy Chapel Hill, USA ABSTRACT create new DL tasks until existing DL tasks reduce their bandwidth SCHED_DEADLINE (DL for short) is an Earliest-Deadline-First consumption. Note that preventing over-utilization is only a neces- sary condition for guaranteeing that tasks meet deadlines. (EDF) scheduler included in the Linux kernel. A question motivated Instead of guaranteeing that deadlines will be met, AC satisfes by DL is how EDF should be implemented in the presence of CPU afnities to maintain optimal bounded tardiness guarantees. Recent two goals [3]. The frst goal is to provide performance guarantees. works have shown that under arbitrary afnities, DL does not main- Specifcally, AC guarantees to each DL task that its long-run con- tain such guarantees. Such works have also shown that repairing sumption of CPU bandwidth remains within a bounded margin DL to maintain these guarantees would likely require an impractical of error from a requested rate. The second goal is to avoid the overhaul of the existing code. In this work, we show that for the starvation of lower-priority non-DL workloads. Specifcally, AC special case where afnities are semi-partitioned, DL can be modi- guarantees that some user-specifed amount of CPU bandwidth fed to maintain tardiness guarantees with minor changes. We also will remain for the execution of non-DL workloads. -
RTEMS C User Documentation Release 4.11.3 ©Copyright 2016, RTEMS Project (Built 15Th February 2018)
RTEMS C User Documentation Release 4.11.3 ©Copyright 2016, RTEMS Project (built 15th February 2018) CONTENTS I RTEMS C User’s Guide1 1 Preface 5 2 Overview 9 2.1 Introduction...................................... 10 2.2 Real-time Application Systems............................ 11 2.3 Real-time Executive.................................. 12 2.4 RTEMS Application Architecture........................... 13 2.5 RTEMS Internal Architecture............................. 14 2.6 User Customization and Extensibility......................... 15 2.7 Portability....................................... 16 2.8 Memory Requirements................................. 17 2.9 Audience........................................ 18 2.10 Conventions...................................... 19 2.11 Manual Organization................................. 20 3 Key Concepts 23 3.1 Introduction...................................... 24 3.2 Objects......................................... 25 3.2.1 Object Names................................. 25 3.2.2 Object IDs................................... 25 3.2.2.1 Thirty-Two Object ID Format.................... 25 3.2.2.2 Sixteen Bit Object ID Format.................... 26 3.2.3 Object ID Description............................. 26 3.3 Communication and Synchronization........................ 27 3.4 Time.......................................... 28 3.5 Memory Management................................. 29 4 RTEMS Data Types 31 4.1 Introduction...................................... 32 4.2 List of Data Types.................................. -
Recursive Updates in Copy-On-Write File Systems - Modeling and Analysis
2342 JOURNAL OF COMPUTERS, VOL. 9, NO. 10, OCTOBER 2014 Recursive Updates in Copy-on-write File Systems - Modeling and Analysis Jie Chen*, Jun Wang†, Zhihu Tan*, Changsheng Xie* *School of Computer Science and Technology Huazhong University of Science and Technology, China *Wuhan National Laboratory for Optoelectronics, Wuhan, Hubei 430074, China [email protected], {stan, cs_xie}@hust.edu.cn †Dept. of Electrical Engineering and Computer Science University of Central Florida, Orlando, Florida 32826, USA [email protected] Abstract—Copy-On-Write (COW) is a powerful technique recursive update. Recursive updates can lead to several for data protection in file systems. Unfortunately, it side effects to a storage system, such as write introduces a recursively updating problem, which leads to a amplification (also can be referred as additional writes) side effect of write amplification. Studying the behaviors of [4], I/O pattern alternation [5], and performance write amplification is important for designing, choosing and degradation [6]. This paper focuses on the side effects of optimizing the next generation file systems. However, there are many difficulties for evaluation due to the complexity of write amplification. file systems. To solve this problem, we proposed a typical Studying the behaviors of write amplification is COW file system model based on BTRFS, verified its important for designing, choosing, and optimizing the correctness through carefully designed experiments. By next generation file systems, especially when the file analyzing this model, we found that write amplification is systems uses a flash-memory-based underlying storage greatly affected by the distributions of files being accessed, system under online transaction processing (OLTP) which varies from 1.1x to 4.2x. -
F2punifycr: a Flash-Friendly Persistent Burst-Buffer File System
F2PUnifyCR: A Flash-friendly Persistent Burst-Buffer File System ThanOS Department of Computer Science Florida State University Tallahassee, United States I. ABSTRACT manifold depending on the workloads it is handling for With the increased amount of supercomputing power, applications. In order to leverage the capabilities of burst it is now possible to work with large scale data that buffers to the utmost level, it is very important to have a pose a continuous opportunity for exascale computing standardized software interface across systems. It has to that puts immense pressure on underlying persistent data deal with an immense amount of data during the runtime storage. Burst buffers, a distributed array of node-local of the applications. persistent flash storage devices deployed on most of Using node-local burst buffer can achieve scalable the leardership supercomputers, are means to efficiently write bandwidth as it lets each process write to the handling the bursty I/O invoked through cutting-edge local flash drive, but when the files are shared across scientific applications. In order to manage these burst many processes, it puts the management of metadata buffers, many ephemeral user level file system solutions, and object data of the files under huge challenge. In like UnifyCR, are present in the research and industry order to handle all the challenges posed by the bursty arena. Because of the intrinsic nature of the flash devices and random I/O requests by the Scientific Applica- due to background processing overhead, like Garbage tions running on leadership Supercomputing clusters, Collection, peak write bandwidth is hard to get. -
Architecture and Performance of Perlmutter's 35 PB Clusterstor
Lawrence Berkeley National Laboratory Recent Work Title Architecture and Performance of Perlmutter’s 35 PB ClusterStor E1000 All-Flash File System Permalink https://escholarship.org/uc/item/90r3s04z Authors Lockwood, Glenn K Chiusole, Alberto Gerhardt, Lisa et al. Publication Date 2021-06-18 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Architecture and Performance of Perlmutter’s 35 PB ClusterStor E1000 All-Flash File System Glenn K. Lockwood, Alberto Chiusole, Lisa Gerhardt, Kirill Lozinskiy, David Paul, Nicholas J. Wright Lawrence Berkeley National Laboratory fglock, chiusole, lgerhardt, klozinskiy, dpaul, [email protected] Abstract—NERSC’s newest system, Perlmutter, features a 35 1,536 GPU nodes 16 Lustre MDSes PB all-flash Lustre file system built on HPE Cray ClusterStor 1x AMD Epyc 7763 1x AMD Epyc 7502 4x NVIDIA A100 2x Slingshot NICs E1000. We present its architecture, early performance figures, 4x Slingshot NICs Slingshot Network 24x 15.36 TB NVMe and performance considerations unique to this architecture. We 200 Gb/s demonstrate the performance of E1000 OSSes through low-level 2-level dragonfly 16 Lustre OSSes 3,072 CPU nodes Lustre tests that achieve over 90% of the theoretical bandwidth 1x AMD Epyc 7502 2x AMD Epyc 7763 2x Slingshot NICs of the SSDs at the OST and LNet levels. We also show end-to-end 1x Slingshot NICs performance for both traditional dimensions of I/O performance 24x 15.36 TB NVMe (peak bulk-synchronous bandwidth) and non-optimal workloads endemic to production computing (small, incoherent I/Os at 24 Gateway nodes 2 Arista 7804 routers 2x Slingshot NICs 400 Gb/s/port random offsets) and compare them to NERSC’s previous system, 2x 200G HCAs > 10 Tb/s routing Cori, to illustrate that Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. -
Network Store: Exploring Slicing in Future 5G Networks
Network Store: Exploring Slicing in Future 5G Networks Navid Nikaein*, Eryk Schillerx, Romain Favraud*y, Kostas Katsalis\, Donatos Stavropoulos\, Islam Alyafawix, Zhongliang Zhaox, Torsten Braunx, and Thanasis Korakis\ *EURECOM, yDCNS xUniversity of Bern fi[email protected] [email protected] \University of Thessaly {kkatsalis,dostavro,korakis}@uth.gr ABSTRACT coupling between control and data planes, state-full design, In this paper, we present a revolutionary vision of 5G net- and dependency to dedicated hardware. The exploitation works, in which SDN programs wireless network functions, of cloud technologies, Software-Define Networking (SDN) and where Mobile Network Operators (MNO), Enterprises, and Network-Function Virtualization (NFV), can provide and Over-The-Top (OTT) third parties are provided with the necessary tools to break-down the vertical system or- NFV-ready Network Store. The proposed Network Store ganization, into a set of horizontal micro-service network serves as a digital distribution platform of programmable architectures. These can be used to combine relevant net- Virtualized Network Functions (VNFs) that enable 5G ap- work functions together, assign the target performance pa- plication use-cases. Currently existing application stores, rameters, and map them onto infrastructure resources. such as Apple's App Store for iOS applications, Google's In this paper, we present the design of a 5G-ready archi- Play Store for Android, or Ubuntu's Software Center, deliver tecture and a NFV-based Network Store that can serve as applications to user specific software platforms. Our vision a digital distribution platform for 5G application use-cases. is to provide a digital marketplace, gathering 5G enabling The goal of the Network Store is to provide programmable Network Applications and Network Functions, written to run pieces of code that on-the-fly reserve required resources, de- on top of commodity cloud infrastructures, connected to re- ploy and run the necessary software components, configure mote radio heads (RRH). -
Embedded Operating Systems
7 Embedded Operating Systems Claudio Scordino1, Errico Guidieri1, Bruno Morelli1, Andrea Marongiu2,3, Giuseppe Tagliavini3 and Paolo Gai1 1Evidence SRL, Italy 2Swiss Federal Institute of Technology in Zurich (ETHZ), Switzerland 3University of Bologna, Italy In this chapter, we will provide a description of existing open-source operating systems (OSs) which have been analyzed with the objective of providing a porting for the reference architecture described in Chapter 2. Among the various possibilities, the ERIKA Enterprise RTOS (Real-Time Operating System) and Linux with preemption patches have been selected. A description of the porting effort on the reference architecture has also been provided. 7.1 Introduction In the past, OSs for high-performance computing (HPC) were based on custom-tailored solutions to fully exploit all performance opportunities of supercomputers. Nowadays, instead, HPC systems are being moved away from in-house OSs to more generic OS solutions like Linux. Such a trend can be observed in the TOP500 list [1] that includes the 500 most powerful supercomputers in the world, in which Linux dominates the competition. In fact, in around 20 years, Linux has been capable of conquering all the TOP500 list from scratch (for the first time in November 2017). Each manufacturer, however, still implements specific changes to the Linux OS to better exploit specific computer hardware features. This is especially true in the case of computing nodes in which lightweight kernels are used to speed up the computation. 173 174 Embedded Operating Systems Figure 7.1 Number of Linux-based supercomputers in the TOP500 list. Linux is a full-featured OS, originally designed to be used in server or desktop environments. -
Red Hat Enterprise Linux for Real Time 8 Reference Guide
Red Hat Enterprise Linux for Real Time 8 Reference Guide Core concepts and terminology for using RHEL for Real Time Last Updated: 2021-05-19 Red Hat Enterprise Linux for Real Time 8 Reference Guide Core concepts and terminology for using RHEL for Real Time Sujata Kurup Red Hat Customer Content Services [email protected] Jaroslav Klech Red Hat Customer Content Services Marie Doleželová Red Hat Customer Content Services Maxim Svistunov Red Hat Customer Content Services Radek Bíba Red Hat Customer Content Services David Ryan Red Hat Customer Content Services Cheryn Tan Red Hat Customer Content Services Lana Brindley Red Hat Customer Content Services Alison Young Red Hat Customer Content Services Legal Notice Copyright © 2021 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. -
SCHED DEADLINE: a Real-Time CPU Scheduler for Linux Luca Abeni [email protected]
SCHED DEADLINE: a real-time CPU scheduler for Linux Luca Abeni [email protected] TuToR 2017 Luca Abeni – 1 / 33 Scheduling Real-Time Tasks Introduction Real-Time Scheduling in cj cj+1 Linux fj dj fj+1 dj+1 Setting the Scheduling rj rj+1 Policy The Constant Bandwidth Server Consider a set of N real-time tasks SCHED DEADLINE Using Γ= {τ0,...τN−1} SCHED DEADLINE Scheduled on M CPUs Real-Time theory → lot of scheduling algorithms... ...But which ones are available on a commonly used OS? POSIX: fixed priorities Can be used to do RM, DM, etc... Multiple processors: DkC, etc... Linux also provides SCHED DEADLINE: resource TuToR 2017 reservations + EDF Luca Abeni – 2 / 33 Definitions Introduction Real-Time Scheduling in Real-time task τ : sequence of jobs Ji =(ri,ci,di) Linux Setting the Scheduling Policy Finishing time fi The Constant Bandwidth Server Goal: fi ≤ di SCHED DEADLINE Using ∀Ji, or control the amount of missed deadlines SCHED DEADLINE Schedule on multiple CPUS: partitioned or global Schedule in a general-purpose OS Open System (with online admission control) Presence of non real-time tasks (do not starve them!) TuToR 2017 Luca Abeni – 3 / 33 Using Fixed Priorities with POSIX Introduction Real-Time Scheduling in SCHED FIFO and SCHED RR use fixed priorities Linux Setting the Scheduling Policy They can be used for real-time tasks, to The Constant Bandwidth Server implement RM and DM SCHED DEADLINE Using Real-time tasks have priority over non real-time SCHED DEADLINE (SCHED OTHER) tasks The difference between