Performance Implications of Memory Affinity on Filesystem Caches in a Non-Uniform Memory Access Environment
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Copy on Write Based File Systems Performance Analysis and Implementation
Copy On Write Based File Systems Performance Analysis And Implementation Sakis Kasampalis Kongens Lyngby 2010 IMM-MSC-2010-63 Technical University of Denmark Department Of Informatics Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk Abstract In this work I am focusing on Copy On Write based file systems. Copy On Write is used on modern file systems for providing (1) metadata and data consistency using transactional semantics, (2) cheap and instant backups using snapshots and clones. This thesis is divided into two main parts. The first part focuses on the design and performance of Copy On Write based file systems. Recent efforts aiming at creating a Copy On Write based file system are ZFS, Btrfs, ext3cow, Hammer, and LLFS. My work focuses only on ZFS and Btrfs, since they support the most advanced features. The main goals of ZFS and Btrfs are to offer a scalable, fault tolerant, and easy to administrate file system. I evaluate the performance and scalability of ZFS and Btrfs. The evaluation includes studying their design and testing their performance and scalability against a set of recommended file system benchmarks. Most computers are already based on multi-core and multiple processor architec- tures. Because of that, the need for using concurrent programming models has increased. Transactions can be very helpful for supporting concurrent program- ming models, which ensure that system updates are consistent. Unfortunately, the majority of operating systems and file systems either do not support trans- actions at all, or they simply do not expose them to the users. -
The Page Cache Today’S Lecture RCU File System Networking(Kernel Level Syncmem
2/21/20 COMP 790: OS Implementation COMP 790: OS Implementation Logical Diagram Binary Memory Threads Formats Allocators User System Calls Kernel The Page Cache Today’s Lecture RCU File System Networking(kernel level Syncmem. Don Porter management) Memory Device CPU Management Drivers Scheduler Hardware Interrupts Disk Net Consistency 1 2 1 2 COMP 790: OS Implementation COMP 790: OS Implementation Recap of previous lectures Background • Page tables: translate virtual addresses to physical • Lab2: Track physical pages with an array of PageInfo addresses structs • VM Areas (Linux): track what should be mapped at in – Contains reference counts the virtual address space of a process – Free list layered over this array • Hoard/Linux slab: Efficient allocation of objects from • Just like JOS, Linux represents physical memory with a superblock/slab of pages an array of page structs – Obviously, not the exact same contents, but same idea • Pages can be allocated to processes, or to cache file data in memory 3 4 3 4 COMP 790: OS Implementation COMP 790: OS Implementation Today’s Problem The address space abstraction • Given a VMA or a file’s inode, how do I figure out • Unifying abstraction: which physical pages are storing its data? – Each file inode has an address space (0—file size) • Next lecture: We will go the other way, from a – So do block devices that cache data in RAM (0---dev size) physical page back to the VMA or file inode – The (anonymous) virtual memory of a process has an address space (0—4GB on x86) • In other words, all page -
Hetfs: a Heterogeneous File System for Everyone
HetFS: A Heterogeneous File System for Everyone Georgios Koloventzos1, Ramon Nou∗1, Alberto Miranda∗1, and Toni Cortes1;2 1 Barcelona Supercomputing Center (BSC) Barcelona, Spain 2 Universitat Polit`ecnicade Catalunya Barcelona, Spain georgios.koloventzos,ramon.nou,alberto.miranda,toni.cortes}@bsc.es Abstract Storage devices have been getting more and more diverse during the last decade. The advent of SSDs made it painfully clear that rotating devices, such as HDDs or magnetic tapes, were lacking in regards to response time. However, SSDs currently have a limited number of write cycles and a significantly larger price per capacity, which has prevented rotational technologies from begin abandoned. Additionally, Non-Volatile Memories (NVMs) have been lately gaining traction, offering devices that typically outperform NAND-based SSDs but exhibit a full new set of idiosyncrasies. Therefore, in order to appropriately support this diversity, intelligent mech- anisms will be needed in the near-future to balance the benefits and drawbacks of each storage technology available to a system. In this paper, we present a first step towards such a mechanism called HetFS, an extension to the ZFS file system that is capable of choosing the storage device a file should be kept in according to preprogrammed filters. We introduce the prototype and show some preliminary results of the effects obtained when placing specific files into different devices. 1 Introduction Storage devices have shown a significant evolution in the latest decade. As the improvements in the latencies of traditional hard disk drives (HDDs) have dimin- ished due to the mechanical limitations inherent to their design, other technolo- gies have been emerging to try and take their place. -
Flexible Lustre Management
Flexible Lustre management Making less work for Admins ORNL is managed by UT-Battelle for the US Department of Energy How do we know Lustre condition today • Polling proc / sysfs files – The knocking on the door model – Parse stats, rpc info, etc for performance deviations. • Constant collection of debug logs – Heavy parsing for common problems. • The death of a node – Have to examine kdumps and /or lustre dump Origins of a new approach • Requirements for Linux kernel integration. – No more proc usage – Migration to sysfs and debugfs – Used to configure your file system. – Started in lustre 2.9 and still on going. • Two ways to configure your file system. – On MGS server run lctl conf_param … • Directly accessed proc seq_files. – On MSG server run lctl set_param –P • Originally used an upcall to lctl for configuration • Introduced in Lustre 2.4 but was broken until lustre 2.12 (LU-7004) – Configuring file system works transparently before and after sysfs migration. Changes introduced with sysfs / debugfs migration • sysfs has a one item per file rule. • Complex proc files moved to debugfs • Moving to debugfs introduced permission problems – Only debugging files should be their. – Both debugfs and procfs have scaling issues. • Moving to sysfs introduced the ability to send uevents – Item of most interest from LUG 2018 Linux Lustre client talk. – Both lctl conf_param and lctl set_param –P use this approach • lctl conf_param can set sysfs attributes without uevents. See class_modify_config() – We get life cycle events for free – udev is now involved. What do we get by using udev ? • Under the hood – uevents are collect by systemd and then processed by udev rules – /etc/udev/rules.d/99-lustre.rules – SUBSYSTEM=="lustre", ACTION=="change", ENV{PARAM}=="?*", RUN+="/usr/sbin/lctl set_param '$env{PARAM}=$env{SETTING}’” • You can create your own udev rule – http://reactivated.net/writing_udev_rules.html – /lib/udev/rules.d/* for examples – Add udev_log="debug” to /etc/udev.conf if you have problems • Using systemd for long task. -
ECE 598 – Advanced Operating Systems Lecture 19
ECE 598 { Advanced Operating Systems Lecture 19 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 7 April 2016 Announcements • Homework #7 was due • Homework #8 will be posted 1 Why use FAT over ext2? • FAT simpler, easy to code • FAT supported on all major OSes • ext2 faster, more robust filename and permissions 2 btrfs • B-tree fs (similar to a binary tree, but with pages full of leaves) • overwrite filesystem (overwite on modify) vs CoW • Copy on write. When write to a file, old data not overwritten. Since old data not over-written, crash recovery better Eventually old data garbage collected • Data in extents 3 • Copy-on-write • Forest of trees: { sub-volumes { extent-allocation { checksum tree { chunk device { reloc • On-line defragmentation • On-line volume growth 4 • Built-in RAID • Transparent compression • Snapshots • Checksums on data and meta-data • De-duplication • Cloning { can make an exact snapshot of file, copy-on- write different than link, different inodles but same blocks 5 Embedded • Designed to be small, simple, read-only? • romfs { 32 byte header (magic, size, checksum,name) { Repeating files (pointer to next [0 if none]), info, size, checksum, file name, file data • cramfs 6 ZFS Advanced OS from Sun/Oracle. Similar in idea to btrfs indirect still, not extent based? 7 ReFS Resilient FS, Microsoft's answer to brtfs and zfs 8 Networked File Systems • Allow a centralized file server to export a filesystem to multiple clients. • Provide file level access, not just raw blocks (NBD) • Clustered filesystems also exist, where multiple servers work in conjunction. -
Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser
Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser The purpose of this presentation is the explanation of: IBM JFS goals: Where was Journaled File System (JFS) designed for? JFS cache design: How the JFS File System and Cache work. JFS benchmarking: How to measure JFS performance under OS/2. JFS cache tuning: How to optimize JFS performance for your job. What do these settings say to you? [E:\]cachejfs SyncTime: 8 seconds MaxAge: 30 seconds BufferIdle: 6 seconds Cache Size: 400000 kbytes Min Free buffers: 8000 ( 32000 K) Max Free buffers: 16000 ( 64000 K) Lazy Write is enabled Do you have a feeling for this? Do you understand the dynamic cache behaviour of the JFS cache? Or do you just rely on the “proven” cachejfs settings that the eCS installation presented to you? Do you realise that the JFS cache behaviour may be optimized for your jobs? November 13, 2009 / page 2 Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Where was Journaled File System (JFS) designed for? 1986 Advanced Interactive eXecutive (AIX) v.1 based on UNIX System V. for IBM's RT/PC. 1990 JFS1 on AIX was introduced with AIX version 3.1 for the RS/6000 workstations and servers using 32-bit and later 64-bit IBM POWER or PowerPC RISC CPUs. 1994 JFS1 was adapted for SMP servers (AIX 4) with more CPU power, many hard disks and plenty of RAM for cache and buffers. 1995-2000 JFS(2) (revised AIX independent version in c) was ported to OS/2 4.5 (1999) and Linux (2000) and also was the base code of the current JFS2 on AIX branch. -
[13주차] Sysfs and Procfs
1 7 Computer Core Practice1: Operating System Week13. sysfs and procfs Jhuyeong Jhin and Injung Hwang Embedded Software Lab. Embedded Software Lab. 2 sysfs 7 • A pseudo file system provided by the Linux kernel. • sysfs exports information about various kernel subsystems, HW devices, and associated device drivers to user space through virtual files. • The mount point of sysfs is usually /sys. • sysfs abstrains devices or kernel subsystems as a kobject. Embedded Software Lab. 3 How to create a file in /sys 7 1. Create and add kobject to the sysfs 2. Declare a variable and struct kobj_attribute – When you declare the kobj_attribute, you should implement the functions “show” and “store” for reading and writing from/to the variable. – One variable is one attribute 3. Create a directory in the sysfs – The directory have attributes as files • When the creation of the directory is completed, the directory and files(attributes) appear in /sys. • Reference: ${KERNEL_SRC_DIR}/include/linux/sysfs.h ${KERNEL_SRC_DIR}/fs/sysfs/* • Example : ${KERNEL_SRC_DIR}/kernel/ksysfs.c Embedded Software Lab. 4 procfs 7 • A special filesystem in Unix-like operating systems. • procfs presents information about processes and other system information in a hierarchical file-like structure. • Typically, it is mapped to a mount point named /proc at boot time. • procfs acts as an interface to internal data structures in the kernel. The process IDs of all processes in the system • Kernel provides a set of functions which are designed to make the operations for the file in /proc : “seq_file interface”. – We will create a file in procfs and print some data from data structure by using this interface. -
NOVA: a Log-Structured File System for Hybrid Volatile/Non
NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu and Steven Swanson, University of California, San Diego https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu This paper is included in the Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016 • Santa Clara, CA, USA ISBN 978-1-931971-28-7 Open access to the Proceedings of the 14th USENIX Conference on File and Storage Technologies is sponsored by USENIX NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu Steven Swanson University of California, San Diego Abstract Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges for system designers. These sys- Fast non-volatile memories (NVMs) will soon appear on tems need to minimize software overhead if they are to fully the processor memory bus alongside DRAM. The result- exploit NVMM’s high performance and efficiently support ing hybrid memory systems will provide software with sub- more flexible access patterns, and at the same time they must microsecond, high-bandwidth access to persistent data, but provide the strong consistency guarantees that applications managing, accessing, and maintaining consistency for data require and respect the limitations of emerging memories stored in NVM raises a host of challenges. Existing file sys- (e.g., limited program cycles). tems built for spinning or solid-state disks introduce software Conventional file systems are not suitable for hybrid mem- overheads that would obscure the performance that NVMs ory systems because they are built for the performance char- should provide, but proposed file systems for NVMs either in- acteristics of disks (spinning or solid state) and rely on disks’ cur similar overheads or fail to provide the strong consistency consistency guarantees (e.g., that sector updates are atomic) guarantees that applications require. -
Hibachi: a Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays
Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays Ziqi Fan, Fenggang Wu, Dongchul Parkx, Jim Diehl, Doug Voigty, and David H.C. Du University of Minnesota–Twin Cities, xIntel Corporation, yHewlett Packard Enterprise Email: [email protected], ffenggang, park, [email protected], [email protected], [email protected] Abstract—Adopting newer non-volatile memory (NVRAM) Application Application Application technologies as write buffers in slower storage arrays is a promising approach to improve write performance. However, due OS … OS … OS to DRAM’s merits, including shorter access latency and lower Buffer/Cache Buffer/Cache Buffer/Cache cost, it is still desirable to incorporate DRAM as a read cache along with NVRAM. Although numerous cache policies have been Application, web or proposed, most are either targeted at main memory buffer caches, database servers or manage NVRAM as write buffers and separately manage DRAM as read caches. To the best of our knowledge, cooperative hybrid volatile and non-volatile memory buffer cache policies specifically designed for storage systems using newer NVRAM Storage Area technologies have not been well studied. Network (SAN) This paper, based on our elaborate study of storage server Hibachi Cache block I/O traces, proposes a novel cooperative HybrId NVRAM DRAM NVRAM and DRAM Buffer cACHe polIcy for storage arrays, named Hibachi. Hibachi treats read cache hits and write cache hits differently to maximize cache hit rates and judiciously adjusts the clean and the dirty cache sizes to capture workloads’ tendencies. In addition, it converts random writes to sequential writes for high disk write throughput and further exploits storage server I/O workload characteristics to improve read performance. -
Hwloc-V1.10.1-Letter.Pdf
Hardware Locality (hwloc) 1.10.1 Generated by Doxygen 1.8.8 Mon Jan 26 2015 10:38:04 Contents 1 Hardware Locality 1 1.1 Introduction ................................................. 1 1.2 Installation.................................................. 2 1.3 CLI Examples................................................ 3 1.4 Programming Interface ........................................... 8 1.4.1 Portability.............................................. 8 1.4.2 API Example ............................................12 1.5 Questions and Bugs.............................................15 1.6 History / Credits...............................................15 1.7 Further Reading...............................................15 2 Terms and Definitions 17 3 Command-Line Tools 21 3.1 lstopo and lstopo-no-graphics........................................21 3.2 hwloc-bind..................................................21 3.3 hwloc-calc..................................................21 3.4 hwloc-info..................................................22 3.5 hwloc-distrib.................................................22 3.6 hwloc-ps...................................................22 3.7 hwloc-gather-topology............................................22 3.8 hwloc-distances...............................................22 3.9 hwloc-annotate ...............................................22 3.10 hwloc-diff and hwloc-patch .........................................22 3.11 hwloc-compress-dir.............................................23 3.12 hwloc-assembler -
12. File-System Implementation
12.12. FiFilele--SystemSystem ImpImpllemeemenntattatiionon SungyoungLee College of Engineering KyungHeeUniversity ContentsContents n File System Structure n File System Implementation n Directory Implementation n Allocation Methods n Free-Space Management n Efficiency and Performance n Recovery n Log-Structured File Systems n NFS Operating System 1 OverviewOverview n User’s view on file systems: ü How files are named? ü What operations are allowed on them? ü What the directory tree looks like? n Implementor’sview on file systems: ü How files and directories are stored? ü How disk space is managed? ü How to make everything work efficiently and reliably? Operating System 2 FileFile--SySysstemtem StrStruucturecture n File structure ü Logical storage unit ü Collection of related information n File system resides on secondary storage (disks) n File system organized into layers n File control block ü storage structure consisting of information about a file Operating System 3 LayeLayerreded FFililee SystemSystem Operating System 4 AA TypicalTypical FFileile ControlControl BlockBlock Operating System 5 FileFile--SySysstemtem ImImpplementationlementation n On-disk structure ü Boot control block § Boot block(UFS) or Boot sector(NTFS) ü Partition control block § Super block(UFS) or Master file table(NTFS) ü Directory structure ü File control block (FCB) § I-node(UFS) or In master file table(NTFS) n In-memory structure ü In-memory partition table ü In-memory directory structure ü System-wide open file table ü Per-process open file table Operating -
SUSE Linux Enterprise Server 12 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 12 SP4
SUSE Linux Enterprise Server 12 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 12 SP4 An administrator's guide for problem detection, resolution and optimization. Find how to inspect and optimize your system by means of monitoring tools and how to eciently manage resources. Also contains an overview of common problems and solutions and of additional help and documentation resources. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents About This Guide xii 1 Available Documentation xiii