Parallel Programming 2021-01-26

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Programming 2021-01-26 Parallel I/O Parallel Programming 2021-01-26 Jun.-Prof. Dr. Michael Kuhn [email protected] Parallel Computing and I/O Institute for Intelligent Cooperating Systems Faculty of Computer Science Otto von Guericke University Magdeburg https://parcio.ovgu.de Outline Review Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 1 / 44 Advanced MPI and Debugging Review • How does one-sided communication work? 1. One-sided communication works with messages 2. Every process can access every other address space 3. Addresses first have to be exposed via windows 4. System looks like a shared memory system to the processes Michael Kuhn Parallel I/O 2 / 44 Advanced MPI and Debugging Review • What is the dierence between active and passive target communication? 1. Both origin and target are involved in active target communication 2. Both origin and target are involved in passive target communication 3. Only target process is involved in active target communication 4. Only target process is involved in passive target communication Michael Kuhn Parallel I/O 2 / 44 Advanced MPI and Debugging Review • What is the purpose of MPI_Accumulate? 1. Can be used to sum multiple values 2. Can be used to perform specific reductions on values 3. Can be used to merge multiple windows 4. Can be used to collect information about processes Michael Kuhn Parallel I/O 2 / 44 Advanced MPI and Debugging Review • How can deadlocks and race conditions be detected? 1. The compiler warns about them 2. Static analysis can detect some errors 3. Errors can only be detected at runtime Michael Kuhn Parallel I/O 2 / 44 Outline Introduction and Motivation Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 3 / 44 Computation and I/O Introduction and Motivation Level Latency • Parallel applications run on multiple nodes L1 cache ≈ 1 ns • Communication via MPI L2 cache ≈ 5 ns • Computation is only one part of applications L3 cache ≈ 10 ns • Input data has to be read RAM ≈ 100 ns • Output data has to be written InfiniBand ≈ 500 ns • Example: checkpoints Ethernet ≈ 100,000 ns • Processors require data fast SSD ≈ 100,000 ns • Caches should be used optimally HDD ≈ 10,000,000 ns • Additional latency due to I/O and network [Bonér, 2012] [Huang et al., 2014] Michael Kuhn Parallel I/O 4 / 44 Computation and I/O... Introduction and Motivation Application Libraries Parallel distributed file system File system Optimizations Data reduction Data Performance analysis Performance Storage device/array • I/O is oen responsible for performance problems • High latency causes idle processors • I/O is oen still serial, limiting throughput • I/O stack is layered • Many dierent components are involved in accessing data • One unoptimized layer can significantly decrease performance Michael Kuhn Parallel I/O 5 / 44 Outline Storage Devices and Arrays Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 6 / 44 Hard Disk Drives Storage Devices and Arrays • First HDD: 1956 • IBM 350 RAMAC (3.75 MB, 8.8 KB/s, 1,200 RPM) • HDD development • Capacity: 100× every 10 years • Throughput: 10× every 10 years [Wikipedia, 2015] Michael Kuhn Parallel I/O 7 / 44 Solid-State Drives Storage Devices and Arrays • Benefits • Read throughput: factor of 15 • Write throughput: factor of 10 • Latency: factor of 100 • Energy consumption: factor of 1–10 • Drawbacks • Price: factor of 10 • Write cycles: 10,000–100,000 • Complexity • Dierent optimal access sizes for reads and writes • Address translation, thermal issues etc. Michael Kuhn Parallel I/O 8 / 44 RAID Storage Devices and Arrays • Storage arrays for higher capacity, throughput and reliability • Proposed in 1988 at the University of California, Berkeley • Originally: Redundant Array of Inexpensive Disks • Today: Redundant Array of Independent Disks • Capacity • Storage array can be addressed like a single, large device • Throughput • All storage devices can contribute to the overall throughput • Reliability • Data can be stored redundantly to survive hardware failures • Devices usually have same age, fabrication defects within same batch Michael Kuhn Parallel I/O 9 / 44 RAID... Storage Devices and Arrays • Five dierent variants initially • RAID 1: mirroring • RAID 2/3: bit/byte striping • RAID 4: block striping • RAID 5: block striping with distributed parity • New variants have been added • RAID 0: striping • RAID 6: block striping with double parity Michael Kuhn Parallel I/O 10 / 44 RAID 1 Storage Devices and Arrays RAID 1 • Improved reliability via mirroring A1 A1 • Advantages A2 A2 A3 A3 • One device can fail without losing data A4 A4 • Read performance can be improved • Disadvantages • Capacity requirements and costs are doubled • Write performance equals that of a single device Disk 0 Disk 1 [Cburnett, 2006a] Michael Kuhn Parallel I/O 11 / 44 RAID 5 Storage Devices and Arrays RAID 5 A1 A2 A3 Ap • Improved reliability via parity B1 B2 Bp B3 • Typically simple XOR C1 Cp C2 C3 • Advantages Dp D1 D2 D3 • Performance can be improved • Requests can be processed in parallel • Load is distributed across all devices Disk 0 Disk 1 Disk 2 Disk 3 [Cburnett, 2006b] Michael Kuhn Parallel I/O 12 / 44 RAID 5... Storage Devices and Arrays • Data can be reconstructed RAID 5 easily due to XOR • ?A = A1 ⊕ A2 ⊕ Ap, A1 A2 ?A Ap B1 B2 ?B B3 ?B = B1 ⊕ B2 ⊕ B3,... C1 Cp ?C C3 • Problems Dp D1 ?D D3 • Read errors on other devices • Duration (30 min in 2004, 17–18 h in 2020 for HDDs) Disk 0 Disk 1 Disk 2 Disk 3 • New approaches like declustered RAID [Cburnett, 2006b] Michael Kuhn Parallel I/O 13 / 44 Performance Assessment Storage Devices and Arrays • Dierent performance criteria • Data throughput (photo/video editing, numerical applications) • Request throughput (databases, metadata management) • Appropriate hardware • Data throughput • HDDs: 150–250 MB/s, SSDs: 0.5–3.5 GB/s • Request throughput • HDDs: 75–100 IOPS (7,200 RPM), SSDs: 90,000–600,000 IOPS • Appropriate configuration • Small blocks for data, large blocks for requests • Partial block/page accesses can reduce performance Michael Kuhn Parallel I/O 14 / 44 Outline File Systems Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 15 / 44 Overview File Systems • File systems provide structure • Files and directories are the most common file system objects • Nesting directories results in hierarchical organization • Other approaches: tagging • Management of data and metadata • Block allocation is important for performance • Access permissions, timestamps etc. • File systems use underlying storage devices or arrays Michael Kuhn Parallel I/O 16 / 44 File System Objects File Systems • User vs. system view • Users see files and directories • System manages inodes • Relevant for stat etc. • Files • Contain data as byte arrays • Can be read and written (explicitly) • Can be mapped to memory (implicit) • Directories • Contain files and directories • Structure the namespace Michael Kuhn Parallel I/O 17 / 44 I/O Interfaces File Systems • Requests are realized through I/O interfaces • Forwarded to the file system • Dierent abstraction levels 1 fd = open("/path/to/file", ...); • Low-level functionality: POSIX etc. 2 nb = write(fd, data, • High-level functionality: NetCDF etc. 3 sizeof(data)); • Initial access via path 4 rv = close(fd); • Aerwards access via file descriptor (few 5 rv = unlink("/path/to/file"); exceptions) • Functions are located in libc • Library executes system calls Michael Kuhn Parallel I/O 18 / 44 Virtual File System (Switch) File Systems • Central file system component in the kernel • Sets file system structure and interface • Forwards applications’ requests based on path • Enables supporting multiple dierent file systems • Applications are still portable due to POSIX • POSIX: standardized interface for all file systems • Syntax defines available operations and their parameters • open, close, creat, read, write, lseek, chmod, chown, stat etc. • Semantics defines operations’ behavior • write: “POSIX requires that a read(2) which can be proved to occur aer a write() has returned returns the new data. Note that not all filesystems are POSIX conforming.” Michael Kuhn Parallel I/O 19 / 44 VFS... [Werner Fischer and Georg Schönberger, 2017] File Systems mmap (anonymous pages) Applications (processes) malloc ... stat(2) open(2) read(2) write(2) chmod(2) VFS Block-based FS Network FS Pseudo FS Special ext2 ext3 ext4 xfs NFS coda purpose FS Direct I/O proc sysfs Page btrfs ifs iso9660 smbfs ... (O_DIRECT) pipefs futexfs tmpfs ramfs cache gfs ocfs ... ceph usbfs ... devtmpfs Stackable FS ecryptfs overlayfs unionfs FUSE userspace (e.g. sshfs) network stackable (optional) struct bio - sector on disk BIOs (block I/Os) Devices on top of “normal” BIOs (block I/Os) - sector cnt block devices - bio_vec cnt drbd LVM - bio_vec index device mapper mdraid - bio_vec list dm-crypt dm-mirror ... dm-cache dm-thin bcache dm-raid dm-delay The Linux Storage Stack Diagram http://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram Created by Werner Fischer and Georg Schönberger License: CC-BY-SA 3.0, see http://creativecommons.org/licenses/by-sa/3.0/
Recommended publications
  • SUSE Linux Enterprise Server 15 SP2 Autoyast Guide Autoyast Guide SUSE Linux Enterprise Server 15 SP2
    SUSE Linux Enterprise Server 15 SP2 AutoYaST Guide AutoYaST Guide SUSE Linux Enterprise Server 15 SP2 AutoYaST is a system for unattended mass deployment of SUSE Linux Enterprise Server systems. AutoYaST installations are performed using an AutoYaST control le (also called a “prole”) with your customized installation and conguration data. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents 1 Introduction to AutoYaST 1 1.1 Motivation 1 1.2 Overview and Concept 1 I UNDERSTANDING AND CREATING THE AUTOYAST CONTROL FILE 4 2 The AutoYaST Control
    [Show full text]
  • The Linux Storage Stack Diagram
    The Linux Storage Stack Diagram version 3.17, 2014-10-17 outlines the Linux storage stack as of Kernel version 3.17 ISCSI USB mmap Fibre Channel Fibre over Ethernet Fibre Channel Fibre Virtual Host Virtual FireWire (anonymous pages) Applications (Processes) LIO malloc vfs_writev, vfs_readv, ... ... stat(2) read(2) open(2) write(2) chmod(2) VFS tcm_fc sbp_target tcm_usb_gadget tcm_vhost tcm_qla2xxx iscsi_target_mod block based FS Network FS pseudo FS special Page ext2 ext3 ext4 proc purpose FS target_core_mod direct I/O NFS coda sysfs Cache (O_DIRECT) xfs btrfs tmpfs ifs smbfs ... pipefs futexfs ramfs target_core_file iso9660 gfs ocfs ... devtmpfs ... ceph usbfs target_core_iblock target_core_pscsi network optional stackable struct bio - sector on disk BIOs (Block I/O) BIOs (Block I/O) - sector cnt devices on top of “normal” - bio_vec cnt block devices drbd LVM - bio_vec index - bio_vec list device mapper mdraid dm-crypt dm-mirror ... dm-cache dm-thin bcache BIOs BIOs Block Layer BIOs I/O Scheduler blkmq maps bios to requests multi queue hooked in device drivers noop Software (they hook in like stacked ... Queues cfq devices do) deadline Hardware Hardware Dispatch ... Dispatch Queue Queues Request Request BIO based Drivers based Drivers based Drivers request-based device mapper targets /dev/nullb* /dev/vd* /dev/rssd* dm-multipath SCSI Mid Layer /dev/rbd* null_blk SCSI upper level drivers virtio_blk mtip32xx /dev/sda /dev/sdb ... sysfs (transport attributes) /dev/nvme#n# /dev/skd* rbd Transport Classes nvme skd scsi_transport_fc network
    [Show full text]
  • Effective Cache Apportioning for Performance Isolation Under
    Effective Cache Apportioning for Performance Isolation Under Compiler Guidance Bodhisatwa Chatterjee Sharjeel Khan Georgia Institute of Technology Georgia Institute of Technology Atlanta, USA Atlanta, USA [email protected] [email protected] Santosh Pande Georgia Institute of Technology Atlanta, USA [email protected] Abstract cache partitioning to divide the LLC among the co-executing With a growing number of cores per socket in modern data- applications in the system. Ideally, a cache partitioning centers where multi-tenancy of a diverse set of applications scheme obtains overall gains in system performance by pro- must be efficiently supported, effective sharing of the last viding a dedicated region of cache memory to high-priority level cache is a very important problem. This is challenging cache-intensive applications and ensures security against because modern workloads exhibit dynamic phase behaviour cache-sharing attacks by the notion of isolated execution in - their cache requirements & sensitivity vary across different an otherwise shared LLC. Apart from achieving superior execution points. To tackle this problem, we propose Com- application performance and improving system throughput CAS, a compiler guided cache apportioning system that pro- [7, 20, 31], cache partitioning can also serve a variety of pur- vides smart cache allocation to co-executing applications in a poses - improving system power and energy consumption system. The front-end of Com-CAS is primarily a compiler- [6, 23], ensuring fairness in resource allocation [26, 36] and framework equipped with learning mechanisms to predict even enabling worst case execution-time analysis of real-time cache requirements, while the backend consists of allocation systems [18].
    [Show full text]
  • Bcache (Kernel-Modul Seit 3.11 ...)
    DFN-Forum 2016: RZ Storage für die Virtualisierung Konrad Meier, Martin Ullrich, Dirk von Suchodoletz 31.05.2016 – Rechenzentrum Universität Rostock Ausgangslage . Praktiker-Beitrag, deshalb Fokus auf Problemlage und Ergebnisse / Erkenntnisse - Beschaffung neuen Speichersystems bereits in Ausschreibung - Übergangslösung für Problem für ca. halbes Jahr gesucht - Abt. eScience für Strat.entwicklung, bwCloud, HPC - Interessanter „Case“ im RZ-Geschäft, da Storage inzwischen sehr zentral - Storage im Übergang zu SSD/Flash aber All-Flash noch nicht bezahlbar - Flash attraktiver Beschleuniger/Cache - Bisher keine Erfahrung (wenig explizite Erfolgsberichte, jedoch viele Hinweise auf deutliche Verbesserungen), Probleme jedoch so groß, dass Risiko eines Scheiterns akzeptabel (Abschätzung Umbauaufwand etc.) 04.07.16 9. DFN-Forum in Rostock 2 Damalige Ausgangssituation . Zwei ESX-Cluster im Einsatz - Virtualisierung 1 Altsystem, klassiches FC-Setup - Virtualisierung 2 modernes System hoher Performance als zukünftiges Modell . Storage: Klass. Massen- speicher (nicht virt.-optimiert) . Für einfache Migration der VMs Einsatz von NFSv3 . Realisiert über Linux-Fileserver - Fibrechannel-Infrastruktur - LUNs mit Multipath-Anbindung als Blockdevices mit ext4 04.07.16 9. DFN-Forum in Rostock 3 Problem: Schlechte Storage Performance . Mehrfache Ausfälle des NFS-Kopfes mit unklaren Ursachen - Hohe Latenzen seit einigen Wochen - Last sollte aufgrund Semesterpause eher gering sein - Keine bekannten Veränderungen im Lastprofil der vorhandenen VMs . Aufschaukeleffekte - NFS „staut“ lange Queues auf (async Konfiguration) - Bei Crashes, je nach Gast-OS ziemlich kaputte FS - Linux-Gäste setzen bei langen Latenzen (virt.) Blockdevice ro, NTFS toleranter - Ungeschickte Konfigurationen in VMs verschlimmern Zustand . Ergebnisse von fio (iotop, htop ebenso betrachtet) lat(usec): 250=26.96%, 500=72.49%, 750=0.29%, 1000=0.07% lat(msec): 2=0.09%, 4=0.08%, 10=0.01%, 20=0.01%, 50=0.01% lat(msec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 2000=0.01% lat(msec): >=2000=0.01% 04.07.16 9.
    [Show full text]
  • Yin, NUDT; Li Wang, Didi Chuxing; Yiming Zhang, Nicex Lab, NUDT; Yuxing Peng, NUDT
    MapperX: Adaptive Metadata Maintenance for Fast Crash Recovery of DM-Cache Based Hybrid Storage Devices Lujia Yin, NUDT; Li Wang, Didi Chuxing; Yiming Zhang, NiceX Lab, NUDT; Yuxing Peng, NUDT https://www.usenix.org/conference/atc21/presentation/yin This paper is included in the Proceedings of the 2021 USENIX Annual Technical Conference. July 14–16, 2021 978-1-939133-23-6 Open access to the Proceedings of the 2021 USENIX Annual Technical Conference is sponsored by USENIX. MapperX: Adaptive Metadata Maintenance for Fast Crash Recovery of DM-Cache Based Hybrid Storage Devices Lujia Yin Li Wang Yiming Zhang [email protected] [email protected] [email protected] (Corresponding) NUDT Didi Chuxing NiceX Lab, NUDT Yuxing Peng [email protected] NUDT Abstract replicas on SSDs and replicates backup replicas on HDDs; and SSHD [28] integrates a small SSD inside a large HDD DM-cache is a component of the device mapper of Linux of which the SSD acts as a cache. kernel, which has been widely used to map SSDs and HDDs As the demand of HDD-SSD hybrid storage increases, onto higher-level virtual block devices that take fast SSDs as Linux kernel has supported users to combine HDDs and a cache for slow HDDs to achieve high I/O performance at SSDs to jointly provide virtual block storage service. DM- low monetary cost. While enjoying the benefit of persistent cache [5] is a component of the device mapper [4] in the caching where SSDs accelerate normal I/O without affecting kernel, which has been widely used in industry to map SSDs durability, the current design of DM-cache suffers from and HDDs onto higher-level virtual block devices that take long crash recovery times (at the scale of hours) and low fast SSDs as a cache for slow HDDs.
    [Show full text]
  • Think ALL Distros Offer the Best Linux Devsecops Environment?
    Marc Staimer, Dragon Slayor Consulting WHITE PAPER Think All Distros Offer the Best Linux DevSecOps What You’re Not Being Told About Environment? Database as a Service (DBaaS) Think Again! WHITE PAPER • Think Again! Think All Distros Provide the Best Linux DevSecOps Environment? Think Again! Introduction DevOps is changing. Developing code with after the fact bolt-on security is dangerously flawed. When that bolt-on fails to correct exploitable code vulnerabilities, it puts the entire organization at risk. Security has been generally an afterthought for many doing DevOps. It was often assumed the IT organization’s systemic multiple layers of security measures and appliances would protect any new code from malware or breaches. And besides, developing code with security built in, adds tasks and steps to development and testing time. More tasks and steps delay time-to-market. Multi-tenant clouds have radically changed the market. Any vulnerability in a world with increasing cyber-attacks, can put millions of user’s data at risk. Those legacy DevOps attitudes are unsound. They are potentially quite costly in the current environment. Consider that nearly every developed and most developing countries have enacted laws and regulation protecting personally identifiable information or PII1. PII is incredibly valuable to cybercriminals. Stealing PII enables them to commit many cybercrimes including the cybertheft of identities, finances, intellectual property, admin privileges, and much more. PII can also be sold on the web. Those PII laws and regulations are meant to force IT organizations to protect PII. Non-compliance of these laws and regulations often carry punitive financial penalties.
    [Show full text]
  • Dm-Cache.Pdf
    DM-Cache Marc Skinner Principal Solutions Architect Twin Cities Users Group :: Q1/2016 Why Cache? ● Spinning disks are slow! ● Solid state disks (SSD) are fast!! ● Non-Volatile Memory Express (NVMe) devices are insane!!! Why not? ● Hybrid drives make 5900rpm drives act like 7200rpm drives with very small on board SSD cache ● Hmm, talk to FAST and move to SLOW? FAST SLOW Linux Caching Options ● DM-Cache ● Oldest and most stable. Developed in 2006 by IBM research group, and merged into Linux kernel tree in version 3.9. Uses the device-mapper framework to cache a slower device ● FlashCache ● Kernel module inspired by dm-cache and developed/maintained by Facebook. Also uses the device-mapper framework. ● EnhanceIO, RapidCache ● Both variations of FlashCache ● BCache ● Newest option and does not rely on device-mapper framework DM-Cache Modes ● write-through ● Red Hat default ● Write requests are not returned until the data reaches the origin and the cache device ● write-back ● Writes go only to the cache device ● pass-through ● Used to by pass the cache, used if cache is corrupt DM-Cache Setup ● Enable discards first # vi /etc/lvm/lvm.conf issue_discards = 1 # dracut -f # sync # reboot DM-Cache Setup ● Create PV, VG and LV with Cache # pvcreate /dev/md2 (raid 10 - 6 x 250gb SSD) # pvcreate /dev/md3 (raid 10 - 6 x 2tb SATA) # vgcreate vg_iscsi /dev/md3 /dev/md2 # lvcreate -l 100%FREE -n lv_sata vg_iscsi /dev/md3 # lvcreate -L 5G -n lv_cache_meta vg_iscsi /dev/md2 # lvcreate -L 650G -n lv_cache vg_iscsi /dev/md2 # lvconvert --type cache-pool
    [Show full text]
  • 06 使用bcache为ceph OSD加速的具体实践 by 花瑞
    杉岩官方微信 Practices for accelerating Ceph OSD with bcache 花瑞 [email protected] www.szsandstone.com Outline n Caching choices for Ceph n Practices for accelerating OSD with bcache n Bcache introduction n Challenge for production ready n Next step www.szsandstone.com 01 Caching choices for Ceph n Ceph Cache Tiering n Complexity in operation and maintenance, too much strategy n Data migration between cache pool and base pool cost too much n Coarse-grained objects promotion, slower performance in some workloads n Longer IO path when cache miss Ceph cache tiering n OSD Cache SSD n Simple to deploy, simple replacement strategy HDD n It is more worthy to accelerate OSD metadata and journal n Fine-grained sensitivity to active and inactive data OSD Cache www.szsandstone.com 02 Caching choices for Ceph n Linux block caching choices Bcache Flashcache EnhanceIO Dm-cache • First committed to • Support by facebook • Derived from • First committed to kernel-3.10 Flashcache kernel-3.9 • Using kernel device- • Good performance mapper • Normal performance • Using kernel device- mapper • SSD-friendly design • Normal performance • Easy maintain • Normal performance • Pooling SSD resource, • Easy develop, debug • Poor features thin-provisioning • No more developed • Rich features and maintained www.szsandstone.com 03 Why bcache n Feature comparison Bcache Flashcache/EnhanceIO Management SSD pooled, thin-provisioning, easy to add SSD(partition) binding to backing HDD, non backing HDD flexible Hit ratio Extent-based/B+tree Index, high hit ratio Block-based/hash
    [Show full text]
  • Reliable Storage for HA, DR, Clouds and Containers Philipp Reisner, CEO LINBIT LINBIT - the Company Behind It
    Reliable Storage for HA, DR, Clouds and Containers Philipp Reisner, CEO LINBIT LINBIT - the company behind it COMPANY OVERVIEW TECHNOLOGY OVERVIEW • Developer of DRBD • 100% founder owned • Offices in Europe and US • Team of 30 highly experienced Linux experts • Partner in Japan REFERENCES 25 Linux Storage Gems LVM, RAID, SSD cache tiers, deduplication, targets & initiators Linux's LVM logical volume snapshot logical volume Volume Group physical volume physical volume physical volume 25 Linux's LVM • based on device mapper • original objects • PVs, VGs, LVs, snapshots • LVs can scatter over PVs in multiple segments • thinlv • thinpools = LVs • thin LVs live in thinpools • multiple snapshots became efficient! 25 Linux's LVM thin-LV thin-LV thin-sLV LV snapshot thinpool VG PV PV PV 25 Linux's RAID RAID1 • original MD code • mdadm command A1 A1 • Raid Levels: 0,1,4,5,6,10 A2 A2 • Now available in LVM as well A3 A3 A4 A4 • device mapper interface for MD code • do not call it ‘dmraid’; that is software for hardware fake-raid • lvcreate --type raid6 --size 100G VG_name 25 SSD cache for HDD • dm-cache • device mapper module • accessible via LVM tools • bcache • generic Linux block device • slightly ahead in the performance game 25 Linux’s DeDupe • Virtual Data Optimizer (VDO) since RHEL 7.5 • Red hat acquired Permabit and is GPLing VDO • Linux upstreaming is in preparation • in-line data deduplication • kernel part is a device mapper module • indexing service runs in user-space • async or synchronous writeback • Recommended to be used below LVM 25 Linux’s targets & initiators • Open-ISCSI initiator IO-requests • Ietd, STGT, SCST Initiator Target data/completion • mostly historical • LIO • iSCSI, iSER, SRP, FC, FCoE • SCSI pass through, block IO, file IO, user-specific-IO • NVMe-OF • target & initiator 25 ZFS on Linux • Ubuntu eco-system only • has its own • logic volume manager (zVols) • thin provisioning • RAID (RAIDz) • caching for SSDs (ZIL, SLOG) • and a file system! 25 Put in simplest form DRBD – think of it as ..
    [Show full text]
  • Oracle Linux and the Unbreakable Enterprise Kernel, Including Premier
    ORACLE DATA SHEET Oracle Linux The Oracle Linux operating system is engineered for open cloud infrastructure. It delivers leading performance, scalability, reliability and security for enterprise SaaS and PaaS workloads as well as traditional enterprise applications. Oracle Linux Support offers access to award-winning Oracle support resources and Linux support specialists, zero-downtime updates using Ksplice, additional management tools such as Oracle Enterprise Manager and lifetime support, all at a low cost. Unlike many other commercial Linux distributions, Oracle Linux is easy to download and completely free to use, distribute and update. KEY FEATURES Latest Linux Innovations • Free to use, free to distribute, free to Oracle Linux comes with a choice of two kernels, the Unbreakable Enterprise Kernel update (UEK), which is installed and enabled by default, and the Red Hat Compatible Kernel. • Zero-downtime kernel, hypervisor UEK tracks the latest Linux kernel releases, supplying more innovation than other and user space updates with Ksplice commercial Linux kernels while providing binary compatibility with applications certified • Comprehensive kernel and to run on Red Hat Enterprise Linux. UEK is designed for enterprise workloads requiring application tracing with DTrace stability, scalability and performance, such as Oracle Database. • Linux management and high availability included at no additional Oracle Linux delivers advanced features for supporting and optimizing the latest charge for Oracle Linux Support enterprise hardware and software. For example: customers • Optimized for Oracle, including Ksplice Zero Downtime Updates – Available to Oracle Linux Premier Support Oracle Database and Oracle customers, Ksplice technology updates the kernels, hypervisors and critical user Applications space libraries without requiring a reboot or interruption.
    [Show full text]
  • Smart Prefetch
    Smart Prefetch User Guide Issue 02 Date 2021-03-30 HUAWEI TECHNOLOGIES CO., LTD. Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Issue 02 (2021-03-30) Copyright © Huawei Technologies Co., Ltd. i Smart Prefetch User Guide Contents Contents 1 Introduction.............................................................................................................................. 1 2 Environment Preparations.....................................................................................................3
    [Show full text]
  • Linux Kernel User Documentation V4.20.0
    usepackagefontspec setsansfontDejaVu Sans setromanfontDejaVu Serif setmonofontDejaVu Sans Mono Linux Kernel User Documentation v4.20.0 The kernel development community 1 16, 2019 Contents 1 Linux kernel release 4.x <http://kernel.org/> 3 2 The kernel’s command-line parameters 9 3 Linux allocated devices (4.x+ version) 109 4 L1TF - L1 Terminal Fault 171 5 Reporting bugs 181 6 Security bugs 185 7 Bug hunting 187 8 Bisecting a bug 193 9 Tainted kernels 195 10 Ramoops oops/panic logger 197 11 Dynamic debug 201 12 Explaining the dreaded “No init found.” boot hang message 207 13 Rules on how to access information in sysfs 209 14 Using the initial RAM disk (initrd) 213 15 Control Group v2 219 16 Linux Serial Console 245 17 Linux Braille Console 247 18 Parport 249 19 RAID arrays 253 20 Kernel module signing facility 263 21 Linux Magic System Request Key Hacks 267 i 22 Unicode support 273 23 Software cursor for VGA 277 24 Kernel Support for miscellaneous (your favourite) Binary Formats v1.1 279 25 Mono(tm) Binary Kernel Support for Linux 283 26 Java(tm) Binary Kernel Support for Linux v1.03 285 27 Reliability, Availability and Serviceability 293 28 A block layer cache (bcache) 309 29 ext4 General Information 319 30 Power Management 327 31 Thunderbolt 349 32 Linux Security Module Usage 353 33 Memory Management 369 ii Linux Kernel User Documentation, v4.20.0 The following is a collection of user-oriented documents that have been added to the kernel over time. There is, as yet, little overall order or organization here — this material was not written to be a single, coherent document! With luck things will improve quickly over time.
    [Show full text]