Parallel Programming 2021-01-26

Parallel I/O Parallel Programming 2021-01-26 Jun.-Prof. Dr. Michael Kuhn [email protected] Parallel Computing and I/O Institute for Intelligent Cooperating Systems Faculty of Computer Science Otto von Guericke University Magdeburg https://parcio.ovgu.de Outline Review Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 1 / 44 Advanced MPI and Debugging Review • How does one-sided communication work? 1. One-sided communication works with messages 2. Every process can access every other address space 3. Addresses first have to be exposed via windows 4. System looks like a shared memory system to the processes Michael Kuhn Parallel I/O 2 / 44 Advanced MPI and Debugging Review • What is the dierence between active and passive target communication? 1. Both origin and target are involved in active target communication 2. Both origin and target are involved in passive target communication 3. Only target process is involved in active target communication 4. Only target process is involved in passive target communication Michael Kuhn Parallel I/O 2 / 44 Advanced MPI and Debugging Review • What is the purpose of MPI_Accumulate? 1. Can be used to sum multiple values 2. Can be used to perform specific reductions on values 3. Can be used to merge multiple windows 4. Can be used to collect information about processes Michael Kuhn Parallel I/O 2 / 44 Advanced MPI and Debugging Review • How can deadlocks and race conditions be detected? 1. The compiler warns about them 2. Static analysis can detect some errors 3. Errors can only be detected at runtime Michael Kuhn Parallel I/O 2 / 44 Outline Introduction and Motivation Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 3 / 44 Computation and I/O Introduction and Motivation Level Latency • Parallel applications run on multiple nodes L1 cache ≈ 1 ns • Communication via MPI L2 cache ≈ 5 ns • Computation is only one part of applications L3 cache ≈ 10 ns • Input data has to be read RAM ≈ 100 ns • Output data has to be written InfiniBand ≈ 500 ns • Example: checkpoints Ethernet ≈ 100,000 ns • Processors require data fast SSD ≈ 100,000 ns • Caches should be used optimally HDD ≈ 10,000,000 ns • Additional latency due to I/O and network [Bonér, 2012] [Huang et al., 2014] Michael Kuhn Parallel I/O 4 / 44 Computation and I/O... Introduction and Motivation Application Libraries Parallel distributed file system File system Optimizations Data reduction Data Performance analysis Performance Storage device/array • I/O is oen responsible for performance problems • High latency causes idle processors • I/O is oen still serial, limiting throughput • I/O stack is layered • Many dierent components are involved in accessing data • One unoptimized layer can significantly decrease performance Michael Kuhn Parallel I/O 5 / 44 Outline Storage Devices and Arrays Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 6 / 44 Hard Disk Drives Storage Devices and Arrays • First HDD: 1956 • IBM 350 RAMAC (3.75 MB, 8.8 KB/s, 1,200 RPM) • HDD development • Capacity: 100× every 10 years • Throughput: 10× every 10 years [Wikipedia, 2015] Michael Kuhn Parallel I/O 7 / 44 Solid-State Drives Storage Devices and Arrays • Benefits • Read throughput: factor of 15 • Write throughput: factor of 10 • Latency: factor of 100 • Energy consumption: factor of 1–10 • Drawbacks • Price: factor of 10 • Write cycles: 10,000–100,000 • Complexity • Dierent optimal access sizes for reads and writes • Address translation, thermal issues etc. Michael Kuhn Parallel I/O 8 / 44 RAID Storage Devices and Arrays • Storage arrays for higher capacity, throughput and reliability • Proposed in 1988 at the University of California, Berkeley • Originally: Redundant Array of Inexpensive Disks • Today: Redundant Array of Independent Disks • Capacity • Storage array can be addressed like a single, large device • Throughput • All storage devices can contribute to the overall throughput • Reliability • Data can be stored redundantly to survive hardware failures • Devices usually have same age, fabrication defects within same batch Michael Kuhn Parallel I/O 9 / 44 RAID... Storage Devices and Arrays • Five dierent variants initially • RAID 1: mirroring • RAID 2/3: bit/byte striping • RAID 4: block striping • RAID 5: block striping with distributed parity • New variants have been added • RAID 0: striping • RAID 6: block striping with double parity Michael Kuhn Parallel I/O 10 / 44 RAID 1 Storage Devices and Arrays RAID 1 • Improved reliability via mirroring A1 A1 • Advantages A2 A2 A3 A3 • One device can fail without losing data A4 A4 • Read performance can be improved • Disadvantages • Capacity requirements and costs are doubled • Write performance equals that of a single device Disk 0 Disk 1 [Cburnett, 2006a] Michael Kuhn Parallel I/O 11 / 44 RAID 5 Storage Devices and Arrays RAID 5 A1 A2 A3 Ap • Improved reliability via parity B1 B2 Bp B3 • Typically simple XOR C1 Cp C2 C3 • Advantages Dp D1 D2 D3 • Performance can be improved • Requests can be processed in parallel • Load is distributed across all devices Disk 0 Disk 1 Disk 2 Disk 3 [Cburnett, 2006b] Michael Kuhn Parallel I/O 12 / 44 RAID 5... Storage Devices and Arrays • Data can be reconstructed RAID 5 easily due to XOR • ?A = A1 ⊕ A2 ⊕ Ap, A1 A2 ?A Ap B1 B2 ?B B3 ?B = B1 ⊕ B2 ⊕ B3,... C1 Cp ?C C3 • Problems Dp D1 ?D D3 • Read errors on other devices • Duration (30 min in 2004, 17–18 h in 2020 for HDDs) Disk 0 Disk 1 Disk 2 Disk 3 • New approaches like declustered RAID [Cburnett, 2006b] Michael Kuhn Parallel I/O 13 / 44 Performance Assessment Storage Devices and Arrays • Dierent performance criteria • Data throughput (photo/video editing, numerical applications) • Request throughput (databases, metadata management) • Appropriate hardware • Data throughput • HDDs: 150–250 MB/s, SSDs: 0.5–3.5 GB/s • Request throughput • HDDs: 75–100 IOPS (7,200 RPM), SSDs: 90,000–600,000 IOPS • Appropriate configuration • Small blocks for data, large blocks for requests • Partial block/page accesses can reduce performance Michael Kuhn Parallel I/O 14 / 44 Outline File Systems Parallel I/O Review Introduction and Motivation Storage Devices and Arrays File Systems Parallel Distributed File Systems Libraries Future Developments Summary Michael Kuhn Parallel I/O 15 / 44 Overview File Systems • File systems provide structure • Files and directories are the most common file system objects • Nesting directories results in hierarchical organization • Other approaches: tagging • Management of data and metadata • Block allocation is important for performance • Access permissions, timestamps etc. • File systems use underlying storage devices or arrays Michael Kuhn Parallel I/O 16 / 44 File System Objects File Systems • User vs. system view • Users see files and directories • System manages inodes • Relevant for stat etc. • Files • Contain data as byte arrays • Can be read and written (explicitly) • Can be mapped to memory (implicit) • Directories • Contain files and directories • Structure the namespace Michael Kuhn Parallel I/O 17 / 44 I/O Interfaces File Systems • Requests are realized through I/O interfaces • Forwarded to the file system • Dierent abstraction levels 1 fd = open("/path/to/file", ...); • Low-level functionality: POSIX etc. 2 nb = write(fd, data, • High-level functionality: NetCDF etc. 3 sizeof(data)); • Initial access via path 4 rv = close(fd); • Aerwards access via file descriptor (few 5 rv = unlink("/path/to/file"); exceptions) • Functions are located in libc • Library executes system calls Michael Kuhn Parallel I/O 18 / 44 Virtual File System (Switch) File Systems • Central file system component in the kernel • Sets file system structure and interface • Forwards applications’ requests based on path • Enables supporting multiple dierent file systems • Applications are still portable due to POSIX • POSIX: standardized interface for all file systems • Syntax defines available operations and their parameters • open, close, creat, read, write, lseek, chmod, chown, stat etc. • Semantics defines operations’ behavior • write: “POSIX requires that a read(2) which can be proved to occur aer a write() has returned returns the new data. Note that not all filesystems are POSIX conforming.” Michael Kuhn Parallel I/O 19 / 44 VFS... [Werner Fischer and Georg Schönberger, 2017] File Systems mmap (anonymous pages) Applications (processes) malloc ... stat(2) open(2) read(2) write(2) chmod(2) VFS Block-based FS Network FS Pseudo FS Special ext2 ext3 ext4 xfs NFS coda purpose FS Direct I/O proc sysfs Page btrfs ifs iso9660 smbfs ... (O_DIRECT) pipefs futexfs tmpfs ramfs cache gfs ocfs ... ceph usbfs ... devtmpfs Stackable FS ecryptfs overlayfs unionfs FUSE userspace (e.g. sshfs) network stackable (optional) struct bio - sector on disk BIOs (block I/Os) Devices on top of “normal” BIOs (block I/Os) - sector cnt block devices - bio_vec cnt drbd LVM - bio_vec index device mapper mdraid - bio_vec list dm-crypt dm-mirror ... dm-cache dm-thin bcache dm-raid dm-delay The Linux Storage Stack Diagram http://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram Created by Werner Fischer and Georg Schönberger License: CC-BY-SA 3.0, see http://creativecommons.org/licenses/by-sa/3.0/

Parallel Programming 2021-01-26

SUSE Linux Enterprise Server 15 SP2 Autoyast Guide Autoyast Guide SUSE Linux Enterprise Server 15 SP2

The Linux Storage Stack Diagram

Effective Cache Apportioning for Performance Isolation Under

Bcache (Kernel-Modul Seit 3.11 ...)

Yin, NUDT; Li Wang, Didi Chuxing; Yiming Zhang, Nicex Lab, NUDT; Yuxing Peng, NUDT

Think ALL Distros Offer the Best Linux Devsecops Environment?

Dm-Cache.Pdf

06 使用bcache为ceph OSD加速的具体实践 by 花瑞

Reliable Storage for HA, DR, Clouds and Containers Philipp Reisner, CEO LINBIT LINBIT - the Company Behind It

Oracle Linux and the Unbreakable Enterprise Kernel, Including Premier

Smart Prefetch

Linux Kernel User Documentation V4.20.0