Scalable Cloud-Based Services for Flexible Storage: Implementation

IMPLEMENTATION GUIDE Service Provider Data Center For Disclosure under NDA Only Scalable Cloud-based Services for Flexible, Robust, Cost- Effective Storage Discover how to improve storage utilization, reduce cost and improve scalability with a Ceph*-based storage-as-a-service (STaaS) solution optimized for Intel® technology Introduction This implementation guide provides key learnings and configuration Storage-as-a-service (STaaS) uses software-defined storage (SDS) to abstract insights to integrate technologies storage software from the storage hardware. By providing a shared pool of storage with optimal business value. capacity that can be used across service offerings, SDS eliminates storage silos If you are responsible for… and helps improve utilization ratios. Intelligent, automated orchestration reduces • Technology decisions: operating costs and can speed provisioning from several weeks to a few minutes. You will learn how to implement a storage-as-a-service (STaaS) Ceph* is an open source STaaS solution that supports object, block and file storage. solution using Ceph*. You’ll Using an open source solution can help you lower costs and avoid vendor lock-in. also find tips for optimizing It also enables you to deploy new technology quickly. Ceph is well supported by performance with Intel® the open source community, and commercial distributions are also available. Intel, technologies and best practices the community and Independent Software Vendors (ISVs) have worked closely to for deploying Ceph. develop reference architectures and best practices for deploying a Ceph-based STaaS platform that is optimized to run on Intel® Xeon® processors and take advantage of other Intel® technologies such as Intel® SSD Data Center Family for NVMe* (Non-Volatile Memory Express*) Solid State Drives (SSDs), Intel® Ethernet Products and software optimizations such as Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) and Intel® Cache Acceleration Software (Intel® CAS). Overview Requirements Configuration Operation Use Cases Validation For Disclosure under NDA Only Implementation Guide | Scalable Cloud-based Services for Flexible, Robust, Cost-Effective Storage 2 Table of Contents Solution Overview Introduction . 1 A Ceph-based STaaS deployment consists of the Ceph software, several types of Solution Overview . 2 nodes (servers), and Intel software optimization products. Ceph Software. .2 Ceph Software Node Types. .2 Intel® Technologies ..................2 Ceph’s foundation is the Reliable Autonomic Distributed Object Store* (RADOS*), which provides your applications with object, block and file system storage in System Requirements . 3 a single unified storage cluster—making Ceph flexible, highly reliable and easy Software Requirements ..............3 for you to manage. Each one of your applications can use the object, block or Minimum Hardware Requirements ...3 file system interfaces to the same RADOS cluster simultaneously, which means Installation and Configuration . 5 your Ceph storage system serves as a flexible foundation for all of your data Get Ceph ............................5 storage needs. You can use Ceph for free because it is open source, and deploy Install Ceph .........................5 it on economical industry-standard hardware. Or you can opt for a commercially Deploy Storage Clusters .............6 supported Ceph distribution if you prefer. Deploy Ceph Clients .................6 The various storage access modes use different components of Ceph: Configuration Considerations ........6 • Object storage uses the Ceph Object Gateway daemon, radosgw* (RGW*). Ceph Operation and Utilization . 8 • File storage (CephFS*) can use a Ceph filesystem kernel driver or the user space Ceph Topologies .....................8 FUSE* client. Using Intel CAS ....................10 • Block storage uses RADOS Block Devices (RBDs*). Using Intel ISA-L ...................10 Integrating with OpenStack ........10 • All storage is ultimately stored by Ceph Object Storage Daemons (Ceph OSDs). Orchestration ..................... 11 • All data is stored as “objects” which are randomly distributed across the cluster Ceph-Based STaaS Use Cases . 11 by the CRUSH* (Controlled Replication Under Scalable Hashing*) algorithm. Validation . .. 13 To efficiently compute information about object placement and location, Ceph uses the CRUSH algorithm instead of a central lookup table. CRUSH enables Ceph Best Practices . 13 performance to scale linearly by ensuring that data is always retrieved directly from Planning. 13 the primary OSD where it is stored—avoiding bottlenecks created by centralized Nodes ............................. 13 metadata lookups. Journaling ......................... 14 Network ........................... 14 Node Types Summary . 14 As you build your Ceph cluster using the guidelines in this document, you will be References . 14 working with several types of “nodes” (sometimes referred to as “hosts”). A node is simply any single machine or server in a Ceph system. Appendix A: Ceph Tuning Details . 15 • Storage nodes (sometimes called “OSD nodes” or simply “OSDs”) are where the Solutions Proven by Your Peers . 17 actual data is stored. • Monitor nodes track the health and configuration of the Ceph cluster by maintaining copies of the cluster maps. • RGW nodes serve as HTTP proxies for object storage workloads. • Metadata nodes map the directories and filenames from CephFS to objects stored within RADOS clusters. • Client nodes request data. See Figure 1 for an overview of how all these fit together into a Ceph cluster. Intel® Technologies Several Intel technologies, both hardware and software, contribute to the reliability and performance of a Ceph-based STaaS solution. See the “References” section for links. • Intel® Xeon® processors and Intel® Xeon® processor Scalable family provide the compute power needed to process vast amounts of data. • Intel® SATA-based SSDs, Intel® NVMe*-based SSDs and Intel® Optane™ SSDs provide performance, stability, efficiency and low power consumption. For Disclosure under NDA Only Implementation Guide | Scalable Cloud-based Services for Flexible, Robust, Cost-Effective Storage 3 Ceh oftwre omponents App Host/VM Client RadosGW* RBD* CephFS* S3* Swift* Librados RADOS Metadata Monitors Servers Pool Pool Pool Pool n MD ON CRUS ap MD n ON n P P P P P n 1 n 1 n 1 n Cluster Node Cluster Node Cluster Node [OSDs] [OSDs] [OSDs] PG lacemen roup Figure 1 . Software Elements of a Ceph Cluster • Intel® Cache Acceleration Software (Intel® CAS) improves CephFS performance—by as much as 550 percent for reads and 400 percent for writes, compared to running CephFS without Intel CAS.1 • Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) enables Ceph to more easily use features of Intel Xeon processors for tasks such as data protection, data integrity and data security. Intel ISA-L enables Ceph to offer erasure coding without negatively affecting performance. • Intel® Ethernet Network Adapters, Controllers and Accessories enable agility within the data center to effectively deliver services efficiently and cost-effectively. Worldwide availability, exhaustive testing for compatibility, and 35 years of innovation mean that Intel Ethernet products are an excellent choice when building your Ceph clusters. System Requirements Software Requirements • Ceph software . The current version of Ceph is v12.2.4 Luminous RC. See Ceph Releases. • Compatible OS . Intel recommends you deploy Ceph on one of the three major Linux* distributions; SUSE Enterprise Storage 4*, Ubuntu*, or CentOS*/Red Hat Enterprise Linux* (RHEL*). And you should deploy Ceph on the latest releases. The choice of a specific distribution is often determined by your existing install base or a preference for interfaces like sysvinit, upstart, or systemd. For full details, visit Ceph Dependencies. • OpenStack* . Although optional, OpenStack integrates well with Ceph and extends its capabilities. See “Integrating with OpenStack” later in this document. Minimum Hardware Requirements2 Ceph was designed to run on industry-standard hardware, which makes building and maintaining PB-scale data clusters economically feasible. When planning your cluster hardware, you will need to balance a number of considerations, including failure domains and potential performance issues. Hardware planning should include distributing Ceph daemons and other processes that use Ceph across many hosts. Generally, we recommend running Ceph daemons of a specific type on a host configured for that type of daemon. We recommend using other hosts for processes that use your data cluster (such as OpenStack or CloudStack*). For Disclosure under NDA Only Implementation Guide | Scalable Cloud-based Services for Flexible, Robust, Cost-Effective Storage 4 The hardware requirements of Ceph depend on the I/O workload. Use the following hardware recommendations as a starting point only. The recommendations given in this section are on a per- process basis. If several processes are co-located on the same server, each process’ CPU, RAM, disk and network requirements should be totaled. CPU Requirements: Ceph metadata servers (MDS) dynamically redistribute their load, which is CPU- intensive. Therefore, your MDS should have significant processing power (for example, quad-core processors or better). Ceph OSDs run the RADOS service, calculate data placement with CRUSH, replicate data and maintain their own copy of the cluster map. Therefore, storage nodes should have a reasonable amount

Scalable Cloud-Based Services for Flexible Storage: Implementation

Optimizing the Ceph Distributed File System for High Performance Computing

Proxmox Ve Mit Ceph &

Red Hat Openstack* Platform with Red Hat Ceph* Storage

Red Hat Data Analytics Infrastructure Solution

Unlock Bigdata Analytic Efficiency with Ceph Data Lake

Evaluation of Active Storage Strategies for the Lustre Parallel File System

Design and Implementation of Archival Storage Component of OAIS Reference Model

IBM Red Hat Ansible Health Check Is Your Red Hat Ansible Environment Working As Hard As You Are?

Release 3.11.0

Ceph Done Right for Openstack

Shared File Systems: Determining the Best Choice for Your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC

Cs 5412/Lecture 24. Ceph: a Scalable High-Performance Distributed File System