E8 Storage, Proprietary and Confidential About E8 Storage
Total Page:16
File Type:pdf, Size:1020Kb
E8 – Serving Data Hungry Application October 2018 Ziv Serlin - VP Architecture & Co-founder Presenter In E8 Mr. Serlin was involve on architect E8 product for the 1st two years, recently Mr. Serlin is focusing on the extensive engagement with E8 end customers exploring, helping and defining with the customers their E8 deployment models, In Ziv Serlin addition to that Mr. Serlin lead the strategic engagement of E8 HW partners. An Co-Founder & VP Architecture expert in designing complex storage systems, Mr. Serlin has extensive experience as IBM XIV, Primary Data, Intel a system architect at Intel and was the HW R&D manager of IBM/XIV product (scale out high-end enterprise block storage system). He earned BSC in Computer Engineering at the Technion. He hold storage patent from his work at IBM. 2 ©2018 E8 Storage, Proprietary and Confidential About E8 Storage • Founded in November 2014 by storage industry veterans from IBM-XIV • Leading NVMe over Fabrics solution in the market • In production with customers in U.S. and Europe, expanding now to Asia • Awarded 10 patents (granted) + 4 pending for E8 architecture • World-wide Team: • R&D in Tel-Aviv • Sales & marketing in U.S., Europe and Asia • Flash Memory Summit 2016 & 2017 Most Innovative Product Award 3 ©2018 E8 Storage, Proprietary and Confidential Storage Growth is Data Growth 5% Business volume grows 5% a year 60% The size of their database grows 60% a year 100% Performance load on that database grows 100% year Big Data Analytics Image Recognition Real-time Security AI/ML Video 4K Post-Prod E8 Storage Accelerates Data Hungry Applications 4 ©2018 E8 Storage, Proprietary and Confidential The Problem (Part 1): Why not use local SSDs in servers? Local SSDs today achieve latency 10x faster than all-flash arrays • “The DevOps Problem” Local SSD AFA • Things that work on laptops become 10x slower on the production infrastructure 0.1ms 1ms • “The islands of storage problem” ??? • Local SSDs in servers mean inefficient capacity utilization, no sharing of SSD data • Local SSDs couple storage and compute • Server purchasing requires upfront investment in SSDs 5 ©2018 E8 Storage, Proprietary and Confidential The Problem (Part 2): Why not use SSDs in SAN/NAS? • Not enough performance • Traditional all-flash arrays (SAN/NAS) get 10%-20% of the potential performance of NVMe SSDs • Classic “scale-up” bottleneck • Dual controller bottleneck • All I/O gated by controller CPU • Switching the SSDs from SAS to NVMe cannot alleviate the controller bottleneck First gen architectures cannot unlock the full performance of NVMe 6 ©2018 E8 Storage, Proprietary and Confidential The Solution: NVMe Over Fabrics • By attaching SSDs over the network, rather 4 Year Cost of SSDs in a Data-Center than using local SSDs in each server, Customer is able to: $80 • Reduce total acquisition cost of SSDs by 60% $60 • Defer capacity purchases until needed $40 • Pay as you grow – at lower future prices $20 $30 • Improve capacity utilization of SSDs to 90% $26 $20 • Shared volumes vs local replicas $12 • Easily add capacity only as needed Moving to shared NVMe storage delivers strong TCO for large scale customer deployments 7 ©2018 E8 Storage, Proprietary and Confidential E8 Storage Unlocks the Performance of NVMe 10M AFA with 24 SSDs 40 1000 Single NVMe SSD E8 24 NVMe SSDs See the Demo! 100 120 750K 2.4 3.1 300K IOPS (@4K read) Read/Write Bandwidth (GB/s) Read Latency (us)(@4K) 8 ©2018 E8 Storage, Proprietary and Confidential E8 Storage Product Overview 9 ©2018 E8 Storage, Proprietary and Confidential What is NVMe™? (Non-Volatile Memory Express) Communication protocol designed specifically for flash storage • High performance, low latency • Efficient protocol with lower stack overhead • Exponentially more queues / commands than SAS • Parallel processing for SSDs vs serial for HDDs • Support for fabrics (NVMe-oF™) • Originally designed for PCIe (internal to servers) • Expands support for other transport media • RDMA Based: RoCE, iWARP, Infiniband I/O Queues Commands per Queue • Non-RMDA: FC, TCP SAS 1 256 • Maintains NVMe protocol end to end NVMe 65,535 64,000 10 ©2018 E8 Storage, Proprietary and Confidential The E8 Storage Difference A new architecture built specifically for high performance NVMe • Unleash the parallelism of NVMe SSDs • Direct drive access for near line rate performance • Separation of data and control paths; no controller bottleneck • E8 Agent offloads up to 90% of data path operations • Simple, centralized management • Intuitive management GUI for host / volume management • E8 Agents auto-discover assigned LUNs • Scalable in multiple dimensions • Up to 126 host agents per E8 Controller • Up to 8 Controllers per host 2PB in single management 11 ©2018 E8 Storage, Proprietary and Confidential Designed for Availability and Reliability No single point of failure anywhere in the architecture • Host agents operate independently Host Servers with E8 Host Agents • Failure of one agent (or more) does not affect other agents • Access to shared storage is not impacted • RAID-6/RAID-5/RAID-10 data protection • Network multi-pathing • Enclosure high availability • Option 1: HA enclosure + dual-ported SSDs • Option 2: Cross-enclosure HA + single-ported SSDs 12 ©2018 E8 Storage, Proprietary and Confidential E8 Storage Customers and Use-Cases 13 ©2018 E8 Storage, Proprietary and Confidential Genomic Acceleration with E8 Storage Shared NVMe as a fast tier for parallelizing genomic processing • "We were keen to test E8 by trying to integrate it with our Univa Grid Engine cluster as a consumable resource of ultra-performance scratch space. Following some simple tuning and using a single EDR link we were able to achieve about 5GB/s from one node and 1.5M 4k IOPS from one node. Using the E8 API we were quickly able to write a simple Grid Engine prolog/epilog that allowed for a user-requestable scratch volume to be automatically created and destroyed by a job. The E8 box behaved flawlessly and the integration with InfiniBand was simpler than we could have possibly expected for such a new product." • - Dr. Robert Esnouf, Director of Research Computing Oxford Big Data Institute + Wellcome Center for Human Genetics From 10 hours per genome to 1 hour for 10 genomes! 14 ©2018 E8 Storage, Proprietary and Confidential E8 for AI/ML with IBM GPFS and Nvidia Shared NVMe Accelerates Deep Learning Cost ($/GBu) GPU Farm: Nvidia DGX-1 • A GPU cluster requires 0.5PB-1PB of shared fast storage • Up to 8 GPUs per node • GPFS Client + E8 Agent run on • But GPU servers have no real x86 within GPU Server estate for local SSDs… • Up to 126 GPU nodes in cluster • E8 Storage provides concurrent Pure Storage IBM GPFS + ESS E8 + GPFS access for 1000 (!) GPUs per FlashBlade cluster Mellanox 100G IB Images per second, per GPU node • 10x Performance of 3000 (ResNet-50 Image Recognition Training) Pure Storage FlashBlade 2500 Shared NVMe Storage 2000 • 4x Performance of IBM ESS SSD • E8-D24 2U24-HA 1500 Pure Storage Appliances, for half the cost IBM GPFS + ESS • Dual-port 2.5” NVMe Drives 1000 E8+GPFS • Up to 184TB (raw) per 2U 500 Patented Distributed RAID6 • 0 1 GPU node 10 GPU nodes 100 GPU nodes 15 ©2018 E8 Storage, Proprietary and Confidential E8 for AI/ML with IBM GPFS and Nvidia Shared NVMe Accelerates AI/ML Workloads Cost ($/GBu) 7 6 • GPU Environment is Perfect for E8 5 • x86 processors on GPU nodes are free: run GPFS client + E8 agent 4 • GPU cluster typically connected with 100G IB or ROCE 3 2 • GPU nodes have no real estate for SSDs: requires external shared SSD solution 1 0 Pure Storage IBM GPFS + ESS Weka.IO (NVMe) E8 + GPFS • Highly Scalable, up to 1000 (!) GPUs in single cluster FlashBlade • 1-8 GPUs per node, up to 126 nodes per cluster • Scale number of NVMe drives and NVMe enclosures easily: 20TB – 2PB Epoch Time (Hours) 30 25 • 10x Performance of Pure Storage FlashBlade 20 • AI/ML data-sets incompatible with de-dup and compression 15 • Which also ruins performance… 10 5 0 • 4x Performance of IBM ESS SSD Appliances, for half the cost Pure Storage IBM GPFS + ESS Weka.IO (NVMe) E8 + GPFS • Open hardware architecture FlashBlade • No NSDs: Single hop from GPFS clients to SSDs through E8’s patented architecture • All network attach, no SAS cabling, no one-to-one mappings 16 ©2018 E8 Storage, Proprietary and Confidential E8 Storage accelerate GPU data hungry applications Better Throughput Better Throughput: E8 Storage processed more images per second than local SSD Lower Latency Lower Latency: Training time was faster with E8 Storage than local SSD 17 ©2018 E8 Storage, Proprietary and Confidential Using E8 with IBM Spectrum Scale Client Client Client Client • Scalable to larger configurations E8 Agent • Can mix connectivity depending on requirements • Standalone pool NSD NSD E8 Agent E8 Agent • Shared LUNs • LROC • Non-shared LUNs (direct connect clients only) IB/ RoCE • HAWC E8 MDS E8-D24 (Dual-port SSDs) E8 MDS SSD SSD SSD SSD SSD SSD 18 ©2018 E8 Storage, Proprietary and Confidential RAID-6 Fastest Shared Block Storage in the World • 0.57ms - Record breaking response time! • 45% lower ORT for the same builds • 8x lower latency on average • The power of Intel Optane • Ultra-low latency for data intensive apps 8x lower latency! • More performance, less hardware E8 Storage 24 NVMe SSDs 2U WekaIO 64 NVMe SSDs 8U Huawei 6800 F 60 SAS SSDs 14U * As of SPEC SFS®2014_swbuild results published August 2018. SPEC SFS2014 is the industry standard benchmark for file storage performance. See all published results at https://www.spec.org/sfs2014/results/