SFD15 - WD Tegile

@WesternDigiDC

Narayan Venkat

Phil Bullinger (SVP DC Systems) ex-EMC (Isilon), Oracle, Engenio Introduction to Data Centre Systems

13500+ active patents worldwide $26B Market capitalisation Storage systems, SSDs, HDDs, embedded and removable flash memory

Roughly half of the world’s data is stored on products

SanDisk, WD, HGST, G-Technology, upthere, Tegile

Strategy and Capabilities

Ultrastar - Flash and HDD Platform and Integration Services ActiveScale - Object Storage, Geo-distributed IntelliFlash - Unified Block / File Storage, NVMe, All-Flash and Hybrid

Density Capacity Durability Integrity Performance Manageability

*Narayan Venkat One Flash Platform, Any workload - Test/Dev - Messaging - Collaboration - Warehousing - Analytics - OLTP

Balanced Performance Capacity Media (HDD, MLC, TLC)

Extreme Performance Performance Media (SCM, NVMe, eMLC)

Hybrid - 10% Flash Flash-Hybrid - 30% - 50% Flash All-Flash - 100% Flash NVMe - 100% NVMe Flash

One OS, Feature Set, User Experience

IntelliFlash Product Portfolio Hybrid systems targeted at secondary storage systems.

IntelliFlash Operating System

Flash Optimized Software Architecture Management Flexibility - integration with vSphere, Hyper-V, OpenStack, KVM - Automated Call-home, Web UI, RESTful APIs Protocol Choice - Multiple protocol access - FC, iSCSI, NFS, SMB-3, VVOLs Data Services - Snapshots, Clones, Inline Compression, Inline deduplication - Replication, Disaster Recovery Metadata Acceleration - Classification, Separation, and Placement of Metadata and Application data - Caching, Aggregation, and Scaling for high-speed storage operations Media Optimization - Delivers media resiliency, flash wear, and data integrity - Media-optimized protection, data layout and storage pooling for high performance Physical Media - Mixture of hard disks, dense flash, performance flash, persistent memory and dynamic RAM

Shailendra Tripathi Architecture and Design

Run every workloads at the speed of persistent memory

Memory Tier Performance Tier Capacity Tier

Categorising Physical Devices by Performance into Different Classes Fully Distributed Storage Architecture

Read and Write I/O Flow Architecture

IntelliFlash Flash Media Optimization Looking Ahead

*Roger Weeks Amplidata

ActiveScale - Archive and Backup - Active Data for Analytics - Data Forever Architecture - Versioning - Encryption - Replication - Single Pane Management - S3 Compatible APIs - Multi-Geo Availability Zones - Scale Up and Scale Out ActiveScale Architecture Durable - BitSpread erasure coding - BitDynamics data integrity Flexible - Single site scale-up and scale-out - Two+ sites asynchronous replication - Three site availability zones Scale-out - Metadata and data separate - Distributed system nodes store metadata - Columns of storage nodes store data

BitSpread - Dynamic Data Placement Local - data does not move after ingest Performance - predictable across workloads Resilient - highly durable data

ActiveScale EC - http://www.hgst.com/sites/default/files/resources/WP34- ActiveScale-Erasure-Coding-Technology.pdf

BitDynamics - Continuous data Integrity Background - verification process always running Performance - not impacted by verification or repair Automatic - all repairs happen with no intervention

GeoSpread - Availability Zones Single - Distributed erasure coded copy Available - Can sustain the loss of an entire site Efficient - Better than 2 or 3 copy replication

ActiveScale Replication Create Regions Bucket - asynchronous replication base Any-Any - All active scale systems Choose - the number of sites you need

ActiveScale Systems P100 - start as low as 720TB, goes to 18PB. 17x 9s data durability, 4.6KVA typical power consumption X100 - 5.4PB in a rack, 840TB - 52PB, 17x 9s data durability, 6.5KVA typical power consumption Scale out to 9 expansion racks, 52PB scale out per namespace

Use Cases M&E - Media Archive - Tape replacement and augmentation - Transcoding - Playout Life Sciences - Bio imaging - Genomic Sequencing Analytics

Mike McWhorter - Senior Technologist, DCS Field Applications Engineering

S3A - S3 Adapter for Hadoop HDFS Triple Replication

Add storage to Hadoop before S3A? Server scale out

Data locality - data is stored closer to the processor for better performance The active data is your Working Set (stored on HDFS), everything else can sit on the object store