SFD15 - WD Tegile
@WesternDigiDC
Narayan Venkat
Phil Bullinger (SVP DC Systems) ex-EMC (Isilon), Oracle, Engenio Introduction to Data Centre Systems
13500+ active patents worldwide $26B Market capitalisation Storage systems, SSDs, HDDs, embedded and removable flash memory
Roughly half of the world’s data is stored on Western Digital products
SanDisk, WD, HGST, G-Technology, upthere, Tegile
Strategy and Capabilities
Ultrastar - Flash and HDD Platform and Integration Services ActiveScale - Object Storage, Geo-distributed IntelliFlash - Unified Block / File Storage, NVMe, All-Flash and Hybrid
Density Capacity Durability Integrity Performance Manageability
*Narayan Venkat One Flash Platform, Any workload - Test/Dev - Messaging - Collaboration - Warehousing - Analytics - OLTP
Balanced Performance Capacity Media (HDD, MLC, TLC)
Extreme Performance Performance Media (SCM, NVMe, eMLC)
Hybrid - 10% Flash Flash-Hybrid - 30% - 50% Flash All-Flash - 100% Flash NVMe - 100% NVMe Flash
One OS, Feature Set, User Experience
IntelliFlash Product Portfolio Hybrid systems targeted at secondary storage systems.
IntelliFlash Operating System
Flash Optimized Software Architecture Management Flexibility - integration with vSphere, Hyper-V, OpenStack, KVM - Automated Call-home, Web UI, RESTful APIs Protocol Choice - Multiple protocol access - FC, iSCSI, NFS, SMB-3, VVOLs Data Services - Snapshots, Clones, Inline Compression, Inline deduplication - Replication, Disaster Recovery Metadata Acceleration - Classification, Separation, and Placement of Metadata and Application data - Caching, Aggregation, and Scaling for high-speed storage operations Media Optimization - Delivers media resiliency, flash wear, and data integrity - Media-optimized protection, data layout and storage pooling for high performance Physical Media - Mixture of hard disks, dense flash, performance flash, persistent memory and dynamic RAM
Shailendra Tripathi Architecture and Design
Run every workloads at the speed of persistent memory
Memory Tier Performance Tier Capacity Tier
Categorising Physical Devices by Performance into Different Classes Fully Distributed Storage Architecture
Read and Write I/O Flow Architecture
IntelliFlash Flash Media Optimization Looking Ahead
*Roger Weeks Amplidata
ActiveScale - Archive and Backup - Active Data for Analytics - Data Forever Architecture - Versioning - Encryption - Replication - Single Pane Management - S3 Compatible APIs - Multi-Geo Availability Zones - Scale Up and Scale Out ActiveScale Architecture Durable - BitSpread erasure coding - BitDynamics data integrity Flexible - Single site scale-up and scale-out - Two+ sites asynchronous replication - Three site availability zones Scale-out - Metadata and data separate - Distributed system nodes store metadata - Columns of storage nodes store data
BitSpread - Dynamic Data Placement Local - data does not move after ingest Performance - predictable across workloads Resilient - highly durable data
ActiveScale EC - http://www.hgst.com/sites/default/files/resources/WP34- ActiveScale-Erasure-Coding-Technology.pdf
BitDynamics - Continuous data Integrity Background - verification process always running Performance - not impacted by verification or repair Automatic - all repairs happen with no intervention
GeoSpread - Availability Zones Single - Distributed erasure coded copy Available - Can sustain the loss of an entire site Efficient - Better than 2 or 3 copy replication
ActiveScale Replication Create Regions Bucket - asynchronous replication base Any-Any - All active scale systems Choose - the number of sites you need
ActiveScale Systems P100 - start as low as 720TB, goes to 18PB. 17x 9s data durability, 4.6KVA typical power consumption X100 - 5.4PB in a rack, 840TB - 52PB, 17x 9s data durability, 6.5KVA typical power consumption Scale out to 9 expansion racks, 52PB scale out per namespace
Use Cases M&E - Media Archive - Tape replacement and augmentation - Transcoding - Playout Life Sciences - Bio imaging - Genomic Sequencing Analytics
Mike McWhorter - Senior Technologist, DCS Field Applications Engineering
S3A - S3 Adapter for Hadoop HDFS Triple Replication
Add storage to Hadoop before S3A? Server scale out
Data locality - data is stored closer to the processor for better performance The active data is your Working Set (stored on HDFS), everything else can sit on the object store