Key Design Issues of Massive Optical Archival Storage Systems
Jie Yao 2020.10
Huazhong University of Science and Technology Wuhan National Laboratory for Optoelectronics Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Data Storage at Big Data Era
• Data volume rocket exponentially, but most of them quickly become cool and maybe are used from time to time • There is a significant high demand on large-scale storage systems with long- term reliability, low cost and low power consumption
3 Source:(Facebook) Cold Data
• Cold data: “written once – read rarely” access pattern • Large fraction of stored data is cold data (> 89%) in Facebook
4 Advantages of Optical Storage for long-term Digital Preservation
• Reliably store data for 50+ years – Blu-ray disc for 50 years,M-Disc for 1000 years – Discs are independent of drives • Trustable digital preservation – Tamper-proof Media (Write Once Read Many) • Resilient against disasters – such as flood, electromagnetic pulse, etc. • No special requirements for store environment – such as constant temperature and humidity, etc. • Low cost – just comprises of a piece of plastics plus multiple thin layers of coating films at several um Drawbacks of Optical Storage
• Low capacity for single disc compared to HDD/Tape – Maximum 128GB Single Side disc for now – large datasets have to be partitioned into more subsets to fit into optical discs – Data also need be spliced when reading • Slow read/write(burn) speed for single drive – Officially up to 6X (27MB/s) • Burning operations are generally opaque for ordinary users or applications • We need a solution at system level: – Optical disc library: to overcome these shortcomings 6 Hardware Platform of Optical Disc Library
• Collaborate with Amethystum corporation in China • Converged design: – High-density and precision-electromechanical hardware architecture –12,240 optical discs in single 42U rack (1.2PB of raw storage when using 100GB/disc) • Right-provisioned for cold data workload: –1~2 Controllers – 2x 10 Gbps Network (Maximum 2GB/s bandwith) –12~48 ODDs
7 Agenda
• Background • Key Deign Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Architecture
• Performance Goal – High throughput, low response time • Current limitation – Moderate data throughput for single optical drive • 27MB/s (6X) for single/dual layer (25GB/50GB) BD-R disc * • 18MB/s (4X) for triple/quadruple layer (100GB/128GB) BD-R disc * – High random access delay – Robot fetching delay: 80 seconds for our platform • Solution – A hybrid parallel architecture * from official website www.blu-raydisc.info Optical Storage Architecture
RAM Network Interfaces
HDDs & SSDs ODDs Optical Discs
10 Architecture
• Hybrid and parallel (RAM, SSD, HDD, ODD storage hierarchy) – RAM (several GB) • Implement loop-buffers for smooth data feeding to ODDs – SSD disk array (hundreds of GB) • Temporate meta-data access – HDD disk array (dozens of TB) • Quickly store the incoming data • Extract data for burning • Read cache – ODD array (12 to 48 drives) for whole disc library • Concurrently burn data to discs Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Scheduling Scheme
• Flow control – Smooth the gap between input speed and optical disc burning speed – In worst case, artificially slows down the incoming data rate to avoid buffer overflow to adapt to the ODDs’ write bandwidth. • Scheduling granularity – 12 discs as a group in our platform – File based (not block based) • Spread files over discs to gain parallelism • Single disc can use UDF file system. (for data recovery in extreme cases) Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Virtualization
• One uniform logical view – Standard POSIX interface for easy use • A huge file volume ( 1.2PB in our case) , just like a network drive. – Hide all underlying medium-specific complexities • Medium grab • Identify process • Format process • Data splitting/splicing • Redundancy allocation • Writing process • Verify process • ... Virtualization
16 Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Tape Library Compatibility
• Base on our virtualization technology, the optical storage system can easily extend external interfaces. • To be compatible with existing tape librarys appliance: – A Virtual Tape Library (VTL) Interface is implemented – The whole optical library is emulated as a tape library. – A virtual tape is implemented as a file in the virtualized storage space. Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Meta-data layout
• Small but frequently accessed – Bufferd and periodically flushing • Avoid failure of single point – Distributed layout • Self-described – Meta-data should be adjacent with described data • Pseudo overwrite – Provide functionality for file updating on write-once medium Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Full-text indexing, data security and integrity
• Full-text indexing – The time-consuming indexing operations are de-coupled from the critical I/O path • Encryption, signature functions are needed on the critical I/O path. – Performance consideration • GPGPU • ASIC • Cpu extension instructions acceleration Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication High availability
• The error rate of single disc cannot meet the enterprise level requirement (1x10-21) • Two level erasure coding – Intra-disc • Reserved physical region for redundancy – Inter-disc • Disc-group based • Data layout – Distributed layout is not needed (RAID5/6) – Dedicate disc(s) for redundancy – Single disc still use UDF Agenda
• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Data De-duplication
• The high redundancy in archival data is proved. • Tradeoff between high de-duplication ratio and data retrieve throughput – Thousands of discs .vs. dozens of drives – Cannot directly port the technologies from magnetic storage • Performance consideration – SSD layer: indexing – HDD layer: fingerprint Conclusion
• We discussed the following main design issues – Architecture – Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication • These issues should be considered and addressed when developing the massive optical archival storage systems. Thank You!