Key Design Issues of Massive Optical Archival Storage Systems

Jie Yao 2020.10

Huazhong University of Science and Technology Wuhan National Laboratory for Optoelectronics Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication at Big Data Era

• Data volume rocket exponentially, but most of them quickly become cool and maybe are used from time to time • There is a significant high demand on large-scale storage systems with long- term reliability, low cost and low power consumption

3 Source:(Facebook) Cold Data

• Cold data: “written once – read rarely” access pattern • Large fraction of stored data is cold data (> 89%) in Facebook

4 Advantages of for long-term Digital Preservation

• Reliably store data for 50+ years – Blu-ray disc for 50 years,M-Disc for 1000 years – Discs are independent of drives • Trustable digital preservation – Tamper-proof Media (Write Once Read Many) • Resilient against disasters – such as flood, electromagnetic pulse, etc. • No special requirements for store environment – such as constant temperature and humidity, etc. • Low cost – just comprises of a piece of plastics plus multiple thin layers of coating films at several um Drawbacks of Optical Storage

• Low capacity for single disc compared to HDD/Tape – Maximum 128GB Single Side disc for now – large datasets have to be partitioned into more subsets to fit into optical discs – Data also need be spliced when reading • Slow read/write(burn) speed for single drive – Officially up to 6X (27MB/s) • Burning operations are generally opaque for ordinary users or applications • We need a solution at system level: – library: to overcome these shortcomings 6 Hardware Platform of Optical Disc Library

• Collaborate with Amethystum corporation in China • Converged design: – High-density and precision-electromechanical hardware architecture –12,240 optical discs in single 42U rack (1.2PB of raw storage when using 100GB/disc) • Right-provisioned for cold data workload: –1~2 Controllers – 2x 10 Gbps Network (Maximum 2GB/s bandwith) –12~48 ODDs

7 Agenda

• Background • Key Deign Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Architecture

• Performance Goal – High throughput, low response time • Current limitation – Moderate data throughput for single optical drive • 27MB/s (6X) for single/dual layer (25GB/50GB) BD-R disc * • 18MB/s (4X) for triple/quadruple layer (100GB/128GB) BD-R disc * – High random access delay – Robot fetching delay: 80 seconds for our platform • Solution – A hybrid parallel architecture * from official website www.blu-raydisc.info Optical Storage Architecture

RAM Network Interfaces

HDDs & SSDs ODDs Optical Discs

10 Architecture

• Hybrid and parallel (RAM, SSD, HDD, ODD storage hierarchy) – RAM (several GB) • Implement loop-buffers for smooth data feeding to ODDs – SSD disk array (hundreds of GB) • Temporate meta-data access – HDD disk array (dozens of TB) • Quickly store the incoming data • Extract data for burning • Read cache – ODD array (12 to 48 drives) for whole disc library • Concurrently burn data to discs Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Scheduling Scheme

• Flow control – Smooth the gap between input speed and optical disc burning speed – In worst case, artificially slows down the incoming data rate to avoid buffer overflow to adapt to the ODDs’ write bandwidth. • Scheduling granularity – 12 discs as a group in our platform – File based (not block based) • Spread files over discs to gain parallelism • Single disc can use UDF . (for data recovery in extreme cases) Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Virtualization

• One uniform logical view – Standard POSIX interface for easy use • A huge file volume ( 1.2PB in our case) , just like a network drive. – Hide all underlying medium-specific complexities • Medium grab • Identify process • Format process • Data splitting/splicing • Redundancy allocation • Writing process • Verify process • ... Virtualization

16 Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Tape Library Compatibility

• Base on our virtualization technology, the optical storage system can easily extend external interfaces. • To be compatible with existing tape librarys appliance: – A Virtual Tape Library (VTL) Interface is implemented – The whole optical library is emulated as a tape library. – A virtual tape is implemented as a file in the virtualized storage space. Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Meta-data layout

• Small but frequently accessed – Bufferd and periodically flushing • Avoid failure of single point – Distributed layout • Self-described – Meta-data should be adjacent with described data • Pseudo overwrite – Provide functionality for file updating on write-once medium Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Full-text indexing, data security and integrity

• Full-text indexing – The time-consuming indexing operations are de-coupled from the critical I/O path • Encryption, signature functions are needed on the critical I/O path. – Performance consideration • GPGPU • ASIC • Cpu extension instructions acceleration Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication High availability

• The error rate of single disc cannot meet the enterprise level requirement (1x10-21) • Two level erasure coding – Intra-disc • Reserved physical region for redundancy – Inter-disc • Disc-group based • Data layout – Distributed layout is not needed (RAID5/6) – Dedicate disc(s) for redundancy – Single disc still use UDF Agenda

• Background • Key Design Issues – Hybrid Architecture – Optimized Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout strategy – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication Data De-duplication

• The high redundancy in archival data is proved. • Tradeoff between high de-duplication ratio and data retrieve throughput – Thousands of discs .vs. dozens of drives – Cannot directly port the technologies from magnetic storage • Performance consideration – SSD layer: indexing – HDD layer: fingerprint Conclusion

• We discussed the following main design issues – Architecture – Scheduling Scheme – Virtualization – Linear Tape Library Compatibility – Meta-data layout – Full-text indexing – Data Security and Integrity – High availability – Data de-duplication • These issues should be considered and addressed when developing the massive optical archival storage systems. Thank You!