ViewBox: Integrating Local Systems with Cloud Storage Services Yupu Zhang, Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

Cloud-Based Our Solution - ViewBox File Synchronization Services integrated file system and cloud storage • Incredibly popular: Dropbox has 100 million+ users • Local detection + Cloud-aided recovery • Back up files to the cloud − Rely on strong local file system to detect problems • Synchronize files across multiple clients Data Pollution − Utilize cloud data to recover from local failures Is Your Data Really Safe? Crash/Causal file system state = correct state Inconsistency 2 • Orchestrated synchronization based on views • Local file system is the weakest − Views: In-mem snapshots of valid file system state − Corruption and inconsistency are exposed 2 2 1 2 − Expose file system’s views so that client sees file system state ≠ correct state 8 what file system sees • Ad-hoc synchronization is harmful 2 cloud state = file system state − Sync client sees what regular application sees, but • Built around ext4 and two services: Dropbox and Seafile not what file system sees Many copies do NOT − than 5% overhead for workloads cloud state ≠ file system state 8 always your data safe − Up to 30% reduction of sync in some cases

ViewBox Architecture ᬅ ext4-cksum ᬆ Cloud Helper

Local Cloud • User-level daemon Super Group Block Inode Inode Data block Descriptors Bitmap Bitmap Table Region Blocks − to local FS through ioctl Other Cloud Sync Client − Communicate with cloud through web API Application Helper

• Pre-allocated checksum region (~0.1% overhead) • Recover from corruption View − Each checksum maps to a data block − Fetch correct block from cloud FS’s view Manager − 32-bit CRC checksum per 4KB block • Recover from crash • Detect data corruption & inconsistency − Recover inconsistent files inotify ext4-cksum − Rollback FS to latest synced view on cloud

ᬇ View Manager

• Local detection (ext4-cksum) Tech #1 - Cloud Journaling Tech #2 - Incremental Snapshotting − Detect corruption & inconsistency using cksum Treat cloud storage as external journal Decouple namespace and data − Initiate recovery − Local FS is dedicated to the entire sync folder Synced Views 4 Synced Views 4 Frozen View 4 5 Frozen View 4

• Cloud-aided recovery (Cloud Helper) namespace Active View 4 5 Active View 4 5 − Recover from corruption and crashes using changed file

synchronized views on cloud FS Epoch E0 E1 E2 E3 FS Epoch E0 E1 E2 Create frozen views FS epochs Log namespace changes and data changes • View-based synchronization (View Manager) − Basis for consistency and correct recovery Synced Views 4 5 Synced Views 4 Re-generate inotify events − Present file system’s view to sync service namespace Frozen View Frozen View − Other applications’ view remains the same 5 6 4 changed file 5 (COW) namespace Active View 6 Active View 6 • Three types of views changed file E E E E − Active view (local) => current FS state FS Epoch E0 E1 E2 3 FS Epoch 0 1 2 − Frozen view (local) => last snapshot in memory Upload frozen views to cloud Apply namespace changes to last frozen view − Synced views (on cloud) => uploaded views Store multiple views on cloud for recovery Data is left in FS, but marked COW

Evaluation Photo Viewing Photo Viewing Photo Editing ext4 ViewBox ext4 ViewBox ext4 ViewBox • Reliability => fault injection 180 700 3000 160 − ViewBox is able to detect corruption 600 2500 140 500 and recover from it using cloud data 120 2000 − Upon a crash, ViewBox downloads 100 400 1500 consistent versions of files from cloud 80 300 60 1000 200 − Causal ordering is preserved (sec) Runtime Sync time (sec) Sync 40 time (sec) Sync 500 20 100 • Performance => trace replay 0 0 0 − Replay iphoto_view and iphoto_edit Dropbox Seafile Dropbox Seafile Dropbox Seafile traces from iBench − Measure workload runtime and • ViewBox does not affect • Huge increase in sync time with • ViewBox and Seafile improve sync synchronization time foreground workload ViewBox and Dropbox, due to time due to reduced interference • Runtime overhead <5% Dropbox’s lacking of view-API from foreground updates