DRESS Code for the Storage Cloud Distributed Replication-Based Exact Simple Storage Salim El Rouayheb EECS Department University of California, Berkeley

DRESS Code For The Storage Cloud Distributed Replication-based Exact Simple Storage Salim El Rouayheb EECS Department University of California, Berkeley Joint work with: Sameer Pawar Nima Noorshams Prof. Kannan Ramchandran Talk Roadmap I Distributed Storage Background Failures & Repairs II & Motivation Reliability & Security Low Complexity Redundancy & Codes DRESS Uncoded repair Projective planes Codes Bins & balls III Malicious attacks Network coding Separation result Security Secure capacity Data Everywhere Coffee shop Airport Office Home • Increasing popularity of data-enabled devices and Hot spots, 4G, Wimax, Wi-fi etc, => “always online” infrastructure. • Truly mobile users • Want to be able to seamlessly access and share data anytime anywhere Off To The Cloud… • Want our files to “follow” us wherever we go Put them in the cloud Get them from the cloud Data Centers P2P Systems Cloud Applications Amazon Web Services Wuala: P2P Cloud Storage • Trade idle local disk space on your computer for online storage • First 1GB is free Cloud storage = local donated storage * percentage of online time 70 GB 100 GB 70% Can We Trust the Cloud? Main Challenge: Cloud is formed of unreliable and untrusted components CloudyDistributed with a Chance Storage of Failure Data center Server Disk failure virus user peer Component failures are the norm rather then exception. • Nodes are unreliable by nature: - Hard disk failure - Network failure Solution: store data - Peer churning redundantly - Malicious nodes Multiple System Considerations Coding for cloud storage Decoding complexity C rate “Traditional” Coding Theory Escher “Impossible” Cube Different Codes for Different Folks Replication (4,2) Erasure code New node New node n n 1 a a 1 a a a b 2MB File 1MB Download a+b n2 a n2 b a b 1 MB Download n n 2MB 3 b 3 a+b n n 4 b 4 a+2b tolerates 1 failure tolerates 2 failures Reliability Bandwidth Complexity/ Disk I/O Reducing Bandwidth Overhead • Repair BW can be reduced from 2 to 1.5 MB • Helper nodes send random linear combinations: random linear network coding .5MB • New data “functionally equivalent” to lost a 1 data. a2 • Repair BW can be reduced from 1.5 MB to a1 a2 b1 b2 1.2 MB if each node is willing to store 20% more. a, b b1 b2 b +b 2MB 1 2 a1+b1 a +b +a +b a2+b2 1 1 2 2 .5MB d=3 .5MB a +2b 1 1 a1+a2 .5MB 2a2+b2 a +2b +2(a +2b ) 1 1 2 2 a1+4a2 Dimakis, Wu & Ramchandran ‘07 Replacement node Fundamental Storage/Bandwidth Tradeoff cost Internet, Wireless MBR Min. Bandwidth Regime Storage Archival applications Min. Storage Regime MSR MDS (Dimakis et al. ’07) Repair BW We Want More: Exact Repair? • We want exact repair: a1 – Maintain a systematic copy of the n 1 a data 2 – Reduce system overhead b1 n2 – Security b2 ? a1 a1+b1 n3 a2 • Exact repair makes the problem 2a2+b2 Exact much harder 2a +b repair n4 1 1 – Relation to non-multicast network a2+b2 coding • Recent important progress: Rashmi et al. (Product-Matrix Codes), Suh & Ramchandran ‘10, Cadambe et al. ‘10 Storage/Bandwidth Tradeoff This Talk cost Min. Bandwidth Regime Recent Result: (Shah et al ’10) MBR Only MBR and MSR points are achievable under exact repair Storage Interference alignment MSR techniques Min. Storage Regime Repair BW Can we also reduce Disk I/O & Energy? Bandwidth-efficiency comes with a price-tag. a 1) Excessive data reading & 1 a computations 2 2) Read/write bandwidth of a hard b1 b +b disk is the bottleneck (100 MB/s vs. b2 1 2 New node network speed ~1000 MB/s) a +2a +b +b a1 a1+b1 1 2 1 2 a2 2a2+b2 Can we have exact codes with minimum overheads of: 2a1+b1 1) Repair bandwidth a2+b2 2) Disk reads 3) Computations? Contributions: Yes We Can • DRESS codes: Exact Repair Minimum disk reads for repair No data processing for repair: uncoded repair Maximum Distance Separable MDS Code Fractional Repetition file Distributed Replication-based Exact Storage Simple Storage Cloud • Surprise: no loss of bandwidth efficiency • Bonus: well-suited for security applications • System price: repair not as flexible w.r.t. who can help Talk Roadmap I Background II & Motivation Uncoded repair DRESS Projective planes Steiner Systems Codes Bins & balls III Security DRESS Code Example d=3MB (n,k)=(7,3) 1 2 3 n Max file size= 6MB: cut-set bound 1" (Dimakis et al.) 3 4 5 n 2" K=3 parity-check File 6 MB 1 5 6 n3" 1 2 3 4 5 6 MDS 1 2 3 4 5 6 7 Code 1 4 7 n 4" User 2 5 7 Gets 6 different n New parameter 5" packets repetition factor r=3 3 6 7 n6" 2 4 6 n7" n=7 nodes Repairing DRESS Codes failure (n,k)=(7,3) parity-check New node File 6 MB 1 2 3 1 2 3 n1" 1 2 3 4 5 6 7 3 4 5 n2" ! Repetition r=3can tolerate up to 2 simultaneous 1 5 6 n3" failures ! Table based repair 1 4 7 n4" •! If 3 nodes fail? 2 5 7 –! Download complete file n5" –! Reconstruct the lost data 3 6 7 –! Rare bad even n6" ! Min repair bandwidth 2 4 6 n7" ! Min disk I/O calls ! Uncoded repair Why “Traditional” Repetition Won’t Work? Max file size= 6MB: cut-set bound 1 2 3 (Dimakis et al.) n1" 1 2 3 parity-check n2" File 6 MB 1 2 3 4 5 6 7 1 2 3 n 3" user gets ONLY 3MB 4 5 6 (n,k)=(7,3) n4" repetition factor r=3 4 5 6 n5" 4 5 6 n6" 7 7 7 n7" Smart Packet Placement How should we place the packet replicas? Goal: maximize file size k(k 1) Max file size: C = kd − (Dimakis et al.) − 2 Penalty term due # nodes contacted to intersection by user # packets/node k(k 1) k Rule: Make sure that any − = two nodes have at most one 2 2 packet in common Where Did That Solution Come From? Lines in projective planes 1 2 3 intersect in exactly one point n1" 3 4 5 n2" 1 5 6 n3" 1 4 7 n4" 2 5 7 Fano Plane n5" Projective plane of order 2 3 6 7 n6" Points Packets 2 4 6 n Lines Storage nodes 7" Higher Order Projective Planes order m=2 m=3 m=4 •! Projective planes are guaranteed to exist for any prime power order m. Codes from Projective Planes Projective plane of order m=3 1 12 2 11 3 1 1 Projective Plane Storage System 1 10 13 4 Lines Storage nodes 9 5 Points Packets 8 6 7 Can we have constructions spanning n=13, r=d=4 more system parameters? Theorem 1 (R. & Ramchandran Allerton’10) A projective plane of order m gives DRESS codes with n=m2+m+1 nodes and repetition factor r=repair degree d = m+1. Kirkman’s Schoolgirls Problem “Fifteen young ladies in a school walk out three abreast for seven days in succession; it is required to arrange them daily so that no two shall walk twice abreast.” Kirkman, 1847 •! Fifteen schoolgirls: A, B,…, O •! They always take their daily walks in groups of threes group 1 group 2 group 3 group 4 group 5 Sun ABC DEF GHI JKL MNO Mon ADH BEK MOI FLN GJC … ? ? ? ? ? Sat ? ? ? ? ? •! Arrange them such that any two schoolgirls walk together exactly once a week Why Do We Care? Storage node A solution to Kirkman’s problem Sun ABC DEF GHI JKL MNO Mon ADH BEK CIO FLN GJM Tue AEM BHN CGK DIL FJO 7 days Wed AFI BLO CHJ DKM EGN Thu AGL BDJ CFM EHO IKN Fri AJN BIM CEL DOG FHK Sat AKO BFG CDN EIJ HLM 5 groups •! Think of schoolgirls as packets and groups as storage nodes •! Kirkman’s solution gives an optimal DRESS code for a ! System with n=35 nodes ! Each node stores 3 packets •! What is the repetition factor here? answer=7 Not Your Average Puzzle… •! Kirkman’s schoolgirls problem initiated a branch of combinatorics known now as Design Theory •! Deep connections to other branches in mathematics: Algebra, Geometry, matroid theory… •! Applications: –! Coding Theory –! Cryptography –! Statistics Steiner Systems A Steiner system is a collection of points & lines such that: 1.! There are points in total 2.! Each line contains exactly points We want t=2 3.! There is exactly one line that contains any given points 1 2 3 4 5 6 7 8 9 Fano plane =S(2, 3, 7) S(2,3,9) # pts on a line Total # points Codes from Steiner Systems Theorem 2: (R. & Ramchandran Allerton’10) A Steiner system gives a capacity-achieving DRESS code with parameters using correspondence lines: nodes, points: packets. DRESS Codes Open problem I ? ? DRESS codes Projective beyond Steiner Steiner Dual planes Systems codes systems? ? ? ? Min Reads Capacity Min Bandwidth Capacity Lookup Minimum k C = k d Repair! reads! × − 2 Open problem II Capacity for minimum reads (and lookup repair)? " Example (n,k,d)=(7,3,3) 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 1 4 7 7 8 9 2 5 8 3 6 9 • Users contacting any 3 nodes will observe either 9 or 7 packets • This code guarantees a “rate” = 7 to any user • Capacity when repairing from any d nodes = 6 We Want More • Explicit constructions for Steiner systems exist for small values of r=3,4,5 • Existential guarantees for large number of nodes [Wilson 75] • Want more flexibility • Want to “grow”.

DRESS Code for the Storage Cloud Distributed Replication-Based Exact Simple Storage Salim El Rouayheb EECS Department University of California, Berkeley

Projective Geometry: a Short Introduction

Robot Vision: Projective Geometry

Projective Planarity of Matroids of 3-Nets and Biased Graphs

COMBINATORICS, Volume

Octonion Multiplication and Heawood's

Collineations in Perspective

Matroids You Have Known

The Turan Number of the Fano Plane

Finite Projective Geometry 2Nd Year Group Project

7 Combinatorial Geometry

Weak Orientability of Matroids and Polynomial Equations

Projective Planes