Data & Storage Services

CERNBox + EOS: Cloud Storage for Science

Presenter: Luca Masce

Thanks to: Jakub T. Mościcki, Andreas J. Peters, Hugo G. Labrador, Massimo Lamanna

CERN/IT-DSS

CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Content

• What we have done • What we do • What we will do

CERNBox

2 The origins of the CERNBox project

• Missing link? CERNBox • 4500 disnct IPs in DNS from cern.ch to *..com (daily...) • What we are missing • easy access “cloud storage for end users” • files go ‘automacally’ to the cloud and are available ‘always everywhere’ • broken laptop ≠ data lost • offline access to data • work on the plane and when back online • keep files in sync across devices • access on mobile clients • (easy) sharing of files with colleagues • sll surprisingly difficult • Can we have this? • for “documents” (small files, oen ppts, text,…) • for “science data” (integrated into data processing workflows and exisng infrastructure)

3 Original architecture (CERNBox beta service) Sync client () Data flow HTTPS Metadata flow LB Web access (hps) Apache, PHP 5.4 (SCL1.0) mod_proxy_balancer USER 64 core, 64GB RAM AS OC AS OC AS OC

• Setup 100% RH6 on “standard” hardware • Based on ownCloud • Guaranteed failover (redundant nodes) SQL overheads filesystem (Hz metadata ops) (POSIX) Keeps track of Files not exposed sync state directly to the user for every STORAGE file in DB Image courtesy of the system MySQL NFS servers, async, SW RAID 1 www.phdcomics.com 48GB RAM Inial space: 20 TB Usage of the beta service

CERNBox March April May June … October Beta 2014 users 190 (*) 285 361 429 720 files 191K 907K 1.6M 2.7M 6.4M size 480GB 1TB 1.5TB 1.9TB 3.4TB (*) users inherited from the inial prototype deployment

Size per user Files per user 1% Avg ~5GB 5% Avg ~10K files 1% 15% <10GB < 5K >10GB 5K-20K up to 100GB up to 100K 84% 94% 5 File access patterns

• GET/PUT rao: 2/1 • File type distribuon: • 1200 different file extensions! • 30% . .h .C • 30% .jpg .png • 15% no extension (UNIX world!) • 25% other: .pdf, .txt, .ppt, .docx, .root, .py, .eps, .tex • ~100 URL shares, ~40 synced shares • UNICODE filenames: greek, russian, thai(?)

6 Pilot limitations

• Move • On the origin client move is propagated to the server • On the other clients it is propagated as COPY/DELETE (subopmal) • Symlinks are not supported • Ignored files: , : ? * “ < > | • We currently recommend one sync folder setup: ~/cernbox • High per-file overhead • Expect 2-5Hz PUT • Expect ~10Hz GET • Transfer rates • Expect 10-30MB download • Expect 5-10MB upload • Larger files: 400MB file on standard desktop • hps/upload: ~25MB/s, hps/download: ~60MB/s • For wireless devices, laptops, phones… do we care about transfer rates?

7 Towards large-scale data sync and share

• Currently deployed CERNBox beta • works OK so far for the classical Dropbox use-case • low-frequency document sync and share • But can we bring this system to the next level? • Our core-business and large-scale workloads • expose PBs of exisng data from day 1 • integraon into physics data processing eco-system • central services: batch, interacve • data analysis applicaons • sync higher data volumes at higher rates • Can we sll keep the simplicity of cloud storage access?

8 Massive scaling at reduced cost?

• No need to keep track of all files and directories in the database • avoids explosive growth of your DB infrastructure • Our file number esmate? With 10K users we have 2.5 billion files in AFS already! • What is your number for 100K users?

• Before we start throwing hardware at the problem… … consider the cost of running the service • Fixed: hardware purchase, service deployment, infrastructure • Scaling: hardware incidents, user support; ; integrity checks; upgrades • Infrastructure: space, electricity and cooling in the data center

• For massive scaling we need to keep TCO under control • profit from exisng large-scale operaons and support of our storage services • exploit economies of scale

9 Integration

• Started in May 2014 • Funconality • Enable sync and share for exisng data in EOS • Without exporng data to another storage • Direct access to data with efficient sync behind • Operaons • NFS/async backend server is a temporary soluon • EOS offer “virtually unlimited” cloud storage for end-users • Fold-in the operaon cost into EOS… • But: • Integrate as transparently as possible • most users don’t care about storage backend • Fully working soluon compable with clients • we don’t want to end up with half-working CERN-specific soluon

10 EOS Integration Details

• Understanding sync protocol and underlying semancs. • Add a few consistency features to EOS (e.g. atomic upload) • Adding few new features to EOS or liing restricons (e.g. UTF8 support) • Beef-up the webdav endpoint to allow owncloud clients to talk directly to it • Integrate web-access and sharing funconality • Web fronted: develop new plugins • Nice integraon of trashbin, versions and sharing: • Fusion between owncloud model and EOS model (Hugo G. Labrador) • Making more robust less stressed parts in EOS (hp/webdav)

• Lots, lots of tesng.

11 CERNBox 2.0 Architecture Data flow Metadata flow Sync client (webdav) HTTPS HTTPS HTTPS LB LB HTTPS Web access (hps) LB LB Data directly hps (private data) accessible USER OC by the user hp (internal) hp (public data) KHz fuse metadata ops

All sync state STORAGE (EOS) as metadata in the storage

IO redirect Files wrien with USER credenals disk servers (1000s) Prototype deployment on EOSPPS

• /eos/user// • this is the default sync and web-enabled folder • …as an advanced user you may add arbitrary folder from EOS • very easy to implement a folder shared by an e-group • We can also allow transparent access to different instances

13 First performance numbers • User-perceived performance (client) • Metadata operaon (pycurl with SSL sessions) • PROPFIND with 1 entry: 90 Hz • PROPFIND with 1K entries: 8.5 KHz • PROPFIND with 10K entries: 10KHz

• nice speed e.g. • kernel src tree upload (50K files, 500MB) ~ 1h from laptop/wifi at home, download ~20 min

ops/s 200 200 Small files (10KB) pycurl seq

150 125 pycurl P=10

100 70 60 65 50 57 pycurl P=50 30 30 50 20 11 owncloud sync client 0 Download Upload Delete 14 Summary

• Working and usable beta service • Useful for geng experience, user feedback and understanding what we want / don’t want in the final producon system based on EOS CERNBox

• Advanced integraon into EOS will open up new possibilies • but there is no free lunch: we will have to adapt to evolving owncloud clients, etc….

• heading towards large sync and share layer for science research • all our data exposed from day 1 • massive scalability, high performance • integrated into exisng workflows - new capabilies! • small overhead on top of our exisng operaons and development • TCO control • …and sll as easy to use as Dropbox.com…

15 Integrated storage ecosystem for scientific research

2.0

sync / share / offline access

webdav & hps:// CERNBox USER online file-system access fuse

Analysis cluster high-performance applicaon access LARGE-SCALE xrootd:// STORAGE Central Services batch access xrdcopy • agenda full • ~35 parcipants • Tracks • Keynote • B.Pierce • Technology • Users • Site reports • Vendor talks • IBM • Powerfolder • SeaFile • • Owncloud

17 CERNBox 2.0 – some numbers

• Advanced prototype stage • Adapted exisng webdav interface in EOS to be compable with owncloud sync clients • Test environment (EOSPPS) • standard hardware • namespace node with Xeon 2.2GHz, 16 cores, 24GB RAM • 50 disk servers: cheap JBODs (1000 disks), total 800TB usable space • Storage layout: 2 replicas in RAIN mode • à every file PUT = 2 copies of the file on two independent storage nodes (with adler32 checksums of content) • Event-based hp(s) load-balancer () • Underlying storage scalability (EOS Prod) • Max observed IO: ~40GB/s on a single instance (eosatlas => ) • Max observed file stats: 10s KHz • Thousands of connected clients • Server should never be a boleneck for CERNBox…

18