Data & Storage Services
CERNBox + EOS: Cloud Storage for Science
Presenter: Luca Masce
Thanks to: Jakub T. Mościcki, Andreas J. Peters, Hugo G. Labrador, Massimo Lamanna
CERN/IT-DSS
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Content
• What we have done • What we do • What we will do
CERNBox
2 The origins of the CERNBox project
• Missing link? CERNBox • 4500 dis nct IPs in DNS from cern.ch to *.dropbox.com (daily...) • What we are missing • easy access “cloud storage for end users” • files go ‘automa cally’ to the cloud and are available ‘always everywhere’ • broken laptop ≠ data lost • offline access to data • work on the plane and rsync when back online • keep files in sync across devices • access on mobile clients • (easy) sharing of files with colleagues • s ll surprisingly difficult • Can we have this? • for “documents” (small files, o en ppts, text,…) • for “science data” (integrated into data processing workflows and exis ng infrastructure)
3 Original architecture (CERNBox beta service) Sync client (webdav) Data flow HTTPS Metadata flow LB Web access (h ps) Apache, PHP 5.4 (SCL1.0) mod_proxy_balancer USER 64 core, 64GB RAM AS OC AS OC AS OC
• Setup 100% RH6 on “standard” hardware • Based on ownCloud • Guaranteed failover (redundant nodes) SQL overheads filesystem (Hz metadata ops) (POSIX) Keeps track of Files not exposed sync state directly to the user for every STORAGE file in DB Image courtesy of the system MySQL server NFS servers, async, SW RAID 1 www.phdcomics.com 48GB RAM Ini al space: 20 TB Usage of the beta service
CERNBox March April May June … October Beta 2014 users 190 (*) 285 361 429 720 files 191K 907K 1.6M 2.7M 6.4M size 480GB 1TB 1.5TB 1.9TB 3.4TB (*) users inherited from the ini al prototype deployment
Size per user Files per user 1% Avg ~5GB 5% Avg ~10K files 1% 15% <10GB < 5K >10GB 5K-20K up to 100GB up to 100K 84% 94% 5 File access patterns
• GET/PUT ra o: 2/1 • File type distribu on: • 1200 different file extensions! • 30% .c .h .C • 30% .jpg .png • 15% no extension (UNIX world!) • 25% other: .pdf, .txt, .ppt, .docx, .root, .py, .eps, .tex • ~100 URL shares, ~40 synced shares • UNICODE filenames: greek, russian, thai(?)
6 Pilot limitations
• Move • On the origin client move is propagated to the server • On the other clients it is propagated as COPY/DELETE (subop mal) • Symlinks are not supported • Ignored files: , : ? * “ < > | • We currently recommend one sync folder setup: ~/cernbox • High per-file overhead • Expect 2-5Hz PUT • Expect ~10Hz GET • Transfer rates • Expect 10-30MB download • Expect 5-10MB upload • Larger files: 400MB file on standard desktop • h ps/upload: ~25MB/s, h ps/download: ~60MB/s • For wireless devices, laptops, phones… do we care about transfer rates?
7 Towards large-scale data sync and share
• Currently deployed CERNBox beta • works OK so far for the classical Dropbox use-case • low-frequency document sync and share • But can we bring this system to the next level? • Our core-business and large-scale workloads • expose PBs of exis ng data from day 1 • integra on into physics data processing eco-system • central services: batch, interac ve • data analysis applica ons • sync higher data volumes at higher rates • Can we s ll keep the simplicity of cloud storage access?
8 Massive scaling at reduced cost?
• No need to keep track of all files and directories in the database • avoids explosive growth of your DB infrastructure • Our file number es mate? With 10K users we have 2.5 billion files in AFS already! • What is your number for 100K users?
• Before we start throwing hardware at the problem… … consider the cost of running the service • Fixed: hardware purchase, service deployment, infrastructure • Scaling: hardware incidents, user support; backup; integrity checks; upgrades • Infrastructure: space, electricity and cooling in the data center
• For massive scaling we need to keep TCO under control • profit from exis ng large-scale opera ons and support of our storage services • exploit economies of scale
9 Integration
• Started in May 2014 • Func onality • Enable sync and share for exis ng data in EOS • Without expor ng data to another storage • Direct access to data with efficient sync behind • Opera ons • NFS/async backend server is a temporary solu on • EOS offer “virtually unlimited” cloud storage for end-users • Fold-in the opera on cost into EOS… • But: • Integrate as transparently as possible • most users don’t care about storage backend • Fully working solu on compa ble with owncloud clients • we don’t want to end up with half-working CERN-specific solu on
10 EOS Integration Details
• Understanding sync protocol and underlying seman cs. • Add a few consistency features to EOS (e.g. atomic upload) • Adding few new features to EOS or li ing restric ons (e.g. UTF8 support) • Beef-up the webdav endpoint to allow owncloud clients to talk directly to it • Integrate web-access and sharing func onality • Web fronted: develop new plugins • Nice integra on of trashbin, versions and sharing: • Fusion between owncloud model and EOS model (Hugo G. Labrador) • Making more robust less stressed parts in EOS (h p/webdav)
• Lots, lots of tes ng.
11 CERNBox 2.0 Architecture Data flow Metadata flow Sync client (webdav) HTTPS HTTPS HTTPS LB LB HTTPS Web access (h ps) LB LB Data directly h ps (private data) accessible USER OC by the user h p (internal) h p (public data) KHz fuse metadata ops
All sync state STORAGE (EOS) as metadata in the storage
IO redirect Files wri en with USER creden als disk servers (1000s) namespace Prototype deployment on EOSPPS
• /eos/user//
13 First performance numbers • User-perceived performance (client) • Metadata opera on (pycurl with SSL sessions) • PROPFIND with 1 entry: 90 Hz • PROPFIND with 1K entries: 8.5 KHz • PROPFIND with 10K entries: 10KHz
• nice speed e.g. • kernel src tree upload (50K files, 500MB) ~ 1h from laptop/wifi at home, download ~20 min
ops/s 200 200 Small files (10KB) pycurl seq
150 125 pycurl P=10
100 70 60 65 50 57 pycurl P=50 30 30 50 20 11 owncloud sync client 0 Download Upload Delete 14 Summary
• Working and usable beta service • Useful for ge ng experience, user feedback and understanding what we want / don’t want in the final produc on system based on EOS CERNBox
• Advanced integra on into EOS will open up new possibili es • but there is no free lunch: we will have to adapt to evolving owncloud clients, etc….
• heading towards large sync and share layer for science research • all our data exposed from day 1 • massive scalability, high performance • integrated into exis ng workflows - new capabili es! • small overhead on top of our exis ng opera ons and development • TCO control • …and s ll as easy to use as Dropbox.com…
15 Integrated storage ecosystem for scientific research
2.0
sync / share / offline access
webdav & h ps:// CERNBox USER online file-system access fuse
Analysis cluster high-performance applica on access LARGE-SCALE xrootd:// STORAGE Central Services batch access xrdcopy • agenda full • ~35 par cipants • Tracks • Keynote • B.Pierce • Technology • Users • Site reports • Vendor talks • IBM • Powerfolder • SeaFile • PyDio • Owncloud
17 CERNBox 2.0 – some numbers
• Advanced prototype stage • Adapted exis ng webdav interface in EOS to be compa ble with owncloud sync clients • Test environment (EOSPPS) • standard hardware • namespace node with Xeon 2.2GHz, 16 cores, 24GB RAM • 50 disk servers: cheap JBODs (1000 disks), total 800TB usable space • Storage layout: 2 replicas in RAIN mode • à every file PUT = 2 copies of the file on two independent storage nodes (with adler32 checksums of content) • Event-based h p(s) load-balancer (nginx) • Underlying storage scalability (EOS Prod) • Max observed IO: ~40GB/s on a single instance (eosatlas => ) • Max observed file stats: 10s KHz • Thousands of connected clients • Server should never be a bo leneck for CERNBox…
18