Cernbox + EOS: Cloud Storage for Science

Data & Storage Services CERNBox + EOS: Cloud Storage for Science Presenter: Luca Masce Thanks to: Jakub T. Mościcki, Andreas J. Peters, Hugo G. Labrador, Massimo Lamanna CERN/IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Content • What we have done • What we do • What we will do CERNBox 2 The origins of the CERNBox project • Missing link? CERNBox • 4500 disSnct IPs in DNS from cern.ch to *.dropbox.com (daily...) • What we are missing • easy access “cloud storage for end users” • files go ‘automacally’ to the cloud and are available ‘always everywhere’ • broken laptop ≠ data lost • offline access to data • work on the plane and rsync when back online • keep files in sync across devices • access on mobile clients • (easy) sharing of files with colleagues • sSll surprisingly difficult • Can we have this? • for “documents” (small files, oben ppts, text,…) • for “science data” (integrated into data processing workflows and exisSng infrastructure) 3 Original architecture (CERNBox beta service) Sync client (webdav) Data flow HTTPS Metadata flow LB Web access (hgps) Apache, PHP 5.4 (SCL1.0) mod_proxy_balancer USER 64 core, 64GB RAM AS OC AS OC AS OC • Setup 100% RH6 on “standard” hardware • Based on ownCloud • Guaranteed failover (redundant nodes) SQL overheads filesystem (Hz metadata ops) (POSIX) Keeps track of Files not exposed sync state directly to the user for every STORAGE file in DB Image courtesy of the system MySQL server NFS servers, async, SW RAID 1 www.phdcomics.com 48GB RAM IniSal space: 20 TB Usage of the beta service CERNBox March April May June … October Beta 2014 users 190 (*) 285 361 429 720 files 191K 907K 1.6M 2.7M 6.4M size 480GB 1TB 1.5TB 1.9TB 3.4TB (*) users inherited from the iniSal prototype deployment Size per user Files per user 1% Avg ~5GB 5% Avg ~10K files 1% 15% <10GB < 5K >10GB 5K-20K up to 100GB up to 100K 84% 94% 5 File access patterns • GET/PUT rao: 2/1 • File type distribuSon: • 1200 different file extensions! • 30% .c .h .C • 30% .jpg .png • 15% no extension (UNIX world!) • 25% other: .pdf, .txt, .ppt, .docx, .root, .py, .eps, .tex • ~100 URL shares, ~40 synced shares • UNICODE filenames: greek, russian, thai(?) 6 Pilot limitations • Move • On the origin client move is propagated to the server • On the other clients it is propagated as COPY/DELETE (subopSmal) • Symlinks are not supported • Ignored files: , : ? * “ < > | • We currently recommend one sync folder setup: ~/cernbox • High per-file overhead • Expect 2-5Hz PUT • Expect ~10Hz GET • Transfer rates • Expect 10-30MB download • Expect 5-10MB upload • Larger files: 400MB file on standard desktop • hgps/upload: ~25MB/s, hgps/download: ~60MB/s • For wireless devices, laptops, phones… do we care about transfer rates? 7 Towards large-scale data sync and share • Currently deployed CERNBox beta • works OK so far for the classical Dropbox use-case • low-frequency document sync and share • But can we bring this system to the next level? • Our core-business and large-scale workloads • expose PBs of exis%ng data from day 1 • integraon into physics data processing eco-system • central services: batch, interacSve • data analysis applicaons • sync higher data volumes at higher rates • Can we sSll keep the simplicity of cloud storage access? 8 Massive scaling at reduced cost? • No need to keep track of all files and directories in the database • avoids explosive growth of your DB infrastructure • Our file number esSmate? With 10K users we have 2.5 billion files in AFS already! • What is your number for 100K users? • Before we start throwing hardware at the problem… … consider the cost of running the service • Fixed: hardware purchase, service deployment, infrastructure • Scaling: hardware incidents, user support; backup; integrity checks; upgrades • Infrastructure: space, electricity and cooling in the data center • For massive scaling we need to keep TCO under control • profit from exisSng large-scale operaons and support of our storage services • exploit economies of scale 9 Integration • Started in May 2014 • FuncSonality • Enable sync and share for exisSng data in EOS • Without exporSng data to another storage • Direct access to data with efficient sync behind • Operaons • NFS/async backend server is a temporary soluSon • EOS offer “virtually unlimited” cloud storage for end-users • Fold-in the operaon cost into EOS… • But: • Integrate as transparently as possible • most users don’t care about storage backend • Fully working soluSon compable with owncloud clients • we don’t want to end up with half-working CERN-specific soluSon 10 EOS Integration Details • Understanding sync protocol and underlying semanScs. • Add a few consistency features to EOS (e.g. atomic upload) • Adding few new features to EOS or libing restricSons (e.g. UTF8 support) • Beef-up the webdav endpoint to allow owncloud clients to talk directly to it • Integrate web-access and sharing funcSonality • Web fronted: develop new plugins • Nice integraon of trashbin, versions and sharing: • Fusion between owncloud model and EOS model (Hugo G. Labrador) • Making more robust less stressed parts in EOS (hgp/webdav) • Lots, lots of tesSng. 11 CERNBox 2.0 Architecture Data flow Metadata flow Sync client (webdav) HTTPS HTTPS HTTPS LB LB HTTPS Web access (hgps) LB LB Data directly hgps (private data) accessible USER OC by the user hgp (internal) hgp (public data) KHz fuse metadata ops All sync state STORAGE (EOS) as metadata in the storage IO redirect Files wriYen with USER credenZals disk servers (1000s) namespace Prototype deployment on EOSPPS • /eos/user/<u>/<username> • this is the default sync and web-enabled folder • …as an advanced user you may add arbitrary folder from EOS • very easy to implement a folder shared by an e-group • We can also allow transparent access to different instances 13 First performance numbers • User-perceived performance (client) • Metadata operaon (pycurl with SSL sessions) • PROPFIND with 1 entry: 90 Hz • PROPFIND with 1K entries: 8.5 KHz • PROPFIND with 10K entries: 10KHz • nice speed e.g. • kernel src tree upload (50K files, 500MB) ~ 1h from laptop/wifi at home, download ~20 min ops/s 200 200 Small files (10KB) pycurl seq 150 125 pycurl P=10 100 70 60 65 50 57 pycurl P=50 30 30 50 20 11 owncloud sync client 0 Download Upload Delete 14 Summary • Working and usable beta service • Useful for geng experience, user feedback and understanding what we want / don’t want in the final producSon system based on EOS CERNBox • Advanced integraon into EOS will open up new possibiliSes • but there is no free lunch: we will have to adapt to evolving owncloud clients, etc…. • heading towards large sync and share layer for science research • all our data exposed from day 1 • massive scalability, high performance • integrated into exisSng workflows - new capabiliSes! • small overhead on top of our exisSng operaons and development • TCO control • …and sSll as easy to use as Dropbox.com… 15 Integrated storage ecosystem for scientific research 2.0 sync / share / offline access webdav & hps:// CERNBox USER online file-system access fuse Analysis cluster high-performance applicaon access LARGE-SCALE xrootd:// STORAGE Central Services batch access xrdcopy • agenda full • ~35 parScipants • Tracks • Keynote • B.Pierce • Technology • Users • Site reports • Vendor talks • IBM • Powerfolder • SeaFile • PyDio • Owncloud 17 CERNBox 2.0 – some numbers • Advanced prototype stage • Adapted exisSng webdav interface in EOS to be compable with owncloud sync clients • Test environment (EOSPPS) • standard hardware • namespace node with Xeon 2.2GHz, 16 cores, 24GB RAM • 50 disk servers: cheap JBODs (1000 disks), total 800TB usable space • Storage layout: 2 replicas in RAIN mode • à every file PUT = 2 copies of the file on two independent storage nodes (with adler32 checksums of content) • Event-based hgp(s) load-balancer (nginx) • Underlying storage scalability (EOS Prod) • Max observed IO: ~40GB/s on a single instance (eosatlas => ) • Max observed file stats: 10s KHz • Thousands of connected clients • Server should never be a bogleneck for CERNBox… 18 .

Load more