New Storage Technologies
Luca Mascetti - CERN - IT/DSS
1 CERN Disk Storage Overview
AFS CASTOR EOS Ceph NFS CERNBox Raw Capacity 3 PB 20 PB 140 PB 4 PB 200 TB 1.1 PB
Data Stored 390 TB 86 PB (tape) 27 PB 170 TB 36 TB 35 TB
Files Stored 2.7 B 300 M 284 M 77 M (obj) 120 M 14 M
AFS is CERN’s linux home directory service
CASTOR & EOS are mainly used for the physics use case (Data Analysis and DAQ)
Ceph is our storage backend for images and volumes in OpenStack
NFS is mainly used by engineering application
CERNBox is our file synchronisation service based on OwnCloud+EOS
2 Ceph
• Ceph is a scalable object storage and has large uptake in the cloud storage & OpenStack communities • Ceph Offers: • object storage (RADOSGW) • block storage (RBD) • filesystem (CephFS)
• Production Setup (~50 diskservers) • 64GB RAM • 20 data disks (XFS) • 4 SSDs (for journals) • 10 Gbit/s Ethernet 3 Block Storage with RBD
RBD (RADOS Block Device) allows you to create virtual drives • Example: create a 10TB drive: • composed of 2.5 million 4MB RADOS objects (object size configurable) • writing to address A writes to object # (A / 4MB) at offset (A mod 4MB) • Striping and snapshots supported natively • Export/import, incremental snap diffs
Not possible to share easily RBD devices between hosts
4 Coupling Storage Solution Ceph virtual volumes (RBD) for AFS & NFS backend • Expose storage volume via VM gateways • Exploit ZFS features for snapshots and backups • Disaster recovery server in Wigner
Prototype VM Virtualised VM Virtualised VM Virtualised Gateway NFS Server Gateway NFS Server Gateway AFS Server AFS Volume ZFS ZFS ZFS Send libRBD libRBD libRBD
5 Ceph Cluster (Meyrin) Ceph Cluster (Wigner) Coupling Storage Solution
CASTOR version 15 introduced the concept of DataPool • built on top of Ceph RADOS • striping physics files into Ceph using “disk servers" as proxies • enhance single tape stream performance (up to 500 MB/s) • Tape server speed around 300 MB/s • released as part of CASTOR v15 but deployment targeted after LHC RUN2
Headnode
DataPool DiskPool Proxy Proxy Diskserver Diskserver
Diskserver Diskserver Diskserver
Ceph Cluster 6 Born in 2010 WebDAV Http Hierarchical in-memory namespace focus on very low latency FUSE ROOT APPS Main Role: Disk-only storage optimised for concurrency S3 Quota System for users&groups with secure auth (KRB5, X509) gridFTP SRM xrdcp 6 EOS instances ALICE, ATLAS, CMS, LHCb, Public, User Client CLIENT
New Functionality Location Awareness & GEO Scheduling MGM MD Archive & Backup tools MQ SERVER
EOS&Raw&Disk&Capacity&Deployed& 150" 140" 130" 140 PB Deployed!!
Petabytes& 120" 110" FST 100" 90" 80" 70" DATA 60" 50" SERVER 40" 30" 20" 10" 0"
Jul$11& Jul$12& Jul$13& Jul$14& Jun$11& Sep$11& Oct$11& Dec$11& Jan$12& Feb$12& Apr$12& Jun$12& Sep$12& Oct$12& Dec$12& Jan$13& Feb$13& Apr$13& Jun$13& Sep$13& Oct$13& Dec$13& Jan$14& Feb$14& Apr$14& Jun$14& Sep$14& Oct$14& Dec$14& Jan$15& May$11& Aug$11& Nov$11& Mar$12& May$12& Aug$12& Nov$12& Mar$13& May$13& Aug$13& Nov$13& Mar$14& May$14& Aug$14& Nov$14& Feb$15&Mar$15& 7 pps& User& Public& LHCb& CMS& ATLAS& ALICE& Wigner Computer Centre
2 independent 100Gb/s links
22ms rtt 8 @ Wigner
EOS$Installed$Raw$Disk$Capacity$ EOS is now optimised for managing 150"PB" efficiently data in different computer centre Wigner" 140 PB Deployed!! 120"PB" Meyrin" providing to our user a single site view 90"PB" And in the future it will be possible to 60"PB" specify adhoc scheduling policy based on 30"PB" the namespace location 0"PB" Easy to add to the system other locations
2013+06" 2013+08" 2013+10" 2013+12" 2014+02" 2014+04" 2014+06" 2014+08" 2014+10" 2014+12" 2015+02" Replicas)Distribu/on) EOS'Replicas'Distribu5on' in)%)per)Experiment) 450" The latest hardware delivery (Mar 2015) 400" balanced the capacity installed in the 2 USER# Millions' 350" computer centres (~50% ~50%) PUBLIC# 300" LHCb# 250" Experiments replicas are not yet spread CMS# 200" equally between the 2 geolocation ATLAS# 150" ALICE# 100" Geo-balancing need to be activated and PPS# 50" tuned to avoid filling the network links to 0" 0%# 25%# 50%# 75%#100%# Meyrin" Wigner" Wigner Meyrin# Wigner# 9 CERNBox
Cloud Storage for Science
CERNBox
10 Why CERNBox?
• Competitive alternative to Dropbox for CERN users • users were using dropbox not only for sharing their pictures • SLAs: data availability and confidentiality • Archival and Back-up policies • Offline Data Access and Data sync across devices • Easy way to share files and folders with colleagues
• We started ownCloud evaluation and build prototype service
Can we integrate sync & share functionality with our main users workflow?
And being able to directly access the underlying data?
11 Our User Base
Physicists
Engineers ~10K users > 200 institutes worldwide ~1K on-site frequent travel, sharing often on the campus and collaborations
Services & Administration
~1K on-site possibly confidential data!
12 CERNBox and EOS
• CERNBox 2.0 architecture and EOS integration
Shares
13 CERNBox Service Numbers
Users 1777 Users% 2000" # files 14 Millions 1800" 1600" 1400" # dirs 1 Million 1200" 1000" Quota 1TB/user 800" 600" 400" Used Space 35 TB 200" 0" Deployed 1.1 PB Apr+14" Jun+14" Jul+14" Aug+14" Sep+14" Oct+14" Dec+14" Jan+15" Feb+15" Space Mar+14" May+14" Nov+14" Mar+15"
1E+16%10 PB Migration from NFS EOS offers “virtually unlimited” 1E+15%1 PB to EOS cloud-storage for our end-users 1001E+14% TB
1E+13%10 TB
1E+12%1 TB The EOS installation at CERN is 1001E+11% GB around 140 PB with the primary 10 GB role of storing physics data 1E+10% Jul.14% Mar.14% Apr.14% May.14% Jun.14% Aug.14% Sep.14% Oct.14% Nov.14% Dec.14% Jan.15% Feb.15% Mar.15%
Deployed%Space% Used%Space% 14 Available soon…
• Direct access to EOSUSER (and not only…) • not only own cloud sync client • xroot, fuse, http/WebDAV • Access to Physics Data • synchronise experiment’s data • Direct access from lxplus and batch • sync from your laptop and run! • sync results back
15 Example: e-science
Thanks to Mauro Arcorace, members of UNITAR/UNOSAT and CIMA foundation for the material provided
16 Hydrological Simulation
Problem: • how to get your data at CERN in an easy way? • how to easily use CERN resources? (e.g. non-physicists)
CERNBox was an easy way to integrate our storage resources for non-expert end-user that may use different operating system
Using EOS as backend allow to access the data from batch nodes or from other location via https
…and it’s very simple to share results with collaborators…
17 Future Directions
CERNBox Laptops, PCs Sync Client & Mobile devices EOS (CERN)
Batch Integration
Grid Integration
18