New Storage Technologies

Luca Mascetti - CERN - IT/DSS

1 CERN Disk Storage Overview

AFS CASTOR EOS Ceph NFS CERNBox Raw Capacity 3 PB 20 PB 140 PB 4 PB 200 TB 1.1 PB

Data Stored 390 TB 86 PB (tape) 27 PB 170 TB 36 TB 35 TB

Files Stored 2.7 B 300 M 284 M 77 M (obj) 120 M 14 M

AFS is CERN’s home service

CASTOR & EOS are mainly used for the physics use case (Data Analysis and DAQ)

Ceph is our storage backend for images and volumes in OpenStack

NFS is mainly used by engineering application

CERNBox is our file synchronisation service based on OwnCloud+EOS

2 Ceph

• Ceph is a scalable and has large uptake in the cloud storage & OpenStack communities • Ceph Offers: • object storage (RADOSGW) • storage (RBD) • filesystem (CephFS)

• Production Setup (~50 diskservers) • 64GB RAM • 20 data disks (XFS) • 4 SSDs (for journals) • 10 Gbit/s Ethernet 3 Block Storage with RBD

RBD (RADOS Block Device) allows you to create virtual drives • Example: create a 10TB drive: • composed of 2.5 million 4MB RADOS objects (object size configurable) • writing to address A writes to object # (A / 4MB) at offset (A mod 4MB) • Striping and snapshots supported natively • Export/import, incremental snap diffs

Not possible to share easily RBD devices between hosts

4 Coupling Storage Solution Ceph virtual volumes (RBD) for AFS & NFS backend • Expose storage volume via VM gateways • Exploit ZFS features for snapshots and backups • Disaster recovery server in Wigner

Prototype VM Virtualised VM Virtualised VM Virtualised Gateway NFS Server Gateway NFS Server Gateway AFS Server AFS Volume ZFS ZFS ZFS Send libRBD libRBD libRBD

5 Ceph Cluster (Meyrin) Ceph Cluster (Wigner) Coupling Storage Solution

CASTOR version 15 introduced the concept of DataPool • built on top of Ceph RADOS • striping physics files into Ceph using “disk servers" as proxies • enhance single tape stream performance (up to 500 MB/s) • Tape server speed around 300 MB/s • released as part of CASTOR v15 but deployment targeted after LHC RUN2

Headnode

DataPool DiskPool Proxy Proxy Diskserver Diskserver

Diskserver Diskserver Diskserver

Ceph Cluster 6 Born in 2010 WebDAV Http Hierarchical in-memory namespace focus on very low latency FUSE ROOT APPS Main Role: Disk-only storage optimised for concurrency S3 Quota System for users&groups with secure auth (KRB5, X509) gridFTP SRM xrdcp 6 EOS instances ALICE, ATLAS, CMS, LHCb, Public, User Client CLIENT

New Functionality Location Awareness & GEO Scheduling MGM MD Archive & Backup tools MQ SERVER

EOS&Raw&Disk&Capacity&Deployed& 150" 140" 130" 140 PB Deployed!!

Petabytes& 120" 110" FST 100" 90" 80" 70" DATA 60" 50" SERVER 40" 30" 20" 10" 0"

Jul$11& Jul$12& Jul$13& Jul$14& Jun$11& Sep$11& Oct$11& Dec$11& Jan$12& Feb$12& Apr$12& Jun$12& Sep$12& Oct$12& Dec$12& Jan$13& Feb$13& Apr$13& Jun$13& Sep$13& Oct$13& Dec$13& Jan$14& Feb$14& Apr$14& Jun$14& Sep$14& Oct$14& Dec$14& Jan$15& May$11& Aug$11& Nov$11& Mar$12& May$12& Aug$12& Nov$12& Mar$13& May$13& Aug$13& Nov$13& Mar$14& May$14& Aug$14& Nov$14& Feb$15&Mar$15& 7 pps& User& Public& LHCb& CMS& ATLAS& ALICE& Wigner Computer Centre

2 independent 100Gb/s links

22ms rtt 8 @ Wigner

EOS$Installed$Raw$Disk$Capacity$ EOS is now optimised for managing 150"PB" efficiently data in different computer centre Wigner" 140 PB Deployed!! 120"PB" Meyrin" providing to our user a single site view 90"PB" And in the future it will be possible to 60"PB" specify adhoc scheduling policy based on 30"PB" the namespace location 0"PB" Easy to add to the system other locations

2013+06" 2013+08" 2013+10" 2013+12" 2014+02" 2014+04" 2014+06" 2014+08" 2014+10" 2014+12" 2015+02" Replicas)Distribu/on) EOS'Replicas'Distribu5on' in)%)per)Experiment) 450" The latest hardware delivery (Mar 2015) 400" balanced the capacity installed in the 2 USER# Millions' 350" computer centres (~50% ~50%) PUBLIC# 300" LHCb# 250" Experiments replicas are not yet spread CMS# 200" equally between the 2 geolocation ATLAS# 150" ALICE# 100" Geo-balancing need to be activated and PPS# 50" tuned to avoid filling the network links to 0" 0%# 25%# 50%# 75%#100%# Meyrin" Wigner" Wigner Meyrin# Wigner# 9 CERNBox

Cloud Storage for Science

CERNBox

10 Why CERNBox?

• Competitive alternative to Dropbox for CERN users • users were using dropbox not only for sharing their pictures • SLAs: data availability and confidentiality • Archival and Back-up policies • Offline Data Access and Data sync across devices • Easy way to share files and folders with colleagues

• We started ownCloud evaluation and build prototype service

Can we integrate sync & share functionality with our main users workflow?

And being able to directly access the underlying data?

11 Our User Base

Physicists

Engineers ~10K users > 200 institutes worldwide ~1K on-site frequent travel, sharing often on the campus and collaborations

Services & Administration

~1K on-site possibly confidential data!

12 CERNBox and EOS

• CERNBox 2.0 architecture and EOS integration

Shares

13 CERNBox Service Numbers

Users 1777 Users% 2000" # files 14 Millions 1800" 1600" 1400" # dirs 1 Million 1200" 1000" Quota 1TB/user 800" 600" 400" Used Space 35 TB 200" 0" Deployed 1.1 PB Apr+14" Jun+14" Jul+14" Aug+14" Sep+14" Oct+14" Dec+14" Jan+15" Feb+15" Space Mar+14" May+14" Nov+14" Mar+15"

1E+16%10 PB Migration from NFS EOS offers “virtually unlimited” 1E+15%1 PB to EOS cloud-storage for our end-users 1001E+14% TB

1E+13%10 TB

1E+12%1 TB The EOS installation at CERN is 1001E+11% GB around 140 PB with the primary 10 GB role of storing physics data 1E+10% Jul.14% Mar.14% Apr.14% May.14% Jun.14% Aug.14% Sep.14% Oct.14% Nov.14% Dec.14% Jan.15% Feb.15% Mar.15%

Deployed%Space% Used%Space% 14 Available soon…

• Direct access to EOSUSER (and not only…) • not only own cloud sync client • xroot, fuse, http/WebDAV • Access to Physics Data • synchronise experiment’s data • Direct access from lxplus and batch • sync from your laptop and run! • sync results back

15 Example: e-science

Thanks to Mauro Arcorace, members of UNITAR/UNOSAT and CIMA foundation for the material provided

16 Hydrological Simulation

Problem: • how to get your data at CERN in an easy way? • how to easily use CERN resources? (e.g. non-physicists)

CERNBox was an easy way to integrate our storage resources for non-expert end-user that may use different

Using EOS as backend allow to access the data from batch nodes or from other location via https

…and it’s very simple to share results with collaborators…

17 Future Directions

CERNBox Laptops, PCs Sync Client & Mobile devices EOS (CERN)

Batch Integration

Grid Integration

18