<<

DPM and LFC in Belle II Dr. Silvio Pardi on behalf of the Belle II Distributed Computing Group DPM Workshop 2015 CERN - 07/12/2015 Outline

• Introduction • Storage Systems currently used by Belle II • Survey on the DPM usage in Belle II collaboration • Case studies for the User Analysis strategies • Http • LFC usage in Belle II • Conclusions The Belle II Experiment • Main Site at KEK, Tsukuba – Japan • Distributed Computing System Based on existing, well-proven solutions plus extensions • VO name: belle • DIRAC framework • LFC for file catalog • AMGA for metadata • Basf2, Simulation and computing framework • Gbasf2, Grid Interface to Basf2 • FTS3 for data movement • CVMFS for software distribution • Grid and non Gird resources (ssh and CLOUD ) Current Storage Elements

30 Storage currently working Backend Type: Capacity for storage Type • 5 DISET • 2 xroot • 10 DPM • 6 dCache

SRM • 6 StoRM 23 23 • 1 bestman2

Reserved disk space for BELLE: 1.9 PB of which 638 TB managed with DPM (33%) Requirement for Storage Element

For each SE we require: • The presence of BELLE Space Token (used to the disk capacity assigned to the VO, and the current usage)

• The presence of the following directory structure and ACL settings (used to protect data from a misusage of native tools)

• /belle/ (Role=Null R-X, Role=production/lcgadmin RWX) • /belle/DATA (Role=Null R-X, Role=production/lcgadmin RWX) • /belle/TMP (Role=Null RWX, Role=production/lcgadmin RWX)

Not all the SRM technologies offer the same features :DPM, SToRM and bestman2 allow to implement all the required ACL. dCache based SEs seem not support the implementation of the full ACL rules required. DPM Storage in Belle II

SITE HEAD NODE COUNTRY Melbourne-SE b2se.mel.coepp.org.au AUSTRALIA Adelaide-SE coepp-dpm-01.ersa.edu.au AUSTRALIA HEPHY-SE hephyse.oeaw.ac.at AUSTRIA CESNET-SE dpm1.egee.cesnet.cz CZECH REPUBLIC KISTI-SE belle-se-head.sdfarm.kr SOUTH KOREA Frascati-SE atlasse.lnf.infn.it ITALY Napoli-SE belle-dpm-01.na.infn.it ITALY CYFRONET-SE dpm.cyf-kr.edu.pl POLAND ULAKBIM-SE torik1.ulakbim.gov.tr TURKEY NCHC-SE se01.grid.nchc.org.tw TAIWAN DPM hardware 1. Is an all in one installation? DPM Survey 2. How many nodes? 3. How many disk nodes (is the head node also a disk node)? Completed 4. Network connection (1Gbps, 2Gbps, 10Gbps etc)? Melbourne-SE 5. How many TB in total? Adelaide-SE DPM software configuration HEPHY-SE 6. Which Raid for file system? CESNET-SE 7. DPM version (output of rpm -qa | dpm AND rpm -qa |grep dmlite) KISTI-SE 8. Gridftp version? Frascati-SE 9. How many file system registered (dpm-qryconf)? Napoli-SE 10. How large is a single FileSystem (20TB? 10TB) ? CYFRONET-SE 11. File System used (ie. xfs, ext3, ext4, gpfs, luster...) ULAKBIM-SE 12. Supported VO (dedicate for belle? other VO which one? Space token?) NCHC-SE 13. Do you support xrootd? 14. Do you support http? 15. Do you support rfio? DPM Installation methods 16. Do you plan to update at the next version? When? 17. Installation method (Foreman? Puppet? Puppet+ Foreman? other?) DPM Issues 18. Do you have any specific request to the DPM team? DPM Survey analysis 1/4 • Type of deploy: Installation from 2 nodes (5 sites) up to 11 nodes. • Network connectivity: 1 Gbps, 4 Gbps, 10Gbps and 20Gbps • Total Disk space from 250GB up to 1PB.

Total disk Space per Site In red, belle II dedicated storages 1000

900

800

700

600

500

400

TotalSpace (TB) 300

200

100

0 DPM Survey analysis 2/4

• Raid6 used mostly everywhere • xfs is the most used file system: • 7 sites xfs • 2 site ext4 • 1 site nfs • Number of File Systems per SE: from 1 up to 64 • Size of each file system: from 5TB up to 140TB, most common values are included between 10TB and 20TB for each FS. DPM Survey analysis 3/4

• Xrootd supported by 8 SEs • http supported by 9 SEs • rfio everywhere 100% 90% • Gridftp version: 80% • 7 sites use 7.x 70% 60%

• 3 sites use 8.x 50% • DPM version: 40%

% utilization % percentage 30% • 8 sites v.1.8.9-2 20% • 2 sites v.1.8.8 10% 0% xrootd http rfio gridftp 8.x gridftp 7.x DPM 1.8.8 DPM 1.8.9-2 DPM Survey analysis 4/4

Installation method: • 2 Sites have used Foreman + puppet • 2 Sites have used plain puppet • 6 Sites have used yaim or other methods

Most of the sites 8/10 intend to update the system in the near future. Case Studies for the Belle II analysis model with remote file access Belle II analysis model is not totally in the “jobs go to data” paradigm, but, jobs may access “remote data” .

Why? • Jobs may need only small parts of the input data (selected events). • The input data for a job may be very large in number of files and in size, and could be spread over many sites. • Copying all the input files to SE or to WN may not be optimal.

How? • Producing μDST from MDST • Intermediate Skimmed data (event selection ~1%) • Large sites to host the μDST, while Small sites need remote access

• Index file • List of selected events to analyze for each analysis mode, with the references to the files containing the events. • Analysis job gets the list of events to process, Open the files remotely and retrieve the selected events Belle II User analysis strategy in testing

• Options for Physics skims from bulk MDST • Output Physics Object + MDST (μDST) • Direct copy to WN using DIRAC • Stream data using XRootD • Output Index file which point to skimmed events • Access bulk MDST directly via root on large cluster • Stream from bulk MDST using XRootD • Local Cluster/Workstation root

The set of possible options for user analysis currently under investigation, includes xrootd-based strategies Belle II analysis model with Index file

Data is analyzed on remote WN. Only selected Events transferred.

Ongoing tests in order to test the whole chain, the gbasf2 support and performances with xrootd.

Storage considered for the ongoing tests Storage Type b2se.mel.coepp.org.au DPM coepp-dpm-01.ersa.edu.au DPM belle-dpm-01.na.infn.it DPM dcache.ijs.si dCache KEK StoRM Http usage in Belle II • Belle II collaboration is interested to explore the usage of http, but we have not concrete plans yet.

• The Belle II software basf2 use the ROOT I/O library, so in principle should be possible use directly http if root can use efficiently this protocol. (some tests are currently ongoing).

• Assuming that http can be use via root efficiently it can be a very nice solution for us, and federation can also have an interest for the experiment. For other use cases we need to study it. HTTP Storage Requirements exercise for Belle II https://twiki.cern.ch/twiki/bin/view/LCG/HTTPStorageRequirements Functionality Belle II Atlas LHCb dCache DPM EOS StoRM xrootd Create, Read, A A A Y Y Y (GET HEAD PUT DELETE Y Y Update, Delete OPTIONS) WebDAV (which Partially PROPFIND, Partially Partially Y Partially: MKCOL PROPFIND PROPFIND, CP, MV Y bits?) MOVE PROPPATCH COPY MOVE LOCK UNLOCK - some special header support X509/VOMS AA A A A Y Y RFC Proxies at VO granularity Y per storage area Multi-range requests D D Y Y Y Y Y Directory Space /C C C N N (planned) custom Rest API using subtree N (planned) Y via virtual Reporting accounting partitions Calculate checksum / C, adler32 C, e.g., md5 N Y (md5, implicit Y (adler32) Y internal type adler32) (md5,adler32,crc32c,crc32,sh a1) Set checksum / type C, adler32 C, e.g., md5 N N set XS: support for custom N (if this means Y internal header to preset checksum on setting the PUT for OC - set type:Y checksum after the transfer) Get checksum C C C Y Y Y (PROPFIND, custom Rest Y API) 3rd Party Copy C C D Y Y N N Y Source 3rd Party Copy C C D N N N N Y Destination 3rd Party Copy ↔ S3 D D D N Y, prototype ?LEGENDN N 3rd Party Copy ↔ D D D Y ? A: required for the deploymentN of HTTP to be worthN pursing at all gridFTP B: required for SRM-free operation of disk storage ACL Management D D D N N Y (custom rest API) N N C: required for single-protocol (ie HTTP-only) disk storage Monitoring of HTTP D C D N(planned) Y Y (internal, not via xrootd) N Y traffic D: desirable feature Endpoint publication D C C Y Y N E: irrelevant Y Y to BDII LFC usage in Belle II

LFC (LCG File Catalog) • PFN (Physical File Name) a specification of the physical

Catalog location of a file

• LFN (Logical File Name) = a site-independent file name File File • GUID (Globally Unique Identifier) gBasf2 LFN and GUID are the common data between LFC and AMGA AMGA (ARDA Metadata Grid Application) • LFN • GUID

• EXP Service centralize at KEK Metadata • RUN File File • …. LFC + DB DPMs in action in the last 28Days Conclusion • DPM is largely used in the Belle II community. • Survey has shown a substantial heterogeneity in term of hardware deploy and software configuration with no specific issues. • For the current usage, DPM well fits needs of Belle II in term of ACL, checksum and general features. • DPM offers also additional features to test the user- analysis strategy included xrootd, and http support. • Troubles have been solved quickly thanks to the proactive support of the DPM team. References “Network requirement for the Belle II user analysis” Belle II Network Meeting (Nov. 17, 2014, New Orleans, USA) https://kds.kek.jp/indico/event/16273/contribution/0/mat erial/slides/0.pdf

“Belle II Grid user analysis progress and plans” – B2GM-22 https://kds.kek.jp/indico/event/19519/session/14/contribu tion/90/material/slides/0.pptx