<<

1

Computing at the Belle-II experiment

Takanori Hara (KEK) [email protected] For the Belle II distributed computing group 13 April, 2015 @ CHEP2015 in Okinawa 2 From Belle to Belle II Accelerator KEKB Beam Energy (GeV) 3.5 x 8 (g = 0.425) CM energy ...... , Y(4S), ...... 34 Luminosity (cm-2 s -1 ) 2.1 x 10

-1 Total data (ab ) 1 B0 raw data : ~1PB 0 mDST data/MC : 0.14/0.6 PB (for one version) Computing one big center @ KEK (non-grid)

(ab -1 ) physics achievements at Belle 1.0 Bottomed tetra-quark Zb

0.8 B→τν decay

0.6 b→dγ transition D0D0 mixing 0.4 Serendipity First tetra-quark X(3872) Integrated luminosity Integrated B→Kll decay Direct CPV in B→Kπ 0 Conrmation of SM

CPV in B→ππ CPV in neutralB system 3 From Belle to Belle II Accelerator KEKB SuperKEKB Beam Energy (GeV) 3.5 x 8 (g = 0.425) 4 x 7 (g = 0.28) CM energy ...... , Y(4S), ...... , Y(4S), ...... 34 x 40 35 Luminosity (cm-2 s -1 ) 2.1 x 10 8 x 10 x 50 Total data (ab -1 ) 1 50 Higher intensity raw data : ~1PB raw data : ~100PB mDST data/MC : 0.14/0.6 PB (for one version) (another raw data copy outside KEK) Computing one big center @ KEK (non-grid) world-wide distriuted computing

-1 (ab -1 ) physics achievements at Belle (ab ) Expected achievements at Belle II 60 1.0 Bottomed tetra-quark Zb Just an Image CKM Angle Measurements 50 with 1 degree precision 0.8 B→τν decay Discovery of B  Kνν 40 0.6 b→dγ transition Precise meas. of D mixing D0D0 mixing 30 Journey for 0.4 Serendipity NewDiscovery physics of New Subatmic Particles First tetra-quark 20 X(3872) Observations with sin2θ with O(10-4) precision Integrated luminosity Integrated 10 W Υ(5S), Υ(3S) etc. B→Kll decay Direct CPV in B→Kπ Discovery of B  µν  τν 0 Conrmation of SM 0 Discovery of B D 2017 20182019 2020 2021 2022 20232024 2025 CPV in B→ππ year CPV in neutralB system 4 Hardware Resources for Belle II 250 version estimated in early 2014 Tape (PB) for raw data uncertainties Performance of accelerator 200 total integrated beam background condition 150 improvement of software Copy outside KEK Original @ KEK The yearly profile may change 100 Challenge The total at the last year should stay the same level 50

0 challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7 1,400 160 CPU (kHEPSpec) 140 Disk space (PB) 1,200 total integrated total integrated 1,000 Analysis 120 MC (reproduce) 100 Analysis 800 MC MC 80 600 Data (reprocess) Data Data 60 400 Challenge Challenge 40 200 20 0 0 challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7 challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7 5 Hardware Resources for Belle II 250 Tape (PB) for raw data 200 total integrated LHC resources based on the published pledges 150 http://wlcg-rebus.cern.ch/apps/pledges/summary/ Copy outside KEK 100 Original @ KEK The real capacities and the usages can be different Challenge 50

0 challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7 1,400 160 CPU (kHEPSpec) 140 Disk space (PB) 1,200 total integrated total integrated 1,000 Analysis ATLAS 120 MC (reproduce) CMS 100 Analysis 800 MC MC ALICE 80 600 Data (reprocess) Data Data LHCb 60 400 Challenge Challenge 40 200 20 0 0 challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7 challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7 6 SuperKEKB/Belle II Time line KEK is the hosting institute of the Belle II experiment Calendar 2015 2016 2017 2018 Year All modules installed VXD(=SVD+PXD) TOP CDC ARICH

Detector FWD/BWD Endcap Global Cosmic Ray Run Roll-in Commissioning QCS detector (Beast2) Phase-3 Phase-2 Phase-1 Accelerator

Belle II Distributed Computing should be ready KEK Computing System (raw data comes but small) replacement Raw data processing

Computing We are here ! Raw data distribution start 7 Belle II Collaboration

23 countries/region 99 institutes c.f. 634 colleagues ATLAS, 38 countries, 177 institutes, ~3000 members CMS : 42 countries, 182 institutes, 4300 members ALICE : 36 countries, 131 institutes, 1200 members as of April 4, 2015 LHCb : 16 countries, 67 institues, 1060 members Asia : ~43% N. America Europe : ~40% Japan :139 : ~17% Germany : 89 Korea : 37 US : 78 Italy : 62 Taiwan : 25 Canada : 20 Russia : 40 India : 25 Mexico : 8 Slovenia : 17 China : 18 Austria : 14 Australia :22 Poland : 11 Czech rep. : 8 others : < 8 colleagues / country 8 Distributed Computing Resources KIT, CNAF, CESNET, GRID SiGNET, HEPHY, UA-ISMA, Middlewares ULAKBIM, CYFRONET, .....

Clusters w/o middleware GE, TORQUE, LSF,... Belle II resources / infrastructure overlap with WLCG Direct submission batch system BINP, NSU, universities in Japan, Korea CREAM CE

Dynamic HTCondor/ HTCondor Cracow Torque Cloud cloud cloud cloud Scheduler VM Manager site site site

cloud cloud cloud cloud cloud cloud cloud site site site site site site site

Melbourne UVic PNNL • Seen as a traditional CREAM CE site HPC • Installed in each cloud site Academic clouds Commercial clouds, Amazon EC2, etc 9 Interoperability with DIRAC KIT, CNAF, CESNET, GRID SiGNET, HEPHY, UA-ISMA, Middlewares ULAKBIM, CYFRONET, .....

Clusters w/o middleware Distributed KEK GE, TORQUE, LSF,... Infrastructure with Direct submission Remote Agent Control A part of DIRAC (originally developed for LHCb)

VMDIRAC SSH or tunnel batch system DIRAC SiteDirector DIRAC BINP, NSU, many universities • Provided as a DIRAC plugin in Japan • Need additional installation CREAM CE SiteDirector SLURM • Multiple cloud sites allowed SiteDirector • Handle each cloud as a site • No modification in cloud site Dynamic HTCondor/ HTCondor Cracow Torque Cloud cloud cloud cloud Scheduler VM Manager site site site

cloud cloud cloud cloud cloud cloud cloud site site site site site site site

Melbourne UVic PNNL • Seen as a traditional CREAM CE site HPC • Installed in each cloud site Academic clouds Commercial clouds, Amazon EC2, etc Detector user analysis (Ntuple level) Local resource MC production and

MC production site Physics analysis skim mdst storage GRID site DataRegional Center raw data storage Belle IIComputing Model and (re)process and processing datastorageRaw Asia Raw Data CenterRaw KEK Data Center Europe 1 Cloud site PNNL Data Center processing dataduplex. Raw Europe 2 cluster site end ofyear 3 dashed Ntuple inputs for mdst MC mdst Data data Raw Tape Disk CPU 10 11 Belle II Distributed Computing system

DIRAC main servers @ KEK DIRAC servers for test/development purpose at PNNL (USA), Cracow (Poland) , etc.

VOMS @ KEK

AMGA + LFC : has been working well recent improvement

Studies with DFC vs AMGA+LFC : not yet a stage to tell their scalabilities

FTS3 : ge ing integrated

cvmfs is used for software distribution 12 Data flow diagram processed @ produced on the world-wide Data “raw data center” MC distributed computing DAQ unit DAQ unit Event generation DAQ unit Belle II Online disk Detector simulation Detector (Sequential root) No intermediate stage Digitization is saved O ine storage ~4 sec/event (Sequential root  raw data in root : merge) rst process Reconstruction 300kB / event Reconstruction

mDST mDST 40kB / event 40kB / event distributed to each region

“coordinated” group skimming

mDST : reconstruction level info., group ocial skim : mDST, mDST, index mDST : mDST + particle level info, index : collection of pointers to events

User distributed analysis index User job Ntuple User job “Conceptual” “another Ntuple more disk requirements less disk requirement mDST possibility” mDST more chaotic network access less network requirements Ueda I. - Belle II Computing towards ... - 2015.02.07. @ B2GM (KEK) 13 Operation at larger scale 15 countries/regions Australia, Austria, Canada, Czeck R., Germany, Italy, Japan, Korea, Poland, 18k concurrent jobs (one-day average) 4th Russia, Slovenia, Taiwan, Turkey, ~11B 2nd 3rd events Ukraine, USA ~6B 31 sites 560M events 1st events 60M GRID, Cloud, local cluster is available events more than 3ab-1 data produced in 2014 However, still a factor of x10 below requirements for full Belle II luminosity Feb. 2012 Feb. 2013 Feb. 2014 Feb. 2015 Oct. 2014 ATLAS (>130 sites) LHCb (~120 sites) 140k ~40k Jobs 100k Belle II now 18k Jobs

Mar. 2010 Sep. 2012 Sep. 2014 14 Monitoring system

•DIRAC web interface can provide some monitors for job status etc.

It is not su cient for our operation. e.g. pilot status We are developing our own monitoring system with web interface. 2-way monitoring •Active way Submit test job, test SE access, port access •Passive way Keep statistics on the result of pilot jobs, Analyze log of pilot jobs

Visualize monitoring result on the web page. (enable to check it easily, anywhere) 15 Monitoring system

Check basic requirements for each site by test jobs. (libraries, free disk space, etc)

↓Check pilot status as well a s processing time, heartbeat time and so on.

We can quickly nd many kinds of problems by our monitoring system now. As a next step, more sophisticated system is being developed: more precise diagnosis, automated notication, etc. made the MC production shift easier 16 Distributed Computing in future

Production Manager Data Manager End Users Keeps evolving BelleDIRAC gb2 client tools Web portal To be considered “Prototype” worked To be developed More development Fabrication System Distribution System

WMS VMDIRAC RMS DMS

Grid Services for Belle II CVMFS AMGA LFC FTS

Sites Cloud I/F CE cloud SE SE CE DIRAC site slave Cluster cloud Xrootd SE site local I/O Cluster remote I/O Server To be integrated Detector user analysis (Ntuple level) Local resource MC production and

MC production site Physics analysis skim mdst storage GRID site DataRegional Center raw data storage Belle IIComputing Model and (re)process India Data Center Raw Data CenterRaw (10% ) KEK Data Center (100% ) Korea Data Center Asia (10% ) Cloud site Canada Data center Germany Data Center Italy Data Center PNNL Data Center Computer cluster site (10% ) (30% ) North AmericaNorth start yearstart 4 (20% ) (raw datapart) (20% ) Europe 17 18 Processed Data Distribution mDST (data) is copied in Asia, Europe, and USA

For the MC data seems to be natural to be the similar structure be er network ? in each region completeness of the dataset in each region easier maintenance ? unbalance of resources data copy between three regions 1-set mDST MC 1-set mDST main center : GridKa/DESY (Germany), 1-set MC CNAF (Italy) mDST main center : PNNL SiGNET (Slovenia) U.Vic. / McGill (Canada) CYFRONET/CC1 (Poland) MC VPI, Hawaii, ... BINP (Russia) main center : KEK (Japan) many US univ. HEPHY (Austria) KISTI (Korea) : mexico CESNET (Czech rep.) NTU (Taiwan) ISMA (Ukraine) Melbourne U.(Australia) INFN Napoli/Pisa/Frascati IHEP (China) /Legnaro/Torino (Italy) TIFR (India) ULAKBIM (Turkey) many Japanese Univ. : spain, saudi arabia : thai, vietnam, malaysia, ... 19 Trans-Pacific data challenge

KEK 10Gbps Sea le Tsukuba 10Gbps PNNL PacicWave Tokyo Los Angeles

reached 1000MB/s transfer between PNNL and KEK

Summer 2015 Goal

Summer 2014 Goal 20 Trans-Atlantic data challenge US side

10G ESnet (AS293) MANLAN . “traceroute” was used to con rm PNNL (AS65428) Exchange Ethernet 10Gbps Best-Effort VLAN LSP between VRFs the routing to each SE 192.188.41.2/30 Bridging V VLAN 3011 V V VLAN 3011 dc.hep.pnnl.org 10G R R R (192.188.41.20) PNNL-CE F 10G F pnwg-cr5.es.net aofa-cr5.es.net F Brocade MLX 192.188.41.17/28 100G 100G 192.188.41.1/30 192.188.41.5/30 . “iperf” was used to do initial network transfer rate test xxx.hep.pnnl.org . FTS3 server at GridKa was used Dedicated 10G link between PNNL SE and ESNet VLAN 3011 ANA100100G to schedule data transfers 10G best-effort Label Switched Path in ESNet backbone

SURFnet !(AS20965) Ethernet GARR (AS137) Test was done VLAN (AS1103) GARR Bridging 62.40.124.58/30 / ANA-100 193.206.130.1/30 in May June 2014 VLAN 3011 V VLAN 3011 V V V R ANA-100 R R VRF R 100G F mx1.ams.nl F F F Juniper T4000 VRF mx1.gen.ch 100G rx1.mi1.garr.net rx1.na1.garr.net 62.40.124.57/30 192.188.41.6/30 rx1.bo.garr.net Network providers setup the VLAN 10G VRF mx1.fra.de 193.206.128.1/30 VRF 192.108.68.66/30 193.206.130.2/30 40G Local network providers and sites 193.206.128.2/30 na.infn.it

coordinated nal con gurations 100G cnaf.infn.it VLAN 3011 VLAN 3011 INFN CNAF Napoli Sites must con gure hardware interface ppssrm-kit.gridka.de (192.108.45.58) DFN (AS680) 10G to match destinations xr-fra1.x-win.dfn.de 10G

fts3-node1-kit.gridka.de VLAN 3011 Vincenzo Capone, (192.108.45.59) Chin Guok ds-202-11-03.cnaf.infn.it recasse01.na.infn.it Aleksandr DFN (131.154.130.76) (193.205.223.100) Kurbatov, Mian f01-151-10-e.gridka.de (192.108.45.245) Usman 40G kr-fzk.x- win.dfn.de 192.108.68.65/30 kit.edu f01-151-45-e.gridka.de M.Schram (192.108.45.246) (PNNL) KIT Europe side f01-151-45-e.gridka.de dcache KIT (AS34878) Thomas Schmid, Hubert Weibel Marco Marletta (192.108.45.246) (192.108.46.24) 21 Trans-Atlantic data challenge US side

10G ESnet (AS293) MANLAN . “traceroute” was used to con rm PNNL (AS65428) Exchange Ethernet 10Gbps Best-Effort VLAN LSP between VRFs the routing to each SE 192.188.41.2/30 Bridging V VLAN 3011 V V VLAN 3011 dc.hep.pnnl.org 10G R R R (192.188.41.20) PNNL-CE F 10G F pnwg-cr5.es.net aofa-cr5.es.net F Brocade MLX 192.188.41.17/28 100G 100G 192.188.41.1/30 192.188.41.5/30 . “iperf” was used to do initial network transfer rate test xxx.hep.pnnl.org . FTS3 server at GridKa was used Dedicated 10G link between PNNL SE and ESNet VLAN 3011 ANA100100G to schedule data transfers 10G best-effort Label Switched Path in ESNet backboneResults using FTS3 server 1.0 GBytes/sec (=8Gbps)

SURFnet !(AS20965) Ethernet GARR (AS137) Test was done VLAN (AS1103) GARR Bridging 62.40.124.58/30 / Output ANA-100 193.206.130.1/30 in May June 2014 VLAN 3011 V VLAN 3011 V V V R ANA-100 R R VRF R 100G F mx1.ams.nl F F F 0.5 GBytes/sec Juniper T4000 VRF mx1.gen.ch 100G rx1.mi1.garr.net rx1.na1.garr.net 62.40.124.57/30 192.188.41.6/30 rx1.bo.garr.net Network providers setup the VLAN 10G VRF mx1.fra.de 193.206.128.1/30 VRF 192.108.68.66/30 193.206.130.2/30 40G Local network providers and sites 193.206.128.2/30 na.infn.it

coordinated nal con gurations 100G cnaf.infn.it VLAN 3011 VLAN 3011 Input INFN CNAF Napoli Sites must con gure hardware interface ppssrm-kit.gridka.de (192.108.45.58) DFN (AS680) 10G to match destinations xr-fra1.x-win.dfn.de 10G

fts3-node1-kit.gridka.de VLAN 3011 Vincenzo Capone, (192.108.45.59) Chin Guok ds-202-11-03.cnaf.infn.it recasse01.na.infn.it Aleksandr DFN (131.154.130.76) (193.205.223.100) Kurbatov, Mian f01-151-10-e.gridka.de (192.108.45.245) Usman 40G kr-fzk.x- win.dfn.de 192.108.68.65/30 kit.edu f01-151-45-e.gridka.de M.Schram (192.108.45.246) (PNNL) KIT Europe side f01-151-45-e.gridka.de dcache KIT (AS34878) Thomas Schmid, Hubert Weibel Marco Marletta (192.108.45.246) (192.108.46.24) 22 Belle II joined LHCONE GÉANT CANARIE JANET SARA NORDUnet SCINET UK Netherlands UVic Canada (UTor) Nordic IHEP-ATLAS UAlb NL-T1 NDGF-T1b NDGF-T1 GLORIAD TRIUMF-T1 McGill ICL CSTNet/CERNet2 (global) SimFraU NDGF-T1c KIAE/ Kurchatov China T1 IHEP-CMS CERN Korea Amsterdam (CERNLight) KISTI SINET MREN Chicago (NetherLight) Korea (Chicago) Geneva (StarLight) Japan CERN-T1 TIFR India KEK T1 Geneva Indiana CIC GigaPoP OmniPoP (Chicago) DFN ICEPP Germany U Tokyo RWTH KIT GSI DE-KIT-T1 PNNL-T1 ESnet DESY Wup.U KISTI –T1 FNAL-T1 USA New York Seattle (MAN LAN) BNL-T1 UChi India KERONET2 (PNWG) UNeb Korea PacWave SLAC UCSD Beijing, KNU KCMS Sunnyvale AGLT2 China PNU UNL UM AGLT2 GÉANT MSU Czech CERN CENIC Europe USA ASGC

Taiwan Los Solvenia Caltech Angeles IU SiGNET ASGC-T1 RENATER MIT IN2P3 ASGC2 UIUC MWT2 France CEA (10 sites) (IRFU) NCU NTU Washington CC-IN2P3-T1 UFlorida (WIX) PurU GLakes UWisc NE SoW Vandebilt Harvard

Internet2 GARR Romania TEIN USA RedIRIS INFN Italy (proposed) (7 sites) Spain CNAF-T1 INFN Napoli PIC-T1 INFN TIFR NKN Pisa CERN India CUDI Mexico  The network requirements for Belle II are similar to the LHC experiments. UNAM  Most Regional Data Centers are already part of LHCONE  LHCONE has been extended to include Belle II Jason Zurawski, Joe Metzger (Mar. 23, 2015) h p://www.es.net/assets/pubs_presos/20150323-OSG-ATLASCMS-Zurawski-v2.pdf 23 Common Software Framework Belle II Analysis Software Frameworek path

Mod Mod Mod Mod Mod Mod Mod

Data Store Input module Output module

steering le is wrien in python 24 Common Software Framework Belle II Analysis Software Frameworek path

Mod Mod Mod Mod Mod Mod Mod

Data Store

path Ring Ring Buer Mod Mod Mod Mod Mod Mod Mod Buer

Input module Data Store Output module path path path multi-process feature not optimized for HT yet Detector simulation, Digitization, Geometry initialization 0 Reconstruction (track nding/ ing, p /g clustering) Event Generator Particle identi cation steering le is wrien in python 25 Software update/improvement

0 5 10 15 20 25 30 35 40 45 50

build-2014-01-17 build-2015-02-03

100 HS06 * s/event  ~40 HS06 *s/event 26 Coordinated group skimming From experience of Belle 2015  Many users cannot use resources eectively in skimming process − i erate the skimming with dierent selection  Not so many users pay a ention whether or not the submi jobs successfully nished − consideration of exp/run-dependences, log le check, ...  Users’ interests can be easily concentrated on certain datasets in particular, the early stage of the experiment − data taken under good accelerator/detector condition − new datasets (new exp#/run#)  same input but many dierent outputs  Usually, it takes too much time to carry out the skimming process and to nish by users

Job failure rate  “coordinated” group skimming must make the physics analysis faster. 27 Production system

MC in early 2014 : no “Production System” 2.5 M jobs : manually submi ed by the shifters  some mistakes e.g. same job submi ed twice wrong destination SE ...

Monitor DB Production system, what we need... Once a production manager denes a production parameters Fabrication Distribution software release, simulation channel, background, number of events, ... A system takes care of the rest and produces the output Validation

UI

MC in late 2014 : with “Prototype Production System”

4.7 M jobs : controlles by a single person

“Full Production System” yet to be developed 28 Summary

 Belle II time line SuperKEKB accelerator commissioning (phase1) starts in early 2016 Phase2 (w/o VXD) starts in 2017  Belle II Distributed Computing should be ready Phase3 (w/ Full detector) starts in 2018  raw data distribution starts  MC mass production & Data transfer challenge the basic concept of the Belle II computing model was proven but still we have many things to do (e.g. Data distribution, user distributed analsis, etc.)  E orts to utilize the limited hardware resources AMAP are on-going toward the physics run. multicore jobs (memory size ) Tuning/Optimization of software (memory size , shorter CPU time) Coordinated group skimming (less human-error) Production system 29 Belle II Oral/Poster Presentations

Software Computing

Oliver FROST @ Track2, April 13 Hideki MIYAKE @ Track4, April 13 Cellular Automaton based Track Finding Belle II production system for the Central Drift Chamber of Belle II Randy SOBIE @ Track7, April 14 Marko BRACKO @ Track3, April 13 Utilizing cloud computing resources for Belle II The Belle II Conditions Database

Marko STARIC @ Track2, April 14 Yuji KATO @ Poster A, 337 Physics Analysis Software Framework for Belle II Job monitoring on DIRAC for Belle II distributed computing Chia-Ling HSU @ Poster A, 468 Thomas HAUTH @ Poster A, 233 The Belle II analysis on Grid Software Development at Belle II Rafal Zbigniew GRZYMKOWSKI @ Poster B, 342 Belle II public and private clouds management in VMDIRAC system. Tobias SCHLÜTER @ Poster A, 469 The GENFIT Library for Track Fi ing and its Performance in Belle II Kiyoshi HAYASAKA @ Poster B, 314 Monitoring system for the Belle II distributed computing Leo PIILONEN @ Poster B, 320 The Simulation Library of the Belle II Software System Jae-Hyuck KWAK @ Poster B, 466 Improvement of AMGA Python Client Library for the Belle II Experiment Tadeas BILKA @ Poster B, 469 Alignment and calibration of Belle II tracking detectors Geun Chul PARK @ Poster B, 313 Directory Search Performance Optimization of AMGA for the Belle II Experiment 30 DAQ + physics trigger

PXD Online disk Event Builder 2

PC CDC

6

PC @maximum Event Builder 1 1.8GB/sec @ o ine storage PC

PC

PXD Event RAW Event Bulding L3 Physics Building w/o Reconstruction Chain Storage Trigger Trigger Pixel (PXD) 1/3 reduction 1/2-1/6 reduction 1/3-1/6 reduction Reduce Beam Physics background categorisation 31 AMGA metadata catalogue

ARDA Metadata G rid Application – Metadata server for GRID environment (EMI product) Metadata : data of data LFN, run range, software version…

LFC (LCG(LHC Computing Grid) File Catalog) PFN (Physical File Name) a specification of the physical location of a file LFC API e.g. srm://kek2-se01.cc.kek.jp:8444/grid/belle/MC/signal/... c.f. URL http://belle2.kek.jp/join.html LFN (Logical File Name) = a site-independent file name e.g. /grid/belle/MC/signal/... GUID (Globally Unique Identifier) File Catalog File … gBasf2 LFN is only common data between LFC and AMGA AMGA API AMGA (ARDA Metadata Grid Application) ARDA (A Realisation of Distributed Analysis for LHC) LFN Exp Run …

File Metadata File API : application interface