Hepix Report

HEPiX Report Helge Meinhard, Pawel Grzywaczewski, Romain Wartel / CERN-IT Post-C5/Computing Seminar 03 December 2010 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Outline • Meeting organisation, site reports, (benchmarking,) infrastructure (Helge Meinhard) • Storage, OS and applications, miscellaneous (Pawel Grzywaczewski) • Virtualisation, security and networking, grid and cloud (Romain Wartel) HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 HEPiX • Global organisation of service managers and support staff providing computing facilities for HEP • Covering all platforms of interest (Unix/Linux, Windows, Grid, …) • Aim: Present recent work and future plans, share experience, advise managers • Meetings ~ 2 / y (spring in Europe, autumn typically in North America) HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 HEPiX Autumn 2010 (1) • Held 01 – 05 November at Cornell University, Ithaca NY – CESR: Electron-positron storage ring; CLEO: experiment doing a lot of interesting b physics – New player in the HEPiX field at as site… but a well-known face: Chuck Boeheim, the previous north-American co-chair of HEPiX – Good local organisation – Nice auditorium in conference hotel, basically unlimited coffee supply – First face-to-face meeting in 2010 for most participants HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 HEPiX Autumn 2010 (2) • Format: Pre-defined tracks with conveners and invited speakers per track – Still room for spontaneous talks – either fit into one of the tracks, or classified as ‘miscellaneous’ – Again proved to be the right approach; in view of the low number of participants, an extremely rich, interesting and packed agenda – Judging by number of submitted abstracts, no real hot spot: 8 infrastructure, 8 Grid/clouds, 7 storage, 6 virtualisation, 6 OS and apps, 5 network and security, 3 miscellaneous, 1 benchmarking – Some abstracts submitted late, planning difficult • Full details and slides: http://indico.cern.ch/conferenceDisplay.py?confId=92498 • Trip report by Alan Silverman available, too http://cdsweb.cern.ch/record/1307061 HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 HEPiX Autumn 2010 (3) • 47 registered participants, of which 11 from CERN – Barring, Bell, Grzywaczewski, Janyst, Kelemen, Meinhard, Salter, Schwickerath, Silverman, T Smith, Wartel – Other sites: ASGC, Caspur, CEA, CNAF, Cornell, DESY Hamburg, DESY Zeuthen, FNAL, FZU, IN2P3, INFN Milano, INFN Pavia, JLAB, KISTI, LAL, NIKHEF, RAL, SFU, SLAC, TRIUMF, Umea U – Compare with Berkeley (autumn 2009): 61 participants, of which 9 from CERN HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 HEPiX Autumn 2010 (4) • 62 talks, of which 19 from CERN – Compare with Berkeley: 62 talks, of which 16 from CERN • Next meetings: – Spring 2011: GSI Darmstadt (May 2 - 6) • Possibly followed by an LCG workshop over the weekend – Autumn 2011: Vancouver (to be confirmed, date to be decided; 20th anniversary of HEPiX!) HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Oracle/SUN Policy Concerns • Recent observations: – Significantly increased HW prices for Thor-style machines – Very significantly increased maintenance fees for Oracle (ex-Sun) software running on non-Oracle hardware • Sun GridEngine, Lustre, OpenSolaris, Java, OpenOffice, VirtualBox, … – Very limited collaboration with non-Oracle developers – Most Oracle software has already got forked as open- source projects – (At least) two Oracle-independent consortia around Lustre • HEP labs very concerned HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Site Reports (1) • Worker node acquisitions: HPC-style small form-factor boxes (e.g. 4 dual CPU systems in a 2U enclosure) very popular – HP, Dell, Supermicro, Acer, … – Overheating CPUs because of missing thermal grease… • Disk storage – Many sites using storage-in-a-box (a la CERN or Thumper/Thor) – Some dedicated storage with SAN (FC or iSCSI) or NAS uplink • Dell MD1000, DDN 6620 (60 drives in 4U) – ASGC reporting servers of 160 TB with 10GE HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Site Reports (2) • Tape storage: Large sites appear to prefer SL8500 robots and LTO5 drives • Networking: OPN hasn’t reached out yet to Norwegian and Slowenian parts of Nordic T1 • GPUs mentioned only for non-HEP applications HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Site Reports (3) • Batch schedulers: One singularity (BQS) gets eradicated at last… – Replaced by (Sun|Oracle) Grid Engine • Configuration tools: Random walk of Quattor, cfengine, Puppets, … – One more very positive Quattor report from RAL • Monitoring: Many sites using Nagios, appears to become a de-facto standard • Drupal, Jabber, Trac mentioned a number of times HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Site Reports (4) • Other interesting points – DESY: turning into a centre for accelerator research, particle physics and photon physics. New requirements • MacOS, Windows HPC • Large RAM machines (~500 GB!) • Lustre on MacOS and Windows • 20…100 GB from XFEL from 2012/13 on • NUMA, GPU computing • Hundreds of VOs with a handful users each – SLAC: Similar conversion as DESY • Mixed experience with back-charging for scientific computing • Tender with fixed budget for maximal capacity HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Site Reports (5) –FNAL • Return air is 23 deg C, too low • DOE review recommends physics capacity without UPS – JLAB: Using tool (Surveyer) to control power consumption of desktop PCs – large savings – RAL: Quibble about email addresses – GSI: Compute requirements similar as for LHC • “Cube” computer centre prototyped – PUE 1.1 – FZU Prag: Experience with a 100 A circuit- breaker tripping… • … because it has been running at 96 A constantly… HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Infrastructure (1) • 8 talks, 3 from CERN – LSF scalability tests (Schwickerath) – CERN-IT procurement (Barring) – CERN computer centre upgrade project (Salter) • Update on Quattor at RAL (Collier) – Started with new batch worker nodes. Batch done, now covering disk servers gradually – Good experience, estimate saving 0.3…0.5 FTE – Good experience with QWG as well – NIKHEF starting tests, too HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Infrastructure (2) • Assets server at LEPP, Cornell U (Pulver) – OCSNG: multi-platform “moment in time” hardware and software inventory – GLPI: full lifecycle asset management – ZENOSS: agent-less monitoring, extensible via Python scripts • Scientific computing at JLAB (Philpott) – Extensive experience with GPUs: mixture of gaming cards and professional units • Mostly using CUDA, interested in OpenCL – Lustre: 300 TB on commodity HW – lots of fun points HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010 Infrastructure (3) • Batch infrastructure resource at DESY: BIRD (Finnern) – Collecting smallish resources, parasitic batch usage based on SGE fair-share – Contributing projects are granted fair-share points • Infrastructure improvements at IN2P3 (Olivero) – Current room: new transformers, diesel, cooling unit – New building: 2 levels of 850 m2, no offices • No raised floor, no false ceiling. All services from above • Minimal electrical redundance, will perhaps use EDF offering of double dedicated connection • Construction started in April 2010, scheduled to finish by February 2011. first production in March 2011 HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010.

Hepix Report

Solaris-Cluster-Businesscontinuity-168285.Pdf

Beginner's Guide to Oracle Grid Engine 6.2 Oracle White Paper—Beginner's Guide to Oracle Grid Engine 6.2

7.1 Task Computing 7.2 Task-Based Application Models 7

Development of Technological Projects with the Active Participation of Women in El Salvador

HP-UX to Oracle Solaris Porting Guide Getting Started on the Move to Oracle Solaris

Prosabladet Udgivelse: 2011/6

Study of Scheduling Optimization Through the Batch Job Logs Analysis 윤 준 원1 · 송 의 성2* 1한국과학기술정보연구원 슈퍼컴퓨팅본부 2부산교육대학교 컴퓨터교육과

Linux Control Groups Univa Grid Engine

Workload Schedulers - Genesis, Algorithms and Comparisons

Univa Grid Engine Status at CC-IN2P3 Vanessa HAMAR for the CCIN2P3 Batch Team Overview

Oracle Grid Engine: an Overview

Electricaccelerator® Installation and Configuration Guide, Version