Univa Grid Engine Status at CC-IN2P3 Vanessa HAMAR for the CCIN2P3 Batch Team Overview
Total Page:16
File Type:pdf, Size:1020Kb
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules 2014/03/11 Univa Grid Engine status at CC-IN2P3 Vanessa HAMAR for the CCIN2P3 Batch Team Overview } Team members } History } Configuration } Numbers } Grid integration } Monitoring } Multicore } What we want? } Future Plans 2014/03/11 2 Batch team members } Batch team leader: ◦ Suzanne Poulat } System Administrators: 1 FTE ◦ Aurélien Gounon ◦ Mattieu Puel ◦ Vanessa Hamar } User Support and UGE administrators: ◦ Bernard Chambon ◦ Nadia Lajili ◦ Rachid Lemrani 2014/03/11 3 History BQS 1992-2012 OGE 2012-2013 UGE 2013 - ... 2014/03/11 4 History BQS 1992-2012 OGE 2012-2013 UGE 2013 - ... } BQS – Home made batch system ◦ 1992 – 2012 -> 20 years of service! ◦ Stable and robust product: 10.000 simultaneous jobs 100.000 jobs per day 2010 ◦ Jobs and configuration in a MySql DB “HEP world” faced new requirements: • VM • multicores • interactive • increase of the needs 2014/03/11 5 History BQS 1992-2012 OGE 2012-2013 UGE 2013 - ... } OGE ◦ 2010 : Sun Grid Engine (SGE) chosen ◦ Used in a test farm to customize the configuration ◦ March 2011 : Farm opened to users ◦ At the beginning of 2011 : Sun bought by Oracle ◦ since July 2011 : Oracle Grid Engine (OGE) Oracle support was not satisfactory. 2014/03/11 6 History BQS 1992-2012 OGE 2012-2013 UGE 2013 - ... } UGE ◦ Fall 2012 -> Evaluation other Grid Engine options ◦ Early 2013 -> choice of Univa Grid Engine (UGE) ◦ June 2013 -> UGE replaces OGE (2013/06/05) 2014/03/11 7 Configuration - Nodes } Common ◦ Operating System = Scientific Linux 6 ◦ UGE Version = 8.1.6 } Master ◦ Automatic restart procedure for qmaster process ◦ Jobs flow regulation (via complex - intensive usage of Resource Quota Sets (RQS)) ◦ PostgreSQL spooling (dedicated server) ◦ ARCo writing to an PostgreSQL DB + internal tools to access data } Shadow ◦ Stopped after several master outage } WNs ◦ Local disk space ($TMPDIR) managed by XFS quota ◦ Outputs copied to user’s HOME when job is DONE ◦ AFS as filesystem for common directory ◦ AFS token renewal in set_token_cmd (home made) ◦ GPFS access control to allow or deny job submission according complex specification Kernel module used by automounter (home made) 2014/03/11 8 Configuration - Functionalities } One shared cluster for 150 groups: ◦ To distribute resources according commitments with experiments in a just way: Share Tree Policy ◦ For adjustments according to particular needs from users : Priority Override ticket 2014/03/11 9 Configuration - Logs } Externals tools for: ◦ Daily rotate of messages files on master ◦ Accounting file rotate made to survive with qacct A file per month 7 days in the current file } House keeping on execution hosts ◦ keep_active = true (but deleting files older than 7 days) 2014/03/11 10 Configuration - Functionalities } Fair Share on Projects } Complex, Resource Quota Sets (RQS) } Load sensors: worker nodes classification depending of load sensors for disk space and memory usage integrated in “Load Formula” } Prolog, epilog, set_token_cmd hooks } Scheduler limitations in order to avoid master blockage: ◦ SCHEDULER_TIMEOUT ◦ MAX_SCHEDULING_TIME ◦ MAX_DISPATCHED_JOBS } Only one Job Submission Verifier (JSV) on interactive jobs (qlogin) to force core binding 2014/03/11 11 One instance for all our needs Parallel Slots Interactive 5% 0% 1024 cores 48 cores Multicore 6% 1080 cores Sequential 89% 17264 cores 2014/03/11 12 Some numbers 20 >4000 Complexs 20 Users queues RQS 400 lines 150 Groups OpenMPI, 200 MPICH2 and Projects multicores jobs 750 Execution Hosts 12.000 pending jobs > 18000 Execution 18.000 simultaneous jobs Threads >110.000 ended jobs / day >600.000 qstat / day 2014/03/11 13 Grid Integration - JWGEN JOB User Needs: • Set user job environment: JWGEN JOB middleware version, operating system, environment variables • Set LRMS queue • Set hardware resources: CPU, memory, disk GENerated • Set logical resource: hpss, xrootd, dCache… Job • Set permissions: software Wrapper manager Job Information: Site policy • Computing Element • Accept/deny job submission • for some users, for part of the Queue: long, atlas, site SL5 • Enforcement of security • policy (for staging) User info: VO, role • Add options and information • JDL requirements: for: quota, jobs priority, CPU, memory, disk accounting 2014/03/11 14 Monitoring MRTG - Service view SMURF – System view Symod – Complete view 2014/03/11 15 Multicores } Different tests were made: ◦ Multicore + Sequential + Reservation active = Scheduled time multiplied by 10 ✗ ◦ Multicore + Sequential + Slot Urgency = more important jobs with less number of slots, more slots required stay always in waiting status ✗ ◦ Not Urgency or Reservation active = Slots always used by sequential jobs – Multicore always waiting ✗ ◦ Sequential and Multicore in separated machines ✓ If more machines will be needed it can be added in 24 hours 2014/03/11 16 Multicores } Atlas in production since January 2014 2014/03/11 17 UNIVA } What we want? ◦ To be able to mix sequential and multicores jobs in the same group ◦ Have the possibility to fix number of jobs by user ◦ For the next version: Qstat Improvements Reactivate the shadow ◦ Improve the monitoring when the system is in trouble. } A few Request for Enhancement (end_user) ◦ Merge the qacct and the qstat command. ◦ Currently this is difficult to get a synoptic (coherent?) view. ◦ Improve the qalter command client. All the resources required at the submission have to be specified again otherwise they are lost. ◦ Job resource requirements have to be checked by the system before the job is queued for running ◦ A way to fix a minimum amount of running jobs per user 2014/03/11 18 UNIVA } We are happy because: ◦ Our requirements are included in the product road map ◦ Responses from UNIVA’s support are quick and precise – … a dedicated developer is taking care of us. 2014/03/11 19 Future plans } Use more policies, by example: ◦ functional_policy } Use job classes } Put in production new tools like: ◦ Unicloud ◦ Unisight } Use advanced resource reservation for multicore jobs } Set number of jobs by user 2014/03/11 20 Questions 2014/03/11 21 .