Particle Physics Cluster Infrastructure Introduction University of Oxford Department of Particle Physics October 2019

Vipul Davda Particle Physics Systems Administrator Room 661 Telephone: x73389 [email protected]

Particle Physics Computing Overview 1 Particle Physics Linux Infrastructure

Distributed File System gluster NFS /data Worker Nodes

/data/atlas /data/lhcb NFS HTCondor /home Batch Server

Interactive Servers

physics_s/eduroam

Network Printer Managed Laptops Managed Desktops

Particle Physics Computing Overview 2 Introduction to the

 Unix is a Multi-User/Multi-Tasking operating system.  Developed in 1969 at AT&T’s Bell Labs by  Ken Thompson (Unix)  Dennis Ritchie (C)  Unix is written in C programming language.  Unix was originally a command-line OS, but now has a graphical user interface.  It is available in many different forms:  Linux , Solaris, AIX, HP-UX, freeBSD  It is a well-suited environment for program development: C, C++, Java, Fortran, Python…  Unix is mainly used on large servers for scientific applications. Particle Physics Computing Overview 3 Linux Distributions

Source: https://www.muylinux.com/2009/04/24/logos-de-distribuciones-gnulinux/

Particle Physics Computing Overview 4 Particle Physics Linux Infrastructure

 Particle Physics uses CentOS Linux on the cluster.

 It is a free version of RedHat Enterprise Linux.

Particle Physics Computing Overview 5 Basic Linux Commands

Particle Physics Computing Overview 6 Basic Linux Commands

o ls - list directory contents ls –l - long listing ls -a - list all files including hidden file beginning with a dot “.”. ls -ld * - list details about a directory and not its contents ls –lh - give human readable file sizes ls –lS - sort files by file size ls –lt - sort files by modification time o cd – change directory $ cd /data/atlas/ o pwd - print name of working directory $ pwd o ~ (tilde) $ cd ~

Particle Physics Computing Overview 7 Basic Linux Commands

o cp – copy file $ cp file1.txt file2.txt o mv – rename file/directory $ mv somefilename.txt file.txt o mkdir – create directory $ mkdir mydata o rm – delete file $ rm myfile o cat - concatenate files and print on the standard output $ cat ~/somefilename.txt

Particle Physics Computing Overview 8 Basic Linux Commands otar - tape archive $ tar cvfp mydata.tar mydata/ ogzip - compress files $ gzip mydata.tar ountar and ungzip $ gzip -cd mydata.tar.gz | tar xvf – $ tar xvfz mydata.tar.gz

Particle Physics Computing Overview 9 Basic Linux Commands

o Disk usage $ df -h /data/snoplus Filesystem Size Used Avail Use% Mounted on pplxfs30.physics.ox.ac.uk:/data/snoplus 48T 38T 11T 79% /data/snoplus o File space usage $ du -sh ~/ 15G /home/davda/  tree - list contents of directories in a tree-like format. $ tree –L 3 –d ~/ | less $ tree –L 3 ~/ | less o find - search for files in a directory hierarchy $ find ~/ -name "*.py"

Particle Physics Computing Overview 10 Basic Linux Commands

o which - shows the full path of (shell) commands $ which ls /usr/bin/ls o whereis - locate the binary, source, and manual page files for a command $ whereis ls ls: /usr/bin/ls /usr/share/man/man1/ls.1.gz /usr/share/man/man1p/ls.1p.gz o locate - find files by name $ locate stdio.h

Particle Physics Computing Overview 11 Linux Command Line Training

Linux Command Line Training: https://www.linkedin.com/learning/learning-linux-command-line-2

Particle Physics Computing Overview 12 Environment Modules

Particle Physics Computing Overview 13 Environment Modules

 CentOS comes with a set of core packages such as gcc, python, glibc etc. The version of each core package is locked to the version of the OS.  For example, if later versions of gcc are required, the following options are available:  Download and compile in your home area and update all paths manually. OR  Use Environment modules.  Environment modules allow you to load different versions of gcc, python, root etc.  If you require any software which is not available as a module, please let us know.

Particle Physics Computing Overview 14 How to use Environment Modules

$ module avail ------/network/software/el7//modules ------

editors/vim/8.1 gcc/8.1.0 genie/genie intel/2015 pygist/2.2 root/5.34.36_pythia6 texlive/2019forthon/0.8.35 geant4/10.4.2 git/2.21.0 lhapdf5/5.9.1 root/5.34.36 texlive/2018 warp/single

------/etc/modulefiles ------mpi/mpich-3.2-x86_64

Particle Physics Computing Overview 15 How to use Environment Modules

$ module show gcc/8.1.0 ------/network/software/el7//modules/gcc/8.1.0: module-whatis adds gcc 8.1.0 package to your environment prepend-path PATH /network/software/el7/compilers/gcc/8.1.0/bin prepend-path LD_LIBRARY_PATH /network/software/el7/compilers/gcc/8.1.0/lib64 prepend-path MANPATH /network/software/el7/compilers/gcc/8.1.0/share/man prepend-path LD_LIBRARY_PATH /network/software/el7/compilers/gcc/8.1.0/common/lib setenv CC gcc setenv MPICH_CC gcc setenv FC gfortran setenv MPICH_FC gfortran setenv F90 gfortran setenv MPICH_F90 gfortran setenv F77 gfortran setenv MPICH_F77 gfortran setenv CPP cpp setenv MPICH_CPP cpp setenv CXX g++ Particle Physics Computing Overview 16 How to use Environment Modules

$ gcc –version gcc (GCC) 4.8.5 20150623 ( 4.8.5-36) Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ module load gcc/8.1.0 $ gcc –version gcc (GCC) 8.1.0Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ module unload gcc/8.1.0

Particle Physics Computing Overview 17 PP Batch Systems

Particle Physics Computing Overview 18 What is HTCondor?

 HTCondor is a batch management system for compute-intensive jobs.  Like other batch systems, HTCondor provides:  job queueing mechanism  scheduling policy  resource monitoring  resource management.  HTCondor is very useful when you have an application that has to run a large number of times on different input data.  HTCondor is installed on the PP CentOS 7 interactive nodes to make it possible to run a large number of computational processes, on different machines.  For a more detailed overview on HTCondor, please see https://research.cs.wisc.edu/htcondor/description.html Particle Physics Computing Overview 19 HTCondor Batch System

Worker Nodes  Jobs are submitted from pplxint10 and pplxint11

 /home and /data areas are HTCondor Batch Server mounted on all worker nodes. Interactive Servers

 There are ~400 logical CPUs on worker nodes to run jobs. Users Particle Physics Computing Overview 20 HTCondor Quick Start Guide

 Login to either pplxint10 or pplxint11.

 These are configured to submit jobs, to the HTCondor batch system.

Particle Physics Computing Overview 21 HTCondor Simple Test Script

 Create an executable on hello.py one of the interactive #!/usr/bin/python nodes. import platform  Test your executable on host=platform.node() print "Hello World - ", host an interactive node. print "finished"

Particle Physics Computing Overview 22 HTCondor Creating a Submit Description File

For example, a simple submit file to run hello.py in the batch queue. myjob.submit  In order to run a job on one of ####################################### the worker nodes, create a # HTCondor Submit Description File. # Author: submit file which sets # Date: environment variables for the # Description: batch queue. ####################################### executable = hello.py universe = vanilla output = output/results.output.$(Process) error = error/results.error.$(Process) log = log/results.log.$(Process) queue 1 Particle Physics Computing Overview 23 HTCondor Submit file

executable: The script or command that HTCondor runs.

output: Where the STDOUT of the command or script should be written to. This can be a relative or absolute path. Please note that the directory “output” will not be created, and will error if the directory does not exist.

error: Where the STDERR of the command or script would be written to. The same rules apply as for output.

log: This is the output of HTCondor's logs for the job. It will show the submission times, execution host and times, and on termination, it will show the stats.  Please note it is not logging for the executable, hello.py, in our example

queue: This schedules the job. It becomes more important (along with the interpolation) when the queue is used to schedule multiple jobs, by taking an integer as a value.

Particle Physics Computing Overview 24 HTCondor Submitting a Job

 A job is added to the HTCondor queue, by using “condor_submit” command for it to be run.

 Simply run the command:

$ condor_submit myjob.submit Submitting job(s). 1 job(s) submitted to cluster 70.

Note: Before submitting any jobs, always test, in order to make sure that both your submit file and executable work properly.

Please bear in mind that submitting untested files and/or jobs will waste time and resources if they fail. Particle Physics Computing Overview 25 HTCondor Submitting a Job

 condor_submit : Submits jobs to the HTCondor queue, according to the information specified in submit_file.

Useful options: -dry-run : this option parses the submit file and saves all the related information (name and locations of input and output files after expanding all variables, value of requirements, etc.) to .  Using this option is highly recommended when debugging or before the actual submission, if you have made some modifications to your submit file.

Particle Physics Computing Overview 26 HTCondor Monitoring the Jobs

 The condor_q command prints a list of all the jobs currently in the queue.

For example, a short time after submitting “myjob.submit” job from pplxint11, output appears as:

$ condor_q ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 70.0 davda 2/13 10:49 0+00:00:03 R 0 97.7 myjob.submit

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

Particle Physics Computing Overview 27 HTCondor Monitoring the Jobs

condor_q: Shows jobs that have been submitted to the batch queue. By default you only see the ID of the job, the owner, submitting time, run time, status, priority, size and command.

STATUS: I: idle ( in waiting queue for resource ) R: running H: on hold ( there was an error, waiting for user action ); S: suspended C: completed X: removed Particle Physics Computing Overview 28 HTCondor Monitoring the Jobs

condor_q Useful options:  -wide: Wide display. You can also use -wide: to truncate lines to fit n columns  -analyze : Shows the reason why it is in current state  -better-analyze : Shows the reason why it is in current state, giving extended information.  -long : Shows all information related to that job.  -run: Shows your running jobs.  -hold: Shows only jobs in the held state and the reason for that. Once you have resolved the problem, use the condor_release command to put the job back in the queue.  -allusers: Shows all users jobs status. Particle Physics Computing Overview 29 HTCondor Monitoring the Jobs

Rather than monitoring the job, using repeated running of condor_q command, use condor_wait command: $ condor_wait -status log/results.70.log

70.0.0 submitted 70.0.0 executing on host <163.1.136.221:9618?addrs=163.1.136.221- 9618+[--1]-9618&noUDP&sock=1232_a0c4_3> 70.0.0 completed All jobs done.

Particle Physics Computing Overview 30 HTCondor Removing a Job

 Successfully submitted jobs will occasionally need to be removed from the queue.

condor_rm Removes a specific job from the queue.

For example, remove job number 70.0 from the queue with

$ condor_rm 70.0

Particle Physics Computing Overview 31 HTCondor Useful Links

Particle Physics HTCondor, how to: https://www2.physics.ox.ac.uk/it-services/particle-physics-linux-condor-batch-farm

HTCondor Home: https://research.cs.wisc.edu/htcondor/index.html

Command Reference: https://htcondor.readthedocs.io/en/v8_8_5/man-pages/index.html

FAQ: https://htcondor.readthedocs.io/en/v8_9_3/faq/index.html

Particle Physics Computing Overview 32 Legacy Batch System - Torque

Worker Nodes  Scientific Linux (SL) 6 batch server  Jobs are submitted from pplxint8 and pplxint9 Torque Server  Around ~500 core and decreasing Interactive Servers  /home and /data area mounted to all WN’s.

Users Particle Physics Computing Overview 33 Legacy PBS/Torque Submitting a Job

 Create an executable on one of the interactive #!/bin/bash nodes. pwd hostname  Test your executable on sleep 10s an interactive node. echo Hello World

Particle Physics Computing Overview 34 Legacy PBS/Torque Submitting a Job

To submit, use qsub command.

qsub

For Example:

$ qsub jobscript 390410.pplxtorque05.physics.ox.ac.uk

Particle Physics Computing Overview 35 Legacy PBS/Torque Monitoring

To get basic information of the job, use the command qstat.

To see only your jobs, use –u option  $ qstat –u

$ qstat

Job id Name User Time Use S Queue ------390410.pplxtorque05 jobscript davda 0 R normal 390411.pplxtorque05 jobscript davda 0 Q normal

Particle Physics Computing Overview 36 Legacy PBS/Torque Submitting a Job

By default, jobs output goes to two files:  Standard output - jobscript.o390410  Error jobscript.e390410.

These are useful for debugging your jobs

$ cat jobscript.o.390410 * * Job Terminated at Fri Oct 7 11:15:36 BST 2016 * * Job Used * * cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:10 ********************************************************************* Particle Physics Computing Overview 37 Legacy PBS/Torque Useful Links

See Particle Physics’ PBS/Torque, How To:

https://www2.physics.ox.ac.uk/it-services/ppunix/particle-physics-linux-batch- farm

Particle Physics Computing Overview 38

Particle Physics Computing Overview 39 Grid Computing Atlas, Sno, LHCb, t2k etc

 The Worldwide LHC Computing Grid (WLCG) is a global computing infrastructure whose mission is to provide computing resources to store, distribute and analyse the data generated by the Large Hadron Collider (LHC), making the data equally available to all partners, regardless of their physical location.  WLCG is spread across 170 computing centres;  ~ 2 million tasks are run every day  ~ 800,000 Computing cores  ~ 900 Petabyte data  LHC experiments are the major users but other Physics and non-Physics groups have also used it extensively.  Oxford is a Tier-2 site and is a small part of the WLCG grid:  3500 Cores  1PB Data  A Grid Certificate (X509) is required to be able to use the Grid.

Particle Physics Computing Overview 40 Oxford Tier 2 Site Part of the UK SouthGrid

SouthGrid Institutions:  University of Oxford  RAL PPD  University of Cambridge  University of Birmingham  University of Bristol  University of Sussex

Particle Physics Computing Overview 41 Oxford Tier 2 Grid Cluster Begbroke

Particle Physics Computing Overview 42 Oxford Tier 2 Grid Cluster Utilisation

Particle Physics Computing Overview 43 Requesting a GRID Certificate

 You request a certificate from http://www.ngs.ac.uk/ukca

 Note: . Please ensure you use the same PC to request and retrieve a certificate.

Particle Physics Computing Overview 44 Requesting a GRID Certificate

 http://www.ngs.ac.uk/ ukca/apply.html

Particle Physics Computing Overview 45 Requesting a GRID Certificate

 https://portal.ca.grid- support.ac.uk/caportal/

Particle Physics Computing Overview 46 Requesting a GRID Certificate

 Email: [email protected] to arrange a suitable time to meet with our Registration Authority (RA) representative. Email: [email protected]  You must take your University card.  The RA checks the PIN that you entered when requesting your certificate. Dear Stuart Robeson,  The RA will check that you are part of Oxford University. Please could you let me know when  If all criteria are validated, the RA will approve is a good time, to come over to the request. Banbury Road IT services office, for  The CA operator will review the approval and you to approve my GRID certificate sign it. request.  You will be informed that your certificate is ready via email which will contain the serial number and instructions to get your Many thanks, certificate.

Particle Physics Computing Overview 47 When You’ve Received Your Grid Certificate

Log on pplxint11 and run:

$mkdir .globus $chmod 700 .globus $cd .globus $openssl pkcs12 -in ../mycert.p12 -clcerts -nokeys -out usercert.pem $openssl pkcs12 -in ../mycert.p12 -nocerts -out userkey.pem $chmod 400 userkey.pem $chmod 444 usercert.pem

Particle Physics Computing Overview 48 Join a Virtual Organisation

 Your grid certificate identifies you to the grid as an individual user, but it's not enough on its own to allow you to run jobs; you also need to join a Virtual Organisation (VO).

 Joining a VO, such as Atlas, lhcb etc, allows you:  To submit jobs using the infrastructure of the experiment  Access data for the experiment

 Every experiment has their own process. Please ask your colleagues on the experiment about this; they will guide you.

Particle Physics Computing Overview 49 Testing Your Certificate

> voms-proxy-init –voms lhcb.cern.ch Enter GRID pass phrase: Your identity: /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=j bloggs Creating temporary proxy ...... Done

 Please consult the documentation provided by your experiment for ‘their’ way to submit and manage grid jobs

Particle Physics Computing Overview 50 Questions?

Particle Physics Computing Overview 51