COMPUTING METHODS IN HIGH ENERGY PHYSICS

S. Lehti and V.Karim¨aki Helsinki Institute of Physics Spring 2010 Lecture 1

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 1) S. Lehti and V.Karimaki¨ Lecture 1 Outline: Practical ’hands-on’ course for comput- • Short review to ,latex,makefile,cvs ing in high energy physics. • Credits: 5op (3ov), 13 lectures + 6 ex- • ++ ercises + home exam. Exercises split in two to have 12 x 1h exercises. • ROOT Recommended prerequisites: Introduc- • Combining programming languages tion to particle physics and programming • Cross section and branching ratio calculations skills. • Event generators Literature: Recommending an advanced • Detector simulation and reconstruction C++ book for reference. • Fast simulations • Grid computing

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 2) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics Unix Shells completion of each stage. The Shell is a program that runs automatically when you log in to a Unix/ system. The Shell Name History Shell forms the interface between users and the sh (Bourne) The original shell rest of the system. It reads each command you csh,,zsh The . Bourne Again Shell, from the type at your terminal and interprets what you have asked for. The Shell is very much like a GNU project More C than csh, also from GNU with features like • variables • control structures (if,while,..) • File Handling • parameter passing Most Unix commands take input from the terminal • interrupt handling keyboard and send output to the terminal monitor. These features provide you with the capability to To redirect the output use > symbol: design your own tools. Shell scripting is ideal for $ > myfiles any small utilities that perform relatively simple Likewise you can redirect the input with the < sym- task, where efficiency is less important than easy bol: configuration, maintenance and portability. You $ wc -l < myfiles can use the shell to organize process control, so The need for becomes more apparent in commands run in predetermined sequence, de- shell programming. pendent on the successful

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 3) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics Shell Programming Pipes Suppose you have a file called ’test’ which contains The standard output of one process (or program) unix commands. There are different ways to get can be the standard input of another process. the system to execute those commands. One is When this is done a ”pipeline” is formed. The giving the filename as an argument to the shell pipe operator is the | symbol. Examples: command, or one can use the command source in $ ls | sort which case the current shell is used $ ls | sort | more $ csh test (or sh test) $ ls | sort | btag | more $ source test Here the standard output from left becomes the One can also give the executing rights to the file standard input to the right of |. and then run it Pipelines provide a flexible and powerful mech- $ chmod 755 test anism for doing jobs easily and quickly, without $ test the need to construct special purpose tools. Ex- Here the number coding in the chmod command isting tools can be combined. Another useful comes from rwxrwxrwx with rwx being binary num- command: ber: rwx=111=7 and -x=101=5. You can check $ find . -name "*.cpp" | xargs the rights of your files with the -l option of the ls grep btag command. This is searching every file in this directory and A shell script starts usually with a line which tells any subdirectory with ending cpp and containing which program is used to execute the file. The line the string btag. looks like this (for ) #!/bin/sh

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 4) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics

Comments start with a # and continue to the • set end of a line. The shell syntax depends on which • set name shell is in use. At CERN people have been us- • set name = word ing widely csh (tcsh), but also bash as the linux • set name = (wordlist) default has gained ground. It’s up to you which • set name[index] = word shell you prefer. The first three forms are used with scalar variables, whereas the last two are used with array variables. csh The setenv command is used to create new environ- Variables are used to hold temporary values and ment variables. Environment variables are passed manage changeable information. There are two to shell scripts and invoked commands, which can types of variables: reference the variables without first defining them. • Shell variables • setenv • Environment variables • setenv name value Shell variables are defined locally in the shell, To obtain the value of a variable a reference whereas environment variables are defined for ${name} is used. Example the shell and all the child processes that are a.out analysis ${CHANNEL} $RUN.out started from it. Here the variables CHANNEL and RUN contains The set command is used to create new local some values. A reference ${#name} returns the variables and assign a value to them. Some dif- number of elements in array, and ${?name} returns ferent methods of invoking the set command are 1 if the variable is set, 0 otherwise. as follows:

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 5) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics

In addition to ordinary variables, a set of special An example csh script variables is available. #!/bin/csh if( ! ${?ENVIRONMENT} ) setenv ENVIRONMENT INTERACTIVE if( $ENVIRONMENT != ”BATCH” ) then Variable Description if( ! ${?SCRATCH} ) then Setting workdir to HOME $0 Shorthand for $argv[0] setenv WORKDIR $HOME else $1,$2,...,$9 Shorthand for $argv[n] echo Setting workdir to SCRATCH setenv WORKDIR $SCRATCH $* Equivalent to $argv[*] endif setenv LS SUBCWD $PWD $$ Process number of the shell endif $< input from file #################################### setenv INPUTFILE analysis.in #setenv DATAPATH $LS SUBCWD/data setenv DATAPATH /mnt/data/data Conditional statements are created with #################################### if( expression ) then else $WORKDIR if( -f $INPUTFILE ) $INPUTFILE endif -f $LS SUBCWD/$INPUTFILE $WORKDIR setenv LD LIBRARY PATH $LS SUBCWD/../lib:${LD LIBRARY PATH}

echo and loops with echo START EXECUTION OF JOB echo foreach name (wordlist) $LS SUBCWD/../bin/${CHANNEL} analysis.exe >& analysis.out if( -f $WORKDIR/analysis.out ) mv $WORKDIR/analysis.out $LS SUBCWD end echo echo JOB FINISHED while ( expression ) echo end exit

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 6) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics

bash test expression In bash the standard sh assignment to set vari- [expression] ables is used The other form of flow control is the case-esac name = value block. Bash supports several types of loops: for, The array variables can be set in two ways, either while, until and select loops. All loops in bash can by setting a single element be exit by giving the built-in break command. name[index] = value or multiple elements at once name = (value1, .., valueN) Perl has become the language of choice for many Exporting variables for use in the environment Unix-based programs, including server support for export name WWW pages. Perl is a simple yet useful program- export name = value ming language that provides the convenience of Arithmetic evaluation is performed when the fol- shell scripts and the power and flexibility of high- lowing form is encountered: $(( expression )). level programming languages. In perl the variable The if statement syntax is can be a string, integer or floating-point number. if list ; then All scalar variables start with the $. The else following assignments are all legal in perl: fi $variable = 1; Most often the list given to an if statement is one $variable = ”my string”; or more test commands, which can be invoked $variable = 3.14; by calling the test as follows:

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 7) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics To do arithmetic in Perl, the following operators An example Perl script #!/usr/local/bin/perl are supported: + - * / ** %. Logical operators ############################################### are divided into two classes, numeric and string. $ibg = 2020; $eventsPerFile = 500; The numeric logical operators are <, >, ==, $allEvents = 100000; <=, >=, !=, ||, && and !. The logical operator $firstEvent = 1; $BatchQueue = ”1nw”; that work with strings are lt, gt, eq, le, ge and $Jobfile = ”offlineAnalysis.job”; ne. The most common assignment operator is $runNumber = 1; =. Autoincrement ++ and decrement - - are ############################################### $pwd = ${’PWD’}; also available. Strings can be combined with . $orca = rindex(”$pwd”,”ORCA”); $slash = index(”$pwd”,”/”,$orca); and .= operators, for example $ORCA Version = substr(”$pwd”,$orca,$slash-$orca); $famos = rindex(”$pwd”,”FAMOS”); $a = ”be” . ”witched”; $slash = index(”$pwd”,”/”,$famos); $a = ”be”; $a .= ”witched”; $FAMOS Version = substr(”$pwd”,$famos,$slash-$famos); $ORCA VERSION = $ORCA Version; The if conditional statement has the following if($famos > $orca) { $ORCA VERSION = $FAMOS Version; structure: if (expr) {..}. Repeating statement } can be made using while and until, looping can print ”Submitting $ORCA VERSION job(s)\n”; be done with the for statement. ###############################################

Shell (system) commands can be run with the for ($i = 1; $i <= $allEvents/$eventsPerFile; $i++) { $LAST = $firstEvent + $eventsPerFile - 1; command system(shell command string) system(”bsub -q $BatchQueue $Jobfile $ORCA VERSION $runNumber $eventsPerFile $firstEvent $LAST $ibg”); $firstEvent = $LAST + 1; More about csh(tcsh),bash and perl: man pages and www like } http://www.mcsr.olemiss.edu/cgi-bin/man-cgi?csh http://www.eng.hawaii.edu/Tutor/csh.html if($firstEvent < $allEvents){ http://www.faqs.org/faqs/unix-faq/shell/csh-whynot $LAST = $allEvents - 1; http://www.gnu.org/software/bash system(”bsub -q $BatchQueue $Jobfile $ORCA VERSION $runNumber $eventsPerFile http://www.perl.com $firstEvent $LAST $ibg”); }

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 8) S. Lehti and V.Karimaki¨ Lecture 1 Short review to Unix basics

Python An example Python script (truncated) #!/usr/bin/env python Python is a dynamic programming language import sys ... used in a wide variety of application domains. ConfigFile = ’TTEffAnalyzer cfg.py’ OutFileName = ’file:tteffAnalysis.root’

root re = re.compile(”(?P<file>’([ˆ’]*\.root)’)”) It is object-oriented, interpreted, and interactive cfg re = re.compile(”(\..*?)$”) job re = re.compile(”(\.py)”) programming language. i = 0 for dline in fileinput.input(): match = root re.search(dline) It has modules, classes, exceptions, very high if match: i = i + 1 level dynamic data types, and dynamic typing. jobNumber = ’ %i’ %(i) cfgFileName = cfg re.sub(”%s\g<1>”%jobNumber, ConfigFile) There are interfaces to many system calls inFileName = match.group(”file”) outFileName = cfg re.sub(”%s\g<1>”%jobNumber, OutFileName) and libraries, as well as to various windowing cfgFile = open(cfgFileName, ’w’) systems. New built-in modules are easily written inFile = open(ConfigFile,’r’) for line in inFile: in C or C++ (or other languages). if ’outputFileName’ in line: cfgFile.write(’ outputFileName = cms.string(”’ + outFileName + ’”)\n’) elif ’process.source = source’ in line: cfgFile.write(’process.source = cms.Source(”PoolSource”,\n’)... A recommended coding style is to use 4-space elif ’input = cms.untracked.int32’ in line: cfgFile.write(’ input = cms.untracked.int32(-1)\n’) indentation, and no tabs else: cfgFile.write(line) inFile.close() cfgFile.close() CMS software (CMSSW) configuration files use jobFileName = job re.sub(”.job”,cfgFileName) Python. jobFile = open(jobFileName, ’w’) jobFile.write(”#!/bin/csh\n”)... jobFile.close() More in http://www.python.org/ cmd = ”bsub -q 1nd ” + jobFileName os.system(cmd)

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 9) S. Lehti and V.Karimaki¨ Lecture 1 CVS - Concurrent Versions System

In HEP experiments the software is typically Some useful commands written by the collaboration members. Some How to login? Here is an example for CMS: part of the software can be commercial pro- Set your CVSROOT grams, but most of the code must be written $ set cvsbase = ":pserver:anonymous from scratch. @cmscvs.cern.ch:/cvs server/ repositories" CVS is a tool to manage the source code. Let’s choose CMSSW, the new detector simulation • to prevent overwriting others’ changes and reconstruction code • keeps record of the versions $ setenv CVSROOT "$cvsbase/CMSSW" • allows users and developers an easy access to releases and prereleases At lxplus.cern.ch this is done by command ’cm- For example almost all of the CMS software is scvsroot CMSSW’, which is a csh script setting available via CVS. Only old software written in the environment variable. FORTRAN is not necessarily in CVS, but as the programs are being rewritten in C++, the $ cvs login (type passwd ’98passwd’) fraction of code not in CVS is decreasing. How to access the head version of the code? Use cvs check-out Documentation for CVS can be found in WWW, $ cvs co IOMC e.g.: How to access a chosen release? http://ximbiot.com/cvs/cvshome $ cvs co -r CMSSW 1 6 7 IOMC

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 10) S. Lehti and V.Karimaki¨ Lecture 1 Git Some useful commands Git is an open source, distributed version control system designed to handle everything from small $ git clone to very large projects with speed and efficiency. $ git commit -a -m "Comment text" $ git add file Git does not use a centralized server. $ git rm file $ git remote add public $HOME/ Projects using Git: Git, Linux Kernel, Perl, public/html/HipProofAnalysis.git Ruby on Rails, Android, WINE, Fedora, X.org, $ git push public VLC, Prototype refs/heads/master:refs/heads/master $ git branch -a Useful links for documentation How to make a public repository (lxplus) http://git-scm.com/ $ cd $HOME/public/html http://jonas.nitro.dk/git/quick-reference.html $ MyGitRepo.git $ cd MyGitRepo.git First you need to introduce yourself to git with $ git --bare init your name and public email address before doing $ chmod a+x hooks/post-update any operation. $ git update-server-info $ git config --global user.name $(git config http.sslVerify false) "Your Name Comes Here" Taking code from others: $ git config --global user.email $ git remote add matti [email protected] $ git fetch matti $ git merge matti/master

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 11) S. Lehti and V.Karimaki¨ Lecture 1 Sidestep to LaTeX LaTeX is the tool to write your papers, and at ences you have not used. This way you can have this stage of your studies you should already the same custom bib file for every paper you write, know how to use it. If not, then learn it now! you just add more references when needed. You LaTeX documentation can be found in literature need two files, a bib file, and a style file. Here is and in the web, google gives you several options an example style file based on JHEP style for getting the documentation. You might try http://cmsdoc.cern.ch/∼slehti/lesHouches.bst e.g. The bib file you have to write yourself, or you can www.ctan.org/tex- use SPIRES automated bibliography generator archive/info/lshort/english/lshort.pdf www.slac.stanford.edu/spires/hep/bib/bibtex.shtml wwwinfo.cern.ch/asdoc/psdir/texacern.ps.gz to make the entry for yourself. Here is an example reference from a bib file: Here we concentrate on two tools which you @Article{L1 TDR, might need in the future: BibTeX and feynMF. title = ”The Level-1 Trigger”, The first is an easy way of managing your journal = ”CERN/LHCC 2000-038”, references, and the second one is a tool to draw volume = ”CMS TDR 6.1”, Feynman graphs. year = ”2000” } BibTeX The citation (\cite{L1 TDR}) in the text works When using BibTeX, LaTex is managing your as usual. In your tex file you need to have bibliography for you. It puts your references in \bibliographystyle{lesHouches}. BibTeX is run like the correct order, and it does not show refer- LaTeX: latex-bibtex-latex-latex-dvips.

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 12) S. Lehti and V.Karimaki¨ Lecture 1 Sidestep to LaTeX

feynMF Example feynMF is a package for easy drawing of The following code produces the figure below: \begin{fmffile}{triangle loop} professional quality Feynman diagrams with \begin{figure}{h} \centering METAFONT. The documentation for feynMF \parbox{50mm}{ \begin{fmfchar*}(100,60) can be found in \fmfleft{i1,i2} \fmfright{o1} \fmf{curly,tension=0.5}{i1,v1} \fmf{curly,tension=0.5}{i2,v2} http://tug.ctan.org/cgi- \fmf{fermion,tension=0.1}{v2,v1} \fmf{fermion,tension=0.3}{v1,v3} bin/ctanPackageInformation.py?id=feynmf \fmf{fermion,tension=0.3}{v3,v2} \fmf{dashes}{v3,o1} \put(0,60){\small g} \put(0,-5){\small g} Instructing LaTeX to use feynMF is done includ- \put(100,28){\small h, H, A} \end{fmfchar*}} ing the feynmf package: (file feynmf.sty needed) \end{figure} \end{fmffile} \usepackage{feynmf} g

Processing your document with LaTeX will gen- erate one or more METAFONT files, which you h, H, A will have to process with METAFONT. The METAFONT file name is given in the document in fmffile environment: g \begin{fmffile}{< MET AF ONT − file >} Run METAFONT (file feynmf.mf needed) ... mf ”\mode=localfont;\input triangle loop.mf;” \end{fmffile} The whole chain is latex-mf-latex-dvips.

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 13) S. Lehti and V.Karimaki¨ Lecture 1 Makefiles

• target lines The make utility automatically determines • shell command lines which pieces of a large program need to be • macro lines recompiled and issues command to recompile • make directive lines (such as include) them. However, make is not limited to pro- grams, it can be used to describe any task Rules where some files must be updated automatically from others whenever the others change. Target lines tell make what can be built. Target lines consist of a list of targets, followed by To run make, you must have a makefile. a colon (:), followed by a list of dependen- Makefile contains the rules which make exe- cies. Although the target list can contain multiple cutes. The default names for the makefiles are targets, typically only one target is listed. Example: GNUmakefile, makefile and Makefile (in this order). If some other name is preferred, it can clean: used with command make -f myMakefile rm -f *.o

A documentation about make can be found in The clean command is executed by typing www.gnu.org/software/make/manual/make.html $ make clean Notice that the shell command line after the semi- A makefile is an ASCII text file containing any colon has a tab in front of the command. If this is of the four types of lines: replaced by blanks, it won’t work! This applies to every line which the target is supposed to execute.

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 14) S. Lehti and V.Karimaki¨ Lecture 1 Makefiles .cc.o: The dependencies are used to ensure that g++ -c $< components are built before the overall exe- main: a.o .o c.o d.o cutable file. Target must never be newer than g++ a.o b.o c.o d.o any of its dependent targets. If any of the dependent targets are newer than the current Here $< is a special built-in macro, which substi- target, the dependent targets must be made, and tutes the current source file in the body of a rule. then the current target must be made. Example: When main is executed, and the timestamp of the object files is newer than that of exe file main, the foo.o: foo.cc the dependent targets a.o b.o c.o d.o are made us- g++ -c foo.cc ing the suffix rule: if the corresponding cc file is newer, new object file is made. When all object If foo.o is newer than foo.cc, nothing is done, files needing updating are made, then the exe file but if foo.cc has been changed so that the main is made. timestamp of file foo.cc is newer than foo.o, then foo.o target is remade. Macros In the above example the object files were typed in One of the powerful features of the make utility is two places. One could use a macro instead: its capability to specify generic targets. Suppose OBJECTS = a.o b.o c.o d.o you have several cc files in your program. Instead main: $(OBJECTS) of writing every one object file a separate rule, g++ $(OBJECTS) on can use a suffix rule:

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 15) S. Lehti and V.Karimaki¨ Lecture 1 Makefiles

Functions An example Makefile A function call syntax is $(ftion args) There are several functions available for various pur- Notice the usage of macros $(CXX) and $(MAKE). poses: text and file name manipulation, condi- What does the @ do? If new line is needed, one can make the break with \. tional functions etc. files = $(wildcard ../src/*.cc ../src/*.cpp *.cc *.cpp) Example: let’s assume that the program con- OBJS = $(addsuffix .o,$( $(files))) taining the source files a.cc, b.cc, c.cc and d.cc OPT = -O -Wall -fPIC -D REENTRANT are located in a directory where there are no INC = -I$(ROOTSYS)/include -I. -I../src \ other cc files. So every file ending .cc must -I../src/HiggsAnalysis/MssmA2tau2l/interface be compiled into the executable. One can use LIBS = -L$(ROOTSYS)/lib -lCore -lCint -lHist -lGraf -lGraf3d -lGpad \ -lTree -lRint -lPostscript -lMatrix -lPhysics -lpthread -lm -ldl \ functions wildcard, basename and addsuffix. -rdynamic

.cc.o: $(CXX) $(OPT) $(INC) -c $ˆ -o $@ files = $(wildcard *.cc) .cpp.o: filebasenames = $(basename $(files)) $(CXX) $(OPT) $(INC) -c $ˆ -o $@ all: OBJECTS = $(addsuffix .o,$(filebasenames)) @$(MAKE) –no-print-directory All All: @$(MAKE) compile; $(MAKE) analysis.exe

Here the function wildcard gives a space sepa- compile: $(OBJS) rated list of any files in the local directory match- analysis.exe: $(OBJS) ing the pattern *.cc. If you now modify your $(CXX) $(LIBS) -O $(OBJS) -o analysis.exe clean: program to include a fifth file e.cc, your Make- rm -f $(OBJS) file will work without modifications.

Spring 2010 COMPUTING METHODS IN HIGH ENERGY PHYSICS (page 16) S. Lehti and V.Karimaki¨ Lecture 1