Sheffield HPC Documentation Release

November 14, 2016

Contents

1 Research Computing Team 3

2 Research Software Engineering Team5

i ii Sheffield HPC Documentation, Release

The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC (Sheffield Advanced Research Computer), is currently under development. It is not yet ready for use.

Contents 1 Sheffield HPC Documentation, Release

2 Contents CHAPTER 1

Research Computing Team

The research computing team are the team responsible for the iceberg service, as well as all other aspects of research computing. If you require support with iceberg, training or software for your workstations, the research computing team would be happy to help. Take a look at the Research Computing website or email research-it@sheffield.ac.uk.

3 Sheffield HPC Documentation, Release

4 Chapter 1. Research Computing Team CHAPTER 2

Research Software Engineering Team

The Sheffield Research Software Engineering Team is an academically led group that collaborates closely with CiCS. They can assist with code optimisation, training and all aspects of High Performance Computing including GPU computing along with local, national, regional and cloud computing services. Take a look at the Research Software Engineering website or email rse@sheffield.ac.uk

2.1 Using the HPC Systems

2.1.1 Getting Started

If you have not used a High Performance Computing (HPC) cluster, or even a command line before this is the place to start. This guide will get you set up using iceberg in the easiest way that fits your requirements.

Getting an Account

Before you can start using iceberg you need to register for an account. Accounts are availible for staff by emailing helpdesk@sheffield.ac.uk. The following categories of students can also have an account on iceberg with the permission of their supervisors: • Research Postgraduates • Taught Postgraduates - project work • Undergraduates 3rd & 4th year - project work Student’s supervisors can request an account for the student by emailing helpdesk@sheffield.ac.uk.

Note: Once you have obtained your iceberg username, you need to initialize your iceberg password to be the same as your normal password by using the CICS synchronize passwords system.

Connecting to iceberg (Terminal)

Accessing iceberg through an ssh terminal is easy and the most flexible way of using iceberg, as it is the native way of interfacing with the linux cluster. It also allows you to access iceberg from any remote location without having to setup VPN connections.

5 Sheffield HPC Documentation, Release

Rest of this page summarizes the recommended methods of accessing iceberg from commonly used platforms. A more comprehensive guide to accessing iceberg and transfering files is located at http://www.sheffield.ac.uk/cics/research/hpc/using/access

Windows

Download and install the Installer edition of mobaxterm. After starting MobaXterm you should see something like this:

Click Start local terminal and if you see something like the the following then please continue to the Connect to iceberg section.

Mac OS/X and Linux

Linux and Mac OS/X both have a terminal emulator program pre-installed. Open a terminal and then go to Connect to iceberg.

Connect to iceberg

Once you have a terminal open run the following command:

6 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

ssh -X @iceberg.shef.ac.uk where you replace with your CICS username. This should give you a prompt resembling the one below: [te1st@iceberg-login2 ~]$ at this prompt type: qsh like this: [te1st@iceberg-login2 ~]$ qsh Your job 135355 ("INTERACTIVE") has been submitted waiting for interactive job to be scheduled .... Your interactive job 135355 has been successfully scheduled. which will pop up another terminal window, which supports graphical applications.

Note: Iceberg is a compute cluster. When you login to the cluster you reach one of two login nodes. You should not run applications on the login nodes. Running qsh gives you an interactive terminal on one of the many worker nodes in the cluster. If you only need terminal based (CLI) applications you can run the qrsh command. Which will give you a shell on a worker node, but without graphical application (X server) support.

What Next?

Now you have connected to iceberg, you can look at how to submit jobs with Iceberg’s Queue System or look at Software on iceberg.

2.1.2 Scheduler qhost qhost is a scheduler command that show’s the status of Sun Grid Engine hosts.

Documentation

Documentation is available on the system using the command: man qhost

Examples

Get an overview of the nodes and CPUS on Iceberg qhost

This shows every node in the cluster. Some of these nodes may be reserved/purchased for specific research groups and hence may not be available for general use.

2.1. Using the HPC Systems 7 Sheffield HPC Documentation, Release

qrsh

qrsh is a scheduler command that requests an interactive session on a worker node. The resulting session will not support graphical applications. You will usually run this command from the head node.

Examples

Request an interactive session that provides the default amount of memory resources qrsh

Request an interactive session that provides 10 Gigabytes of real and virtual memory qrsh -l rmem=10G -l mem=10G

qrshx

qrshx is a scheduler command that requests an interactive session on a worker node. The resulting session will support graphical applications. You will usually run this command from the head node.

Examples

Request an interactive X-Windows session that provides the default amount of memory resources and launch the gedit text editor qrshx gedit

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory and launch the latest version of MATLAB qrshx -l mem=10G -l rmem=10G module load apps/ matlab

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory and 4 CPU cores qrshx -l rmem=10G -l mem=10G -pe openmp 4

Sysadmin notes

qrshx is a Sheffield-developed modification to the standard set of scheduler commands. It is at /usr/local/bin/qrshx and contains the following #!/bin/sh exec qrsh -v DISPLAY -pty y "$@" bash

qsh

qsh is a scheduler command that requests an interactive X-windows session on a worker node. The resulting terminal is not user-friendly and we recommend that you use our qrshx command instead.

8 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Examples

Request an interactive X-Windows session that provides the default amount of memory resources qsh

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory qsh -l rmem=10G -l mem=10G

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory and 4 CPU cores qsh -l rmem=10G -l mem=10G -pe openmp 4 qstat qstat is a scheduler command that displays the status of the queues.

Examples

Display all jobs queued on the system qstat

Display all jobs queued by the username foo1bar qstat -u foo1bar

Display all jobs in the openmp parallel environment stat -pe openmp

Display all jobs in the queue named foobar qstat -q foobar.q qsub qsub is a scheduler command that submits a batch job to the system.

Examples

Submit a batch job called myjob.sh to the system qsub myjob.sh qtop qtop is a scheduler command that provides a summary of all processes running on the cluster for a given user.

2.1. Using the HPC Systems 9 Sheffield HPC Documentation, Release

Examples qtop is only available on the worker nodes. As such, you need to start an interactive session on a worker node using qrsh or qrshx in order to use it. To give a summary of all of your currently running jobs qtop

Summary for job 256127

HOST VIRTUAL-MEM RSS-MEM %CPU %MEM CPUTIME+ COMMAND testnode03 106.22 MB 1.79 MB 0.0 0.0 00:00:00 bash testnode03 105.62 MB 1.27 MB 0.0 0.0 00:00:00 qtop testnode03 57.86 MB 3.30 MB 0.0 0.0 00:00:00 ssh ------TOTAL: 0.26 GB 0.01 GB

Iceberg’s Queue System

To manage use of the Iceberg cluster, there is a queue system (SoGE, a derivative of the Sun Grid Engine). The queue system works by a user requesting some task, either a script or an interactive session, be run on the cluster and then the scheduler will take tasks from the queue based on a set of rules and priorities.

Using Iceberg Interactively

If you wish to use the cluster for interactive use, such as running applications such as MATLAB or Ansys, or compiling software, you will need to request that the scheduler gives you an interactive session. For an introduction to this see Getting Started. There are three commands which give you an interactive shell: • qrsh - Requests an interactive session on a worker node. No support for graphical applications. • qrshx - Requests an interactive session on a worker node. Supports graphical applications. Superior to qsh. • qsh - Requests an interactive session on a worker node. Supports graphical applications. You can configure the resources available to the interactive session by specifying them as command line options to the qsh or qrsh commands. For example to run a qrshx session with access to 16 GB of virtual RAM [te1st@iceberg-login2 ~]$ qrshx -l mem=16G or a session with access to 8 cores: [te1st@iceberg-login2 ~]$ qrshx -pe openmp 8

A table of Common Interactive Job Options is given below, any of these can be combined together to request more resources.

Note: Long running jobs should use the batch submission system rather than requesting an interactive session for a very long time. Doing this will lead to better cluster performance for all users.

10 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Command Description -l Specify the total maximum execution time for the job. h_rt=hh:mm:ss -l mem=xxG Specify the maximum amount (xx) of memory to be used (per process or core). Common Interactive Job Options -pe Specify a parallel environment and number of processors. -pe openmp The openmp parallel environment provides multiple threads on one node. specifies the max number of threads.

Running Batch Jobs on iceberg

The power of iceberg really comes from the ‘batch job’ queue submission process. Using this system, you write a script which requests various resources, initializes the computational environment and then executes your program(s). The scheduler will run your job when resources are available. As the task is running, the terminal output and any errors are captured and saved to disk, so that you can see the output and verify the execution of the task. Any task that can be executed without any user intervention while it is running can be submitted as a batch job to Iceberg. This excludes jobs that require a Graphical User Interface (GUI), however, many common GUI applications such as Ansys or MATLAB can also be used without their GUIs. When you submit a batch job, you provide an executable file that will be run by the scheduler. This is normally a bash script file which provides commands and options to the program you are using. Once you have a script file, or other executable file, you can submit it to the queue by running: qsub myscript.sh

Here is an example batch submission script that runs a fictitious program called foo #!/bin/bash # Request 5 gigabytes of real memory (mem) # and 5 gigabytes of virtual memory (mem) #$ -l mem=5G -l rmem=5G

# load the module for the program we want to run module load apps/gcc/foo

#Run the program foo with input foo.dat #and output foo.res foo < foo.dat > foo.res

Some things to note: • The first line always needs to be #!/bin/bash to tell the scheduler that this is a bash batch script. • Comments start with a # • Scheduler options, such as the amount of memory requested, start with #$ • You will usually require one or more module commands in your submission file. These make programs and libraries available to your scripts. Here is a more complex example that requests more resources #!/bin/bash # Request 16 gigabytes of real memory (mem) # and 16 gigabytes of virtual memory (mem) #$ -l mem=16G -l rmem=16G # Request 4 cores in an OpenMP environment #$ -pe openmp 4

2.1. Using the HPC Systems 11 Sheffield HPC Documentation, Release

# Email notifications to [email protected] #$ -M [email protected] # Email notifications if the job aborts #$ -m a

# load the modules required by our program module load compilers/gcc/5.2 module load apps/gcc/foo

#Set the OPENMP_NUM_THREADS environment variable to 4 export OMP_NUM_THREADS=4

#Run the program foo with input foo.dat #and output foo.res foo < foo.dat > foo.res

Scheduler Options

Com- Description mand -l Specify the total maximum execution time for the job. h_rt=hh:mm:ss -l Specify the maximum amount (xx) of memory to be used. mem=xxG -l Target a node by name. Not recommended for normal use. hostname= -l arch= Target a processor architecture. Options on Iceberg include intel-e5-2650v2 and intel-x5650 -N Job name, used to name output files and in the queue list. -j Join the error and normal output into one file rather than two. -M Email address to send notifications to. -m bea Type of notifications to send. Can be any combination of begin (b) end (e) or abort (a) i.e. -m ea for end and abortion messages. -a Specify the earliest time for a job to start, in the format MMDDhhmm. e.g. -a 01011130 will schedule the job to begin no sooner than 11:30 on 1st January.

All scheduler commands

All available scheduler commands are listed in the Scheduler section.

Frequently Asked SGE Questions

How many jobs can I submit at any one time You can submit up to 2000 jobs to the cluster, and the scheduler will allow up to 200 of your jobs to run simultaneously (we occasionally alter this value depending on the load on the cluster). How do I specify the processor type on Iceberg? Add the following line to your submission script #$ -l arch=intel-e5-2650v2

This specifies nodes that have the Ivybridge E5-2650 CPU. All such nodes on Iceberg have 16 cores.

12 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

To only target the older, 12 core nodes that contain X5650 CPUs add the following line to your submission script #$ -l arch=intel-x5650

How do I specify multiple email addresses for job notifications? Specify each additional email with it’s own -M option #$ -M [email protected] #$ -M [email protected]

How do you ensure that a job starts after a specified time? Add the following line to your submission script #$ -a time but replace time with a time in the format MMDDhhmm For example, for 22nd July at 14:10, you’d do #$ -a 07221410

This won’t guarantee that it will run precisely at this time since that depends on available resources. It will, however, ensure that the job runs after this time. If your resource requirements aren’t too heavy, it will be pretty soon after. When I tried it, it started about 10 seconds afterwards but this will vary. This is a reference section for commands that allow you to interact with the scheduler. • qhost - Show’s the status of Sun Grid Engine hosts. • qrsh - Requests an interactive session on a worker node. No support for graphical applications. • qrshx - Requests an interactive session on a worker node. Supports graphical applications. Superior to qsh. • qsh - Requests an interactive session on a worker node. Supports graphical applications. • qstat - Displays the status of jobs and queues. • qsub - Submits a batch job to the system. • qtop - Provides a summary of all processes running on the cluster for a given user

2.2 Iceberg

2.2.1 Software on iceberg

These pages list the software available on iceberg. If you notice an error or an omission, or wish to request new software please email the research computing team at research-it@sheffield.ac.uk.

Modules on Iceberg

In general the software available on iceberg is loaded and unloaded via the use of the modules system 1. Modules make it easy for us to install many versions of different applications, compilers and libraries side by side and allow users to setup the computing environment to include exactly what they need.

1 http://modules.sourceforge.net/

2.2. Iceberg 13 Sheffield HPC Documentation, Release

Note: Modules are not available on the login node. You must move to a worker node using either qrsh or qsh (see Getting Started) before any of the following commands will work.

Available modules can be listed using the following command: module avail

Modules have names that look like apps/python/2.7. To load this module, you’d do: module load apps/python/2.7

You can unload this module with: module unload apps/python/2.7

It is possible to load multiple modules at once, to create your own environment with just the software you need. For example, perhaps you want to use version 4.8.2 of the gcc compiler along with MATLAB 2014a module load compilers/gcc/4.8.2 module load apps/matlab/2014a

Confirm that you have loaded these modules wih module list

Remove the MATLAB module with module unload apps/matlab/2014a

Remove all modules to return to the base environment module purge

Module Command Reference

Here is a list of the most useful module commands. For full details, type man module at an iceberg command prompt. • module list – lists currently loaded modules • module avail – lists all available modules • module load modulename – loads module modulename • module unload modulename – unloads module modulename • module switch oldmodulename newmodulename – switches between two modules • module initadd modulename – run this command once to automatically ensure that a module is loaded when you log in. (It creates a .modules file in your home dir which acts as your personal configuration.) • module show modulename - Shows how loading modulename will affect your environment • module purge – unload all modules • module help modulename – may show longer description of the module if present in the modulefile • man module – detailed explanation of the above commands and others

14 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Applications on Iceberg

CASTEP

CASTEP Latest Version 16.1 URL http://www.castep.org/

Licensing Only licensed users of CASTEP are entitled to use it and license details are available on CASTEP’s website. Access to CASTEP on the system is controlled using a Unix group. That is, only members of the castep group can access and run the program. To be added to this group, you will need to contact the Iceberg email the team at research-it@sheffield.ac.uk and provide evidence of your eligibility to use CASTEP.

Interactive Usage The serial version of CASTEP should be used for interactive usage. After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command. Make the serial version of CASTEP available using the one of the commands module load apps/intel/15/castep/16.1-serial module load apps/intel/15/castep/8.0-serial

The CASTEP executable is called castep-serial so if you execute castep.serial

You should get the following Usage: castep : Run files .cell [and .param] " [-d|--dryrun] : Perform a dryrun calculation on files .cell " [-s|--search] : print list of keywords with match in description " [-v|--version] : print version information " [-h|--help] : describe specific keyword in <>.cell or <>.param " " all : print list of all keywords " " basic : print list of basic-level keywords " " inter : print list of intermediate-level keywords " " expert : print list of expert-level keywords " " dummy : print list of dummy keywords

If, instead, you get -bash: castep.serial: command not found

It is probably because you are not a member of the castep group. See the section on Licensing above for details on how to be added to this group. Interactive usage is fine for small CASTEP jobs such as the Silicon example given at http://www.castep.org/Tutorials/BasicsAndBonding To run this example, you can do # Get the files, decompress them and enter the directory containing them wget http://www.castep.org/files/Si2.tgz tar -xvzf ./Si2.tgz cd Si2

2.2. Iceberg 15 Sheffield HPC Documentation, Release

#Run the CASTEP job in serial castep.serial Si2

#Read the output using the more command more Si2.castep

CASTEP has a built in help system. To get more information on using castep use castep.serial-help

Alternatively you can search for help on a particular topic castep.serial -help search keyword or list all of the input parameters castep.serial -help search all

Batch Submission - Parallel The parallel version of CASTEP is called castep.mpi. To make the parallel envi- ronment available, use one of the following module commands module load apps/intel/15/castep/16.1-parallel module load apps/intel/15/castep/8.0-parallel

As an example of a parallel submission, we will calculate the bandstructure of graphite following the tutorial at http://www.castep.org/Tutorials/BandStructureAndDOS After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command. Download and decompress the example input files with the commands wget http://www.castep.org/files/bandstructure.tgz tar -xvzf ./bandstructure.tgz

Enter the directory containing the input files for graphite cd bandstructure/graphite/

Create a file called submit.sge that contains the following #!/bin/bash #$ -pe openmpi-ib 4 # Run the calculation on 4 CPU cores #$ -l rmem=4G # Request 4 Gigabytes of real memory per core #$ -l mem=4G # Request 4 Gigabytes of virtual memory per core module load apps/intel/15/castep/16.1-parallel mpirun castep.mpi graphite

Submit it to the system with the command qsub submit.sge

After the calculation has completed, get an overview of the calculation by looking at the file graphite.castep more graphite.castep

Installation Notes These are primarily for system administrators. CASTEP Version 16.1

16 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The jump in version numbers from 8 to 16.1 is a result of CASTEP’s change of version numbering. There are no versions 9-15. Serial (1 CPU core) and Parallel versions of CASTEP were compiled. Both versions were compiled with version 15.0.3 of the Intel Compiler Suite and the Intel MKL versions of BLAS and FFT were used. The parallel version made use of OpenMPI 1.8.8 The Serial version was compiled and installed with module load compilers/intel/15.0.3 install_dir=/usr/local/packages6/apps/intel/15/castep/16.1 mkdir -p $install_dir tar -xzf ./CASTEP-16.1.tar.gz cd CASTEP-16.1

#Compile Serial version make INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10 make INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10 install install-tools

The directory CASTEP-16.1 was then deleted and the parallel version was installed with #!/bin/bash module load libs/intel/15/openmpi/1.8.8 #The above command also loads Intel Compilers 15.0.3 #It also places the MKL in LD_LIBRARY_PATH install_dir=/usr/local/packages6/apps/intel/15/castep/16.1 tar -xzf ./CASTEP-16.1.tar.gz cd CASTEP-16.1

#Workaround for bug described at http://www.cmth.ph.ic.ac.uk/computing/software/castep.html sed 's/-static-intel/-shared-intel/' obj/platforms/linux_x86_64_ifort15.mk -i

#Compile parallel version make COMMS_ARCH=mpi FFT=mkl MATHLIBS=mkl10 mv ./obj/linux_x86_64_ifort15/castep.mpi $install_dir

CASTEP Version 8 Serial (1 CPU core) and Parallel versions of CASTEP were compiled. Both versions were compiled with version 15.0.3 of the Intel Compiler Suite and the Intel MKL versions of BLAS and FFT were used. The parallel version made use of OpenMPI 1.8.8 The Serial version was compiled and installed with module load compilers/intel/15.0.3 install_dir=/usr/local/packages6/apps/intel/15/castep/8.0 tar -xzf ./CASTEP-8.0.tar.gz cd CASTEP-8.0

#Compile Serial version make INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10 make INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10 install install-tools

The directory CASTEP-8.0 was then deleted and the parallel version was installed with #!/bin/bash module load libs/intel/15/openmpi/1.8.8

2.2. Iceberg 17 Sheffield HPC Documentation, Release

#The above command also loads Intel Compilers 15.0.3 #It also places the MKL in LD_LIBRARY_PATH install_dir=/usr/local/packages6/apps/intel/15/castep/8.0 mkdir -p $install_dir tar -xzf ./CASTEP-8.0.tar.gz cd CASTEP-8.0

#Compile parallel version make COMMS_ARCH=mpi FFT=mkl MATHLIBS=mkl10 mv ./obj/linux_x86_64_ifort15/castep.mpi $install_dir

Modulefiles • CASTEP 16.1-serial • CASTEP 16.1-parallel • CASTEP 8.0-serial • CASTEP 8.0-parallel

Testing Version 16.1 Serial The following script was submitted via qsub from inside the build directory: #!/bin/bash #$ -l mem=10G #$ -l rmem=10G module load compilers/intel/15.0.3 cd CASTEP-16.1/Test ../bin/testcode.py -q --total-processors=1 -e /home/fe1mpc/CASTEP/CASTEP-16.1/obj/linux_x86_64_ifort15/castep.serial -c simple -v -v -v

All but one of the tests passed. It seems that the failed test is one that fails for everyone for this version since there is a missing input file. The output from the test run is on the system at /usr/local/packages6/apps/intel/15/castep/16.1/CASTEP_SERIAL_tests_09022016.txt Version 16.1 Parallel The following script was submitted via qsub from inside the build directory #!/bin/bash #$ -pe openmpi-ib 4 #$ -l mem=10G #$ -l rmem=10G module load libs/intel/15/openmpi/1.8.8 cd CASTEP-16.1/Test ../bin/testcode.py -q --total-processors=4 --processors=4 -e /home/fe1mpc/CASTEP/CASTEP-16.1/obj/linux_x86_64_ifort15/castep.mpi -c simple -v -v -v

All but one of the tests passed. It seems that the failed test is one that fails for everyone for this version since there is a missing input file. The output from the test run is on the system at /usr/local/packages6/apps/intel/15/castep/16.1/CASTEP_Parallel_tests_09022016.txt Version 8 Parallel The following script was submitted via qsub

18 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

#!/bin/bash #$ -pe openmpi-ib 4 module load libs/intel/15/openmpi/1.8.8 cd CASTEP-8.0 make check COMMS_ARCH=mpi MAX_PROCS=4 PARALLEL="--total-processors=4 --processors=4"

All tests passed.

GATK

GATK Version 3.4-46 URL https://www.broadinstitute.org/gatk/

The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, devel- oped by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive sesssion with the qsh or qrsh command. The latest version of GATK (currently 3.4-46) is made available with the command module load apps/binapps/GATK

Alternatively, you can load a specific version with module load apps/binapps/GATK/3.4-46 module load apps/binapps/GATK/2.6-5

Version 3.4-46 of GATK also changes the environment to use 1.7 since this is required by GATK 3.4-46. An environment variable called GATKHOME is created by the module command that contains the path to the requested version of GATK. Thus, you can run the program with the command java -jar $GATKHOME/GenomeAnalysisTK.jar -h

Which will give a large amount of help, beginning with the version information ------The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 Copyright (c) 2010 The Broad Institute For support and documentation go to http://www.broadinstitute.org/gatk ------

Documentation The GATK manual is available online https://www.broadinstitute.org/gatk/guide/

Installation notes The entire install is just a .jar file. Put it in the install directory and you’re done.

2.2. Iceberg 19 Sheffield HPC Documentation, Release

Modulefile Version 3.4-46 • The module file is on the system at /usr/local/modulefiles/apps/binapps/GATK/3.4-46 Its contents are #%Module1.0##################################################################### ## ## GATK 3.4-46 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

#This version of GATK needs Java 1.7 module load apps/java/1.7 proc ModulesHelp { } { puts stderr "Makes GATK 3.4-46 available" } set version 3.4-46 set GATK_DIR /usr/local/packages6/apps/binapps/GATK/$version module-whatis "Makes GATK 3.4-46 available" prepend-path GATKHOME $GATK_DIR

Version 2.6-5 • The module file is on the system at /usr/local/modulefiles/apps/binapps/GATK/2.6-5 Its contents are #%Module1.0##################################################################### ## ## GATK 2.6-5 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes GATK 3.4-46 available" } set version 2.6.5 set GATK_DIR /usr/local/packages6/apps/binapps/GATK/$version module-whatis "Makes GATK 2.6-5 available" prepend-path GATKHOME $GATK_DIR

MOMFBD

20 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

MOMFBD Version 2016-04-14 URL http://dubshen.astro.su.se/wiki/index.php?title=MOMFBD

MOMFBD or Multi-Object Multi-Frame Blind Deconvolution is a image processing application for removing the effects of atmospheric seeing from solar image data.

Usage The MOMFBD binaries can be added to your path with module load apps/gcc/5.2/MOMFBD

Installation notes MOMFBD was installed using gcc 5.2. Using this script and is loaded by this modulefle.

STAR

STAR Latest version 2.5.0c URL https://github.com/alexdobin/STAR

Spliced Transcripts Alignment to a Reference.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh command. The latest version of STAR (currently 2.5.0c) is made available with the command module load apps/gcc/5.2/STAR

Alternatively, you can load a specific version with module load apps/gcc/5.2/STAR/2.5.0c

This command makes the STAR binary available to your session and also loads the gcc 5.2 compiler environment which was used to build STAR. Check that STAR is working correctly by displaying the version STAR--version

This should give the output STAR_2.5.0c

Documentation The STAR manual is available online: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Installation notes STAR was installed using gcc 5.2 module load compilers/gcc/5.2 mkdir STAR mv ./STAR-2.5.0c.zip ./STAR

2.2. Iceberg 21 Sheffield HPC Documentation, Release

cd ./STAR/ unzip STAR-2.5.0c.zip cd STAR-2.5.0c make cd source mkdir -p /usr/local/packages6/apps/gcc/5.2/STAR/2.5.0c mv ./STAR /usr/local/packages6/apps/gcc/5.2/STAR/2.5.0c/

Testing No test suite was found.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/STAR/2.5.0c It’s contents #%Module1.0##################################################################### ## ## STAR 2.5.0c module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## module load compilers/gcc/5.2 module-whatis "Makes version 2.5.0c of STAR available" set STAR_DIR /usr/local/packages6/apps/gcc/5.2/STAR/2.5.0c/ prepend-path PATH $STAR_DIR

Abaqus

Abaqus Versions 6.13,6.12 and 6.11 Support Level FULL Dependancies Intel Compiler URL http://www.3ds.com/products-services/simulia/products/abaqus/ Local URL https://www.shef.ac.uk/wrgrid/software/abaqus

Abaqus is a software suite for Finite Element Analysis (FEA) developed by Dassault Systèmes.

Interactive usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh command. Alternatively, if you require more memory, for example 16 gigabytes, use the command qsh -l mem=16G The latest version of Abaqus (currently version 6.13) is made available with the command module load apps/abaqus

22 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Alternatively, you can make a specific version available with one of the following commands module load apps/abaqus/613 module load apps/abaqus/612 module load apps/abaqus/611

After that, simply type abaqus to get the command-line interface to abaqus or type abaqus cae to get the GUI interface.

Abaqus example problems Abaqus contains a large number of example problems which can be used to become familiar with Abaqus on the system. These example problems are described in the Abaqus documentation, and can be obtained using the Abaqus fetch command. For example, after loading the Abaqus module enter the following at the command line to extract the input file for test problem s4d abaqus fetch job=s4d

This will extract the input file s4d.inp, to run the computation defined by this input file replace input=myabaqusjob with input=s4d in the commands and scripts below.

Batch submission of a single core job In this example, we will run the s4d.inp file on a single core using 8 Gigabytes of memory. After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qrsh command. Load version 6.13-3 of Abaqus and fetch the s4d example by running the following commands module load apps/abaqus/613 abaqus fetch job=s4d

Now, you need to write a batch submission file. We assume you’ll call this my_job.sge #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -l rmem=8G #$ -l mem=8G

module load apps/abaqus

abq6133 job=my_job input=s4d.inp scratch=/scratch memory="8gb" interactive

Submit the job with the command qsub my_job.sge Important notes: • We have requested 8 gigabytes of memory in the above job. The memory="8gb" switch tells abaqus to use 8 gigabytes. The #$ -l rmem=8G and #$ -l mem=8G tells the system to reserve 8 gigabytes of real and virtual memory resptively. It is important that these numbers match. • Note the word interactive at the end of the abaqus command. Your job will not run without it.

Batch submission of a single core job with user subroutine In this example, we will fetch a simulation from Abaqus’ built in set of problems that makes use of user subroutines (UMATs) and run it in batch on a single core. After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qrsh command. Load version 6.13-3 of Abaqus and fetch the umatmst3 example by running the following commands module load apps/abaqus/613 abaqus fetch job=umatmst3*

2.2. Iceberg 23 Sheffield HPC Documentation, Release

This will produce 2 files: The input file umatmst3.inp and the Fortran user subroutine umatmst3.f. Now, you need to write a batch submission file. We assume you’ll call this my_user_job.sge #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -l rmem=8G #$ -l mem=8G module load apps/abaqus/613 module load compilers/intel/12.1.15 abq6133 job=my_user_job input=umatmst3.inp user=umatmst3.f scratch=/scratch memory="8gb" interactive

Submit the job with the command qsub my_user_job.sge Important notes: • In order to use user subroutimes, it is necessary to load the module for the intel compiler. • The user-subroutine itself is passed to Abaqus with the switch user=umatmst3.f

AdmixTools

AdmixTools Latest version 4.1 Dependancies Intel compiler 15.0.3 URL https://github.com/DReichLab/AdmixTools

AdmixTools implements five genetics methods described for learning about population mixtures. Details of these methods/algorithms can be found in Patterson et al. (2012) Ancient Admixture in Human History (doi:10.1534/genetics.112.145037).

Usage Load version 4.1 of AdmixTools with the command module load apps/intel/15/AdmixTools/4.1

This module makes the following programs available (by adding their containing directory to the PATH environment variable); programs; these are documented in README files contained in the $ADMIXTOOLSDIR directory. • convertf: See README.CONVERTF for documentation of programs for converting file formats. • qp3Pop: See README.3PopTest for details of running f_3 test. This test can be used as a format test of admixture with 3 populations. • qpBound: See README.3PopTest for details of running qpBound. This test can be used for estimating bounds on the admixture proportions, given 3 populations (2 reference and one target). • qpDstat: See README.Dstatistics for details of running D-statistics. This is a formal test of admixture with 4 populations. • qpF4Ratio: See README.F4RatioTest for details of running F4 ratio estimation. This program computes the admixture proportion by taking the ratio of two f4 tests. • rolloff: See README.ROLLOFF/ README.ROLLOFF_OUTPUT for details for running rolloff. This pro- gram can be used for dating admixture events.

24 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

AdmixTools also comes with several examples. The example inputs and outputs can be found in $ADMIXTOOLSDIR/examples, whilst the example data the inputs reference are contained in $ADMIXTOOLSDIR/data

Installation Notes Version 4.1 This script: 1. Checks out a specific git commit of AdmixTools (def3c5d75d1b10fd3270631f5c64adbf3af04d4d; NB the repos- itory does not contain any tags) 2. Builds GNU Scientific library (GSL) v2.2 and installs into the AdmixTools final installation directory (/usr/local/packages6/apps/intel/15/AdmixTools/4.1) 3. Builds AdmixTools (using v15.0.3 of the Intel Compiler and a customised Makefile (links against Intel libifcore rather than gfortran and uses MKL for BLAS and LAPACK functions) 4. Installs AdmixTools in the aforementioned directory. 5. Downloads the data needed to run the examples then runs the examples 6. Installs this modulefile as /usr/local/modulefiles/apps/intel/15/AdmixTools/4.1

Annovar

Annovar Version 2015DEC14 URL http://annovar.openbioinformatics.org/en/latest/

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. To add the annovar binaries to the system PATH, execute the following command module load apps/binapps/annovar/2015DEC14

Documentation The annovar manual is available online at http://annovar.openbioinformatics.org/en/latest/

Installation notes The install is a collection of executable Perl scripts. Installation involves copying the directory and adding it to the PATH. It seems that annovar uses dates to distinguish between releases rather than version numbers tar -xvzf ./annovar.latest.tar.gz mkdir -p mkdir -p /usr/local/packages6/apps/binapps/annovar mv annovar /usr/local/packages6/apps/binapps/annovar/2015DEC14

2.2. Iceberg 25 Sheffield HPC Documentation, Release

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/binapps/annovar/2015DEC14 Its contents are #%Module1.0##################################################################### ## ## annovar modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes annovar 2015DEC14 available" } set ANNOVAR_DIR /usr/local/packages6/apps/binapps/annovar/2015Dec14 module-whatis "Makes annovar 2015DEC14 available" prepend-path PATH $ANNOVAR_DIR

Ansys

Ansys Version 14 , 14.5 , 15 Support Level FULL Dependencies If using the User Defined Functions (UDF) will also need the following: For Ansys Mechanical, Workbench, CFX and AutoDYN : INTEL 14.0 or above Compiler For Fluent : GCC 4.6.1 or above URL http://www.ansys.com/en_uk Local URL http://www.shef.ac.uk/cics/research/software/fluent

ANSYS suite of programs can be used to numerically simulate a large variety of structural and fluid dynamics problems found in many engineering, physics, medical, aeronotics and automative industry applications

Interactive usage After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qsh command. Alternatively, if you require more memory, for example 16 gigabytes, use the command qsh -l rmem=16G To make the latest version of Ansys available, run the following module command module load apps/ansys

Alternatively, you can make a specific version available with one of the following commands module load apps/ansys/14 module load apps/ansys/14.5 module load apps/ansys/15.0

26 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

After loading the modules, you can issue the following commands to run an ansys product. The teaching versions mentioned below are installed for use during ANSYS and FLUENT teaching labs and will only allow models of upto 500,000 elements. ansyswb : to run Ansys workbench ansys : to run Ansys Mechanical outside the workbench ansystext: to run line-mode version of ansys Ansys and Ansystext : to run the teaching license version of the above two commands. fluent : to run Fluent outside the workbench fluentext: to run the line-mode version of Fluent Fluent and Fluentext : to run the teaching license version of the above two commands. icemcfx or icem: to run icemcfd outside the workbench.

Running Batch fluent and ansys jobs The easiest way of running batch ansys and fluent jobs is as follows: module load {version_you_require} for example: module load apps/ansys/15.0 followed by runfluent or runansys runfluent and runansys command submits a fluent journal or ansys input file into the batch system and can take a number of different parameters, according to your requirements. runfluent command Just typing runfluent will display information on how to use it. Usage: runfluent [2d,2ddp,3d or 3ddp] fluent_journal_file -time hh:mm:ss [-mem=nn] [-rmem=nn] [-mail your_email_address] [-nq] [-parallel nprocs][optional_extra_fluent_params]. Where all but the first two parameters are optional. First parameter [2d , 2ddp , etc ] is the dimensionality of the problem. Second parameter, fluent_journal_file, is the file containing the fluent commands. Other 'optional' parameters are: -time hh:mm:ss is the cpu time needed in hours:minutes:seconds -mem=nn is the virtual memory needed (Default=8G). Example: -mem 12G (for 12 GBytes) -rmem=nn is the real memory needed.(Default=2G). Example: -rmem 4G (for 4 GBytes) -mail email_address. You will receive emails about the progress of your job. Example:-mail [email protected] -nq is an optional parameter to submit without confirming -parallel nprocs : Only needed for parallel jobs to specify the no.of processors. -project project_name : The job will use a project allocation. fluent_params : any parameter not recognised will be passed to fluent itself.

Example: runfluent 3d nozzle.jou -time 00:30:00 -mem=10G Fluent journal files are essentially a sequence of Fluent Commands you would have entered by starting fluent in non-gui mode. Here is an example journal file: /file/read-case test.cas /file/read-data test.dat /solve iter 200 /file/write-data testv5b.dat yes /exit yes

Note that there can be no graphics output related commands in the journal file as the job will be run in batch mode. Please see fluent documents for further details of journal files and how to create them.

2.2. Iceberg 27 Sheffield HPC Documentation, Release

By using the -g parameter, you can startup an interactive fluent session in non-gui mode to experiment. For example- fluent 3d -g runansys command RUNANSYS COMMAND SUBMITS ANSYS JOBS TO THE SUN GRID ENGINE Usage: runansys ansys_inp_file [-time hh:mm:ss][-mem=nn] [-rmem=nn] [-parallel n] [-project proj_name] [-mail email_address] [-fastdata] [other qsub parameters] Where; ansys_inp_file is a file containing a series of Ansys commands. -time hh:mm:ss is the cpu time needed in hours:minutes:seconds, if not specified 1 hour will be assumed. -mem=nn is the virtual memory requirement. -rmem=nn is the real memory requirement. -parallel n request an n-way parallel ansys job -gpu use GPU. Note for GPU users: -mem= must be greater than 18G. -project project_name : The job will use a project's allocation. -mail your_email_address : Job progress report is emailed to you. -fastdata use /fastdata/$USER/$JOB_ID as the working directory

As well as time and memory, any other valid qsub parameter can be specified.Particularly users of UPF functions will need to specify -v ANS_USER_PATH=the_working_directory All parameters except the ansys_inp file are optional. Output files created by Ansys take their names from the jobname specified by the user. You will be prompted for a jobname as well as any other startup parameter you wish to pass to Ansys Example: runansys test1.dat -time 00:30:00 -mem 8G -rmem=3G -mail [email protected] bcbio

bcbio Latest version 0.9.6a Dependancies gcc 5.2, 3.2.1, Anaconda Python 2.3 URL http://bcbio-nextgen.readthedocs.org/en/latest/

A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.

Usage Load version 0.9.6a of bcbio with the command module load apps/gcc/5.2/bcbio/0.9.6a

There is also a development version of bcbio installed on iceberg. This could change without warning and should not be used for production module load apps/gcc/5.2/bcbio/devel

These module commands add bcbio commands to the PATH, load any supporting environments and correctly configure the system for bcbio usage. Once the module is loaded you can, for example, check the version of bcbio

28 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

bcbio_nextgen.py -v

/usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/anaconda/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter. warnings.warn(self.msg_depr % (key, alt_key)) 0.9.6a

To check how the loaded version of bcbio has been configured more $BCBIO_DIR/config/install-params.yaml

At the time of writing, the output from the above command is aligners: - bwa - bowtie2 - rtg - hisat2 genomes: - hg38 - hg19 - GRCh37 isolate: true tooldir: /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/tools toolplus: []

Example batch submission TODO

Integration with SGE TODO

Installation Notes These are primarily for system administrators. 0.9.6a Version 0.9.6a was installed using gcc 5.2, R 3.2.1 and Anaconda Python 2.3. The install was performed in two parts. The first step was to run the SGE script below in batch mode. Note that the install often fails due to external services being flaky. See https://github.com/rcgsheffield/sheffield_hpc/issues/219 for details. Depending on the reason for the failure, it should be OK to simply restart the install. This particular install was done in one-shot...no restarts necessary. • install_bcbio_0.96a The output from this batch run can be found in /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/install_output/ Once the install completed, the module file (see Modulefile section) was created and loaded and the following upgrades were performed bcbio_nextgen.py upgrade --toolplus gatk=./GenomeAnalysisTK.jar bcbio_nextgen.py upgrade --genomes hg38 --aligners hisat2

The GATK .jar file was obtained from https://www.broadinstitute.org/gatk/download/ A further upgrade was performed on 13th January 2016. STAR had to be run directly because the bcbio upgrade command that made use of it kept stalling ( bcbio_nextgen.py upgrade –data –genomes GRCh37 –aligners bwa – aligners star ). We have no idea why this made a difference but at least the direct STAR run could make use of multiple cores whereas the bcbio installer only uses 1

2.2. Iceberg 29 Sheffield HPC Documentation, Release

#!/bin/bash #$ -l rmem=3G -l mem=3G #$ -P radiant #$ -pe openmp 16 module load apps/gcc/5.2/bcbio/0.9.6a STAR --genomeDir /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Hsapiens/GRCh37/star --genomeFastaFiles /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --runThreadN 16 --runMode genomeGenerate --genomeSAindexNbases 14 bcbio_nextgen.py upgrade --data --genomes GRCh37 --aligners bwa

Another upgrade was performed on 25th February 2016 module load apps/gcc/5.2/bcbio/0.9.6a bcbio_nextgen.py upgrade -u stable --data --genomes mm10 --aligners star --aligners bwa

As is usually the case for us, this stalled on the final STAR command. The exact call to STAR was found in /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/star/Log.out and run manually in a 16 core OpenMP script: STAR --runMode genomeGenerate --runThreadN 16 --genomeDir /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/star --genomeFastaFiles /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/seq/mm10.fa --genomeSAindexNbases 14 --genomeChrBinNbits 14

This failed (see https://github.com/rcgsheffield/sheffield_hpc/issues/272). The fix was to add the line index mm10 /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/seq/mm10.fa to the file usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/galaxy/tool-data/sam_fa_indices.loc

Update: 14th March 2016 Another issue required us to modify /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/seq/mm10- resources.yaml so that it read version: 16 aliases: snpeff: GRCm38.82 ensembl: mus_musculus_vep_83_GRCm38 variation: dbsnp: ../variation/mm10-dbSNP-2013-09-12.vcf.gz lcr: ../coverage/problem_regions/repeats/LCR.bed.gz rnaseq: transcripts: ../rnaseq/ref-transcripts.gtf transcripts_mask: ../rnaseq/ref-transcripts-mask.gtf transcriptome_index: tophat: ../rnaseq/tophat/mm10_transcriptome.ver dexseq: ../rnaseq/ref-transcripts.dexseq.gff3 refflat: ../rnaseq/ref-transcripts.refFlat rRNA_fa: ../rnaseq/rRNA.fa srnaseq: srna-transcripts: ../srnaseq/srna-transcripts.gtf mirbase-hairpin: ../srnaseq/hairpin.fa mirbase-mature: ../srnaseq/hairpin.fa mirdeep2-fasta: ../srnaseq/Rfam_for_miRDeep.fa

Development version

30 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The development version was installed using gcc 5.2, R 3.2.1 and Anaconda Python 2.3. • install_bcbio_devel.sge This is a SGE submit script. The long running time of the installer made it better-suited to being run as a batch job. • bcbio-devel modulefile located on the system at /usr/local/modulefiles/apps/gcc/5.2/bcbio/devel The first install attempt failed with the error To debug, please try re-running the install command with verbose output: export CC=${CC:-`which gcc`} && export CXX=${CXX:-`which g++`} && export SHELL=${SHELL:-/bin/bash} && export PERL5LIB=/usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/lib/perl5:${PERL5LIB} && /usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/bin/brew install -v --env=inherit --ignore-dependencies git Traceback (most recent call last): File "bcbio_nextgen_install.py", line 276, in main(parser.parse_args(), sys.argv[1:]) File "bcbio_nextgen_install.py", line 46, in main subprocess.check_call([bcbio["bcbio_nextgen.py"], "upgrade"] + _clean_args(sys_argv, args, bcbio)) File "/usr/local/packages6/apps/binapps/anacondapython/2.3/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/local/packages6/apps/gcc/5.2/bcbio/devel/anaconda/bin/bcbio_nextgen.py', 'upgrade', '--tooldir=/usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools', '--isolate', '--genomes', 'GRCh37', '--aligners', 'bwa', '--aligners', 'bowtie2', '--data']' returned non-zero exit status 1

I manually ran the command export CC=${CC:-`which gcc`} && export CXX=${CXX:-`which g++`} && export SHELL=${SHELL:-/bin/bash} && export PERL5LIB=/usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/lib/perl5:${PERL5LIB} && /usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/bin/brew install -v --env=inherit --ignore-dependencies git and it completed successfully. I then resubmitted the submit script which eventually completed successfully. It took several hours! At this point, I created the module file. Bcbio was upgraded to the development version with the following interactive commands module load apps/gcc/5.2/bcbio/devel bcbio_nextgen.py upgrade -u development

The GATK .jar file was obtained from https://www.broadinstitute.org/gatk/download/ and installed to bcbio by running the following commands interactively module load apps/gcc/5.2/bcbio/devel bcbio_nextgen.py upgrade --tools --toolplus gatk=./cooper/GenomeAnalysisTK.jar

Module files • 0.9.6a

Testing Version 0.9.6a The following test script was submitted to the system as an SGE batch script #!/bin/bash #$ -pe openmp 12 #$ -l mem=4G #Per Core! #$ -l rmem=4G #Per Core! module add apps/gcc/5.2/bcbio/0.9.6a git clone https://github.com/chapmanb/bcbio-nextgen.git cd bcbio-nextgen/tests ./run_tests.sh devel ./run_tests.sh rnaseq

The tests failed due to a lack of pandoc

2.2. Iceberg 31 Sheffield HPC Documentation, Release

[2016-01-07T09:40Z] Error: pandoc version 1.12.3 or higher is required and was not found. [2016-01-07T09:40Z] Execution halted [2016-01-07T09:40Z] Skipping generation of coverage report: Command 'set -o pipefail; /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/anaconda/bin/Rscript /data/fe1mpc/bcbio-nextgen/tests/test_automated_ou tput/report/qc-coverage-report-run.R Error: pandoc version 1.12.3 or higher is required and was not found. Execution halted ' returned non-zero exit status 1

The full output of this testrun is on the system at /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/tests/7-jan-2016/ Pandoc has been added to the list of applications that need to be installed on iceberg. Development version The following test script was submitted to the system. All tests passed. The output is at /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/tests/tests_07_01_2016/

#!/bin/bash #$ -pe openmp 12 #$ -l mem=4G #Per Core! #$ -l rmem=4G #Per Core! module add apps/gcc/5.2/bcbio/0.9.6a git clone https://github.com/chapmanb/bcbio-nextgen.git cd bcbio-nextgen/tests ./run_tests.sh devel ./run_tests.sh rnaseq

bcl2fastq

bcl2fastq Versions 1.8.4 Support Level Bronze URL http://support.illumina.com/downloads/bcl2fastq_conversion_software_184.html

Illumina sequencing instruments generate per-cycle BCL basecall files as primary sequencing output, but many down- stream analysis applications use per-read FASTQ files as input. bcl2fastq combines these per-cycle BCL files from a run and translates them into FASTQ files. bcl2fastq can begin bcl conversion as soon as the first read has been completely sequenced.

Usage To make bcl2fastq available, use the following module command in your submission scripts module load apps/bcl2fastq/1.8.4

Installation Notes These notes are primarily for system administrators. Compilation was done using gcc 4.4.7. I tried it with gcc 4.8 but ended up with a lot of errors. The package is also dependent on Perl. Perl 5.10.1 was used which was the system Perl installed at the time. The RPM Perl-XML-Simple also needed installing. export TMP=/tmp export SOURCE=${TMP}/bcl2fastq

32 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

export BUILD=${TMP}/bcl2fastq-1.8.4-build mkdir -p /usr/local/packages6/apps/gcc/4.4.7/bcl2fastq/1.8.4 export INSTALL=/usr/local/packages6/apps/gcc/4.4.7/bcl2fastq/1.8.4 mv bcl2fastq-1.8.4.tar.bz2 ${TMP} cd ${TMP} tar xjf bcl2fastq-1.8.4.tar.bz2 mkdir ${BUILD} cd ${BUILD} ${SOURCE}/src/configure --prefix=${INSTALL} make make install bedtools

bedtools Versions 2.25.0 Dependancies compilers/gcc/5.2 URL https://bedtools.readthedocs.org/en/latest/

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks.

Interactive usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh command. The latest version of bedtools (currently version 2.25.0) is made available with the command: module load apps/gcc/5.2/bedtools

Alternatively, you can make a specific version available: module load apps/gcc/5.2/bedtools/2.25.0

After that any of the bedtools commands can be run from the prompt.

Installation notes bedtools was installed using gcc 5.2 with the script install_bedtools.sh

Testing No test suite was found.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bedtools/2.25.0 The contents of the module file is #%Module1.0##################################################################### ## ## bedtools module file ##

2.2. Iceberg 33 Sheffield HPC Documentation, Release

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global bedtools-version

puts stderr " Adds `bedtools-$bedtools-version' to your PATH environment variable and necessary libraries" } set bedtools-version 2.25.0 module load compilers/gcc/5.2 prepend-path PATH /usr/local/packages6/apps/gcc/5.2/bedtools/2.25.0/bin

bitseq

bitseq Versions 0.7.5 URL http://bitseq.github.io/

Transcript isoform level expression and differential expression estimation for RNA-seq

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh command module load apps/gcc/5.2/bitseq

Alternatively, you can load a specific version with module load apps/gcc/5.2/bitseq/0.7.5

This command adds the BitSeq binaries to your PATH.

Documentation Documentation is available online http://bitseq.github.io/howto/index

Installation notes BitSeq was installed using gcc 5.2 module load compilers/gcc/5.2 tar -xvzf ./BitSeq-0.7.5.tar.gz cd BitSeq-0.7.5 make cd .. mkdir -p /usr/local/packages6/apps/gcc/5.2/bitseq mv ./BitSeq-0.7.5 /usr/local/packages6/apps/gcc/5.2/bitseq/

Testing No test suite was found.

34 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bitseq/0.7.5. It’s contents #%Module1.0##################################################################### ## ## bitseq 0.7.5 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

module load compilers/gcc/5.2

proc ModulesHelp { } { puts stderr "Makes bitseq 0.7.5 available" }

set version 0.7.5 set BITSEQ_DIR /usr/local/packages6/apps/gcc/5.2/bitseq/BitSeq-$version

module-whatis "Makes bitseq 0.7.5 available"

prepend-path PATH $BITSEQ_DIR

BLAST

ncbi-blast Version 2.3.0 URL https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit. The BLAST+ applications have a number of performance and feature improvements over the legacy BLAST applications. For details, please see the BLAST+ user manual.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx or qrsh command. The latest version of BLAST+ (currently 2.3.0) is made available with the command module load apps/binapps/ncbi-blast

Alternatively, you can load a specific version with module load apps/binapps/ncbi-blast/2.3.0

This command makes the BLAST+ executables available to your session by adding the install directory to your PATH variable. It also sets the BLASTDB database environment variable. You can now run commands directly. e.g. blastn-help

2.2. Iceberg 35 Sheffield HPC Documentation, Release

Databases The following databases have been installed following a user request • nr.*tar.gz Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq • nt.*tar.gz Partially non-redundant nucleotide sequences from all traditional divisions of GenBank, EMBL, and DDBJ excluding GSS,STS, PAT, EST, HTG, and WGS. A full list of databases available on the NCBI FTP site is at https://ftp.ncbi.nlm.nih.gov/blast/documents/blastdb.html If you need any of these installing, please make a request on our github issues log.

Installation notes This was an install from binaries #get binaries and put in the correct location wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.3.0+-x64-linux.tar.gz tar -xzf ./ncbi-blast-2.3.0+-x64-linux.tar.gz mkdir -p /usr/local/packages6/apps/binapps/ncbi-blast/ mv ncbi-blast-2.3.0+ /usr/local/packages6/apps/binapps/ncbi-blast/

#Create database directory mkdir -p /usr/local/packages6/apps/binapps/ncbi-blast/ncbi-blast-2.3.0+/db

#Install the nr database cd /usr/local/packages6/apps/binapps/ncbi-blast/ncbi-blast-2.3.0+/db for num in `seq 0 48`; do paddednum=`printf "%02d" $num` `wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.$paddednum.tar.gz` done

#Install the nt database for num in `seq 0 36`; do paddednum=`printf "%02d" $num` `wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.$paddednum.tar.gz` done for f in *.tar.gz; do tar -xvzf $f; done

Testing No testing has been performed. If you can suggest a suitable test suite, please contact us.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/binapps/ncbi-blast/2.3.0 The contents of the module file is #%Module1.0##################################################################### ## ## BLAST 2.3.0 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } {

36 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

puts stderr "Makes BLAST 2.3.0 available" } set BLAST_DIR /usr/local/packages6/apps/binapps/ncbi-blast/ncbi-blast-2.3.0+ module-whatis "Makes BLAST 2.3.0 available" prepend-path PATH $BLAST_DIR/bin prepend-path BLASTDB $BLAST_DIR/db

bless

Bless Version 1.02 URL http://sourceforge.net/p/bless-ec/wiki/Home/

BLESS: Bloom-filter-based Error Correction Solution for High-throughput Sequencing Reads

Interactive Usage Bless uses MPI and we currently have no interactive MPI environments available. As such, it is not possible to run Bless interactively on Iceberg.

Batch Usage The latest version of bless (currently 1.02) is made available with the command module load apps/gcc/4.9.2/bless

Alternatively, you can load a specific version with module load apps/gcc/4.9.2/bless/1.02

The module create the environment variable BLESS_PATH which points to the Bless installation directory. It also loads the dependent modules • compilers/gcc/4.9.2 • mpi/gcc/openmpi/1.8.3 Here is an example batch submission script that makes use of two example input files we have included in our instal- lation of BLESS #!/bin/bash #Next line is memory per slot #$ -l mem=5G -l rmem=5G #Ask for 4 slots in an OpenMP/MPI Hybrid queue #These 4 slots will all be on the same node #$ -pe openmpi-hybrid-4 4

#load the module module load apps/gcc/4.9.2/bless/1.02

#Output information about nodes where code is running cat $PE_HOSTFILE > nodes data_folder=$BLESS_PATH/test_data file1=test_small_1.fq

2.2. Iceberg 37 Sheffield HPC Documentation, Release

file2=test_small_2.fq

#Number of cores per node we are going to use. cores=4 export OMP_NUM_THREADS=$cores

#Run BLESS mpirun $BLESS_PATH/bless -kmerlength 21 -smpthread $cores -prefix test_bless -read1 $data_folder/$file1 -read2 $data_folder/$file2

BLESS makes use of both MPI and OpenMP parallelisation frameworks. As such, it is necessary to use the hybrid MPI/OpenMP queues. The current build of BLESS does not work on more than one node. This will limit you to the maximum number of cores available on one node. For example, to use all 16 cores on a 16 core node you would request the following parallel environment #$ -pe openmpi-hybrid-16 16

Remember that memory is allocated on a per-slot basis. You should ensure that you do not request more memory than is available on a single node or your job will be permanently stuck in a queue-waiting (qw) status.

Installation notes Various issues were encountered while attempting to install bless. See https://github.com/rcgsheffield/sheffield_hpc/issues/143 for details. It was necessary to install gcc 4.9.2 in or- der to build bless. No other compiler worked! Here are the install steps tar -xvzf ./bless.v1p02.tgz mkdir -p /usr/local/modulefiles/apps/gcc/4.9.2/bless/ cd v1p02/

Load Modules module load compilers/gcc/4.9.2 module load mpi/gcc/openmpi/1.8.3

Modify the Makefile. Change the line cd zlib; ./compile to cd zlib;

Manually compile zlib cd zlib/ ./compile

Finish the compilation cd .. make

Copy the bless folder to the central location cd .. cp -r ./v1p02/ /usr/local/packages6/apps/gcc/4.9.2/bless/

38 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Testing No test suite was found.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/4.9.2/bless/1.02 • The module file is on github.

Bowtie2

bowtie2 Versions 2.2.26 URL http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive sesssion with the qsh or qrsh command. The latest version of bowtie2 (currently 2.2.26) is made available with the command module load apps/gcc/5.2/bowtie2

Alternatively, you can load a specific version with module load apps/gcc/5.2/bowtie2/2.2.6

This command makes the bowtie2 executables available to your session by adding the install directory to your PATH variable. This allows you to simply do something like the following bowtie2--version which gives results that looks something like /usr/local/packages6/apps/gcc/5.2/bowtie2/2.2.6/bowtie2-align-s version 2.2.6 64-bit Built on node063 Fri Oct 23 08:40:38 BST 2015 Compiler: gcc version 5.2.0 (GCC) Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Installation notes bowtie2 2.2.6 was installed using gcc 5.2 #Build module load compilers/gcc/5.2 unzip bowtie2-2.2.6-source.zip cd bowtie2-2.2.6 make

2.2. Iceberg 39 Sheffield HPC Documentation, Release

#Install cd .. mkdir -p /usr/local/packages6/apps/gcc/5.2/bowtie2 mv ./bowtie2-2.2.6 /usr/local/packages6/apps/gcc/5.2/bowtie2/2.2.6

Testing No test suite was found.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bowtie2/2.2.6 The contents of the module file is #%Module1.0##################################################################### ## ## bowtie2 2.2.6 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ## module load compilers/gcc/5.2 proc ModulesHelp { } { puts stderr "Makes bowtie 2.2.6 available" } set version 2.2.6 set BOWTIE2_DIR /usr/local/packages6/apps/gcc/5.2/bowtie2/$version module-whatis "Makes bowtie2 v2.2.6 available" prepend-path PATH $BOWTIE2_DIR

bwa

bwa Versions 0.7.12 URL http://bio-bwa.sourceforge.net/

BWA (Burrows-Wheeler Aligner) is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx command.

40 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The latest version of bwa (currently 0.7.12) is made available with the command module load apps/gcc/5.2/bwa

Alternatively, you can load a specific version with module load apps/gcc/5.2/bwa/0.7.12 module load apps/gcc/5.2/bwa/0.7.5a

This command makes the bwa binary available to your session.

Documentation Once you have made bwa available to the system using the module command above, you can read the man pages by typing man bwa

Installation notes bwa 0.7.12 bwa 0.7.12 was installed using gcc 5.2 module load compilers/gcc/5.2

#build module load compilers/gcc/5.2 tar -xvjf ./bwa-0.7.12.tar.bz2 cd bwa-0.7.12 make

#Sort out manfile mkdir -p share/man/man1 mv bwa.1 ./share/man/man1/

#Install mkdir -p /usr/local/packages6/apps/gcc/5.2/bwa/ cd .. mv bwa-0.7.12 /usr/local/packages6/apps/gcc/5.2/bwa/0.7.12/ bwa 0.7.5a bwa 0.7.5a was installed using gcc 5.2 module load compilers/gcc/5.2

#build module load compilers/gcc/5.2 tar -xvjf bwa-0.7.5a.tar.bz2 cd bwa-0.7.5a make

#Sort out manfile mkdir -p share/man/man1 mv bwa.1 ./share/man/man1/

#Install mkdir -p /usr/local/packages6/apps/gcc/5.2/bwa/ cd .. mv bwa-0.7.5a /usr/local/packages6/apps/gcc/5.2/bwa/

2.2. Iceberg 41 Sheffield HPC Documentation, Release

Testing No test suite was found.

Module files The default version is controlled by the .version file at /usr/local/modulefiles/apps/gcc/5.2/bwa/.version #%Module1.0##################################################################### ## ## version file for bwa ## set ModulesVersion "0.7.12"

Version 0.7.12 • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bwa/0.7.12 • On github: 0.7.12. Version 0.7.5a • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bwa/0.7.5a • On github: 0.7.5a.

CDO

CDO Version 1.7.2 URL https://code.zmaw.de/projects/cdo

CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. Supported data formats are GRIB 1/2, netCDF 3/4, SERVICE, EXTRA and IEG. There are more than 600 operators available.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. To add the cdo command to the system PATH, execute the following command module load apps/gcc/5.3/cdo/1.7.2

Documentation The CDO manual is available online at https://code.zmaw.de/projects/cdo/embedded/index.html

Installation notes Installation (1) Download the latest current version:

wget https://code.zmaw.de/attachments/download/12350/cdo-current.tar.gz

(2) Extract the files into a working directory ( I have used /data/cs1ds/2016/cdo ) :

gunzip cdo-current.tar.gz tar -xvf cdo-current.tar

(3) Install the program ( I have used /usr/local/extras/CDO ):

42 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load compilers/gcc/5.3 ./configure --prefix=/usr/local/extras/CDO --with-netcdf=/usr/local/extras/netcdf/4.3.2 make install

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.3/cdo Its contents are #%Module1.0##################################################################### ## ## CDO Climate Data Operators Module file. ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes Climate Data Operator $version available" } set version 1.7.2 set BIN_DIR /usr/local/extras/CDO/$version module-whatis "Makes CDO (Climate Data Operators) $version available" prepend-path PATH $BIN_DIR/bin

Code Saturne 4.0

Code Saturne Version 4.0 URL http://code-saturne.org/cms/ Documentation http://code-saturne.org/cms/documentation Location /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0

Code_Saturne solves the Navier-Stokes equations for 2D, 2D-axisymmetric and 3D flows, steady or unsteady, laminar or turbulent, incompressible or weakly dilatable, isothermal or not, with scalars transport if required.

Usage To make code saturne available, run the following module command after starting a qsh session. module load apps/code_saturne/4.0.0

Troubleshooting If you run Code Saturne jobs from /fastdata they will fail since the underlying filesystem used by /fastdata does not support posix locks. If you need more space than is available in your home directory, use /data instead. If you use /fastdata with Code Saturne you may get an error message similar to the one below File locking failed in ADIOI_Set_lock(fd 14,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26. - If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). - If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.

2.2. Iceberg 43 Sheffield HPC Documentation, Release

ADIOI_Set_lock:: Function not implemented ADIOI_Set_lock:offset 0, length 8 ------MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. ------solver script exited with status 1. Error running the calculation. Check Code_Saturne log (listing) and error* files for details.

Installation Notes These notes are primarily for system administrators. Installation notes for the version referenced by module module load apps/code_saturne/4.0.0: Pre-requisites: This version of Code Saturne was built with the following:- • gcc 4.4.7 • cgns 3.2.1 • MED 3.0.8 • OpenMPI 1.8.3 • HDF5 1.8.14 built using gcc module load libs/gcc/4.4.7/cgns/3.2.1 tar -xvzf code_saturne-4.0.0.tar.gz mkdir code_saturn_build cd code_saturn_build/ ./../code_saturne-4.0.0/configure --prefix=/usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0 --with-mpi=/usr/local/mpi/gcc/openmpi/1.8.3/ --with-med=/usr/local/packages6/libs/gcc/4.4.7/med/3.0.8/ --with-cgns=/usr/local/packages6/libs/gcc/4.4.7/cgnslib/3.2.1 --with-hdf5=/usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/

This gave the following configuration Configuration options: use debugging code: no MPI (Message Passing Interface) support: yes OpenMP support: no

The package has been configured. Type: make make install

To generate and install the PLE package

Configuration options: use debugging code: no use malloc hooks: no use graphical user interface: yes use long integers: yes Zlib (gzipped file) support: yes MPI (Message Passing Interface) support: yes MPI I/O support: yes MPI2 one-sided communication support: yes OpenMP support: no

44 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

BLAS (Basic Linear Algebra Subprograms) support: no Libxml2 (XML Reader) support: yes ParMETIS (Parallel Graph Partitioning) support: no METIS (Graph Partitioning) support: no PT-SCOTCH (Parallel Graph Partitioning) support: no SCOTCH (Graph Partitioning) support: no CCM support: no HDF (Hierarchical Data Format) support: yes CGNS (CFD General Notation System) support: yes MED (Model for Exchange of Data) support: yes MED MPI I/O support: yes MEDCoupling support: no Catalyst (ParaView co-processing) support: no EOS support: no freesteam support: no SALOME GUI support: yes SALOME Kernel support: yes Dynamic loader support (for YACS): dlopen

I then did make make install

Post Install Steps To make Code Saturne aware of the SGE system: • Created /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0/etc/code_saturne.cfg: See code_saturne.cfg 4.0 • Modified /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0/share/code_saturne/batch/batch.SGE. See: batch.SGE 4.0

Testing This module has not been yet been properly tested and so should be considered experimental. Several user’s jobs up to 8 cores have been submitted and ran to completion.

Module File Module File Location: /usr/local/modulefiles/apps/code_saturne/4.0.0 #%Module1.0##################################################################### ## ## code_saturne 4.0 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global code-saturneversion

puts stderr " Adds `code_saturn-$codesaturneversion' to your PATH environment variable and necessary libraries" } set codesaturneversion 4.0. module load mpi/gcc/openmpi/1.8.3 module-whatis "loads the necessary `code_saturne-$codesaturneversion' library paths"

2.2. Iceberg 45 Sheffield HPC Documentation, Release

set cspath /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0 prepend-path MANPATH $cspath/share/man prepend-path PATH $cspath/bin

ctffind

ctffind Versions 4.1.5 3.140609 URL http://grigoriefflab.janelia.org/ctf

CTFFIND4, CTFFIND3 and CTFTILT are programs for finding CTFs of electron micrographs. The programs CTFFIND3 and CTFFIND4 are updated versions of the program CTFFIND2, which was developed in 1998 by Niko- laus Grigorieff at the MRC Laboratory of Molecular Biology in Cambridge, UK with financial support from the MRC. This software is licensed under the terms of the Janelia Research Campus Software Copyright 1.1.

Making ctffind available To activate and use CTFFIND4, run: module load apps/gcc/4.9.2/ctffind/4.1.5

This makes the following programs available in your session: • ctffind • ctffind_plot_results.sh To activate and use CTFFIND3, run: module load apps/gcc/4.4.7/ctffind/3.140609

This makes the following programs available in your session: • ctffind3.exe • ctffind3_mp.exe • ctftilt.exe • ctftilt_mp.exe CTFFIND4 should run significantly faster than CTFFIND3 and may give slightly improved results when processing data from detectors other than scanned photographic film.

Installation notes These are primarily for system administrators. Version 4.1.5 This version was built using GCC 4.9.2, the Intel Math Kernel Library (MKL) for fast fourier transforms and the wxWidgets toolkit for its user interface. It also has run-time dependencies on all three of those software packages. It was installed as an optional dependency of Relion 2. 1. Download, patch, configure, compile and install using this script 2. Install this modulefile as /usr/local/modulefiles/apps/gcc/4.9.2/ctffind/4.1.5

46 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Version 3.140609 This version was built using GCC 4.4.7. 1. Download, configure, compile and install using install_ctffind.sh 2. Install this modulefile as /usr/local/modulefiles/apps/gcc/4.4.7/ctffind/3.140609

Cufflinks

Cufflinks Version 2.2.1 URL http://cole-trapnell-lab.github.io/cufflinks

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of tran- scripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. The latest version of Cufflinks (currently 2.2.1) is made available with the command module load apps/binapps/cufflinks

Alternatively, you can load a specific version with module load apps/binapps/cufflinks/2.2.1

This command makes the cufflinks binary directory available to your session by adding it to the PATH environment variable.

Documentation The Cufflinks manual is available online at http://cole-trapnell-lab.github.io/cufflinks/manual/

Installation notes A binary install was used tar -xvzf cufflinks-2.2.1.Linux_x86_64.tar.gz mkdir -p /usr/local/packages6/apps/binapps/cufflinks mv cufflinks-2.2.1.Linux_x86_64 /usr/local/packages6/apps/binapps/cufflinks/2.2.1

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/binapps/cufflinks/2.2.1 Its contents are ## cufflinks 2.2.1 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

2.2. Iceberg 47 Sheffield HPC Documentation, Release

proc ModulesHelp { } { puts stderr "Makes cufflinks 2.2.1 available" }

set version 2.2.1 set CUFF_DIR /usr/local/packages6/apps/binapps/cufflinks/$version

module-whatis "Makes cufflinks 2.2.1 available"

prepend-path PATH $CUFF_DIR

ffmpeg

ffmpeg Version 2.8.3 URL https://www.ffmpeg.org/

FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. No matter if they were designed by some standards committee, the community or a corporation. It is also highly portable: FFmpeg compiles, runs, and passes our testing infrastructure FATE across Linux, Mac OS X, , the BSDs, Solaris, etc. under a wide variety of build environments, machine architectures, and configurations.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. The latest version of ffmpeg (currently 2.8.3) is made available with the command module load apps/gcc/5.2/ffmpeg

Alternatively, you can load a specific version with module load apps/gcc/5.2/ffmpeg/2.8.3

This command makes the ffmpeg binaries available to your session. It also loads version 5.2 of the gcc compiler environment since gcc 5.2 was used to compile ffmpeg 2.8.3 You can now run ffmpeg. For example, to confirm the version loaded ffmpeg-version

and to get help ffmpeg-h

Documentation Once you have made ffmpeg available to the system using the module command above, you can read the man pages by typing man ffmpeg

Installation notes ffmpeg was installed using gcc 5.2

48 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load compilers/gcc/5.2 tar xf ./ffmpeg-2.8.3.tar.xz cd ffmpeg-2.8.3 mkdir -p /usr/local/packages6/apps/gcc/5.2/ffmpeg/2.8.3 ./configure --prefix=/usr/local/packages6/apps/gcc/5.2/ffmpeg/2.8.3 make make install

Testing The test suite was executed make check

All tests passed.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/ffmpeg/2.8.3 • The module file is on github.

Freesurfer

Freesurfer Versions 5.3.0 URL http://freesurfer.net/

An open source software suite for processing and analyzing (human) brain MRI images.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive graphical session with the qsh command. The latest version of Freesurfer is made available with the command module load apps/binapps/

Alternatively, you can load a specific version with module load apps/binapps/freesurfer/5.3.0

Important note Freesurfer is known to produce differing results when used across platforms. See http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038234 for details. We recommend that you run all of your analyses for any given study on the same /hardware combination and record details of these as part of your results.

Testing No formal test suite was found. A set of commands to try were suggested at https://surfer.nmr.mgh.harvard.edu/fswiki/TestingFreeSurfer (Accessed November 6th 2015) The following were tried

2.2. Iceberg 49 Sheffield HPC Documentation, Release

freeview -v $SUBJECTS_DIR/bert/mri/brainmask.mgz -v $SUBJECTS_DIR/bert/mri/aseg.mgz:colormap=lut:opacity=0.2 -f $SUBJECTS_DIR/bert/surf/lh.white:edgecolor=yellow -f $SUBJECTS_DIR/bert/surf/rh.white:edgecolor=yellow -f $SUBJECTS_DIR/bert/surf/lh.pial:annot=aparc:edgecolor=red -f $SUBJECTS_DIR/bert/surf/rh.pial:annot=aparc:edgecolor=red tkmedit bert orig.mgz tkmedit bert norm.mgz -segmentation aseg.mgz $FREESURFER_HOME/FreeSurferColorLUT.txt tksurfer bert rh pial qdec

Installation notes These are primarily for administrators of the system. The github issue for the original install request is at https://github.com/rcgsheffield/sheffield_hpc/issues/164 Freesurfer was installed as follows wget -c ftp://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/5.3.0/freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gz tar -xvzf ./freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gz mv freesurfer 5.3.0 mkdir -p /usr/local/packages6/apps/binapps/freesurfer mv ./5.3.0/ /usr/local/packages6/apps/binapps/freesurfer/

The license file was obtained from the vendor and placed in /usr/local/packages6/apps/binapps/freesurfer/license.txt. The vendors were asked if it was OK to use this license on a central system. The answer is ‘yes’ - details at http://www.mail-archive.com/[email protected]/msg43872.html To find the information necessary to create the module file env >base.env source /usr/local/packages6/apps/binapps/freesurfer/5.3.0/SetUpFreeSurfer.sh env > after.env diff base.env after.env

The script SetUpFreeSurfer.sh additionally creates a MATLAB startup.m file in the user’s home directory if it does not already exist. This is for MATLAB support only, has not been replicated in the module file and is currently not supported in this install. The module file is at /usr/local/modulefiles/apps/binapps/freesurfer/5.3.0 #%Module10.2#####################################################################

## Module file logging source /usr/local/etc/module_logging.tcl ##

# Freesurfer version (not in the user's environment) set ver 5.3.0 proc ModulesHelp { } { global ver

puts stderr "Makes Freesurfer $ver available to the system." } module-whatis "Sets the necessary Freesurfer $ver paths" prepend-path PATH /usr/local/packages6/apps/binapps/freesurfer/$ver/bin prepend-path FREESURFER_HOME /usr/local/packages6/apps/binapps/freesurfer/$ver

# The following emulates the results of 'source $FREESURFER_HOME/SetUpFreeSurfer.csh'

50 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

setenv FS_OVERRIDE 0 setenv PERL5LIB /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/lib/perl5/5.8.5 setenv OS Linux setenv LOCAL_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/local setenv FSFAST_HOME /usr/local/packages6/apps/binapps/freesurfer/5.3.0/fsfast setenv MNI_PERL5LIB /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/lib/perl5/5.8.5 setenv FMRI_ANALYSIS_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/fsfast setenv FSF_OUTPUT_FORMAT nii.gz setenv MINC_BIN_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/bin setenv SUBJECTS_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/subjects prepend-path PATH /usr/local/packages6/apps/binapps/freesurfer/5.3.0/fsfast/bin:/usr/local/packages6/apps/binapps/freesurfer/5.3.0/tktools:/usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/bin setenv FUNCTIONALS_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/sessions setenv MINC_LIB_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/lib setenv MNI_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni #setenv FIX_VERTEX_AREA #How do you set this to the empty string? This was done in the original script. setenv MNI_DATAPATH /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/data

gemini

gemini Versions 0.18.0 URL http://gemini.readthedocs.org/en/latest/

GEMINI (GEnome MINIng) is a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample phenotypes and geno- types, as well as genome annotations into an integrated database framework, GEMINI provides a simple, flexible, and powerful system for exploring genetic variation for disease and population genetics.

Making gemini available The following module command makes the latest version of gemini available to your session module load apps/gcc/4.4.7/gemini

Alternatively, you can make a specific version available module load apps/gcc/4.4.7/gemini/0.18

Installation notes These are primarily for system administrators. • gemini was installed using the gcc 4.4.7 compiler • install_gemini.sh

Testing The test script used was • test_gemini.sh The full output from the test run is on the system at /usr/local/packages6/apps/gcc/4.4.7/gemini/0.18/test_results/ There was one failure

2.2. Iceberg 51 Sheffield HPC Documentation, Release

effstring.t02...\c ok effstring.t03...\c 15d14 < gene impact impact_severity biotype is_exonic is_coding is_lof 64a64 > gene impact impact_severity biotype is_exonic is_coding is_lof fail updated 10 variants annotate-tool.t1 ... ok updated 10 variants annotate-tool.t2 ... ok

This was reported to the developers who indicated that it was nothing to worry about

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/gemini/0.18 The contents of the module file is #%Module1.0##################################################################### ## ## Gemini module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global bedtools-version

puts stderr " Adds `gemini-$geminiversion' to your PATH environment variable" } set geminiversion 0.18 prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/gemini/0.18/bin/

GROMACS

GROMACS Latest version 5.1 Dependancies mpi/intel/openmpi/1.10.0 URL http://www.gromacs.org/

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. To make GROMACS available in this session, run one of the following command: source /usr/local/packages6/apps/intel/15/gromacs/5.1/bin/GMXRC

Installation notes

52 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Version 5.1 Compilation Choices: • Use latest intel compilers • -DGMX_MPI=on • -DCMAKE_PREFIX_PATH=/usr/local/packages6/apps/intel/15/gromcas • -DGMX_FFT_LIBRARY=”fftw3” • -DGMX_BUILD_OWN_FFTW=ON The script used to build gromacs can be found here. htop

htop Versions 2.0 URL http://hisham.hm/htop/

This is htop, an interactive process viewer for Unix systems.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command. The latest version of htop (currently 2.0) is made available with the command module load apps/gcc/4.4.7/htop

Alternatively, you can load a specific version with module load apps/gcc/4.4.7/htop/2.0

This command makes the htop binary available to your session.

Installation notes htop was installed using gcc 4.4.7 tar -xvzf ./htop-2.0.0.tar.gz cd htop-2.0.0 mkdir -p /usr/local/packages6/apps/gcc/4.4.7/htop/2.0 ./configure --prefix=/usr/local/packages6/apps/gcc/4.4.7/htop/2.0 make make install

Testing No test suite was found.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/htop/2.0 • The module file is on github.

2.2. Iceberg 53 Sheffield HPC Documentation, Release

IDL

IDL Version 8.5 Support Level extras Dependancies java URL http://www.exelisvis.co.uk/ProductsServices/IDL.aspx Documentation http://www.exelisvis.com/docs/using_idl_home.html

IDL is a language that first appeared in 1977.

Usage If you wish to use the IDLDE then you may need to request more memory for the interactive session using something like qsh -l mem=8G. IDL can be activated using the module file: module load apps/idl/8.5 then run using idl or idlde for the interactive development environment.

Installation notes Extract the supplied linux x86-64 tar file, run the install shell script and point it at the install directory.

Integrative Genomics Viewer (IGV)

Integrative Genomics Viewer (IGV) Version 2.3.63 URL https://www.broadinstitute.org/igv/

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an graphical interactive session with the qsh command. In testing, we determined that you need to request at least 7 Gigabytes of memory to launch IGV qsh -l mem=7G

The latest version of IGV (currently 2.3.63) is made available with the command module load apps/binapps/igv

Alternatively, you can load a specific version with module load apps/binapps/igv/2.3.63

This command makes the IGV binary directory available to your session by adding it to the PATH environment variable. Launch the applications with the command

54 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

igv.sh

Documentation The IGV user guide is available online at https://www.broadinstitute.org/igv/UserGuide

Installation notes A binary install was used unzip IGV_2.3.63.zip mkdir -p /usr/local/packages6/apps/binapps/IGV mv ./IGV_2.3.63 /usr/local/packages6/apps/binapps/IGV/2.3.63 rm /usr/local/packages6/apps/binapps/IGV/2.3.63/igv.bat

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/binapps/igv/2.3.63 Its contents #%Module1.0##################################################################### ## ## IGV 2.3.63 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

proc ModulesHelp { } { puts stderr "Makes IGV 2.3.63 available" }

set version 2.3.63 set IGV_DIR /usr/local/packages6/apps/binapps/IGV/$version

module-whatis "Makes IGV 2.3.63 available"

prepend-path PATH $IGV_DIR

ImageJ

ImageJ Version 1.50g URL http://imagej.nih.gov/ij/

ImageJ is a public domain, Java-based image processing program developed at the National Institutes of Health.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx command. The latest version of ImageJ (currently 1.50g) is made available with the command

2.2. Iceberg 55 Sheffield HPC Documentation, Release

module load apps/binapps/

Alternatively, you can load a specific version with module load apps/binapps/imagej/1.50g

This module command also changes the environment to use Java 1.8 and Java 3D 1.5.2 An environment variable called IMAGEJ_DIR is created by the module command that contains the path to the requested version of ImageJ. You can start ImageJ, with default settings, with the command imagej

This starts imagej with 512Mb of Java heap space. To request more, you need to run the Java .jar file directly and use the -Xmx Java flag. For example, to request 1 Gigabyte java -Xmx1G -Dplugins.dir=$IMAGEJ_DIR/plugins/ -jar $IMAGEJ_DIR/ij.jar

You will need to ensure that you’ve requested enough virtual memory from the scheduler to support the above request. For more details, please refer to the Virtual Memory section of our Java documentation.

Documentation Links to documentation can be found at http://imagej.nih.gov/ij/download.html

Installation notes Install version 1.49 using the install_imagej.sh script Run the GUI and update to the latest version by clicking on Help > Update ImageJ Rename the run script to imagej since run is too generic. Modify the imagej script so that it reads java -Xmx512m -jar -Dplugins.dir=/usr/local/packages6/apps/binapps/imagej/1.50g/plugins/ /usr/local/packages6/apps/binapps/imagej/1.50g/ij.jar

ImageJ’s 3D plugin requires Java 3D which needs to be included in the JRE used by imagej. I attempted a module-controlled version of Java3D but the plugin refused to recognise it. See https://github.com/rcgsheffield/sheffield_hpc/issues/279 for details. The plug-in will install Java3D within the JRE for you if you launch it and click Plugins>3D>3D Viewer

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/binapps/imagej/1.50g Its contents are #%Module1.0##################################################################### ## ## ImageJ 1.50g modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

module load apps/java/1.8u71

proc ModulesHelp { } { puts stderr "Makes ImageJ 1.50g available" }

56 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

set version 1.50g set IMAGEJ_DIR /usr/local/packages6/apps/binapps/imagej/$version module-whatis "Makes ImageJ 1.50g available" prepend-path PATH $IMAGEJ_DIR prepend-path CLASSPATH $IMAGEJ_DIR/ prepend-path IMAGEJ_DIR $IMAGEJ_DIR

R (Intel Build)

R Dependencies BLAS URL http://www.r-project.org/ Documentation http://www.r-project.org/

R is a statistical computing language. This version of R is built using the Intel compilers and the Intel Math Kernel Library. This combination can result in significantly faster runtimes in certain circumstances. Most R extensions are written and tested for the gcc suite of compilers so it is recommended that you perform testing before switching to this version of R.

CPU Architecture The Intel build of R makes use of CPU instructions that are only present on the most modern of Iceberg’s nodes. In order to use them, you should add the following to your batch submission scripts #$ -l arch=intel-e5-2650v2

If you do not do this, you will receive the following error message when you try to run R Please verify that both the operating system and the processor support Intel(R) F16C and AVX instructions.

Loading the modules After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. There are two types of the Intel builds of R, sequential and parallel. sequential makes use of one CPU core and can be used as a drop-in replacement for the standard version of R installed on Iceberg. module load apps/intel/15/R/3.3.1_sequential

The parallel version makes use of multiple CPU cores for certain linear algebra routines since it is linked to the parallel version of the Intel MKL. Note that only linear algebra routines are automatically parallelised. module load apps/intel/15/R/3.3.1_parallel

When using the parallel module, you must also ensure that you set the bash environment variable OMP_NUM_THREADS to the number of cores you require and also use the openmp parallel environment. E.g. Add the following to your submission script #$ -pe openmp 8 export OMP_NUM_THREADS=8 module load apps/intel/15/R/3.3.1_parallel

2.2. Iceberg 57 Sheffield HPC Documentation, Release

Example batch jobs Here is how to run the R script called linear_algebra_bench.r from the HPC Examples github repository #!/bin/bash #This script runs the linear algebra benchmark multiple times using the intel-compiled version of R #that's linked with the sequential MKL #$ -l mem=8G -l rmem=8G # Target the Ivy Bridge Processors #$ -l arch=intel-e5-2650v2

module load apps/intel/15/R/3.3.1_sequential

echo "Intel R with sequential MKL on intel-e5-2650v2" Rscript linear_algebra_bench.r output_data.rds

Here is how to run the same code using 8 cores #!/bin/bash #$ -l mem=3G -l rmem=3G # Memory per core # Target the Ivy Bridge Processors #$ -l arch=intel-e5-2650v2 #$ -pe openmp 8 export OMP_NUM_THREADS=8

module load apps/intel/15/R/3.3.1_parallel

echo "Intel R with parallel MKL on intel-e5-2650v2" echo "8 cores" Rscript inear_algebra_bench.r 8core_output_data.rds

Installing additional packages By default, the standard version of R allows you to install packages into the location ~/R/x86_64-unknown-linux-gnu-library/3.3/, where ~ refers to your home directory. To ensure that the Intel builds do not contaminate the standard gcc builds, the Intel R module files set the environment variable R_LIBS_USER to point to ~/R/intel_R/3.3.1 As a user, you should not need to worry about this detail and just install packages as you usuall would from within R. e.g. install.packages("dplyr")

The Intel build of R will ignore any packages installed in your home directory for the standard version of R and vice versa

Installation Notes These notes are primarily for administrators of the system. version 3.3.1 This was a scripted install. It was compiled from source with Intel Compiler 15.0.3 and with –enable-R-shlib enabled. It was run in batch mode. This build required several external modules including xz utils, curl, bzip2 and zlib • install_intel_r_sequential.sh Downloads, compiles, tests and installs R 3.3.1 using Intel Compilers and the sequential MKL. The install and test logs are at /usr/local/packages6/apps/intel/15/R/sequential- 3.3.1/install_logs/ • install_intel_r_parallel.sh Downloads, compiles, tests and installs R 3.3.1 using Intel Compilers and the parallel MKL. The install and test logs are at /usr/local/packages6/apps/intel/15/R/sequential-3.3.1/install_logs/

58 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

• 3.3.1_parallel Parallel Module File • 3.3.1_sequential Sequential Module File

JAGS

JAGS Latest version 4.2 URL http://mcmc-jags.sourceforge.net/

JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind: • To have a cross-platform engine for the BUGS language • To be extensible, allowing users to write their own functions, distributions and samplers. • To be a platform for experimentation with ideas in Bayesian modeling

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx or qrsh command. To make JAGS available in this session, run one of the following module commands module load apps/gcc/4.8.2/JAGS/4.2 module load apps/gcc/4.8.2/JAGS/3.4 module load apps/gcc/4.8.2/JAGS/3.1

You can now run the jags command jags Welcome to JAGS 4.2.0 on Thu Jun 30 09:21:17 2016 JAGS is free software and comes with ABSOLUTELY NO WARRANTY Loading module: basemod: ok Loading module: bugs: ok .

The rjags and runjags interfaces in R rjags and runjags are CRAN packages that provide an R interface to jags. They are not installed in R by default. After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. Run the following module commands module load compilers/gcc/4.8.2 module load apps/gcc/4.8.2/JAGS/4.2 module load apps/R/3.3.0

Launch R by typing R and pressing return. Within R, execute the following commands install.packages('rjags') install.packages('runjags')

and follow the on-screen inctructions. Answer y to any questions about the creation of a personal library should they be asked. The packages will be stored in a directory called R within your home directory.

2.2. Iceberg 59 Sheffield HPC Documentation, Release

You should only need to run the install.packages commands once. When you log into the system in future, you will only need to run the module commands above to make JAGS available to the system. You load the rjags packages the same as you would any other R package library('rjags') library('runjags')

If you received an error message such as Error : .onLoad failed in loadNamespace() for 'rjags', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared object '/home/fe1mpc/R/x86_64-unknown-linux-gnu-library/3.2/rjags/libs/rjags.so': libjags.so.3: cannot open shared object file: No such file or directory Error: package or namespace load failed for 'rjags' the most likely cause is that you forget to load the necessary modules before starting R.

Installation notes Version 4.2 JAGS 4.2 was built with gcc 4.8.2 • Install script on github - https://github.com/mikecroucher/HPC_Installers/blob/master/apps/jags/4.2.0/sheffield/iceberg/install_jags4.2.0.sh • Module file on github - https://github.com/mikecroucher/HPC_Installers/blob/master/apps/jags/4.2.0/sheffield/iceberg/4.2 Version 3.4 JAGS 3.4 was built with gcc 4.8.2 module load compilers/gcc/4.8.2 tar -xvzf ./JAGS-3.4.0.tar.gz cd JAGS-3.4.0 mkdir -p /usr/local/packages6/apps/gcc/4.8.2/JAGS/3.4 ./configure --prefix=/usr/local/packages6/apps/gcc/4.8.2/JAGS/3.4 make make install

Version 3.1 JAGS 3.1 was built with gcc 4.8.2 module load compilers/gcc/4.8.2 tar -xvzf ./JAGS-3.1.0.tar.gz cd JAGS-3.1.0 mkdir -p /usr/local/packages6/apps/gcc/4.8.2/JAGS/3.1 ./configure --prefix=/usr/local/packages6/apps/gcc/4.8.2/JAGS/3.1 make make install

Java

Java Latest Version 1.8u71 URL https://www.java.com/en/download/

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

60 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command. The latest version of Java (currently 1.8u71) is made available with the command module load apps/java

Alternatively, you can load a specific version with one of module load apps/java/1.8u71 module load apps/java/1.7.0u55 module load apps/java/1.7 module load apps/java/1.6

Check that you have the version you expect. First, the runtime java -version

java version "1.8.0_71" Java(TM) SE Runtime Environment (build 1.8.0_71-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)

Now, the compiler javac -version

javac 1.8.0_71

Virtual Memory By default, Java requests a lot of virtual memory on startup. This is usually a given fraction of the physical memory on a node, which can be quite a lot on Iceberg. This then exceeds a user’s virtual memory limit set by the scheduler and causes a job to fail. See What is rmem ( real_memory) and mem ( virtual_memory) for explanations of the difference between virtual and real/physical memory. To fix this, we have created a wrapper script to Java that uses the -Xmx1G switch to force Java to only request one Gigabyte of memory for its heap size. If this is insufficient, you are free to allocate as much memory as you require but be sure to request enough from the scheduler as well. You’ll typically need to request more virtual memory from the scheduler than you specify in the Java -Xmx switch. For example, consider the following submission script. Note that it was necessary to request 9 Gigabytes of memory from the scheduler even though we only allocated 5 Gigabytes heap size in Java. The requirement for 9 gigabytes was determined empirically #!/bin/bash #Request 9 gigabytes of real memory from the scheduler (mem) #and 9 gigabytes of virtual memory from the scheduler (mem) #$ -l mem=9G -l rmem=9G

# load the Java module module load apps/java/1.8u71

#Run java program allocating 5 Gigabytes java -Xmx5G HelloWorld

Installation notes These are primarily for administrators of the system. Unzip and copy the install directory to /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/ To fix the virtual memory issue described above, we use a wrapper around the java install that sets Java’s Xmx parameter to a reasonable value.

2.2. Iceberg 61 Sheffield HPC Documentation, Release

Create the file /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/shef/java with contents #!/bin/bash # # Java version 1.8 cannot be invoked without specifying the java virtual # machine size due to the limitations imposed by us via SGE on memory usage. # Therefore this script intercepts the java invocations and adds a # memory constraint parameter to java engine unless there was one already # specified on the command parameter. # # if test -z "`echo $* | grep -e -Xmx`"; then # user has not specified -Xmx memory requirement flag, so add it. /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/bin/java -Xmx1G $* else # user specified the -Xmx flag, so don't add it. /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/bin/java $* fi

The module file is at /usr/local/modulefiles/apps/java/1.8u71. It’s contents are #%Module10.2#####################################################################

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global helpmsg puts stderr "\t$helpmsg\n" } set version 1.8 set javahome /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/ if [ file isdirectory $javahome/bin ] { module-whatis "Sets JAVA to version $version" set helpmsg "Changes the default version of Java to Version $version" # bring in new version setenv JAVA_HOME $javahome prepend-path PATH $javahome/bin prepend-path PATH $javahome/shef prepend-path MANPATH $javahome/man } else { module-whatis "JAVA $version not installed" set helpmsg "JAVA $version not installed" if [ expr [ module-info mode load ] || [ module-info mode display ] ] { # bring in new version puts stderr "JAVA $version not installed on [uname nodename]" } }

Julia

62 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Julia Versions 0.5.0-rc3 URL http://julialang.org/

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with either the or qrsh or qrshx commands. The latest version of Julia (currently 0.5.0-rc3) is made available with the commands module load compilers/gcc/5.2 module load apps/gcc/5.2/julia/0.5.0-rc3

This adds Julia to your PATH and also loads the gcc 5.2 compiler environment with which Julia was built. Start Julia by executing the command julia

You can exit a Julia session with the quit() function.

Installation notes These are primarily for administrators of the system. Julia was installed using gcc 5.2 module load apps/gcc/5.2/git/2.5 module load compilers/gcc/5.2 module load apps/python/anaconda2-2.5.0 git clone git://github.com/JuliaLang/julia.git cd julia git checkout release-0.5 #The next line targets the NEHALEM CPU architecture. This is the lowest architecture available on #Iceberg and so the resulting binary will be supported on all nodes. Performance will not be as good as it #could be on modern nodes. sed -i s/OPENBLAS_TARGET_ARCH:=/OPENBLAS_TARGET_ARCH:=NEHALEM/ ./Make.inc make

Module File The module file is as below #%Module10.2#################################################################### #

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global ver

puts stderr " Adds Julia $ver to your environment variables." }

# Mathematica version (not in the user's environment)

2.2. Iceberg 63 Sheffield HPC Documentation, Release

set ver 0.5.0-rc3 module-whatis "sets the necessary Julia $ver paths" prepend-path PATH /usr/local/packages6/apps/gcc/5.2/julia/0.5.0-rc3

Knime

Knime URL https://www.knime.org Documentation http://tech.knime.org/documentation

KNIME Analytics Platform is an open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. The latest version of Knime including all of the extensions can be loaded with module load apps/knime

Alternatively, you can load a specific version of Knime excluding all of the extensions using one of the following module load apps/knime/3.1.2

With extensions module load apps/knime/3.1.2ext

Knime can then be run with $ knime

Batch usage There is a command line option allowing the user to run KNIME in batch mode: knime -application org.knime.product.KNIME_BATCH_APPLICATION -nosplash -workflowDir=""

• -application org.knime.product.KNIME_BATCH_APPLICATION launches the KNIME batch application. • -nosplash does not show the initial splash window. • -workflowDir=”” provides the path of the directory containing the workflow. Full list of options: • -nosave do not save the workflow after execution has finished • -reset reset workflow prior to execution • -failonloaderror don’t execute if there are errors during workflow loading • -updateLinks update meta node links to latest version • -credential=name[;login[;password]] for each credential enter credential name and optional login/password

64 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

• -masterkey[=...] prompt for master password (used in e.g. database nodes),if provided with argument, use argument instead of prompting • -preferences=... path to the file containing /knime preferences, • -workflowFile=... ZIP file with a ready-to-execute workflow in the root of the ZIP • -workflowDir=... directory with a ready-to-execute workflow • -destFile=... ZIP file where the executed workflow should be written to if omitted the workflow is only saved in place • -destDir=... directory where the executed workflow is saved to if omitted the workflow is only saved in place • -workflow.variable=name,value,type define or overwrite workflow variable’name’ with value ‘value’ (possibly enclosed by quotes). The’type’ must be one of “String”, “int” or “double”. The following return codes are defined: • 0 upon successful execution • 2 if parameters are wrong or missing • 3 when an error occurs during loading a workflow • 4 if an error during execution occurred

Installation Notes These notes are primarily for administrators of the system. Version 3.1.2 without extensions • Download with wget https://download.knime.org/analytics-platform/linux/knime_3.1.2.linux.gtk.x86_64.tar.gz • Move to /usr/local/extras/knime_analytics/3.1.2 • Unzip tar -xvzf knime_3.1.2.linux.gtk.x86_64.tar.gz The modulefile is at /usr/local/extras/modulefiles/apps/knime/3.1.2 contains #%Module1.0

proc ModulesHelp { } { puts stderr " Adds KNIME to your PATH environment variable and necessary libraries" }

prepend-path PATH /usr/local/extras/knime_analytics/3.1.2

Version 3.1.2 with extensions • Download with wget https://download.knime.org/analytics-platform/linux/knime- full_3.1.2.linux.gtk.x86_64.tar.gz • Move to /usr/local/extras/knime_analytics/3.1.2ext • Unzip tar -xvzf knime-full_3.1.2.linux.gtk.x86_64.tar.gz The modulefile is at /usr/local/extras/modulefiles/apps/knime/3.1.2ext contains #%Module1.0

proc ModulesHelp { } { puts stderr " Adds KNIME to your PATH environment variable and necessary libraries"

2.2. Iceberg 65 Sheffield HPC Documentation, Release

} prepend-path PATH /usr/local/extras/knime_analytics/3.1.2ext

Lua

Lua URL https://www.lua.org/ Documentation https://www.lua.org/docs.html

Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object- oriented programming, functional programming, data-driven programming, and data description.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. Lua can be loaded with module load apps/lua/5.3.3

Lua can then be run with $ lua

Installation Notes These notes are primarily for administrators of the system. Version 5.3.3 • curl -R -O http://www.lua.org/ftp/lua-5.3.3.tar.gz • tar zxf lua-5.3.3.tar.gz • cd lua-5.3.3 • make linux test The modulefile is at /usr/local/extras/modulefiles/apps/lua/5.3.3 contains #%Module1.0 proc ModulesHelp { } { puts stderr " Adds Lua to your PATH environment variable and necessary libraries" } prepend-path PATH /usr/local/extras/Lua/lua-5.3.3/src

Macaulay2

66 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Macaulay2 Latest version 1.9.2 Dependancies mpi/intel/openmpi/1.10.0 compilers/gcc/5.3 libs/gcc/5.2/boost/1.59 libs/gcc/lapack/3.3.0 apps/gcc/4.4.7/xzutils/5.2.2 URL http://www.math.uiuc.edu/Macaulay2/

Macaulay2 is a software system devoted to supporting research in algebraic geometry and commutative algebra. Macaulay2 includes core algorithms for computing Gröbner bases and graded or multi-graded free resolutions of mod- ules over quotient rings of graded or multi-graded polynomial rings with a monomial ordering. The core algorithms are accessible through a versatile high level interpreted user language with a powerful debugger supporting the creation of new classes of mathematical objects and the installation of methods for computing specifically with them. Macaulay2 can compute Betti numbers, Ext, cohomology of coherent sheaves on projective varieties, primary decom- position of ideals, integral closure of rings, and more.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. To make Macaulay2 available in this session, run one of the following command: module load apps/gcc/5.3/macaulay2/1.9.2

You should then be able to start an interactive Macaulay2 session using M2

Testing Macaulay2 can perform an extensive series of self-tests at compile-time (by running make check). These tests were performed and passed when building Macaulay2 on Iceberg.

Installation notes and modulefile

Version 1.9.2 • Built using install_macaulay_1.9.2.sh ‘_ • Built using a specific git commit (aedebfc1e6326416cb01598e09e8b4dbdc76c178) rather than the tagged 1.9.2 release as Macaulay2’s self-tests failed for the latter. There are few differences between this commit and the tagged 1.9.2 release. • Made available to users via the Macaulay2 1.9.2 modulefile located on the system at /usr/local/modulefiles/apps/gcc/5.3/macaulay2/1.9.2 • The install script and modulefile enable several other following modulefiles to provide: – OpenMPI (mpi/intel/openmpi/1.10.0) – GCC’s compiler and Fortran library (compilers/gcc/5.3) – the Boost C++ libraries (libs/gcc/5.2/boost/1.59) – LAPACK linear algebra routines (libs/gcc/lapack/3.3.0) – LZMA (xz) (de)compression (apps/gcc/4.4.7/xzutils/5.2.2)

2.2. Iceberg 67 Sheffield HPC Documentation, Release

Maple

Maple Latest Version 2016 URL http://www.maplesoft.com/products/maple/

Scientific Computing and Visualisation

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. The latest version of Maple (currently 2016) is made available with the command module load apps/binapps/maple

Alternatively, you can load a specific version with module load apps/binapps/maple/2016 module load apps/binapps/maple/2015

You can then run the graphical version of Maple by entering xmaple or the command line version by entering maple.

Batch usage It is not possible to run Maple worksheets in batch mode. Instead, you must convert your worksheet to a pure text file that contains a set of maple input commands. You can do this in Maple by opening your worksheet and clicking on File->Export As->Maple Input. The result will have the file extension .mpl An example Sun Grid Engine submission script that makes use of a .mpl file called, for example, mycode.mpl is #!/bin/bash # Request 4 gigabytes of real memory (mem) # and 4 gigabytes of virtual memory (mem) #$ -l mem=4G -l rmem=4G module load apps/binapps/maple/2015 maple < mycode.mpl

For general information on how to submit batch jobs refer to Running Batch Jobs on iceberg

Tutorials • High Performance Computing with Maple A tutorial from the Sheffield Research Software Engineering group on how to use Maple in a High Performance Computing environment

Installation notes These are primarily for administrators of the system. Maple 2016 • Run the installer Maple2016.1LinuxX64Installer.run and follow instructions. • Choose Install folder /usr/local/packages6/apps/binapps/maple2016 • Do not configure MATLAB • Choose a network license. Details on CiCS internal wiki.

68 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

• Uncheck ‘Enable periodic checking for Maple 2016 updates’ • Check ‘Check for updates now’ The module file is at /usr/local/modulefiles/apps/binapps/maple/2016 #%Module10.2#####################################################################

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global ver

puts stderr " Makes Maple $ver available to the system." }

# Maple version (not in the user's environment) set ver 2016 module-whatis "sets the necessary Maple $ver paths" prepend-path PATH /usr/local/packages6/apps/binapps/maple2016/bin

Maple 2015 The module file is at /usr/local/modulefiles/apps/binapps/maple/2015 #%Module10.2#################################################################### #

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global ver

puts stderr " Makes Maple $ver available to the system." }

# Maple version (not in the user's environment) set ver 2015 module-whatis "sets the necessary Maple $ver paths" prepend-path PATH /usr/local/packages6/maple/bin/

Mathematica

Wolfram Mathematica Dependancies None URL http://www.wolfram.com/mathematica/ Latest version 10.3.1

2.2. Iceberg 69 Sheffield HPC Documentation, Release

Mathematica is a technical computing environment and programming language with strong symbolic and numerical abilities.

Single Core Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with qsh. The latest version of Mathematica can be loaded with module load apps/binapps/mathematica

Alternatively, you can load a specific version of Mathematica using module load apps/binapps/mathematica/10.3.1 module load apps/binapps/mathematica/10.2

Mathematica can then be started with the mathematica command mathematica

Multicore Interactive Usage Mathematica has extensive parallel functionality. To use it, you should request a parallel interactive session. For example, to request 4 cores qsh -pe openmp 4

Load and launch Mathematica module load apps/binapps/mathematica mathematica

In Mathematica, let’s time how long it takes to calculate the first 20 million primes on 1 CPU core AbsoluteTiming[primelist = Table[Prime[k], {k, 1, 20000000}];]

When I tried this, I got 78 seconds. Your results may vary greatly. Now, let’s launch 4 ParallelKernels and redo the calculation in parallel LaunchKernels[4] AbsoluteTiming[primelist = ParallelTable[Prime[k], {k, 1, 20000000}, Method -> "CoarsestGrained"];]

When I tried this, I got 29 seconds – around 2.7 times faster. This illustrates a couple of points:- • You should always request the same number of kernels as you requested in your qsh command (in this case, 4). If you request more, you will damage performance for yourself and other users of the system. • N kernels doesn’t always translate to N times faster.

Batch Submission Unfortunately, it is not possible to run Mathematica notebook .nb files directly in batch. Instead, you need to create a simple text file that contains all of the Mathematica commands you want to execute. Typically, such files are given the extension .m. Let’s run the following simple Mathematica script as a batch job.

(*Find out what version of Mathematica this machine is running*) Print["Mathematica version is " <> $Version]

(*Find out the name of the machine we are running on*) Print["The machine name is " <> $MachineName]

(*Do a calculation*)

70 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Print["The result of the integral is "] Print [ Integrate[Sin[x]^2, x]]

Copy and paste the above into a text file called very_simple_mathematica.m An example batch submission script for this file is #!/bin/bash # Request 4 gigabytes of real memory (mem) # and 4 gigabytes of virtual memory (mem) #$ -l mem=4G -l rmem=4G module load apps/binapps/mathematica/10.3.1 math -script very_simple_mathematica.m

Copy and paste the above into a file called run_job.sh and submit with qsub run_job.sh

Once the job has successfully completed, the output will be in a file named like run_job.sh.o396699. The number at the end refers to the job-ID given to this job by the system and will be different for you. Let’s take a look at the contents of this file more run_job.sh.o396699

Mathematica version is 10.2.0 for Linux x86 (64-bit) (July 28, 2015) The machine name is node131 The result of the integral is x/2 - Sin[2*x]/4

Installation notes These are primarily for administrators of the system For Version 10.3.1 mkdir -p /usr/local/packages6/apps/binapps/mathematica/10.3.1 chmod +x ./Mathematica_10.3.1_LINUX.sh ./Mathematica_10.3.1_LINUX.sh

The installer is interactive. Here’s the session output ------ 10.3 Installer ------

Copyright (c) 1988-2015 Wolfram Research, Inc. All rights reserved.

WARNING: Wolfram Mathematica is protected by copyright law and international treaties. Unauthorized reproduction or distribution may result in severe civil and criminal penalties and will be prosecuted to the maximum extent possible under law.

Enter the installation directory, or press ENTER to select /usr/local/Wolfram/Mathematica/10.3: > /usr/local/packages6/apps/binapps/mathematica/10.3.1

Now installing...

[*****************************************************************************]

2.2. Iceberg 71 Sheffield HPC Documentation, Release

Type the directory path in which the Wolfram Mathematica script(s) will be created, or press ENTER to select /usr/local/bin: > /usr/local/packages6/apps/binapps/mathematica/10.3.1/scripts

Create directory (y/n)? > y

WARNING: No Avahi Daemon was detected so some Kernel Discovery features will not be available. You can install Avahi Daemon using your distribution's package management system.

For Red Hat based distributions, try running (as root): yum install avahi

Installation complete.

Install the University network mathpass file at /usr/local/packages6/apps/binapps/mathematica/10.3.1/Configuration/Licensing For Version 10.2 mkdir -p /usr/local/packages6/apps/binapps/mathematica/10.2 chmod +x ./Mathematica_10.2.0_LINUX.sh ./Mathematica_10.2.0_LINUX.sh

The installer is interactive. Here’s the session output ------Wolfram Mathematica 10.2 Installer ------

Copyright (c) 1988-2015 Wolfram Research, Inc. All rights reserved.

WARNING: Wolfram Mathematica is protected by copyright law and international treaties. Unauthorized reproduction or distribution may result in severe civil and criminal penalties and will be prosecuted to the maximum extent possible under law.

Enter the installation directory, or press ENTER to select /usr/local/Wolfram/Mathematica/10.2: >

Error: Cannot create directory /usr/local/Wolfram/Mathematica/10.2.

You may need to be logged in as root to continue with this installation.

Enter the installation directory, or press ENTER to select /usr/local/Wolfram/Mathematica/10.2: > /usr/local/packages6/apps/binapps/mathematica/10.2

Now installing...

[*********************************************************************************************************************************************************************************************************]

Type the directory path in which the Wolfram Mathematica script(s) will be created, or press ENTER to select /usr/local/bin: > /usr/local/packages6/apps/binapps/mathematica/10.2/scripts

Create directory (y/n)? > y

WARNING: No Avahi Daemon was detected so some Kernel Discovery features will not be available. You can install Avahi Daemon using your distribution's package management system.

72 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

For Red Hat based distributions, try running (as root):

yum install avahi

Installation complete.

Remove the playerpass file rm/usr/local/packages6/apps/binapps/mathematica/10.2/Configuration/Licensing/playerpass

Install the University network mathpass file at /usr/local/packages6/apps/binapps/mathematica/10.2/Configuration/Licensing

Modulefiles • The 10.3.1 module file. • The 10.2 module file.

MATLAB

MATLAB Versions 2013a , 2013b , 2014a, 2015a, 2016a Support Level FULL Dependancies None URL http://uk.mathworks.com/products/matlab Local URL http://www.shef.ac.uk/wrgrid/software/matlab Documentation http://uk.mathworks.com/help/matlab

Scientific Computing and Visualisation

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh command. The latest version of MATLAB (currently 2016a) is made available with the command module load apps/matlab

Alternatively, you can load a specific version with one of of the following commands module load apps/matlab/2013a module load apps/matlab/2013b module load apps/matlab/2014a module load apps/matlab/2015a module load apps/matlab/2016a

You can then run MATLAB by entering matlab

Serial (one CPU) Batch usage Here, we assume that you wish to run the program hello.m on the system. First, you need to write a batch submission file. We assume you’ll call this my_job.sge.

2.2. Iceberg 73 Sheffield HPC Documentation, Release

#!/bin/bash #$ -l rmem=4G # Request 4 Gigabytes of real memory #$ -l mem=16G # Request 16 Gigabytes of virtual memory $ -cwd # Run job from current directory module load apps/matlab # Make latest version of MATLAB available

matlab -nodesktop -r 'hello'

Ensuring that hello.m and myjob.sge are both in your current working directory, submit your job to the batch system. qsub my_job.sge

Some notes about this example: • We are running the script hello.m but we drop the .m in the call to MATLAB. That is, we do -r ’hello’ rather than -r hello.m. • All of the module commands introduced in the Interactive usage section will also work in batch mode. This allows you to select a specific version of MATLAB if you wish.

Easy Way of Running MATLAB Jobs on the Batch Queue Firstly prepare a MATLAB script that contains all the commands for running a MATLAB task. Let us assume that this script is called mymatlabwork.m. Next select the version of MATLAB you wish to use by using the module load command, for example; module load apps/matlab/2016a

Now submit a job that runs this MATLAB script as a batch job. runmatlab mymatlabwork.m

That is all to it! The runmatlab command can take a number of parameters to refine the control of your MATLAB batch job, such as the maximum time and memory needs. To get a full listing of these parameters simply type runmatlab on iceberg command line.

MATLAB Compiler and running free-standing compiled MATLAB programs The MATLAB compiler mcc is installed on iceberg that can be used to generate free standing executables. Such executables can then be run on other computers that does not have MATLAB installed. We strongly recommend you use R2016a or later versions to take advantage of this feature. To compile a MATLAB function or script for example called myscript.m the following steps are required. module load apps/matlab/2016a # Load the matlab 2016a module mcc -m myscript.m # Compile your program to generate the executable myscript and # also generate a shell script named run_myscript.sh ./run_myscript.sh $MCRROOT # Finally run your program

If myscript.m is a MATLAB function that require inputs these can be suplied on the command line. For example if the first line of myscript.m reads: function out = myscript ( a , b , c )

then to run it with 1.0, 2.0, 3.0 as its parameters you will need to type: ./run_myscript.sh $MCRROOT 1.0 2.0 3.0

74 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

After a successful compilation and running you can transfer your executable and the runscript to another computer. That computer does not have to have MATLAB installed or licensed on it but it will have to have the MATLAB runtime system installed. This can be done by either downloading the MATLAB runtime environment from Mathworks web site or by copying the installer file from iceberg itself which resides in: /usr/local/packages6/matlab/R2016a/toolbox/compiler/deploy/glnxa64/MCRInstaller.zip

This file can be unzipped in a temporary area and run the setup script that unzipping yields to install the MATLAB runtime environment. Finally the environment variable $MCRROOT can be set to the directory containing the runtime environment.

Parallel MATLAB on iceberg Currently we recommend the 2015a version of MATLAB for parallel work. The default cluster configuration named local provides parallel working environment by using the CPUs of the worker node that is running the current MATLAB session. Each iceberg worker node can run multiple users’ jobs simultane- ously. Therefore depending on who else is using that node at the time, parallel MATLAB jobs can create contentions between jobs and slow them considerably. It is therefore advisable to start parallel MATLAB jobs that will use the local profile from a parallel SGE job. For example, to use the local profile with 5 workers, do the following; Start a parallel OpenMP job with 6 workers: qsh -pe openmp 6

Run MATLAB in that session and select 5 workers: matlab parpool ('local' , 5 )

The above example will use 5 MATLAB workers on a single iceberg node to run a parallel task. To take advantage of the multiple iceberg nodes, you will need to make use of a parallel cluster profile named sge. This can be done by issuing a locally provided MATLAB command named iceberg that imports the parallel cluster profile named sge that can take advantage of the SGE scheduler to run larger parallel jobs. When using the sge profile, MATLAB will be able to submit multiple MATLAB jobs the the SGE scheduler from within MATLAB itself. However, each job will have the default resource requirements unless the following trick is deployed. For example, during your MATLAB session type: global sge_params sge_params='-l mem=16G -l h_rt=36:00:00'

to make sure that all the MATLAB batch jobs will use up to 16GBytes of memory and will not be killed unless they exceed 36 hours of run time.

Training • CiCS run an Introduction to Matlab course • In November 2015, CiCS hosted a Parallel Computing in MATLAB Masterclass. The materials are available at http://rcg.group.shef.ac.uk/courses/mathworks-parallelmatlab/

Installation notes These notes are primarily for system administrators. Requires the floating license server licserv4.shef.ac.uk to serve the licenses. An install script named installer_input.txt and associated files are downloadable from Mathworks site along with all the required toolbox specific installation files. The following steps are performed to install MATLAB on iceberg.

2.2. Iceberg 75 Sheffield HPC Documentation, Release

1. If necessary, update the floating license keys on licserv4.shef.ac.uk to ensure that the licenses are served for the versions to install. 2. Log onto Mathworks site to download the MATLAB installer package for 64-bit Linux ( for R2016a this was called matlab_R2016a_glnxa64.zip ) 3. Unzip the installer package in a temporary directory: unzip matlab_R2016a_glnxa64.zip ( This will create a few items including files named install and installer_input.txt) 4. Run the installer: ./install 5. Select install choice of Log in to Mathworks Account 6. Select Download only. 7. Select the offered default Download path ( this will be in your home area $HOME/Downloads/MathWorks/... ) Note: This is the default download location that is later used by the silent installer. Another option is to move all downloaded files to the same directory where install script resides. 8. Finally run the installer using our customized installer_input.txt script as input ( ./install -inputFile installer_input.txt ). Installation should finish with exit status 0 if all has worked. Note: A template installer_input file for 2016a is available at /usr/local/packages6/matlab directory named 2016a_installer_input.txt. This will need minor edits to install the next versions in the same way.

MrBayes

MrBayes Version 3.2.6 URL https://sourceforge.net/projects/mrbayes/

MrBayes is a program for the Bayesian estimation of phylogeny.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx command. The latest version of MrBayes (currently 3.2.6) is made available with the command module load apps/gcc/4.4.7/mrbayes

Alternatively, you can load a specific version with module load apps/gcc/4.4.7/mrbayes/3.2.6

This command makes the MrBayes mb binary available to your session.

Installation notes MrBayes was installed with gcc 4.4.7 tar -xvzf ./mrbayes-3.2.6.tar.gz ls cd mrbayes-3.2.6 autoconf ./configure --with-beagle=/usr/local/packages6/libs/gcc/4.4.7/beagle/2.1.2 --prefix=/usr/local/packages6/apps/gcc/4.4.7/mrbayes/3.2.6/

76 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

make make install

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/mrbayes/3.2.6 • The module file is on github.

Netlogo

Netlogo Version 5.3.1 URL https://ccl.northwestern.edu/netlogo/index.shtml

NetLogo is a multi-agent programmable modeling environment. It is used by tens of thousands of students, teachers and researchers worldwide.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx command. The latest version of NetLogo, currently 5.3.1, is made available with the command module load apps/binapps/netlogo/

Alternatively, you can load a specific version with module load apps/binapps/netlogo/5.3.1

This command makes the NetLogo executables available to your session by adding the install directory to your PATH variable. Start the NetLogo Graphical User Interface with the command NetLogo

Batch usage Here is an example batch submission file for a Netlogo model that contains a BehaviourSpace experi- ment called ParallelExperiment1. Using BehaviourSpace is the easiest way to exploit parallel processors in Netlogo. #!/bin/bash # Request 3 Gigabytes of RAM per core #$ -l mem=3G #$ -l rmem=3G #Combine stderr and stdout into one file #$ -j y #Make sure that the value below matches the number of threads #$ -pe openmp 4 #Make sure this matches the number of openmp slots requested threads=4 module load apps/binapps/netlogo/5.3.1

#Set the filename of the output csv file #Will be called out_4.csv in this case output_table=out_$threads.csv

2.2. Iceberg 77 Sheffield HPC Documentation, Release

#Experiment and model names experiment=ParallelExperiment1 model=ParallelPredation.nlogo

echo "$threads threads requested"

#You have to run netlogo from its install directory in order for extensions to work cd $NETLOGO_DIR java -Xmx1024m -Dfile.encoding=UTF-8 -cp $NETLOGO_DIR/NetLogo.jar org.nlogo.headless.Main --model $SGE_O_WORKDIR/$model --experiment $experiment --table $SGE_O_WORKDIR/$output_table --threads $threads

Call the above file submit_netlogo.sh and submit it with qsub submit_netlogo.sh. All files required to run this example can be found in the HPC Examples repository. Netlogo uses shared memory parallelism. As such, the maximum number of OpenMP slots can request on our current system is 16.

Troubleshooting Problems with X Windows When you run the NetLogo command, you get the following error NetLogo Error invoking method. NetLogo Failed to launch JVM

This could be because you have not requested an X session properly. Ensure that you have logged into the system using the -X flag e.g. ssh -X [email protected]

Also ensure that you have requested a qrshx session rather than a qrsh session. Java memory issues You get the following error message Error occurred during initialization of VM Could not reserve enough space for object heap Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.

This is the ‘Virtual Memory’ problem described in the Java section. You need to request more memory for your session. For example, request an interactive session like this qrshx -l mem=8G -l rmem=8G

Installation notes Download and untar the installer to /usr/local/packages6/apps/binapps/netlogo/netlogo-5.3.1-64 post installation actions One netlogo user required the Dynamic scheduler extension. Installed by doing wget https://github.com/colinsheppard/Dynamic-Scheduler-Extension/archive/v0.2.1.tar.gz tar -xvzf ./v0.2.1.tar.gz mv Dynamic-Scheduler-Extension ./dynamic-scheduler mv ./dynamic-scheduler/ /usr/local/packages6/apps/binapps/netlogo/netlogo-5.3.1-64/app/extensions/

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/binapps/netlogo/5.3.1

78 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Octave

Ocatve Versions 4.0.0 URL https://www.gnu.org/software/octave/

GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabil- ities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to MATLAB so that most programs are easily portable.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with either the qsh or qrsh commands. The latest version of Octave (currently 4.0.0) is made available with the command module load apps/gcc/5.2/octave

Alternatively, you can load a specific version with module load apps/gcc/5.2/octave/4.0

This adds Octave to your PATH and also loads all of the supporting libraries and compilers required by the system. Start Octave by executing the command octave

If you are using a qsh session, the graphical user interface version will begin. If you are using a qrsh session, you will only be able to use the text-only terminal version.

Batch Usage Here is an example batch submission script that will run an Octave program called foo.m #!/bin/bash # Request 5 gigabytes of real memory (mem) # and 5 gigabytes of virtual memory (mem) #$ -l mem=5G -l rmem=5G # Request 64 hours of run time #$ -l h_rt=64:00:00

module load apps/gcc/5.2/octave/4.0

octave foo.m

Using Packages (Toolboxes) Octave toolboxes are referred to as packages. To see which ones are installed, use the command ver from within Octave. Unlike MATLAB, Octave does not load all of its packages at startup. It is necessary to load the package before its commands are available to your session. For example, as with MATLAB, the pdist command is part of the statistics package. Unlike MATLAB, pdist is not immediately available to you

2.2. Iceberg 79 Sheffield HPC Documentation, Release

octave:1> pdist([1 2 3; 2 3 4; 1 1 1]) warning: the 'pdist' function belongs to the statistics package from Octave Forge which you have installed but not loaded. To load the package, run `pkg load statistics' from the Octave prompt.

Please read `http://www.octave.org/missing.html' to learn how you can contribute missing functionality. warning: called from __unimplemented__ at line 524 column 5 error: 'pdist' undefined near line 1 column 1

As the error message suggests, you need to load the statistics package octave:1> pkg load statistics octave:2> pdist([1 2 3; 2 3 4; 1 1 1]) ans =

1.7321 2.2361 3.7417

Installation notes These are primarily for administrators of the system. Octave was installed using gcc 5.2 and the following libraries: • Java 1.8u60 • FLTK 1.3.3 • fftw 3.3.4 • Octave was installed using a SGE batch job. The install script is on github • The make log is on the system at /usr/local/packages6/apps/gcc/5.2/octave/4.0/make_octave4.0.0.log • The configure log is on the system at /usr/local/packages6/apps/gcc/5.2/octave/4.0/configure_octave4.0.0.log For full functionality, Octave requires a large number of additional libraries to be installed. We have currently not installed all of these but will do so should they be required. For information, here is the relevant part of the Configure log that describes how Octave was configured Source directory: . Installation prefix: /usr/local/packages6/apps/gcc/5.2/octave/4.0 C compiler: gcc -pthread -fopenmp -Wall -W -Wshadow -Wforma t -Wpointer-arith -Wmissing-prototypes -Wstrict-prototypes -Wwrite-strings -Wcas t-align -Wcast-qual -I/usr/local/packages6/compilers/gcc/5.2.0/include C++ compiler: g++ -pthread -fopenmp -Wall -W -Wshadow -Wold-s tyle-cast -Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -g - O2 Fortran compiler: gfortran -O Fortran libraries: -L/usr/local/packages6/compilers/gcc/5.2.0/lib - L/usr/local/packages6/compilers/gcc/5.2.0/lib64 -L/usr/local/packages6/compilers /gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -L/usr/local/packages6/compile rs/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../lib64 -L/lib/../ lib64 -L/usr/lib/../lib64 -L/usr/local/packages6/libs/gcc/5.2/fftw/3.3.4/lib -L/ usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/lib -L/usr/local/packages6/compilers /gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../.. -lgfortran -lm -lquad math Lex libraries: LIBS: -lutil -lm

AMD CPPFLAGS:

80 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

AMD LDFLAGS: AMD libraries: ARPACK CPPFLAGS: ARPACK LDFLAGS: ARPACK libraries: BLAS libraries: -lblas CAMD CPPFLAGS: CAMD LDFLAGS: CAMD libraries: CARBON libraries: CCOLAMD CPPFLAGS: CCOLAMD LDFLAGS: CCOLAMD libraries: CHOLMOD CPPFLAGS: CHOLMOD LDFLAGS: CHOLMOD libraries: COLAMD CPPFLAGS: COLAMD LDFLAGS: COLAMD libraries: CURL CPPFLAGS: CURL LDFLAGS: CURL libraries: -lcurl CXSPARSE CPPFLAGS: CXSPARSE LDFLAGS: CXSPARSE libraries: DL libraries: FFTW3 CPPFLAGS: FFTW3 LDFLAGS: FFTW3 libraries: -lfftw3_threads -lfftw3 FFTW3F CPPFLAGS: FFTW3F LDFLAGS: FFTW3F libraries: -lfftw3f_threads -lfftw3f FLTK CPPFLAGS: -I/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/in clude -I/usr/include/freetype2 -I/usr/local/packages6/compilers/gcc/5.2.0/includ e -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_THREAD_SAFE -D_REENTRANT FLTK LDFLAGS: -L/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/li b -Wl,-rpath,/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/lib -L/usr/local/packa ges6/compilers/gcc/5.2.0/lib -L/usr/local/packages6/compilers/gcc/5.2.0/lib64 -l fltk_gl -lGLU -lGL -lfltk -lXcursor -lXfixes -lXext -lXft -lfontconfig -lXineram a -lpthread -ldl -lm -lX11 FLTK libraries: fontconfig CPPFLAGS: fontconfig libraries: -lfontconfig FreeType2 CPPFLAGS: -I/usr/include/freetype2 FreeType2 libraries: -lfreetype GLPK CPPFLAGS: GLPK LDFLAGS: GLPK libraries: HDF5 CPPFLAGS: HDF5 LDFLAGS: HDF5 libraries: -lhdf5 Java home: /usr/local/packages6/apps/binapps/java/jre1.8.0_6 0/ Java JVM path: /usr/local/packages6/apps/binapps/java/jre1.8.0_6 0/lib/amd64/server Java CPPFLAGS: -I/usr/local/packages6/apps/binapps/java/jre1.8.0 _60//include -I/usr/local/packages6/apps/binapps/java/jre1.8.0_60//include/linux Java libraries:

2.2. Iceberg 81 Sheffield HPC Documentation, Release

LAPACK libraries: -llapack LLVM CPPFLAGS: LLVM LDFLAGS: LLVM libraries: Magick++ CPPFLAGS: Magick++ LDFLAGS: Magick++ libraries: OPENGL libraries: -lfontconfig -lGL -lGLU OSMesa CPPFLAGS: OSMesa LDFLAGS: OSMesa libraries: PCRE CPPFLAGS: PCRE libraries: -lpcre PortAudio CPPFLAGS: PortAudio LDFLAGS: PortAudio libraries: PTHREAD flags: -pthread PTHREAD libraries: QHULL CPPFLAGS: QHULL LDFLAGS: QHULL libraries: QRUPDATE CPPFLAGS: QRUPDATE LDFLAGS: QRUPDATE libraries: Qt CPPFLAGS: -I/usr/include/QtCore -I/usr/include/QtGui -I/usr /include/QtNetwork -I/usr/include/QtOpenGL Qt LDFLAGS: Qt libraries: -lQtNetwork -lQtOpenGL -lQtGui -lQtCore READLINE libraries: -lreadline Sndfile CPPFLAGS: Sndfile LDFLAGS: Sndfile libraries: TERM libraries: -lncurses UMFPACK CPPFLAGS: UMFPACK LDFLAGS: UMFPACK libraries: X11 include flags: X11 libraries: -lX11 Z CPPFLAGS: Z LDFLAGS: Z libraries: -lz

Default pager: less gnuplot: gnuplot

Build Octave GUI: yes JIT compiler for loops: no Build Java interface: no Do internal array bounds checking: no Build static libraries: no Build shared libraries: yes Dynamic Linking: yes (dlopen) Include support for GNU readline: yes 64-bit array dims and indexing: no OpenMP SMP multithreading: yes Build cross tools: no configure: WARNING:

82 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

I didn't find gperf, but it's only a problem if you need to reconstruct oct-gperf.h configure: WARNING:

I didn't find icotool, but it's only a problem if you need to reconstruct octave-logo.ico, which is the case if you're building from VCS sources. configure: WARNING: Qhull library not found. This will result in loss of functi onality of some geometry functions. configure: WARNING: GLPK library not found. The glpk function for solving linea r programs will be disabled. configure: WARNING: gl2ps library not found. OpenGL printing is disabled. configure: WARNING: OSMesa library not found. Offscreen rendering with OpenGL w ill be disabled. configure: WARNING: qrupdate not found. The QR & Cholesky updating functions wi ll be slow. configure: WARNING: AMD library not found. This will result in some lack of fun ctionality for sparse matrices. configure: WARNING: CAMD library not found. This will result in some lack of fu nctionality for sparse matrices. configure: WARNING: COLAMD library not found. This will result in some lack of functionality for sparse matrices. configure: WARNING: CCOLAMD library not found. This will result in some lack of functionality for sparse matrices. configure: WARNING: CHOLMOD library not found. This will result in some lack of functionality for sparse matrices. configure: WARNING: CXSparse library not found. This will result in some lack o f functionality for sparse matrices. configure: WARNING: UMFPACK not found. This will result in some lack of functio nality for sparse matrices. configure: WARNING: ARPACK not found. The eigs function will be disabled. configure: WARNING: Include file not found. Octave will not be able to call Java methods. configure: WARNING: Qscintilla library not found -- disabling built-in GUI editor configure:

• Some commonly-used packages were additionally installed from Octave Forge using the following commands from within Octave pkg install -global -forge io pkg install -global -forge statistics pkg install -global -forge mapping pkg install -global -forge image pkg install -global -forge struct pkg install -global -forge optim

Module File The module file is octave_4.0

OpenFOAM

2.2. Iceberg 83 Sheffield HPC Documentation, Release

OpenFOAM Version 3.0.0 Dependancies gcc/5.2 mpi/gcc/openmpi/1.10.0 scotch/6.0.4 URL http://www.openfoam.org Documentation http://www.openfoam.org/docs/

OpenFOAM is free, open source software for computational fluid dynamics (CFD).

Usage The lastest version of OpenFoam can be activated using the module file: module load apps/gcc/5.2/openfoam

Alternatively, you can load a specific version of OpenFOAM: module load apps/gcc/5.2/openfoam/2.4.0 module load apps/gcc/5.2/openfoam/3.0.0

This has the same effect as sourcing the openfoam bashrc file, so that should not be needed.

Installation notes OpenFoam was compiled using the install_openfoam.sh script, the module file is 3.0.0.

OpenSim

OpenSim Support Level bronze Dependancies None URL https://simtk.org/home/opensim Version 3.3

OpenSim is a freely available, user extensible software system that lets users develop models of musculoskeletal structures and create dynamic simulations of movement.

Usage The latest version of OpenSim can be loaded with module load apps/gcc/4.8.2/opensim

Installation Notes These are primarily for administrators of the system. Built using: cmake /home/cs1sjm/Downloads/OpenSim33-source/ -DCMAKE_INSTALL_PREFIX=/usr/local/packages6/apps/gcc/4.8.2/opensim/3.3/ -DSIMBODY_HOME=/usr/local/packages6/libs/gcc/4.8.2/simbody/3.5.3/ -DOPENSIM_STANDARD_11=ON make -j 8 make install

84 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

orca

orca Versions 3.0.3 URL https://orcaforum.cec.mpg.de/

ORCA is a flexible, efficient and easy-to-use general purpose tool for quantum chemistry with specific emphasis on spectroscopic properties of open-shell molecules. It features a wide variety of standard quantum chemical methods ranging from semiempirical methods to DFT to single- and multireference correlated ab initio methods. It can also treat environmental and relativistic effects.

Making orca available The following module command makes the latest version of orca available to your session module load apps/binapps/orca

Alternatively, you can make a specific version available module load apps/binapps/orca/3.0.3

Example single core job Create a file called orca_serial.inp that contains the following orca commands # # My first ORCA calculation :-) # # Taken from the Orca manual # https://orcaforum.cec.mpg.de/OrcaManual.pdf ! HF SVP * xyz 0 1 C 0 0 0 O 0 0 1.13 *

Create a Sun Grid Engine submission file called submit_serial.sh that looks like this #!/bin/bash # Request 4 Gig of virtual memory per process #$ -l mem=4G # Request 4 Gig of real memory per process #$ -l rmem=4G module load apps/binapps/orca/3.0.3 $ORCAHOME/orca example1.inp

Submit the job to the queue with the command qsub submit_serial.sh

Example parallel job An example Sun Grid Engine submission script is #!/bin/bash #Request 4 Processes #Ensure that this matches the number requested in your Orca input file #$ -pe openmpi-ib 4 # Request 4 Gig of virtual memory per process

2.2. Iceberg 85 Sheffield HPC Documentation, Release

#$ -l mem=4G # Request 4 Gig of real memory per process #$ -l rmem=4G module load mpi/gcc/openmpi/1.8.8 module load apps/binapps/orca/3.0.3

ORCAPATH=/usr/local/packages6/apps/binapps/orca/3.0.3/ $ORCAPATH/orca example2_parallel.inp

Register as a user You are encouraged to register as a user of Orca at https://orcaforum.cec.mpg.de/ in order to take advantage of updates, announcements and also of the users forum.

Documentation A comprehensive .pdf manual is available online.

Installation notes These are primarily for system administrators. Orca was a binary install tar -xjf ./orca_3_0_3_linux_x86-64.tbz cd orca_3_0_3_linux_x86-64 mkdir -p /usr/local/packages6/apps/binapps/orca/3.0.3 mv * /usr/local/packages6/apps/binapps/orca/3.0.3/

Modulefile The module file is on the system at /usr/local/modulefiles/apps/binapps/orca/3.0.3 The contents of the module file are #%Module1.0##################################################################### ## ## Orca module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global bedtools-version

puts stderr " Adds `orca-$orcaversion' to your PATH environment variable" } set orcaversion 3.0.3 prepend-path ORCAHOME /usr/local/packages6/apps/binapps/orca/3.0.3/ prepend-path PATH /usr/local/packages6/apps/binapps/orca/3.0.3/

Paraview

86 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Paraview Version 4.3 Support Level extras Dependancies openmpi (1.8) URL http://paraview.org/ Documentation http://www.paraview.org/documentation/

Paraview is a parallel visualisation tool.

Using Paraview on iceberg This guide describes how to use Paraview from iceberg. Paraview is a parallel visualisa- tion tool designed for use on clusters like iceberg. Because iceberg is designed primarily as a headless computational cluster paraview has not been installed on iceberg in such a way that you load the GUI remotely on iceberg 1. Paraview therefore runs on iceberg in client + server mode, the server running on iceberg, and the client on your local machine.

Configuring the Client To use Paraview on iceberg, first download and install Paraview 4.3 2. Once you have installed Paraview locally (the client) you need to configure the connection to iceberg. In Paraview go to File > Connect, then click Add Server, name the connection iceberg, and select ‘Client / Server (reverse connection)’ for the Server Type, the port should retain the default value of 11111. Then click configure, on the next screen leave the connection as manual and click save. Once you are back at the original connect menu, click connect to start listening for the connection from iceberg.

Starting the Server Once you have configured the local paraview client, login to iceberg from the client machine via ssh 3 and run qsub-paraview. This will submit a job to the scheduler que for 16 processes with 4GB or RAM each. This is designed to be used for large visualisation tasks, smaller jobs can be requested by specifying standard qsub commands to qsub-paraview i.e. qsub-paraview -pe openmpi-ib 1 will only request one process. Assuming you still have the client listening for connections, once the paraview job starts in the que it should connect to your client and you should be able to start accessing data stored on iceberg and rendering images.

A Note on Performance When you run Paraview locally it will use the graphics hardware of your local machine for rendering. This is using hardware in your computer to create and display these images. On iceberg there is no such hardware, it is simulated via software. Therefore for small datasets you will probably find you are paying a performance penalty for using Paraview on iceberg. The advantage however, is that the renderer is on the same cluster as your data, so no data transfer is needed, and you have access to very large amounts of memory for visualising very large datasets.

Manually Starting the Server The qsub-paraview command is a wrapper that automatically detects the client IP address from the SSH connection and submits the job. It is possible to customise this behavior by copying and modifying this script. This for instance would allow you to start the paraview server via MyApps or from a different computer to the one with the client installed. The script used by qsub-paraview also serves as a good example script and can be copied into your home directory by running cp /usr/local/bin/pvserver_submit.sh ~/. This script can then be qsubmitted as normal by qsub. The client IP address can be added manually by replacing echo $SSH_CLIENT | awk ‘{ print $1}’ with the IP address. More information on Paraview client/server can be found Here.

Installation Custom build scripts are availible in /usr/local/extras/paraview/build_scripts which can be used to re- compile.

1 It is not possible to install the latest version of the paraview GUI on iceberg due to the Qt version shipped with Scientific Liunx 5. 2 The client and server versions have to match. 3 Connecting to Paraview via the automatic method descibed here is not supported on the MyApps portal.

2.2. Iceberg 87 Sheffield HPC Documentation, Release

Perl

Perl Latest Version 5.24.0 URL https://www.perl.org/

Perl 5 is a programming language originally designed for text/report processing but now widely used by the bioin- formatics community due to its ability to identify and process patterns in sequences of data. Perl 5 is also used by Linux/UNIX systems administrators as it has borrowed much from shell scripting languages and contains functionality for manipulating files and directory trees.

Usage The default version of Perl on the system is 5.10.1; no module command is required to use it $ perl --version

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi

However you may see warnings when trying to install Perl software that this version (or the cran package manage tool that comes with it) is too old; in this case you can activate a more recent version using module load apps/gcc/4.4.7/perl/5.24.0

Several Perl modules have been upgraded/installed for this instance of Perl: • CPAN, the original Perl package management tool, was upgraded to 2.14 • BioPerl 1.007000 (aka v1.7.0) was installed. BioPerl is a collection of Perl modules that facilitate the develop- ment of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project. BioPerl has pre-requisites of inc::latest (0.500) and Module::Build (0.4220). • Term::ReadKey (2.37) and Term::Readline::Gnu (1.34) were installed; these make interactive use of CPAN more pleasant. • local::lib was installed, which allows users to install additional Perl modules in their home directories. MORE INFO NEEDED (see Installing Perl modules). • App::cpanminus (1.7042) was installed, which provides the cpanm program. This is considered by some to be a superior package management tool over cpan. You can confirm that you are using this newer version of Perl (and have access to BioPerl) using $ perl --version

Summary of my perl5 (revision 5 version 24 subversion 0) configuration:

$ perl -MBio::Root::Version -le 'print $Bio::Root::Version::VERSION'

1.007000 ...

Performance: 5.24.0 can be significantly faster than earlier versions of Perl 5 for arithemetic operations.

Installing Perl modules If using a version of Perl activated using module load then you can subsequently easily install Perl modules in your home directory (within what’s called a Perl local::lib). Run the following

88 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

eval $(perlvers="$(perl -e 'print $^V')"; perl -I$HOME/.perl$perlvers/lib/perl5 -Mlocal::lib=$HOME/.perl$perlvers)

The next time you try to use cpanm to install a package it should install it into the (hidden) directory .perl5.x.y within your home directory. For example, the following installs the Bio::Tradis Perl module into ~/.perl5.24.0/ module load apps/gcc/4.4.7/perl/5.24.0 eval $(perlvers="$(perl -e 'print $^V')"; perl -I$HOME/.perl$perlvers/lib/perl5 -Mlocal::lib=$HOME/.perl$perlvers) cpanm Bio::Tradis

If you always want to have your Perl local::lib available then you may want to add the eval ... line above to your ~/.bashrc file.

Installation instructions Version 5.24.0 This script: 1. Downloads, unpacks, builds (using the system GCC (4.4.7)), tests then installs Perl (in /usr/local/packages6/apps/gcc/4.4.7/perl/5.24.0/) 2. Installs this modulefile as /usr/local/modulefiles/apps/gcc/4.4.7/perl/5.24.0 PerlBrew could have been used to install and manage multiple versions of Perl but it offers relatively few advantages over Environment Modules and the latter are already used system-wide for package management.

Phyluce

Phyluce Version 1.5.0 Dependancies apps/python/conda URL https://github.com/faircloth-lab/phyluce Documentation http://phyluce.readthedocs.io/

Phyluce (phy-loo-chee) is a software package that was initially developed for analyzing data collected from ultracon- served elements in organismal genomes.

Usage Phyluce can be activated using the module file: module load apps/binapps/phyluce/1.5.0

Phyluce makes use of the apps/python/conda module, therefore this module will be loaded by loading Phyluce. As Phyluce is a Python package your default Python interpreter will be changed by loading Phyluce.

Installation notes As root: $ module load apps/python/conda $ conda create -p /usr/local/packages6/apps/binapps/conda/phyluce python=2 $ source activate /usr/local/packages6/apps/binapps/conda/phyluce $ conda install -c https://conda.binstar.org/faircloth-lab phyluce

This installs Phyluce as a conda environment in the /usr/local/packages6/apps/binapps/conda/phyluce folder, which is then loaded by the module file phyluce/1.5.0, which is a modification of the anaconda module files.

2.2. Iceberg 89 Sheffield HPC Documentation, Release

Picard

Picard Version 1.129 URL https://github.com/broadinstitute/picard/

A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats. Picard is implemented using the HTSJDK Java libraryHTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx or qrsh command, preferably requesting at least 8 Gigabytes of memory qrsh -l mem=8G -l rmem=8G

The latest version of Picard (currently 1.129) is made available with the command module load apps/binapps/picard

Alternatively, you can load a specific version with module load apps/binapps/picard/1.129 module load apps/binapps/picard/1.101

These module commands also changes the environment to use Java 1.6 since this is required by Picard 1.129. An environment variable called PICARDHOME is created by the module command that contains the path to the requested version of Picard. Thus, you can run the program with the command java -jar $PICARDHOME/picard.jar

Installation notes Version 1.129 A binary install was used. The binary came from the releases page of the project’s github repo unzip picard-tools-1.129.zip mkdir -p /usr/local/packages6/apps/binapps/picard mv ./picard-tools-1.129 /usr/local/packages6/apps/binapps/picard/1.129

Version 1.101 A binary install was used. The binary came from the project’s sourceforge site https://sourceforge.net/projects/picard/files/picard-tools/ unzip picard-tools-1.101.zip mv ./picard-tools-1.101 /usr/local/packages6/apps/binapps/picard/1.101/

Modulefile Version 1.129 The module file is on the system at /usr/local/modulefiles/apps/binapps/picard/1.129 Its contents are

90 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

#%Module1.0##################################################################### ## ## Picard 1.129 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

#This version of Picard needs Java 1.6 module load apps/java/1.6 proc ModulesHelp { } { puts stderr "Makes Picard 1.129 available" } set version 1.129 set PICARD_DIR /usr/local/packages6/apps/binapps/picard/$version module-whatis "Makes Picard 1.129 available" prepend-path PICARDHOME $PICARD_DIR

Version 1.101 The module file is on the system at /usr/local/modulefiles/apps/binapps/picard/1.101 Its contents are #%Module1.0##################################################################### ## ## Picard 1.101 modulefile ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

#This version of Picard needs Java 1.6 module load apps/java/1.6 proc ModulesHelp { } { puts stderr "Makes Picard 1.101 available" } set version 1.101 set PICARD_DIR /usr/local/packages6/apps/binapps/picard/$version module-whatis "Makes Picard 1.101 available" prepend-path PICARDHOME $PICARD_DIR

Plink

2.2. Iceberg 91 Sheffield HPC Documentation, Release

Plink Versions 1.90 beta 3.42 Support Level Bronze Dependancies None URL https://www.cog-genomics.org/plink2

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large- scale analyses in a computationally efficient manner.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the ‘qsh or qrsh command. The latest version of Plink is made available with the command module load apps/binapps/plink

Alternatively, you can load a specific version. To access the latest version (build date 20 Sep 2016) module load apps/binapps/plink/1.90b3.42

An older build of Plink 1.9 (15 Jul 2015) is also available module load apps/binapps/plink/1.90b3v

After making a version of Plink available you can then run it using plink on the command line.

Installation notes These are primarily for administrators of the system. Both versions of Plink were installed like so $ver=1.90b3.42 # or 1.90b3v mkdir plink_build cd plink_build unzip plink_linux_x86_64.zip rm plink_linux_x86_64.zip mkdir -p /usr/local/packages6/apps/binapps/plink/$ver mv * /usr/local/packages6/apps/binapps/plink/$ver

The modulefiles are at /usr/local/modulefiles/apps/binapps/plink/$ver #%Module10.2#################################################################### #

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global ver

puts stderr " Makes Plink $ver available to the system." }

# Plink version (not in the user's environment) set ver 1.90b3.42 # or 1.90b3v

92 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module-whatis "sets the necessary Plink $ver paths"

prepend-path PATH /usr/local/packages6/apps/binapps/plink/$ver

povray

povray Version 3.7.0 URL http://www.povray.org/

The Persistence of Vision Raytracer is a high-quality, Free Software tool for creating stunning three-dimensional graphics. The source code is available for those wanting to do their own ports.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. The latest version of povray (currently 3.7) is made available with the command module load apps/gcc/5.2/povray

Alternatively, you can load a specific version with module load apps/gcc/5.2/povray/3.7

This command makes the povray binary available to your session. It also loads version 5.2 of the gcc compiler environment since gcc 5.2 was used to compile povray 3.7. You can now run povray. For example, to confirm the version loaded povray--version

and to get help povray--help

Documentation Once you have made povray available to the system using the module command above, you can read the man pages by typing man povray

Installation notes povray 3.7.0 was installed using gcc 5.2 using the following script • install_povray-3.7.sh

Testing The test suite was executed make check

All tests passed.

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/povray/3.7 • The module file is on github.

2.2. Iceberg 93 Sheffield HPC Documentation, Release

Python

Python Support Level Gold Dependencies None URL https://python.org Version All

This page documents the “miniconda” installation on iceberg. This is the recommended way of using Python on iceberg, and the best way to be able to configure custom sets of packages for your use. “conda” a Python package manager, allows you to create “environments” which are sets of packages that you can modify. It does this by installing them in your home area. This page will guide you through loading conda and then creating and modifying environments so you can install and use whatever Python packages you need.

Using conda Python After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. Conda Python can be loaded with: module load apps/python/conda

The root conda environment (the default) provides Python 3 and no extra modules, it is automatically updated, and not recommended for general use, just as a base for your own environments. There is also a python2 environment, which is the same but with a Python 2 installation.

Quickly Loading Anaconda Environments There are a small number of environments provided for everyone to use, these are the default root and python2 environments as well as various versions of Anaconda for Python 3 and Python 2. The anaconda environments can be loaded through provided module files: module load apps/python/anaconda2-2.4.0 module load apps/python/anaconda3-2.4.0 module load apps/python/anaconda3-2.5.0

Where anaconda2 represents Python 2 installations and anaconda3 represents Python 3 installations. These commands will also load the apps/python/conda module and then activate the anaconda environment specified.

Note: Anaconda 2.5.0 is compiled with Intel MKL libraries which should result in higher numerical performance.

Using conda Environments Once the conda module is loaded you have to load or create the desired conda environ- ments. For the documentation on conda environments see the conda documentation. You can load a conda environment with: source activate python2

where python2 is the name of the environment, and unload one with: source deactivate

94 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

which will return you to the root environment. It is possible to list all the available environments with: conda env list

Provided system-wide are a set of anaconda environments, these will be installed with the anaconda version number in the environment name, and never modified. They will therefore provide a static base for derivative environments or for using directly.

Creating an Environment Every user can create their own environments, and packages shared with the system- wide environments will not be reinstalled or copied to your file store, they will be symlinked, this reduces the space you need in your /home directory to install many different Python environments. To create a clean environment with just Python 2 and numpy you can run: conda create -n mynumpy python=2.7 numpy

This will download the latest release of Python 2.7 and numpy, and create an environment named mynumpy. Any version of Python or list of packages can be provided: conda create -n myscience python=3.5 numpy=1.8.1 scipy

If you wish to modify an existing environment, such as one of the anaconda installations, you can clone that envi- ronment: conda create --clone anaconda3-2.3.0 -n myexperiment

This will create an environment called myexperiment which has all the anaconda 2.3.0 packages installed with Python 3.

Installing Packages Inside an Environment Once you have created your own environment you can install addi- tional packages or different versions of packages into it. There are two methods for doing this, conda and pip, if a package is available through conda it is strongly recommended that you use conda to install packages. You can search for packages using conda: conda search pandas

then install the package using: conda install pandas

if you are not in your environment you will get a permission denied error when trying to install packages, if this happens, create or activate an environment you own. If a package is not available through conda you can search for and install it using pip: pip search colormath

pip install colormath

Previous Anaconda Installation There is a legacy anaconda installation which is accessible through the binapps/anacondapython/2.3 module. This module should be considered deprecated and should no longer be used.

2.2. Iceberg 95 Sheffield HPC Documentation, Release

Using Python with MPI There is an experimental set of packages for conda that have been compiled by the iceberg team, which allow you to use a MPI stack entirely managed by conda. This allows you to easily create complex evironments and use MPI without worrying about other modules or system libraries. To get access to these packages you need to run the following command to add the repo to your conda config: conda config --add channels file:///usr/local/packages6/conda/conda-bld/

you should then be able to install the packages with the openmpi feature, which currently include openmpi, hdf5, mpi4py and h5py: conda create -n mpi python=3.5 openmpi mpi4py

Currently, there are Python 2.7, 3.4 and 3.5 versions of mpi4py and h5py compiled in this repository. The build scripts for these packages can be found in this GitHub repository.

Installation Notes These are primarily for administrators of the system. The conda package manager is installed in /usr/share/packages6/conda, it was installed using the miniconda installer. The two “root” environments root and python2 can be updated using the update script located in /usr/local/packages6/conda/_envronments/conda-autoupdate.sh. This should be run regu- larly to keep this base environments upto date with Python, and more importantly with the conda package manager itself.

Installing a New Version of Anaconda Perform the following: $ cd /usr/local/packages6/conda/_envronments/ $ cp anaconda2-2.3.0.yml anaconda2-x.y.z.yml

then edit that file modifying the environment name and the anaconda version under requirements then run: $ conda env create -f anaconda2-x.y.z.yml

then repeat for the Python 3 installation. Then copy the modulefile for the previous version of anaconda to the new version and update the name of the environ- ment. Also you will need to append the new module to the conflict line in apps/python/.conda-environments.tcl.

R

R Dependencies BLAS URL http://www.r-project.org/ Documentation http://www.r-project.org/

R is a statistical computing language.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. The latest version of R can be loaded with

96 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load apps/R

Alternatively, you can load a specific version of R using one of the following module load apps/R/3.3.1 module load apps/R/3.3.0 module load apps/R/3.2.4 module load apps/R/3.2.3 module load apps/R/3.2.2 module load apps/R/3.2.1 module load apps/R/3.2.0 module load apps/R/3.1.2

R can then be run with $ R

Serial (one CPU) Batch usage Here, we assume that you wish to run the program my_code.R on the system. With batch usage it is recommended to load a specific version of R, for example module load apps/R/3.2.4, to ensure the expected output is achieved. First, you need to write a batch submission file. We assume you’ll call this my_job.sge #!/bin/bash #$ -cwd # Run job from current directory #$ -l rmem=4G # Request 4 gigabytes of memory

module load apps/R/3.3.1 # Recommended to load a specific version of R

R CMD BATCH my_code.r my_code.r.o$JOB_ID

Note that R must be called with both the CMD and BATCH options which tell it to run an R program, in this case my_code.R. If you do not do this, R will attempt to open an interactive prompt. The final argument, my_code.R.o$JOBID, tells R to send output to a file with this name. Since $JOBID will always be unique, this ensures that all of your output files are unique. Without this argument R sends all output to a file called my_code.Rout. Ensuring that my_code.R and my_job.sge are both in your current working directory, submit your job to the batch system qsub my_job.sge

Replace my_job.sge with the name of your submission script.

Graphical output By default, graphical output from batch jobs is sent to a file called Rplots.pdf

Installing additional packages As you will not have permissions to install packages to the default folder, additional R packages can be installed to your home folder ~/. To create the appropriate folder, install your first package in R in interactive mode. Load an interactive R session as described above, and install a package with install.packages()

You will be prompted to create a personal package library. Choose yes. The package will download and install from a CRAN mirror (you may be asked to select a nearby mirror, which you can do simply by entering the number of your preferred mirror).

2.2. Iceberg 97 Sheffield HPC Documentation, Release

Once the chosen package has been installed, additional packages can be installed either in the same way, or by creating a .R script. An example script might look like install.packages("dplyr") install.packages("devtools")

Call this using source(). For example if your script is called packages.R and is stored in your home folder, source this from an interactive R session with source("~/packages.R")

These additional packages will be installed without prompting to your personal package library. To check your packages are up to date, and update them if necessary, run the following line from an R interactive session update.packages(lib.loc="~/R/x86_64-unknown-linux-gnu-library/3.3/")

The folder name after ~/R/ will likely change, but this can be completed with tab autocompletion from the R session. Ensure lib.loc folder is specified, or R will attempt to update the wrong library.

R Packages that require external libraries Some R packages require external libraries to be installed before you can install and use them. Since there are so many, we only install those libraries that have been explicitly requested by users of the system. The associated R packages are not included in the system install of R, so you will need to install them yourself to your home directory following the instructions linked to below. • geos This is the library required for the rgeos package. • JAGS This is the library required for the rjags and runjags packages

Using the Rmath library in C Programs The Rmath library allows you to access some of R’s functionality from a C program. For example, consider the C-program below #include #define MATHLIB_STANDALONE #include "Rmath.h"

main(){ double shape1,shape2,prob;

shape1 = 1.0; shape2 = 2.0; prob = 0.5;

printf("Critical value is %lf\n",qbeta(prob,shape1,shape2,1,0)); }

This makes use of R’s qbeta function. You can compile and run this on a worker node as follows. Start a session on a worker node with qrsh or qsh and load the R module module load apps/R/3.3.0

Assuming the program is called test_rmath.c, compile with gcc test_rmath.c -lRmath -lm -o test_rmath

98 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

For full details about the functions made available by the Rmath library, see section 6.7 of the document Writing R extensions

Accelerated version of R There is an experimental, accelerated version of R installed on Iceberg that makes use of the Intel compilers and the Intel MKL. See R (Intel Build) for details.

Installation Notes These notes are primarily for administrators of the system. version 3.3.1 • What’s new in R version 3.3.1 This was a scripted install. It was compiled from source with gcc 4.4.7 and with –enable-R-shlib enabled. It was run in batch mode. This build required several external modules including xz utils, curl, bzip2 and zlib • install_R_3.3.1.sh Downloads, compiles, tests and installs R 3.3.1 and the Rmath library. • R 3.3.1 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.3.1 • Install log-files, including the output of the make check tests are available on the system at /usr/local/packages6/R/3.3.1/install_logs version 3.3.0 • What’s new in R version 3.3.0 This was a scripted install. It was compiled from source with gcc 4.4.7 and with –enable-R-shlib enabled. You will need a large memory qrshx session in order to successfully run the build script. I used qrshx -l rmem=8G -l mem=8G This build required several external modules including xz utils, curl, bzip2 and zlib • install_R_3.3.0.sh Downloads, compiles, tests and installs R 3.3.0 and the Rmath library. • R 3.3.0 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.3.0 • Install log-files, including the output of the make check tests are available on the system at /usr/local/packages6/R/3.3.0/install_logs Version 3.2.4 • What’s new in R version 3.2.4 This was a scripted install. It was compiled from source with gcc 4.4.7 and with –enable-R-shlib enabled. You will need a large memory qrshx session in order to successfully run the build script. I used qrshx -l rmem=8G -l mem=8G This build made use of new versions of xz utils and curl • install_R_3.2.4.sh Downloads, compiles, tests and installs R 3.2.4 and the Rmath library. • R 3.2.4 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.4 • Install log-files, including the output of the make check tests are available on the system at /usr/local/packages6/R/3.2.4/install_logs Version 3.2.3 • What’s new in R version 3.2.3 This was a scripted install. It was compiled from source with gcc 4.4.7 and with --enable-R-shlib enabled. You will need a large memory qrsh session in order to successfully run the build script. I used qrsh -l rmem=8G -l mem=16G • install_R_3.2.3.sh Downloads, compiles, tests and installs R 3.2.3 and the Rmath library.

2.2. Iceberg 99 Sheffield HPC Documentation, Release

• R 3.2.3 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.3 • Install log-files, including the output of the make check tests are available on the system at /usr/local/packages6/R/3.2.3/install_logs Version 3.2.2 • What’s new in R version 3.2.2 This was a scripted install. It was compiled from source with gcc 4.4.7 and with --enable-R-shlib enabled. You will need a large memory qrsh session in order to successfully run the build script. I used qrsh -l rmem=8G -l mem=16G • install_R_3.2.2.sh Downloads, compiles and installs R 3.2.2 and the Rmath library. • R 3.2.2 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.2 • Install log-files were manually copied to /usr/local/packages6/R/3.2.2/install_logs on the system. This step should be included in the next version of the install script. Version 3.2.1 This was a manual install. It was compiled from source with gcc 4.4.7 and with --enable-R-shlib enabled. • Install notes • R 3.2.1 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.1 Older versions Install notes for older versions of R are not available. relion

relion Versions 1.4 URL http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page

RELION is a software package that performs an empirical Bayesian approach to (cryo-EM) structure de-termination by single-particle analysis. Note that RELION is distributed under a GPL license.

Making RELION available The following module command makes the latest version of gemini available to your session module load apps/gcc/4.4.7/relion

Alternatively, you can make a specific version available module load apps/gcc/4.4.7/relion/1.4

Installation notes These are primarily for system administrators. Install instructions: http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install • RELION was installed using the gcc 4.4.7 compiler and Openmpi 1.8.8 • install_relion.sh

100 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

• Note - the environment variable RELION_QSUB_TEMPLATE points to an SGE qsub template, which needs customizing to work with our environment

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/relion/1.4 The contents of the module file is #%Module1.0##################################################################### ## ## relion module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global bedtools-version

puts stderr " Setups `relion-$relionversion' environment variables" } set relionversion 1.4 module load apps/gcc/4.4.7/ctffind/3.140609 module load apps/binapps/resmap/1.1.4 module load mpi/gcc/openmpi/1.8.8 prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/relion/1.4/bin prepend-path LD_LIBRARY_PATH /usr/local/packages6/apps/gcc/4.4.7/relion/1.4/lib setenv RELION_QSUB_TEMPLATE /usr/local/packages6/apps/gcc/4.4.7/relion/1.4/bin/relion_qsub.csh setenv RELION_CTFFIND_EXECUTABLE ctffind3_mp.exe setenv RELION_RESMAP_EXECUTABLE /usr/local/packages6/apps/binapps/resmap/1.1.4

ResMap

ResMap Versions 1.1.4 URL http://resmap.sourceforge.net/

ResMap (Resolution Map) is a software package for computing the local resolution of 3D density maps studied in structural biology, primarily electron cryo-microscopy.

Making ResMap available The following module command makes the latest version of gemini available to your session module load apps/binapps/resmap

Alternatively, you can make a specific version available module load apps/binapps/resmap/1.1.4

2.2. Iceberg 101 Sheffield HPC Documentation, Release

Installation notes These are primarily for system administrators. • ResMap was installed as a precompiled binary from https://sourceforge.net/projects/resmap/files/ResMap-1.1.4- linux64/download

Modulefile The module file is on the system at /usr/local/modulefiles/apps/binapps/1.1.4 The contents of the module file is #%Module1.0##################################################################### ## ## ResMap module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { global bedtools-version

puts stderr " Adds `ResMap-$resmapversion' to your PATH environment variable" } set resmapversion 1.1.4 prepend-path PATH /usr/local/packages6/apps/binapps/resmap/$resmapversion/

Samtools

Samtools Versions 1.3.1 1.2 URL http://samtools.sourceforge.net/

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh command. Next, load a specific version of Samtools with one of the following: module load apps/gcc/6.2/samtools/1.3.1 module load apps/gcc/5.2/samtools/1.2

This command makes the Samtools binary directory available to your session.

Documentation Once you have made Samtools available to the system using the module command above, you can read the man pages by typing man samtools

102 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Installation notes This section is primarily for system administrators. Version 1.3.1 This install script: 1. Built Samtools plus the bundled HTSlib, HTSlib utilities such as bgzip plus various useful plugins. Compiled using GCC 6.2. 2. Ran all tests using make tests; a summary of the results is shown below; for full results see /usr/local/packages6/apps/gcc/6.2/samtools/1.3.1/tests.log

Number of tests: total .. 423 passed .. 403 failed .. 0 expected failure .. 20 unexpected pass .. 0

test/merge/test_bam_translate test/merge/test_bam_translate.tmp test/merge/test_rtrans_build test/merge/test_trans_tbl_init cd test/mpileup && ./regression.sh mpileup.reg

=== Testing mpileup.reg regressions ===

Expected passes: 124 Unexpected passes: 0 Expected failures: 0 Unexpected failures: 0 => PASS cd test/mpileup && ./regression.sh depth.reg

=== Testing depth.reg regressions ===

Expected passes: 14 Unexpected passes: 0 Expected failures: 0 Unexpected failures: 0 => PASS

3. Installed Samtools to /usr/local/packages6/apps/gcc/6.2/samtools/1.3.1 Next, this modulefile was installed as /usr/local/modulefiles/apps/gcc/6.2/samtools/1.3.1 Version 1.2 Installed using gcc 5.2 module load compilers/gcc/5.2 tar -xvjf ./samtools-1.2.tar.bz2 cd samtools-1.2 mkdir -p /usr/local/packages6/apps/gcc/5.2/samtools/1.2 make prefix=/usr/local/packages6/apps/gcc/5.2/samtools/1.2 make prefix=/usr/local/packages6/apps/gcc/5.2/samtools/1.2 install #tabix and bgzip are not installed by the above procedure. #We can get them by doing the following cd htslib-1.2.1/ make

2.2. Iceberg 103 Sheffield HPC Documentation, Release

mv ./tabix /usr/local/packages6/apps/gcc/5.2/samtools/1.2/bin/ mv ./bgzip /usr/local/packages6/apps/gcc/5.2/samtools/1.2/bin/

The test suite was run with make test 2>&1 | tee make_tests.log

The summary of the test output was Test output: Number of tests: total .. 368 passed .. 336 failed .. 0 expected failure .. 32 unexpected pass .. 0 test/merge/test_bam_translate test/merge/test_bam_translate.tmp test/merge/test_pretty_header test/merge/test_rtrans_build test/merge/test_trans_tbl_init cd test/mpileup && ./regression.sh Samtools mpileup tests:

EXPECTED FAIL: Task failed, but expected to fail; when running $samtools mpileup -x -d 8500 -B -f mpileup.ref.fa deep.sam|awk '{print $4}'

Expected passes: 123 Unexpected passes: 0 Expected failures: 1 Unexpected failures: 0

The full log is on the system at /usr/local/packages6/apps/gcc/5.2/samtools/1.2/make_tests.log This modulefile was installed as /usr/local/modulefiles/apps/gcc/5.2/samtools/1.2 spark

Spark Version 2.0 URL http://spark.apache.org/

Apache Spark is a fast and general engine for large-scale data processing.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh or qrshx command. To make Spark available, execute the following module command module load apps/gcc/4.4.7/spark/2.0

Installation notes Spark was built using the system gcc 4.4.7

104 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

tar -xvzf ./spark-2.0.0.tgz cd spark-2.0.0 build/mvn -DskipTests clean package mkdir /usr/local/packages6/apps/gcc/4.4.7/spark mv spark-2.0.0 /usr/local/packages6/apps/gcc/4.4.7/spark/

Modulefile Version 2.0 • The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/spark/2.0 Its contents are #%Module1.0##################################################################### ## ## Spark module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

# Use only one core. User can override this if they want setenv MASTER local\[1\] prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/spark/spark-2.0.0/bin

SU2

SU2 Version 4.1.0 Dependancies gcc/4.4.7 mpi/gcc/openmpi/1.10.1 URL http://su2.stanford.edu/ Documentation https://github.com/su2code/SU2/wiki

The SU2 suite is an open-source collection of C++ based software tools for performing Partial Differential Equation (PDE) analysis and solving PDE constrained optimization problems.

Usage SU2 can be activated using the module file: module load apps/gcc/4.4.7/su2

Installation notes Su2 was compiled using the install_su2.sh script. SU2 also has it’s own version of CGNS which was compiled with the script install_cgns.sh. The module file is 4.1.0.

Tophat

2.2. Iceberg 105 Sheffield HPC Documentation, Release

Tophat Versions 2.1.0 URL https://ccb.jhu.edu/software/tophat/index.shtml

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with either the qsh or qrsh command. The latest version of tophat (currently 2.1.0) is made available with the command: module apps/gcc/4.8.2/tophat

Alternatively, you can load a specific version with module load apps/gcc/4.8.2/tophat/2.1.0

This command makes the tophat binary available to your session.

Installation notes Tophat 2.1.0 was installed using gcc 4.8.2. Installs were attempted using gcc 5.2 and gcc 4.4.7 but both failed (see this issue on github) This install has dependencies on the following • GNU Compiler Collection (gcc) 4.8.2 • Boost C++ Library 1.58 • Bowtie2 (not needed at install time but is needed at runtime) Install details module load compilers/gcc/4.8.2 module load libs/gcc/4.8.2/boost/1.58 mkdir -p /usr/local/packages6/apps/gcc/4.8.2/tophat/2.1.0 tar -xvzf ./tophat-2.1.0.tar.gz cd tophat-2.1.0 ./configure --with-boost=/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0/ --prefix=/usr/local/packages6/apps/gcc/4.8.2/tophat/2.1.0

Configuration results -- tophat 2.1.0 Configuration Results -- C++ compiler: g++ -Wall -Wno-strict-aliasing -g -gdwarf-2 -Wuninitialized -O3 -DNDEBUG -I./samtools-0.1.18 -pthread -I/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0//include -I./SeqAn-1.3 Linker flags: -L./samtools-0.1.18 -L/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0//lib BOOST libraries: -lboost_thread -lboost_system GCC version: gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15) Host System type: x86_64-unknown-linux-gnu Install prefix: /usr/local/packages6/apps/gcc/4.8.2/tophat/2.1.0 Install eprefix: ${prefix}

Built with

106 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

make make install

Testing A test script was executed based on the documentation on the tophat website (retrieved 29th October 2015). It only proves that the code can run without error, not that the result is correct. • tophat_test.sh

Modulefile • The module file is on the system at /usr/local/modulefiles/apps/gcc/4.8.2/tophat/2.1.0 • The module file is on github.

VCFtools

VCFtools Version 0.1.14 Dependancies gcc/5.2 URL https://vcftools.github.io/ Documentation https://vcftools.github.io/examples.html

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

Usage VCFtools can be activated using the module file: module load apps/gcc/5.2/vcftools/0.1.14

This will both add the binary files to the $PATH and the Perl module to $PERL5LIB.

Installation notes VCFtools was compiled using the install_vcftools.sh script, the module file is 0.1.14.

Velvet

Velvet Version 1.2.10 URL https://www.ebi.ac.uk/~zerbino/velvet/

Sequence assembler for very short reads.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshx or qrsh command. To add the velvet binaries to the system PATH, execute the following command

2.2. Iceberg 107 Sheffield HPC Documentation, Release

module load apps/gcc/4.4.7/velvet/1.2.10

This makes two programs available:- • velvetg - de Bruijn graph construction, error removal and repeat resolution • velveth - simple hashing program

Example submission script If the command you want to run is velvetg /fastdata/foo1bar/velvet/assembly_31 - exp_cov auto -cov_cutoff auto, here is an example submission script that requests 60Gig memory #!/bin/bash #$ -l mem=60G #$ -l rmem=60G

module load apps/gcc/4.4.7/velvet/1.2.10

velvetg /fastdata/foo1bar/velvet/assembly_31 -exp_cov auto -cov_cutoff auto

Put the above into a text file called submit_velvet.sh and submit it to the queue with the command qsub submit_velvet.sh

Velvet Performance Velvet has got multicore capabilities but these have not been compiled in our ver- sion. This is because there is evidence that the performance is better running in serial than in parallel. See http://www.ehu.eus/ehusfera/hpc/2012/06/20/benchmarking-genetic-assemblers-abyss-vs-velvet/ for details.

Installation notes Velvet was compiled with gcc 4.4.7 tar -xvzf ./velvet_1.2.10.tgz cd velvet_1.2.10 make

mkdir -p /usr/local/packages6/apps/gcc/4.4.7/velvet/1.2.10 mv * /usr/local/packages6/apps/gcc/4.4.7/velvet/1.2.10

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/velvet/1.2.10 Its contents are #%Module1.0##################################################################### ## ## velvet 1.2.10 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

proc ModulesHelp { } { puts stderr "Makes velvet 1.2.10 available" }

set VELVET_DIR /usr/local/packages6/apps/gcc/4.4.7/velvet/1.2.10

module-whatis "Makes velevt 1.2.10 available"

prepend-path PATH $VELVET_DIR

108 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

xz utils

xz utils Latest version 5.2.2 URL http://tukaani.org/xz/

XZ Utils is free general-purpose data compression software with a high compression ratio. XZ Utils were written for POSIX-like systems, but also work on some not-so-POSIX systems. XZ Utils are the successor to LZMA Utils. The core of the XZ Utils compression code is based on LZMA SDK, but it has been modified quite a lot to be suitable for XZ Utils. The primary compression algorithm is currently LZMA2, which is used inside the .xz container format. With typical files, XZ Utils create 30 % smaller output than gzip and 15 % smaller output than bzip2. XZ Utils consist of several components: • liblzma is a compression library with an API similar to that of zlib. • xz is a command line tool with syntax similar to that of gzip. • xzdec is a decompression-only tool smaller than the full-featured xz tool. • A set of shell scripts (xzgrep, xzdiff, etc.) have been adapted from gzip to ease viewing, grepping, and comparing compressed files. • Emulation of command line tools of LZMA Utils eases transition from LZMA Utils to XZ Utils. While liblzma has a zlib-like API, liblzma doesn’t include any file I/O functions. A separate I/O library is planned, which would abstract handling of .gz, .bz2, and .xz files with an easy to use API.

Usage There is an old version of xz utils available on the system by default. We can see its version with xz--version which gives xz (XZ Utils) 4.999.9beta liblzma 4.999.9beta

Version 4.999.9beta of xzutils was released in 2009. To make version 5.2.2 (released in 2015) available, run the following module command module load apps/gcc/4.4.7/xzutils/5.2.2

Documentation Standard man pages are available. The documentation version you get depends on wether or not you’ve loaded the module. man xz

Installation notes This section is primarily for administrators of the system. xz utils 5.2.2 was compiled with gcc 4.4.7 tar -xvzf ./xz-5.2.2.tar.gz cd xz-5.2.2 mkdir -p /usr/local/packages6/apps/gcc/4.4.7/xzutils/5.2.2 ./configure --prefix=/usr/local/packages6/apps/gcc/4.4.7/xzutils/5.2.2

2.2. Iceberg 109 Sheffield HPC Documentation, Release

make make install

Testing was performed with make check

Final piece of output was ======All 9 tests passed ======

Module file Modulefile is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/xzutils/5.2.2 #%Module1.0##################################################################### ## ## xzutils 5.2.2 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes xzutils 5.2.2 available" } set XZUTILS_DIR /usr/local/packages6/apps/gcc/4.4.7/xzutils/5.2.2 module-whatis "Makes xzutils 5.2.2 available" prepend-path PATH $XZUTILS_DIR/bin prepend-path LD_LIBRARY_PATH $XZUTILS_DIR/lib prepend-path CPLUS_INCLUDE_PATH $XZUTILS_DIR/include prepend-path CPATH $XZUTILS_DIR/include prepend-path LIBRARY_PATH $XZUTILS_DIR/lib prepend-path MANPATH $XZUTILS_DIR/share/man/

Libraries bclr

bclr Latest version UNKNOWN URL http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/

Future Technologies Group researchers are developing a hybrid kernel/user implementation of checkpoint/restart. Their goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be made to application code. This work focuses on checkpointing parallel applications that communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable Sys- tems Software ISIC. This work is broken down into 4 main areas:

110 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Usage BCLR was installed before we started using the module system. As such, it is currently always available to worker nodes. It should be considered experimental. The checkpointing is performed at the kernel level, so any batch code should be checkpointable without modification (it may not work with our MPI environment though... although it should cope with SMP codes). To run a code, use cr_run ./executable

To checkpoint a process with process id PID cr_checkpoint -f checkpoint.file PID

Use the –term flag if you want to checkpoint and kill the process To restart the process from a checkpoint cr_restart checkpoint.file

Using with SGE in batch A checkpoint environment has been setup called BLCR An example of a checkpointing job would look something like #!/bin/bash #$ -l h_rt=168:00:00 #$ -c sx #$ -ckpt blcr cr_run ./executable >> output.file

The -c sx options tells SGE to checkpoint if the queue is suspended, or if the execution daemon is killed. You can also specify checkpoints to occur after a given time period. A checkpoint file will be produced before the job is terminated. This file will be called checkpoint.[jobid].[pid]. This file will contain the complete in-memory state of your program at the time that it terminates, so make sure that you have enough disk space to save the file. To resume a checkpointed job submit a job file which looks like #!/bin/bash #$ -l h_rt=168:00:00 #$ -c sx #$ -ckpt blcr cr_restart [name of checkpoint file]

Installation notes Installation notes are not available

beagle

beagle Latest version 2.1.2 URL https://github.com/beagle-dev/beagle-lib Location /usr/local/packages6/libs/gcc/4.4.7/beagle/2.1.2/

General purpose library for evaluating the likelihood of sequence evolution on trees

2.2. Iceberg 111 Sheffield HPC Documentation, Release

Usage To make this library available, run the following module command module load libs/gcc/4.4.7/beagle/2.1.2

This populates the environment variables C_INCLUDE_PATH, LD_LIBRARY_PATH and LD_RUN_PATH with the relevant directories.

Installation notes This section is primarily for administrators of the system. Beagle 2.1.2 was compiled with gcc 4.4.7 • Install script - ‘_ • Module file - ‘_

Boost C++ Library

Boost C++ Library Latest version 1.58 URL www.boost.org

Boost provides free, peer-reviewed and portable C++ source libraries.

Usage On Iceberg, different versions of Boost were built using different versions of gcc. We suggest that you use the matching version of gcc to build your code. The latest version of Boost, version 1.59, was built using gcc 5.2. To make both the compiler and Boost library available to the system, execute the following module commands while in a qrsh or qsh session module load compilers/gcc/5.2 module load libs/gcc/5.2/boost/1.59

Boost version 1.58 was built using gcc 4.8.2. To make both the compiler and Boost library available to the system, execute the following module commands while in a qrsh or qsh session module load compilers/gcc/4.8.2 module load libs/gcc/4.8.2/boost/1.58

Version 1.41 of Boost uses version 4.4.7 of the gcc compiler. Since this is the default version of gcc on the system, you only need to load the module for the library module load libs/gcc/4.4.7/boost/1.41

Build a simple program using Boost Many boost libraries are header-only which makes them partic- ularly simple to compile. The following program reads a sequence of integers from standard input, uses Boost.Lambda to multiply each number by three, and writes them to standard output (taken from http://www.boost.org/doc/libs/1_58_0/more/getting_started/unix-variants.html): #include #include #include #include int main()

112 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

{ using namespace boost::lambda; typedef std::istream_iterator in;

std::for_each( in(std::cin), in(), std::cout<< (_1 * 3)<<""); }

Copy this into a file called example1.cpp and compile with g++ example1.cpp -o example Provided you loaded the correct modules given above, the program should compile without error.

Linking to a Boost library The following program is taken from the official Boost documentation http://www.boost.org/doc/libs/1_58_0/more/getting_started/unix-variants.html #include #include #include

int main() { std::string line; boost::regex pat("^Subject: (Re: |Aw: ) *(.*)");

while (std::cin) { std::getline(std::cin, line); boost::smatch matches; if (boost::regex_match(line, matches, pat)) std::cout<< matches[2]<< std::endl; } }

This program makes use of the Boost.Regex library, which has a separately-compiled binary component we need to link to. Assuming that the above program is called example2.cpp, compile with the following command g++ example2.cpp -o example2 -lboost_regex If you get an error message that looks like this: example2.cpp:1:27: error: boost/regex.hpp: No such file or directory the most likely cause is that you forgot to load the correct modules as detailed above.

Installation Notes This section is primarily for administrators of the system version 1.59: Compiled with gcc 5.2 and icu version 55 module load compilers/gcc/5.2 module load libs/gcc/4.8.2/libunistring/0.9.5 module load libs/gcc/4.8.2/icu/55

mkdir -p /usr/local/packages6/libs/gcc/5.2/boost/1.59.0/ tar -xvzf ./boost_1_59_0.tar.gz cd boost_1_59_0 ./bootstrap.sh --prefix=/usr/local/packages6/libs/gcc/5.2/boost/1.59.0/

2.2. Iceberg 113 Sheffield HPC Documentation, Release

It complained that it could not find the icu library but when I ran ./b2 install --prefix=/usr/local/packages6/libs/gcc/5.2/boost/1.59.0/

It said that it had detected the icu library and was compiling it in Version 1.58: Compiled with gcc 4.8.2 and icu version 55 module load compilers/gcc/4.8.2 module load libs/gcc/4.8.2/libunistring/0.9.5 module load libs/gcc/4.8.2/icu/55 tar -xvzf ./boost_1_58_0.tar.gz cd boost_1_58_0 ./bootstrap.sh --prefix=/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0/

It complained that it could not find the icu library but when I ran ./b2 install --prefix=/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0

It said that it had detected the icu library and was compiling it in Version 1.41: This build of boost was built with gcc 4.4.7 and ICU version 42 module load libs/gcc/4.4.7/icu/42 tar -xvzf ./boost_1_41_0.tar.gz cd boost_1_41_0 ./bootstrap.sh --prefix=/usr/local/packages6/libs/gcc/4.4.7/boost/1.41 ./bjam -sICU_PATH=/usr/local/packages6/libs/gcc/4.4.7/icu/42 install

Testing The two examples above were compiled and run.

Module Files Version 1.59 Module file location: /usr/local/modulefiles/libs/gcc/5.2/boost/1.59 #%Module1.0##################################################################### ## ## boost 1.59 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## module load libs/gcc/4.8.2/libunistring/0.9.5 module load libs/gcc/4.8.2/icu/55 proc ModulesHelp { } { puts stderr "Makes the Boost 1.59 library available" } set BOOST_DIR /usr/local/packages6/libs/gcc/5.2/boost/1.59.0 module-whatis "Makes the Boost 1.59 library available" prepend-path LD_LIBRARY_PATH $BOOST_DIR/lib prepend-path CPLUS_INCLUDE_PATH $BOOST_DIR/include prepend-path LIBRARY_PATH $BOOST_DIR/lib

114 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Version 1.58 Module file location: /usr/local/modulefiles/libs/gcc/4.8.2/boost/1.58 #%Module1.0##################################################################### ## ## boost 1.58 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## module load libs/gcc/4.8.2/libunistring/0.9.5 module load libs/gcc/4.8.2/icu/55 proc ModulesHelp { } { puts stderr "Makes the Boost 1.58 library available" } set BOOST_DIR /usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0 module-whatis "Makes the Boost 1.58 library available" prepend-path LD_LIBRARY_PATH $BOOST_DIR/lib prepend-path CPLUS_INCLUDE_PATH $BOOST_DIR/include prepend-path LIBRARY_PATH $BOOST_DIR/lib

Version 1.41 The module file is on the system at /usr/local/modulefiles/libs/gcc/4.4.7/boost/1.41 #%Module1.0##################################################################### ## ## Boost 1.41 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## module load libs/gcc/4.4.7/icu/42 proc ModulesHelp { } { puts stderr "Makes the Boost 1.41 library available" } set BOOST_DIR /usr/local/packages6/libs/gcc/4.4.7/boost/1.41 module-whatis "Makes the Boost 1.41 library available" prepend-path LD_LIBRARY_PATH $BOOST_DIR/lib prepend-path CPLUS_INCLUDE_PATH $BOOST_DIR/include prepend-path LIBRARY_PATH $BOOST_DIR/lib

2.2. Iceberg 115 Sheffield HPC Documentation, Release

bzip2 bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

bzip2 Version 1.0.6 URL http://www.bzip.org/ Location /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6

Usage The default version of bzip2 on the system is version 1.0.5. If you need a newer version run the following module command module load libs/gcc/4.4.7/bzip2/1.0.6

Check the version number that’s available using bzip2--version

Documentation Standard man pages are available man bzip2

Installation notes This section is primarily for administrators of the system. It was built with gcc 4.4.7 wget http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz mkdir -p /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6 tar -xvzf ./bzip2-1.0.6.tar.gz cd bzip2-1.0.6 make -f Makefile-libbz2_so make make install PREFIX=/usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6 mv *.so* /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6/lib/ testing The library is automatically tested when you do a make. The results were Doing 6 tests (3 compress, 3 uncompress) ... If there's a problem, things might stop at this point.

./bzip2 -1 < sample1.ref > sample1.rb2 ./bzip2 -2 < sample2.ref > sample2.rb2 ./bzip2 -3 < sample3.ref > sample3.rb2 ./bzip2 -d < sample1.bz2 > sample1.tst ./bzip2 -d < sample2.bz2 > sample2.tst ./bzip2 -ds < sample3.bz2 > sample3.tst cmp sample1.bz2 sample1.rb2 cmp sample2.bz2 sample2.rb2 cmp sample3.bz2 sample3.rb2

116 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

cmp sample1.tst sample1.ref cmp sample2.tst sample2.ref cmp sample3.tst sample3.ref

If you got this far and the 'cmp's didn't complain, it looks like you're in business.

Module File Module location is /usr/local/modulefiles/libs/gcc/4.4.7/bzip2/1.0.6. Module contents #%Module1.0##################################################################### ## ## bzip2 1.0.6 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the bzip 1.0.6 library available" } module-whatis "Makes the bzip 1.0.6 library available" set BZIP_DIR /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6 prepend-path LD_LIBRARY_PATH $BZIP_DIR/lib prepend-path CPATH $BZIP_DIR/include prepend-path MANPATH $BZIP_DIR/man prepend-path PATH $BZIP_DIR/bin

cfitsio

cfitsio Latest version 3.380 URL http://heasarc.gsfc.nasa.gov/fitsio/fitsio.html

CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format. CFITSIO provides simple high-level routines for reading and writing FITS files that insulate the programmer from the internal complexities of the FITS format.

Usage To make this library available, run the following module command module load libs/gcc/5.2/cfitsio

The modulefile creates a variable $CFITSIO_INCLUDE_PATH which is the path to the include directory.

Installation notes This section is primarily for administrators of the system. CFITSIO 3.380 was compiled with gcc 5.2. The compilation used this script and it is loaded with this modulefile.

2.2. Iceberg 117 Sheffield HPC Documentation, Release

cgns

cgns Version 3.2.1 Support Level Bronze Dependancies libs/hdf5/gcc/openmpi/1.8.14 URL http://cgns.github.io/WhatIsCGNS.html Location /usr/local/packages6/libs/gcc/4.4.7/cgnslib

The CFD General Notation System (CGNS) provides a general, portable, and extensible standard for the storage and retrieval of computational fluid dynamics (CFD) analysis data.

Usage To make this library available, run the following module command module load libs/gcc/4.4.7/cgns/3.2.1

This will also load the module files for the prerequisite libraries, Open MPI 1.8.3 and HDF5 1.8.14 with parallel support.

Installing This section is primarily for administrators of the system. • This is a prerequisite for Code Saturne version 4.0. • It was built with gcc 4.4.7, openmpi 1.8.3 and hdf 1.8.14 module load libs/hdf5/gcc/openmpi/1.8.14 tar -xvzf cgnslib_3.2.1.tar.gz mkdir /usr/local/packages6/libs/gcc/4.4.7/cgnslib cd /usr/local/packages6/libs/gcc/4.4.7/cgnslib mkdir 3.2.1 cd 3.2.1 cmake ~/cgnslib_3.2.1/ ccmake .

Configured the following using ccmake CGNS_ENABLE_PARALLEL ON MPIEXEC /usr/local/mpi/gcc/openmpi/1.8.3/bin/mpiexec MPI_COMPILER /usr/local/mpi/gcc/openmpi/1.8.3/bin/mpic++ MPI_EXTRA_LIBRARY /usr/local/mpi/gcc/openmpi/1.8.3/lib/libmpi.s MPI_INCLUDE_PATH /usr/local/mpi/gcc/openmpi/1.8.3/include MPI_LIBRARY /usr/local/mpi/gcc/openmpi/1.8.3/lib/libmpi_c ZLIB_LIBRARY /usr/lib64/libz.so

FORTRAN_NAMING LOWERCASE_ HDF5_INCLUDE_PATH /usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/include/ HDF5_LIBRARY /usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/lib/libhdf5.so HDF5_NEED_MPI ON HDF5_NEED_SZIP OFF HDF5_NEED_ZLIB ON CGNS_BUILD_CGNSTOOLS OFF CGNS_BUILD_SHARED ON CGNS_ENABLE_64BIT ON CGNS_ENABLE_FORTRAN ON CGNS_ENABLE_HDF5 ON

118 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

CGNS_ENABLE_SCOPING OFF CGNS_ENABLE_TESTS ON CGNS_USE_SHARED ON CMAKE_BUILD_TYPE Release CMAKE_INSTALL_PREFIX /usr/local/packages6/libs/gcc/4.4.7/cgnslib/3.2.1

Once the configuration was complete, I did make make install

Module File Module File Location: /usr/local/modulefiles/libs/gcc/4.4.7/cgns/3.2.1 #%Module1.0##################################################################### ## ## cgns 3.2.1 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the cgns 3.2.1 library available" } module-whatis "Makes the cgns 3.2.1 library available" module load libs/hdf5/gcc/openmpi/1.8.14 set CGNS_DIR /usr/local/packages6/libs/gcc/4.4.7/cgnslib/3.2.1 prepend-path LD_LIBRARY_PATH $CGNS_DIR/lib prepend-path CPATH $CGNS_DIR/include

CUDA

CUDA, which stands for Compute Unified Device Architecture, is a parallel computing platform and application pro- gramming interface (API) model created by NVIDIA. It allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing - an approach known as General Purpose GPU (GPGPU).

Usage There are several versions of the CUDA library available. As with many libraries installed on the system, CUDA libraries are made available via module commands which are only available once you have started a qrsh or qsh session. The latest version CUDA is loaded with the command module load libs/cuda

Alternatively, you can load a specific version with one of the following module load libs/cuda/8.0.44 module load libs/cuda/7.5.18 module load libs/cuda/6.5.14 module load libs/cuda/4.0.17 module load libs/cuda/3.2.16

2.2. Iceberg 119 Sheffield HPC Documentation, Release

To check which version of CUDA you are using $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sun_Sep__4_22:14:01_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44

Important To compile CUDA programs you also need a compatible version of the GCC compiler suite. As of version 8.0.44, CUDA is compatible with GCC versions: • greater than or equal to 4.7.0 (to allow for the use of c++11 features) and • less than 5.0.0 It is therefore recommended that you load the most recent 4.x version of GCC when building CUDA programs on Iceberg: module load compilers/gcc/4.9.2

Compiling the sample programs You do not need to be using a GPU-enabled node to compile the sample programs but you do need a GPU to run them. In a qrsh session # Load modules module load libs/cuda/7.5.18 module load compilers/gcc/4.9.2

# Copy CUDA samples to a local directory # It will create a directory called NVIDIA_CUDA-7.5_Samples/ mkdir cuda_samples cd cuda_samples cp -r $CUDA_SDK .

# Compile (this will take a while) cd NVIDIA_CUDA-7.5_Samples/ make

A basic test is to run one of the resulting binaries, deviceQuery.

Documentation • CUDA Toolkit Documentation • The power of C++11 in CUDA 7

Determining the NVIDIA Driver version Run the command cat/proc/driver/nvidia/version

Example output is NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.44 Wed Aug 17 22:24:07 PDT 2016 GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

120 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Installation notes These are primarily for system administrators CUDA 8.0.44 1. The NVIDIA device driver was updated from v352.93 to v367.44, which also updated the OpenGL libraries. This was all done by the /etc/init.d/uos-nvidia script at boot time on all GPU-equipped nodes. 2. The CUDA toolkit binaries and samples were installed using mkdir -m 2775 -p /usr/local/packages6/libs/binlibs/CUDA/8.0.44 chown ${USER}:app-admins /usr/local/packages6/libs/binlibs/CUDA/8.0.44 cd /usr/local/media/cuda/ chmod +x cuda_8.0.44_linux-run ./cuda_8.0.44_linux.run --toolkit --toolkitpath=/usr/local/packages6/libs/binlibs/CUDA/8.0.44/cuda \ --samples --samplespath=/usr/local/packages6/libs/binlibs/CUDA/8.0.44/samples \ --no-opengl-libs -silent

3. This modulefile was installed as /usr/local/modulefiles/libs/cuda/8.0.44 CUDA 7.5.18 1. The device drivers were updated separately by one of the sysadmins. 2. A binary install was performed using a .run file mkdir -p /usr/local/packages6/libs/binlibs/CUDA/7.5.18/ chmod +x ./cuda_7.5.18_linux.run ./cuda_7.5.18_linux.run --toolkit --toolkitpath=/usr/local/packages6/libs/binlibs/CUDA/7.5.18/cuda \ --samples --samplespath=/usr/local/packages6/libs/binlibs/CUDA/7.5.18/samples \ --no-opengl-libs -silent

3. This modulefile was installed as /usr/local/modulefiles/libs/cuda/7.5.18 Previous versions No install notes are available. curl

curl Latest version 7.47.1 URL https://curl.haxx.se/ curl is an open source command line tool and library for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, HTTP/2, cookies, user+password authentication (Basic, Plain, Digest, CRAM-MD5, NTLM, Negotiate and Kerberos), file transfer resume, proxy tunneling and more.

Usage There is a default version of curl available on the system but it is rather old curl-config--version gives the result libcurl 7.19.7

2.2. Iceberg 121 Sheffield HPC Documentation, Release

Version 7.19.7 was released in November 2009! A newer version of the library is available via the module system. To make it available module load libs/gcc/4.4.7/curl/7.47.1

The curl-config command will now report the newer version curl-config--version

Should result in libcurl 7.47.1

Documentation Standard man pages are available man curl

Installation notes This section is primarily for administrators of the system. Curl 7.47.1 was compiled with gcc 4.47 • Install script: install_curl_7.47.1.sh • Module file: 7.47.1 fftw

fftw Latest version 3.3.5 URL http://www.fftw.org/

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

Usage To make this library available, run one of the following module commands: module load libs/gcc/4.9.2/fftw/3.3.5 module load libs/gcc/5.2/fftw/3.3.4

If you are using CUDA then you will want to use version 3.3.5 as the build of version 3.3.4 is dependent on GCC 5.2, a version of GCC not supported by CUDA at this time.

Installation notes This section is primarily for administrators of the system. version 3.3.5 This was compiled with GCC 4.9.2 (for compatibility with CUDA, which doesn’t support GCC >= 5.0.0). Threading (inc. OpenMP) and shared-library support were enabled at build-time. Vectorisation support was not explicitly enabled to allow the library to be used on nodes that do not support AVX/AVX2. First, download, configure, build, test and install using this script. During the testing stage you should see lots of numerical output plus:

122 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

------FFTW transforms passed basic tests! ------

------FFTW threaded transforms passed basic tests! ------

Next, this modulefile as /usr/local/modulefiles/libs/gcc/4.9.2/fftw/3.3.5 version 3.3.4 This was compiled with GCC 5.2: module load compilers/gcc/5.2 mkdir -p /usr/local/packages6/libs/gcc/5.2/fftw/3.3.4 tar -xvzf fftw-3.3.4.tar.gz cd fftw-3.3.4 ./configure --prefix=/usr/local/packages6/libs/gcc/5.2/fftw/3.3.4 --enable-threads --enable-openmp --enable-shared make make check

The result was lots of numerical output and: ------FFTW transforms passed basic tests! ------

------FFTW threaded transforms passed basic tests! ------

Installed with: make install

The modulefile is on the system at /usr/local/modulefiles/libs/gcc/5.2/fftw/3.3.4 #%Module1.0##################################################################### ## ## fftw 3.3.4 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl proc ModulesHelp { } { puts stderr "Makes the FFTW 3.3.4 library available" } module-whatis "Makes the FFTW 3.3.4 library available" set FFTW_DIR /usr/local/packages6/libs/gcc/5.2/fftw/3.3.4 prepend-path CPLUS_INCLUDE_PATH $FFTW_DIR/include prepend-path LD_LIBRARY_PATH $FFTW_DIR/lib prepend-path LIBRARY_PATH $FFTW_DIR/lib prepend-path MANPATH $FFTW_DIR/share/man prepend-path PATH $FFTW_DIR/bin

2.2. Iceberg 123 Sheffield HPC Documentation, Release

FLTK

FLTK Version 1.3.3 URL http://www.fltk.org/index.php

FLTK (pronounced “fulltick”) is a cross-platform C++ GUI toolkit for UNIX/Linux (X11), Windows, and OS X. FLTK provides modern GUI functionality without the bloat and supports 3D graphics via OpenGL and its built-in GLUT emulation.

Usage Run one of the following: module load libs/gcc/5.2/fltk/1.3.3 module load libs/gcc/4.9.2/fltk/1.3.3

If you are using CUDA then you will want to use the version that was built with (and has a runtime dependency on) GCC 4.9.2 as GCC 5.0 and later are not supported by CUDA at this time.

Installation notes This section is primarily for administrators of the system. Version 1.3.3 (GCC 4.9.2) • Compiled with (and has a runtime dependency on) GCC 4.9.2 (for compatibility with CUDA, which doesn’t support GCC >= 5.0.0). • Shared-library support and X Font Library (Xft) support were enabled at build-time. • A pre-requisite for Relion 2-beta. First, download, configure, build, test and install using this script. Next, this modulefile as /usr/local/modulefiles/libs/gcc/4.9.2/fltk/1.3.3 Version 1.3.3 (GCC 5.2) • This is a pre-requisite for GNU Octave version 4.0 • Compiled with (and has a runtime dependency on) GCC 5.2 • Installed using: module load compilers/gcc/5.2 mkdir -p /usr/local/packages6/libs/gcc/5.2/fltk/1.3.3 #tar -xvzf ./fltk-1.3.3-source.tar.gz cd fltk-1.3.3

#Fixes error relating to undefined _ZN18Fl_XFont_On_Demand5valueEv #Source https://groups.google.com/forum/#!topic/fltkgeneral/GT6i2KGCb3A sed -i 's/class Fl_XFont_On_Demand/class FL_EXPORT Fl_XFont_On_Demand/' FL/x.H

./configure --prefix=/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3 --enable-shared --enable-xft make make install

Modulefile at /usr/local/modulefiles/libs/gcc/5.2/fltk/1.3.3:

124 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

#%Module1.0##################################################################### ## ## fltk 1.3.3 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## module load compilers/gcc/5.2 proc ModulesHelp { } { puts stderr "Makes the FLTK 1.3.3 library available" } set FLTK_DIR /usr/local/packages6/libs/gcc/5.2/fltk/1.3.3 module-whatis "Makes the FLTK 1.3.3 library available" prepend-path LD_LIBRARY_PATH $FLTK_DIR/lib prepend-path CPLUS_INCLUDE_PATH $FLTK_DIR/include prepend-path LIBRARY_PATH $FLTK_DIR/lib prepend-path PATH $FLTK_DIR/bin

Geospatial Data Abstraction Library (GDAL)

GDAL Latest version 2.1.1 URL http://www.gdal.org/

GDAL is a library used by many Geographic Information Systems (GIS) packages for converting between many different raster and vector GIS data formats. It also includes command-line utilities for data translation and pro- cessing. It is released under an an X/MIT style Open Source license by the [Open Source Geospatial Founda- tion](http://www.osgeo.org/).

Usage By running $ module load libs/gcc/6.2/proj/2.1.1 you • add several GDAL programs to your PATH environment variable • allow other programs to make use of (dynamically link against) the GDAL library You can run gdal-config --version to test that you are running the required version $ gdal-config --version 2.1.1

The command-line programs that GDAL provides are :: $ gdaladdo $ gdalbuildvrt $ gdal-config $ gdal_contour $ gdaldem $ gdalenhance $ gdal_grid $ gdalinfo $ gdallocationinfo $ gdalmanage $ gdal_rasterize $ gdalserver $ gdalsrsinfo $ gdaltindex $ gdaltransform $ gdal_translate $ gdalwarp $ nearblack $ ogr2ogr $ ogrinfo $ ogrlineref $ ogrtindex $ testepsg

2.2. Iceberg 125 Sheffield HPC Documentation, Release

The gdal... utilities are mostly for processing raster GIS data formats, whilst the ogr... utilities are for process- ing vector GIS data formats.

Documentation Standard man pages are available for the provided commands/functions. These can be viewed using e.g. $ man gdal_contour

Much more information is available on the project site.

Supported file formats GDAL has been compiled on this system with support for only a limited set of GIS data formats. See Installation notes below for a list of those provided by each available version of GDAL.

Installation notes This section is primarily for administrators of the system. Version 2.1.1 GDAL 2.1.1 was compiled with v6.2.0 of the GCC compiler suite. 1. cd to a scratch directory. 2. Download, build and install GDAL using install_gdal_2.1.1.sh, ensuring that all output is redirected into a log file. GDAL files are installed into /usr/local/packages6/libs/gcc/6.2/gdal/2.1.1/ 3. Install this modulefile as /usr/local/modulefiles/libs/gcc/6.2/gdal/2.1.1 The file formats supported by the build generated using install_gdal_2.1.1.sh are listed here.

geos

geos Version 3.4.2 Support Level Bronze Dependancies compilers/gcc/4.8.2 URL http://trac.osgeo.org/geos/ Location /usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2

GEOS - Geometry Engine, Open Source

Usage To make this library available, run the following module commands module load compilers/gcc/4.8.2 module load libs/gcc/4.8.2/geos/3.4.2

We load version 4.8.2 of gcc since gcc 4.8.2 was used to build this version of geos.

The rgeos interface in R rgeos is a CRAN package that provides an R interface to geos. It is not installed in R by default so you need to install a version in your home directory. After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command. Run the following module commands

126 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load apps/R/3.2.0 module load compilers/gcc/4.8.2 module load libs/gcc/4.8.2/geos/3.4.2

Launch R and run the command install.packages('rgeos')

If you’ve never installed an R package before on the system, it will ask you if you want to install to a personal library. Answer y to any questions you are asked. The library will be installed to a sub-directory called R in your home directory and you should only need to perform the above procedure once. Once you have performed the installation, you will only need to run the module commands above to make the geos library available to the system. Then, you use regos as you would any other library in R library('rgeos')

Installation notes This section is primarily for administrators of the system. qrsh tar -xvjf ./geos-3.4.2.tar.bz2 cd geos-3.4.2 mkdir -p /usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2 module load compilers/gcc/4.8.2 ./configure prefix=/usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2

Potentially useful output at the end of the configure run Swig: false Python bindings: false Ruby bindings: false PHP bindings: false

Once the configuration was complete, I did make make install

Testing Compile and run the test-suite with make check

All tests passed.

Module File Module File Location: /usr/local/modulefiles/libs/gcc/4.8.2/geos/3.4.2 more /usr/local/modulefiles/libs/gcc/4.8.2/geos/3.4.2 #%Module1.0##################################################################### ## ## geos 3.4.2 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

2.2. Iceberg 127 Sheffield HPC Documentation, Release

proc ModulesHelp { } { puts stderr "Makes the geos 3.4.2 library available" } set GEOS_DIR /usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2 module-whatis "Makes the geos 3.4.2 library available" prepend-path LD_LIBRARY_PATH $GEOS_DIR/lib prepend-path PATH $GEOS_DIR/bin

HDF5

HDF5 Version 1.8.16, 1.8.15-patch1, 1.8.14 and 1.8.13 Dependencies gcc or pgi compiler, openmpi (optional) URL http://www.hdfgroup.org/HDF5/ Documentation http://www.hdfgroup.org/HDF5/doc/

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format. Two primary versions of this library are provided, MPI parallel enabled versions and serial versions.

Usage - Serial The serial versions were built with gcc version 4.8.2. As such, if you are going to build anything against these versions of HDF5, we recommend that you use gcc 4.8.2 which can be enabled with the following module command module load compilers/gcc/4.8.2

To enable the serial version of HDF5, use one of the following module commands depending on which version of the library you require: module load libs/hdf5/gcc/1.8.14 module load libs/hdf5/gcc/1.8.13

Usage – Parallel There are multiple versions of parallel HDF5 installed with different openmpi and compiler ver- sions. Two versions of HDF were built using gcc version 4.4.7 and OpenMPI version 1.8.3. Version 4.4.7 of gcc is the default compiler on the system so no module command is required for this module load libs/hdf5/gcc/openmpi/1.8.14 module load libs/hdf5/gcc/openmpi/1.8.13

One version was built with the PGI compiler version 15.7 and openmpi version 1.8.8 module load libs/hdf5/pgi/1.8.15-patch1

128 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The above module also loads the relevant modules for OpenMPI and PGI Compiler. To see which modules have been loaded, use the command module list Finally, another version was built with GCC 4.4.7 and openmpi 1.10.1, this version is also linked against ZLIB and SZIP.: module load libs/gcc/4.4.7/openmpi/1.10.1/hdf5/1.8.16

Installation notes This section is primarily for administrators of the system. Version 1.8.16 built using GCC Compiler, with seperate ZLIB and SZIP • install_hdf5.sh Install script • 1.8.16 Version 1.8.15-patch1 built using PGI Compiler Here are the build details for the module libs/hdf5/pgi/1.8.15-patch1 Compiled using PGI 15.7 and OpenMPI 1.8.8 • install_pgi_hdf5_1.8.15-patch1.sh Install script • Modulefile located on the system at /usr/local/modulefiles/libs/hdf5/pgi/1.8.15-patch1 gcc versions This package is built from the source code distribution from the HDF Group website. Two primary versions of this library are provided, a MPI parallel enabled version and a serial version. The serial version has the following configuration flags enabled: --enable-fortran--enable-fortran2003--enable-cxx--enable-shared

The parallel version has the following flags: --enable-fortran--enable-fortran2003--enable-shared--enable-parallel

The parallel library does not support C++, hence it being disabled for the parallel build.

icu - International Components for Unicode

icu Latest Version 55 URL http://site.icu-project.org/

ICU is the premier library for software internationalization, used by a wide array of companies and organizations

Usage Version 55 of the icu library requires gcc version 4.8.2. To make the compiler and library available, run the following module commands module load compilers/gcc/4.8.2 module load libs/gcc/4.8.2/icu/55

Version 42 of the icu library uses gcc version 4.4.7 which is the default on Iceberg. To make the library available, run the following command

2.2. Iceberg 129 Sheffield HPC Documentation, Release

module load libs/gcc/4.4.7/icu/42

Installation Notes This section is primarily for administrators of the system. Version 55 Icu 55 is a pre-requisite for the version of boost required for an experimental R module used by one of our users. module load compilers/gcc/4.8.2 tar -xvzf icu4c-55_1-src.tgz cd icu/source ./runConfigureICU Linux/gcc --prefix=/usr/local/packages6/libs/gcc/4.8.2/icu/55/ make

Version 42 Icu version 42 was originally installed as a system RPM. This install moved icu to a module-based install tar -xvzf icu4c-4_2_1-src.tgz cd icu/source/ ./runConfigureICU Linux/gcc --prefix=/usr/local/packages6/libs/gcc/4.4.7/icu/42 make make install

Testing Version 55 make check Last few lines of output were [All tests passed successfully...] Elapsed Time: 00:00:00.086 make[2]: Leaving directory `/home/fe1mpc/icu/icu/source/test/letest' ------ALL TESTS SUMMARY: All tests OK: testdata intltest iotest cintltst letest make[1]: Leaving directory `/home/fe1mpc/icu/icu/source/test' make[1]: Entering directory `/home/fe1mpc/icu/icu/source' verifying that icu-config --selfcheck can operate verifying that make -f Makefile.inc selfcheck can operate PASS: config selfcheck OK rm -rf test-local.xml

Version 42 make check Last few lines of output were All tests passed successfully...] Elapsed Time: 00:00:12.000 make[2]: Leaving directory `/home/fe1mpc/icu/source/test/cintltst' ------ALL TESTS SUMMARY: ok: testdata iotest cintltst ===== ERRS: intltest make[1]: *** [check-recursive] Error 1 make[1]: Leaving directory `/home/fe1mpc/icu/source/test' make: *** [check-recursive] Error 2

130 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The error can be ignored since it is a bug in the test itself: • http://sourceforge.net/p/icu/mailman/message/32443311/

Module Files Version 55 Module File Location: /usr/local/modulefiles/libs/gcc/4.8.2/icu/55 #%Module1.0##################################################################### ## ## icu 55 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the icu library available" } set ICU_DIR /usr/local/packages6/libs/gcc/4.8.2/icu/55 module-whatis "Makes the icu library available" prepend-path LD_LIBRARY_PATH $ICU_DIR/lib prepend-path LIBRARY_PATH $ICU_DIR/lib prepend-path CPLUS_INCLUDE_PATH $ICU_DIR/include

Version 42 Module File Location: /usr/local/modulefiles/libs/gcc/4.4.7/icu/42 #%Module1.0##################################################################### ## ## icu 42 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the icu library available" } set ICU_DIR /usr/local/packages6/libs/gcc/4.4.7/icu/42 module-whatis "Makes the icu library available" prepend-path LD_LIBRARY_PATH $ICU_DIR/lib prepend-path LIBRARY_PATH $ICU_DIR/lib prepend-path CPLUS_INCLUDE_PATH $ICU_DIR/include

Intel Data Analytics Acceleration Library

Intel’s Data Analytics Acceleration Library (DAAL) provides functions for data analysis (characterization, summa- rization, and transformation) and (regression, classification, and more).

2.2. Iceberg 131 Sheffield HPC Documentation, Release

Parallel Studio Composer Edition version DAAL can be used with and without other Parallel Studio packages. To access it: module load libs/binlibs/intel-daal/2017.0

Licensing and availability See the information on Parallel Studio licensing.

Installation Notes The following notes are primarily for system administrators. Intel DAAL 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/binlibs/intel-daal/2017.0.

Intel Integrated Performance Primitives

Integrated Performance Primitives (IPP) are “high-quality, production-ready, low-level building blocks for image pro- cessing, signal processing, and data processing (data compression/decompression and cryptography) applications.”

Parallel Studio Composer Edition version IPP can be used with and without other Parallel Studio packages. To access it: module load libs/binlibs/intel-ipp/2017.0

Licensing and availability See the information on Parallel Studio licensing.

Installation Notes The following notes are primarily for system administrators. Intel IPP 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/binlibs/intel-ipp/2017.0/binary.

Intel Math Kernel Library

Intel’s Math Kernel Library (MKL) provides highly optimized, threaded and vectorized functions to maximize per- formance on each processor family. It Utilises de-facto standard C and Fortran APIs for compatibility with BLAS, LAPACK and FFTW functions from other math libraries.

Interactive usage The MKL was installed along with the Intel compilers as part of Intel Parallel Studio. The MKL can be used with or without other Parallel Studio packages (such as the Intel compilers). To activate just the MKL, use one of : module load libs/binlibs/intel-mkl/2017.0

module load libs/binlibs/intel-mkl/11.2.3

132 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Note that 2017.0 is newer than 11.2.3. Sample C and Fortran programs demonstrating matrix multiplication are available for version 2017.0 in the directory $MKL_SAMPLES: $ ls $MKL_SAMPLES/ mkl_c_samples mkl_fortran_samples

To compile one of these you need to copy the samples to a directory you can write to, ensure the MKL and the Intel compilers are both loaded, change into the directory containing the relevant sample program (C or Fortran) then run make to compile: $ qrsh $ cp -r $MKL_SAMPLES/ ~/mkl_samples $ module load libs/binlibs/intel-mkl/2017.0 $ module load libs/binlibs/intel-mkl/2017.0 $ cd ~/mkl_samples/mkl_fortran_samples/matrix_multiplication $ make ifort -c src/dgemm_example.f -o release/dgemm_example.o ifort release/dgemm_example.o -mkl -static-intel -o release/dgemm_example ifort -c src/dgemm_with_timing.f -o release/dgemm_with_timing.o ifort release/dgemm_with_timing.o -mkl -static-intel -o release/dgemm_with_timing ifort -c src/matrix_multiplication.f -o release/matrix_multiplication.o ifort release/matrix_multiplication.o -mkl -static-intel -o release/matrix_multiplication ifort -c src/dgemm_threading_effect_example.f -o release/dgemm_threading_effect_example.o ifort release/dgemm_threading_effect_example.o -mkl -static-intel -o release/dgemm_threading_effect_example

You should then find several compiled programs in the release directory: $ ./release/matrix_multiplication This example measures performance of computing the real matrix product C=alpha*A*B+beta*C using a triple nested loop, where A, B, and C are matrices and alpha and beta are double precision scalars

Initializing data for matrix multiplication C=A*B for matrix A( 2000 x 200) and matrix B( 200 x 1000)

Intializing matrix data

Making the first run of matrix product using triple nested loop to get stable run time measurements

Measuring performance of matrix product using triple nested loop

== Matrix multiplication using triple nested loop == == completed at 184.42246 milliseconds ==

Example completed.

For documentation and a tutorial, start an interactive session using qsh, load MKL, then run: $ firefox $MKL_SAMPLES/mkl_fortran_samples/matrix_multiplication/tutorial/en/index.htm

Installation Notes The following notes are primarily for system administrators.

2.2. Iceberg 133 Sheffield HPC Documentation, Release

Intel MKL 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/binlibs/intel-mkl/2017.0. Intel MKL 11.2.3 Installed as part of Intel Parallel Studio Composer Edition 2015 Update 3. This modulefile was installed as /usr/local/modulefiles/libs/binlibs/intel-mkl/11.2.3

Intel Threading Building Blocks

Threading Building Blocks (TBB) lets you write “parallel C++ programs that take full advantage of multicore perfor- mance, that are portable and composable, and that have future-proof scalability.”

Interactive usage TBB can be used with and without other Parallel Studio packages. To make use of it you first need to start an interactive session that has access to multiple cores. Here we request six cores from the clusters’ scheduler: $ qrsh -pe openmp 6

Next, we need to active a specific version of TBB: $ module load libs/binlibs/intel-tbb/2017.0

You can find sample TBB programs in the directory $TBB_SAMPLES: $ ls $TBB_SAMPLES

common concurrent_hash_map concurrent_priority_queue GettingStarted graph index.html license.txt parallel_do parallel_for parallel_reduce pipeline task task_arena task_group test_all

We can compile and run a sample program by copying the samples to our (writable) home directory, moving in to the directory containing the sample program (which should have a Makefile) then running make: $ cp -r $TBB_SAMPLES ~/tbb_samples $ cd ~/tbb_samples/GettingStarted/sub_string_finder/ $ ls

license.txt Makefile readme.html sub_string_finder.cpp sub_string_finder_extended.cpp sub_string_finder_pretty.cpp

$ make

g++ -O2 -DNDEBUG -o sub_string_finder sub_string_finder.cpp -ltbb -lrt g++ -O2 -DNDEBUG -o sub_string_finder_pretty sub_string_finder_pretty.cpp -ltbb -lrt g++ -O2 -DNDEBUG -o sub_string_finder_extended sub_string_finder_extended.cpp -ltbb -lrt ./sub_string_finder_extended Done building string. Done with serial version. Done with parallel version. Done validating results. Serial version ran in 8.51745 seconds Parallel version ran in 1.51716 seconds

Many of the sample directories contain HTML documentation. To read this you need to start an interactive graphical session (using qsh or qrshx) then run:

134 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

$ firefox ~/tbb_samples/index.html

Tutorials See Chrys Woods’ excellent tutorials for C++ programmers and Python programmers.

Licensing and availability See the information on Parallel Studio licensing.

Installation Notes The following notes are primarily for system administrators. Intel TBB 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/binlibs/intel-tbb/2017.0. libunistring

libunistring Latest version 0.9.5 URL http://www.gnu.org/software/libunistring/#TOCdownloading/

Text files are nowadays usually encoded in Unicode, and may consist of very different scripts – from Latin letters to Chinese Hanzi –, with many kinds of special characters – accents, right-to-left writing marks, hyphens, Roman numbers, and much more. But the POSIX platform APIs for text do not contain adequate functions for dealing with particular properties of many Unicode characters. In fact, the POSIX APIs for text have several assumptions at their base which don’t hold for Unicode text. This library provides functions for manipulating Unicode strings and for manipulating C strings according to the Unicode standard.

Usage To make this library available, run the following module command module load libs/gcc/4.8.2/libunistring/0.9.5

This correctly populates the environment variables LD_LIBRARY_PATH, LIBRARY_PATH and CPLUS_INCLUDE_PATH

Installation notes This section is primarily for administrators of the system. libunistring 0.9.5 was compiled with gcc 4.8.2 module load compilers/gcc/4.8.2 tar -xvzf ./libunistring-0.9.5.tar.gz cd ./libunistring-0.9.5 ./configure --prefix=/usr/local/packages6/libs/gcc/4.8.2/libunistring/0.9.5

#build make #Testing make check #Install make install

2.2. Iceberg 135 Sheffield HPC Documentation, Release

Testing Run make check after make and before make install. This runs the test suite. Results were ======Testsuite summary for ======# TOTAL: 492 # PASS: 482 # SKIP: 10 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ======

Module file The module file is on the system at /usr/local/modulefiles/libs/gcc/4.8.2/libunistring/0.9.5 #%Module1.0##################################################################### ## ## libunistring 0.9.5 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the libunistring library available" } set LIBUNISTRING_DIR /usr/local/packages6/libs/gcc/4.8.2/libunistring/0.9.5 module-whatis "Makes the libunistring library available" prepend-path LD_LIBRARY_PATH $LIBUNISTRING_DIR/lib/ prepend-path LIBRARY_PATH $LIBUNISTRING_DIR/lib prepend-path CPLUS_INCLUDE_PATH $LIBUNISTRING_DIR/include

MED

MED Version 3.0.8 Support Level Bronze URL http://www.salome-platform.org/downloads/current-version Location /usr/local/packages6/libs/gcc/4.4.7/med/3.0.8

The purpose of the MED module is to provide a standard for storing and recovering computer data associated to numerical meshes and fields, and to facilitate the exchange between codes and solvers.

Usage To make this library available, run the following module command

136 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load libs/gcc/4.4.7/med/3.0.8

Installation notes This section is primarily for administrators of the system. • This is a pre-requisite for Code Saturne version 4.0. • It was built with gcc 4.4.7, openmpi 1.8.3 and hdf5 1.8.14 module load mpi/gcc/openmpi/1.8.3 tar -xvzf med-3.0.8.tar.gz cd med-3.0.8 mkdir -p /usr/local/packages6/libs/gcc/4.4.7/med/3.0.8 ./configure --prefix=/usr/local/packages6/libs/gcc/4.4.7/med/3.0.8 --disable-fortran --with-hdf5=/usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/ --disable-python make make install

Fortran was disabled because otherwise the build failed with compilation errors. It’s not needed for Code Saturne 4.0. Python was disabled because it didn’t have MPI support. testing The following was submiited as an SGE job from the med-3.0.8 build directory #!/bin/bash

#$ -pe openmpi-ib 8 #$ -l mem=6G module load mpi/gcc/openmpi/1.8.3 make check

All tests passed

Module File #%Module1.0##################################################################### ## ## MED 3.0.8 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the MED 3.0.8 library available" } module-whatis "Makes the MED 3.0.8 library available" set MED_DIR /usr/local/packages6/libs/gcc/4.4.7/med/3.0.8 prepend-path LD_LIBRARY_PATH $MED_DIR/lib64 prepend-path CPATH $MED_DIR/include

2.2. Iceberg 137 Sheffield HPC Documentation, Release

NAG Fortran Library (Serial)

Produced by experts for use in a variety of applications, the NAG Fortran Library has a global reputation for its excellence and, with hundreds of fully documented and tested routines, is the largest collection of mathematical and statistical algorithms available. This is the serial (1 CPU core) version of the NAG Fortran Library. For many routines, you may find it beneficial to use the parallel version of the library.

Usage There are several versions of the NAG Fortran Library available. The one you choose depends on which compiler you are using. As with many libraries installed on the system, NAG libraries are made available via module commands which are only available once you have started a qrsh or qsh session. In addition to loading a module for the library, you will usually need to load a module for the compiler you are using. NAG for Intel Fortran Use the following command to make Mark 25 of the serial (1 CPU core) version of the NAG Fortran Library for Intel compilers available module load libs/intel/15/NAG/fll6i25dcl

Once you have ensured that you have loaded the module for the Intel compilers you can compile your NAG program using ifort your_code.f90 -lnag_mkl -o your_code.exe

which links to a version of the NAG library that’s linked against the high performance Intel MKL (which provides high-performance versions of the BLAS and LAPACK libraries). Alternatively, you can compile using ifort your_code.f90 -lnag_nag -o your_code.exe

Which is linked against a reference version of BLAS and LAPACK. If you are in any doubt as to which to choose, we suggest that you use -lnag_mkl NAG for PGI Fortran Use the following command to make Mark 24 of the serial (1 CPU core) version of the NAG Fortran Library for PGI compilers available module load libs/pgi/15/NAG/fll6a24dpl

Once you have ensured that you have loaded the module for the PGI Compilers you can compile your NAG program using pgf90 your_code.f90 -lnag_mkl -I$NAGINC -o your_code.exe

which links to a version of the NAG library that’s linked against the high performance Intel MKL (which provides high-performance versions of the BLAS and LAPACK libraries). Alternatively, you can compile using pgf90 your_code.f90 -lnag_nag -I$NAGINC -o your_code.exe

Which is linked against a reference version of BLAS and LAPACK. If you are in any doubt as to which to choose, we suggest that you use -lnag_mkl

Running NAG’s example programs Most of NAG’s routines come with example programs that show how to use them. When you use the module command to load a version of the NAG library, the script nag_example for that version becomes available. Providing this script with the name of the NAG routine you are interested in will copy, compile and run the example program for that routine into your current working directory.

138 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

For example, here is an example output for the NAG routine a00aaf which identifies the version of the NAG library you are using. If you try this yourself, the output you get will vary according to which version of the NAG library you are using nag_example a00aaf

If you have loaded the module for fll6i25dcl this will give the following output Copying a00aafe.f90 to current directory cp /usr/local/packages6/libs/intel/15/NAG/fll6i25dcl/examples/source/a00aafe.f90 .

Compiling and linking a00aafe.f90 to produce executable a00aafe.exe ifort -I/usr/local/packages6/libs/intel/15/NAG/fll6i25dcl/nag_interface_blocks a00aafe.f90 /usr/local/packages6/libs/intel/15/NAG/fll6i25dcl/lib/libnag_nag.a -o a00aafe.exe

Running a00aafe.exe ./a00aafe.exe > a00aafe.r A00AAF Example Program Results

*** Start of NAG Library implementation details ***

Implementation title: Linux 64 (Intel 64 / AMD64), Intel Fortran, Double Precision (32-bit integers) Precision: FORTRAN double precision Product Code: FLL6I25DCL Mark: 25.1.20150610 (self-contained)

*** End of NAG Library implementation details ***

Functionality The key numerical and statistical capabilities of the Fortran Library are shown below. • Click here for a complete list of the contents of the Library. • Cick here to see what’s new in Mark 25 of the library. Numerical Facilities • Optimization, both Local and Global • Linear, quadratic, integer and nonlinear programming and least squares problems • Ordinary and partial differential equations, and mesh generation • Solution of dense, banded and sparse linear equations and eigenvalue problems • Solution of linear and nonlinear least squares problems • Curve and surface fitting and interpolation • Special functions • Numerical integration and integral equations • Roots of nonlinear equations (including polynomials) • Option Pricing Formulae • Wavelet Transforms Statistical Facilities • Random number generation • Simple calculations on statistical data • Correlation and regression analysis

2.2. Iceberg 139 Sheffield HPC Documentation, Release

• Multivariate methods • Analysis of variance and contingency table analysis • Time series analysis • Nonparametric statistics

Documentation • The NAG Fortran MK25 Library Manual (Link to NAG’s webbsite) • The NAG Fortran MK24 Library Manual ( Link to NAG’s website)

Installation notes fll6i25dcl These are primarily for system administrators tar -xvzf ./fll6i25dcl.tgz ./install.sh

The installer is interactive. Answer the installer questions as follows Do you wish to install NAG Mark 25 Library? (yes/no): yes

License file gets shown [accept/decline]? : accept

Where do you want to install the NAG Fortran Library Mark 25? Press return for default location (/opt/NAG) or enter an alternative path. The directory will be created if it does not already exist. > /usr/local/packages6/libs/intel/15/NAG/

Module Files fll6i25dcl • The module file is on the system at /usr/local/modulefiles/libs/intel/15/NAG/fll6i25dcl • The module file is on github. fll6a24dpl • The module file is on the system at /usr/local/modulefiles/libs/pgi/15/NAG/fll6a24dpl

NetCDF (fortran)

NetCDF Latest version 4.4.3 Dependancies mpi/gcc/openmpi/1.10.1 libs/gcc/4.4.7/openmpi/1.10.1/hdf5/1.8.16 URL http://www.unidata.ucar.edu/software/netcdf/ Documentation http://www.unidata.ucar.edu/software/netcdf/docs/

140 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. NetCDF was developed and is maintained at Unidata. Unidata provides data and software tools for use in geoscience education and research. Unidata is part of the University Corporation for Atmospheric Research (UCAR) Community Programs (UCP). Unidata is funded primarily by the National Science Foundation.

Usage This version of NetCDF was compiled using version 4.4.7 of the gcc compiler, openmpi 1.10.1 and HDF5 1.8.16. To make it available, run the following module command after starting a qsh or qrsh session module load libs/gcc/4.4.7/openmpi/1.10.1/netcdf-fortran/4.4.3

This module also loads the relevant modules for OpenMPI, PGI Compiler and HDF5. To see which modules have been loaded, use the command module list

Installation notes This section is primarily for administrators of the system. Version 4.4.3 • install_gcc_netcdf-fortran_4.4.3.sh Install script • Modulefile.

NetCDF (PGI build)

NetCDF Latest version Dependancies PGI Compiler 15.7, PGI OpenMPI 1.8.8, PGI HDF5 1.8.15-patch1 URL http://www.unidata.ucar.edu/software/netcdf/ Documentation http://www.unidata.ucar.edu/software/netcdf/docs/

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. NetCDF was developed and is maintained at Unidata. Unidata provides data and software tools for use in geoscience education and research. Unidata is part of the University Corporation for Atmospheric Research (UCAR) Community Programs (UCP). Unidata is funded primarily by the National Science Foundation.

Usage This version of NetCDF was compiled using version 15.7 of the PGI Compiler, OpenMPI 1.8.8 and HDF5 1.8.15-patch1. To make it available, run the following module command after starting a qsh or qrsh session module load libs/pgi/netcdf/4.3.3.1

This module also loads the relevant modules for OpenMPI, PGI Compiler and HDF5. To see which modules have been loaded, use the command module list

Installation notes This section is primarily for administrators of the system. Version 4.3.3.1 Compiled using PGI 15.7, OpenMPI 1.8.8 and HDF5 1.8.15-patch1 • install_pgi_netcdf_4.3.3.1.sh Install script • Install logs are on the system at /usr/local/packages6/libs/pgi/netcdf/4.3.3.1/install_logs

2.2. Iceberg 141 Sheffield HPC Documentation, Release

• Modulefile located on the system at ls /usr/local/modulefiles/libs/pgi/netcdf/4.3.3.1 pcre

pcre Version 8.37 Support Level Bronze Dependancies None URL http://www.pcre.org/ Location /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API. The PCRE library is free, even for building proprietary software.

Usage To make this library available, run the following module command after starting a qsh or qrsh session. module load libs/gcc/4.4.7/pcre/8.37

This also makes the updated pcregrep command available and will replace the system version. Check the version you are using with pcregrep -V pcregrep -V pcregrep version 8.37 2015-04-28

Installation notes This section is primarily for administrators of the system. qrsh tar -xvzf pcre-8.37.tar.gz cd pcre-8.37 mkdir -p /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37 ./configure --prefix=/usr/local/packages6/libs/gcc/4.4.7/pcre/8.37

The configuration details were pcre-8.37 configuration summary:

Install prefix ...... : /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37 C preprocessor ...... : gcc -E C compiler ...... : gcc C++ preprocessor ...... : g++ -E C++ compiler ...... : g++ Linker ...... : /usr/bin/ld -m elf_x86_64 C preprocessor flags ...... : C compiler flags ...... : -g -O2 -fvisibility=hidden C++ compiler flags ...... : -O2 -fvisibility=hidden -fvisibility-inlines-hidden Linker flags ...... : Extra libraries ...... :

Build 8 bit pcre library ...... : yes Build 16 bit pcre library ...... : no Build 32 bit pcre library ...... : no

142 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Build C++ library ...... : yes Enable JIT compiling support .... : no Enable UTF-8/16/32 support ...... : no Unicode properties ...... : no Newline char/sequence ...... : lf \R matches only ANYCRLF ...... : no EBCDIC coding ...... : no EBCDIC code for NL ...... : n/a Rebuild char tables ...... : no Use stack recursion ...... : yes POSIX mem threshold ...... : 10 Internal link size ...... : 2 Nested parentheses limit ...... : 250 Match limit ...... : 10000000 Match limit recursion ...... : MATCH_LIMIT Build shared libs ...... : yes Build static libs ...... : yes Use JIT in pcregrep ...... : no Buffer size for pcregrep ...... : 20480 Link pcregrep with libz ...... : no Link pcregrep with libbz2 ...... : no Link pcretest with libedit ...... : no Link pcretest with libreadline .. : no Valgrind support ...... : no Code coverage ...... : no

Once the configuration was complete, I did make make install

Testing was performed and all tests passed make check

======Testsuite summary for PCRE 8.37 ======# TOTAL: 5 # PASS: 5 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ======

Module File Module File Location: /usr/local/modulefiles/libs/gcc/4.4.7/pcre/8.37 #%Module1.0##################################################################### ## ## pcre 8.37 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ##

2.2. Iceberg 143 Sheffield HPC Documentation, Release

proc ModulesHelp { } { puts stderr "Makes the pcre 8.37 library available" }

module-whatis "Makes the pcre 8.37 library available"

set PCRE_DIR /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37

prepend-path LD_LIBRARY_PATH $PCRE_DIR/lib prepend-path CPATH $PCRE_DIR/include prepend-path PATH $PCRE_DIR/bin

PROJ.4

PROJ.4 Latest version 4.9.3 URL https://github.com/OSGeo/proj.4

PROJ.4 consists of programs and a library for managing cartographic projections.

Usage By running $ module load libs/gcc/6.2/proj/4.9.3

you • add several PROJ.4 programs to your PATH environment variable • allow other programs to make use of (dynamically link against) the PROJ.4 library You can run proj to test that you are running the required version $ proj Rel. 4.9.3, 15 August 2016 usage: proj [ -bCeEfiIlormsStTvVwW [args] ] [ +opts[=arg] ] [ files ]

Documentation Standard man pages are available for the provided commands/functions cs2cs, geod, proj, geodesic and pj_init. These can be viewed using e.g. $ man proj

Installation notes This section is primarily for administrators of the system. Version 4.9.3 PROJ.4 4.9.3 was compiled with v6.2.0 of the GCC compiler suite. 1. cd to a scratch directory. 2. Download, build and install PROJ.4 using install_proj_4.9.3.sh. Files are installed into /usr/local/packages6/libs/gcc/6.2/proj/4.9.3/ 3. Install this modulefile as /usr/local/modulefiles/libs/gcc/6.2/proj/4.9.3

144 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

SCOTCH

SCOTCH Latest version 6.0.4 URL http://www.labri.fr/perso/pelegrin/scotch/ Location /usr/local/packages6/libs/gcc/5.2/scotch/6.0.4

Software package and libraries for sequential and parallel graph partitioning, static mapping and clustering, sequential mesh and hypergraph partitioning, and sequential and parallel sparse matrix block ordering.

Usage To make this library available, run the following module command module load libs/gcc/5.2/scotch/6.0.4

Installation notes This section is primarily for administrators of the system. SCOTCH 6.0.4 was compiled with gcc 5.2 using the bash file install_scotch.sh

SimBody

SimBody Version 3.5.3 Support Level Bronze Dependancies compilers/gcc/4.8.2 URL https://simtk.org/home/simbody Location /usr/local/packages6/libs/gcc/4.8.2/simbody/3.5.3

Usage To make this library available, run the following module command module load libs/gcc/4.8.2/simbody

Installing This section is primarily for administrators of the system. Installed using the following procedure: module load compilers/gcc/4.8.2 cmake ../simbody-Simbody-3.5.3/ -DCMAKE_INSTALL_PREFIX=/usr/local/packages6/libs/gcc/4.8.2/simbody/3.5.3 make -j 8

wxWidgets

2.2. Iceberg 145 Sheffield HPC Documentation, Release

wxWidgets Version 3.1.0 URL http://www.wxwidgets.org/

“wxWidgets is a C++ library that lets developers create applications for Windows, Mac OS X, Linux and other plat- forms with a single code base. It has popular language bindings for Python, Perl, Ruby and many other languages, and unlike other cross-platform toolkits, wxWidgets gives applications a truly native look and feel because it uses the platform’s native API rather than emulating the GUI. It’s also extensive, free, open-source and mature.”

Usage Activate wxWidgets using: module load libs/gcc/4.9.2/wxWidgets/3.1.0

You can then check that wxWidgets is available using: $ wx-config --version 3.1.0

Installation notes This section is primarily for system administrators. Version 3.1.0 This is a pre-requisite for ctffind 4.x. It was built using gcc 4.9.2. Unicode support was enabled. 1. Download, configure, compile and install using this script 2. Install this modulefile as /usr/local/modulefiles//libs/gcc/4.9.2/wxWidgets/3.1.0 zlib

zlib Version 1.2.8 URL http://zlib.net/ Location /usr/local/packages6/libs/gcc/4.4.7/zlib/1.2.8

A Massively Spiffy Yet Delicately Unobtrusive Compression Library.

Usage To make this library available, run the following module command module load libs/gcc/4.4.7/zlib/1.2.8

Installation notes This section is primarily for administrators of the system. It was built with gcc 4.4.7 #!/bin/bash install_dir=/usr/local/packages6/libs/gcc/4.4.7/zlib/1.2.8 wget http://zlib.net/zlib-1.2.8.tar.gz

146 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

tar -xvzf ./zlib-1.2.8.tar.gz cd zlib-1.2.8 mkdir -p $install_dir

./configure --prefix=$install_dir make make install testing The library was tested with make check. The results were make check hello world zlib version 1.2.8 = 0x1280, compile flags = 0xa9 uncompress(): hello, hello! gzread(): hello, hello! gzgets() after gzseek: hello! inflate(): hello, hello! large_inflate(): OK after inflateSync(): hello, hello! inflate with dictionary: hello, hello! *** zlib test OK *** hello world zlib version 1.2.8 = 0x1280, compile flags = 0xa9 uncompress(): hello, hello! gzread(): hello, hello! gzgets() after gzseek: hello! inflate(): hello, hello! large_inflate(): OK after inflateSync(): hello, hello! inflate with dictionary: hello, hello! *** zlib shared test OK *** hello world zlib version 1.2.8 = 0x1280, compile flags = 0xa9 uncompress(): hello, hello! gzread(): hello, hello! gzgets() after gzseek: hello! inflate(): hello, hello! large_inflate(): OK after inflateSync(): hello, hello! inflate with dictionary: hello, hello! *** zlib 64-bit test OK ***

Module File Module location is /usr/local/modulefiles/libs/gcc/4.4.7/zlib/1.2.8. Module contents #%Module1.0##################################################################### ## ## zlib 1.2.8 module file ##

## Module file logging source /usr/local/etc/module_logging.tcl ## proc ModulesHelp { } { puts stderr "Makes the zlib 1.2.8 library available"

2.2. Iceberg 147 Sheffield HPC Documentation, Release

} module-whatis "Makes the zlib 1.2.8 library available" set ZLIB_DIR /usr/local/packages6/libs/gcc/4.4.7/zlib/1.2.8 prepend-path LD_LIBRARY_PATH $ZLIB_DIR/lib prepend-path CPATH $ZLIB_DIR/include prepend-path MANPATH $ZLIB_DIR/share/man

Development Tools on Iceberg

CMake

CMake is a build tool commonly used when compiling other libraries. CMake is installed in /usr/local/packages6/cmake.

Usage CMake can be loaded with: module load compilers/cmake

Installation Run the following commands: module load apps/python/2.7

./bootstrap --prefix=/usr/local/packages6/cmake/3.3.0/ --mandir=/usr/local/packages6/cmake/3.3.0/man --sphinx-man gmake -j8 gmake install

GNU Compiler Collection (gcc)

The GNU Compiler Collection (gcc) is a widely used, free collection of compilers including C (gcc), C++ (g++) and Fortran (gfortran). The defaut version of gcc on the system is 4.4.7 gcc -v

Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)

It is possible to switch to other versions of the gcc compiler suite using modules. After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qrsh or qsh command. Choose the version of the compiler you wish to use using one of the following commands module load compilers/gcc/6.2 module load compilers/gcc/5.4 module load compilers/gcc/5.3 module load compilers/gcc/5.2

148 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load compilers/gcc/4.9.2 module load compilers/gcc/4.8.2 module load compilers/gcc/4.5.3

Alternatively load the most recent available version using module load compilers/gcc

Confirm that you’ve loaded the version of gcc you wanted using gcc -v.

Documentation man pages are available on the system. Once you have loaded the required version of gcc, type man gcc

• What’s new in the gcc version 6 series? • What’s new in the gcc version 5 series?

Installation Notes These notes are primarily for system administrators • gcc version 6.2 was installed using : – install_gcc_6.2.sh – gcc 6.2 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/6.2 • gcc version 5.4 was installed using : – install_gcc_5.4.sh – gcc 5.4 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/5.4 • Installation notes for version 5.3 are not available. • gcc version 5.2 was installed using : – install_gcc_5.2.sh – gcc 5.2 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/5.2 • gcc version 4.9.2 was installed using : – install_gcc_4.9.2.sh – gcc 4.9.2 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/4.9.2 • Installation notes for versions 4.8.2 and below are not available. git

git Latest version 2.5 Dependancies gcc 5.2 URL https://git-scm.com/

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

2.2. Iceberg 149 Sheffield HPC Documentation, Release

Usage An old version of git is installed as part of the system’s opertaing system. As such, it is available everywhere, including on the log-in nodes $ git --version git version 1.7.1

This was released in April 2010. We recommend that you load the most up to date version using modules - something that can only be done after starting an interactive qrsh or qsh session module load apps/gcc/5.2/git/2.5

Installation notes Version 2.5 of git was installed using gcc 5.2 using the following install script and module file: • install_git_2.5.sh • git 2.5 modulefile located on the system at /usr/local/modulefiles/compilers/git/2.5

Intel Compilers

The Intel compilers help create C, C++ and Fortran applications that can take full advantage of the advanced hardware capabilities available in Intel processors and co-processors. They also simplify that development by providing high level parallel models and built-in features like explicit vectorization and optimization reports. Several versions of the Intel compilers were installed on this cluster as part of Intel Parallel Studio but can be used with or without the other Parallel Studio components.

Usage and compilation examples After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. To make one of the versions of the Intel compilers available, run one of the following module commands module load compilers/intel/17.0.0 module load compilers/intel/15.0.3 module load compilers/intel/14.0 module load compilers/intel/12.1.15 module load compilers/intel/11.0

C To compile the C hello world example into an executable called hello using the Intel C compiler: icc hello.c -o hello

C++ To compile the C++ hello world example into an executable called hello using the Intel C++ compiler: icpc hello.cpp -o hello

Fortran To compile the Fortran hello world example into an executable called hello using the Intel Fortran compiler: ifort hello.f90 -o hello

150 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Detailed Documentation Once you have loaded the module on Iceberg, man pages are available for Intel compiler products man ifort man icc

The following links are to Intel’s website: • User and Reference Guide for the Intel® C++ Compiler 17.0 • User and Reference Guide for the Intel® Fortran Compiler 17.0 • Step by Step optimizing with Intel C++ Compiler

Licensing and availability See the information on Parallel Studio licensing.

Related Software on the system TODO: add link to NAG Fortran Library (serial)

Installation Notes The following notes are primarily for system administrators.

Version 17.0.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/compilers/intel/17.0.0.

Version 15.0.3 • The install is located on the system at /usr/local/packages6/compilers/intel/2015/ • The license file is at /usr/local/packages6/compilers/intel/license.lic • The environment variable INTEL_LICENSE_FILE is set by the environment module and points to the license file location • Download the files l_ccompxe_2015.3.187.tgz (C/C++) and l_fcompxe_2015.3.187.tgz (For- tran) from Intel Portal. • Put the above .tgz files in the same directory as install_intel15.sh :down- load: and silent_master.cfg • Run install_intel15.sh • To find what was required in the module file: env > base.env source /usr/local/packages6/compilers/intel/2015/composer_xe_2015.3.187/bin/compilervars.sh intel64 env > after_intel.env diff base.env after_intel.env

• The module file is on iceberg at /usr/local/modulefiles/compilers/intel/15.0.3

Versions 14 and below Installation notes are not available for these older versions of the Intel compilers.

2.2. Iceberg 151 Sheffield HPC Documentation, Release

Intel Parallel Studio

Intel Parallel Studio XE is a software development suite that helps boost application performance by taking advantage of the ever-increasing processor core count and vector register width available in Intel Xeon processors and other compatible processors. Its core components are compilers and libraries for fast linear algebra, data transformation and parallelisation.

Composer Edition 2017 This includes: • Intel C++ and Fortran compilers: these help create C, C++ and Fortran applications that “can take full advan- tage of the advanced hardware capabilities available in Intel processors and co-processors. They also simplify that development by providing high level parallel models and built-in features like explicit vectorization and optimization reports”. • Data Analytics Acceleration Library (DAAL): functions for data analysis (characterization, summarization, and transformation) and Machine Learning (regression, classification, and more). • Math Kernel Library (MKL): This library provides highly optimized, threaded and vectorized functions to max- imize performance on each processor family. Utilizes de facto standard C and Fortran APIs for compatibility with BLAS, LAPACK and FFTW functions from other math libraries. • Integrated Performance Primitives (IPP): “high-quality, production-ready, low-level building blocks for image processing, signal processing, and data processing (data compression/decompression and cryptography) appli- cations.” • Threading Building Blocks (TBB): lets you write “parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.” It does not include Intel’s MPI implementation. See Intel’s site for further details of what Parallel Studio Composer Edition includes.

Licensing and availability Some components of Parallel Studio are freely available for those only wanting commu- nity support; other components, such as the compilers are commercial software. The University has purchased a number of Intel-supported licenses for the Parallel Studio Composer Edition products listed above. These can be accessed by (and are shared between) the University’s HPC clusters and researchers’ own machines.

Installation Notes The following notes are primarily for system administrators. Parallel Studio XE Composer Edition 2017.0 1. Download parallel_studio_xe_2017_composer_edition.tgz from the Intel Portal, save in /usr/local/media/protected/intel/2017.0/ then make it only readable by the app-admins group. 2. Ensure details of the Intel license server are in the file /usr/local/packages6/compilers/intel/license.lic 3. Run this script. This installs Parallel Studio into /usr/local/packages6/compilers/intel-pe-xe-ce/2017.0/binary/. Products are activated using the aforementioned license file during the installation process. 4. Install several modulefiles to locations under /usr/local/modulefiles. These modulefiles set the INTEL_LICENSE_FILE environment variable to the location of the aforementioned license file and set other environment variables required by the different Parallel Studio products. There is one modulefile for all Parallel Studio software and other modulefiles for specific products. • The Compilers modulefile should be installed as /usr/local/modulefiles/compilers/intel/17.0.0. • The DAAL modulefile should be installed as /usr/local/modulefiles/libs/binlibs/intel-daal/2017.0.

152 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

• The IPP modulefile should be installed as /usr/local/modulefiles/libs/binlibs/intel-ipp/2017.0. • The MKL modulefile should be installed as /usr/local/modulefiles/libs/binlibs/intel-mkl/2017.0. • The TBB modulefile should be installed as /usr/local/modulefiles/libs/binlibs/intel-tbb/2017.0. • See the (TCL) modulefiles for details of how they were derived from Intel-supplied environment- manipulating shell scripts. 5. Check that licensing is working by activating the Intel Compilers modulefile then try compiling a trivial C program using the icc compiler.

NAG Fortran Compiler

The NAG Fortran Compiler is robust, highly tested, and valued by developers all over the globe for its checking capa- bilities and detailed error reporting. The Compiler is available on Unix platforms as well as for Microsoft Windows and Apple Mac platforms. Release 6.0 has extensive support for both legacy and modern Fortran features, and also supports parallel programming with OpenMP.

Making the NAG Compiler available After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qsh or qrsh command. To make a version of the NAG Fortran Compiler available, run one of following module commands: module load compilers/NAG/6.1 module load compilers/NAG/6.0

Compilation examples To compile the Fortran hello world example into an executable called hello using the NAG compiler nagfor hello.f90 -o hello

Detailed Documentation Once you’ve run one of the NAG Compiler module command, man documentation is available: man nagfor

Online documentation: • PDF version of the NAG Fortran Compiler Manual • NAG Fortran Compiler Documentation Index (NAG’s Website) • Differences between different versions can be found in the Release Notes.

Installation Notes The following notes are primarily for system administrators: Licensing Add the license key(s) to /usr/local/packages5/nag/license.lic. The license key needs to be updated annually before 31st July. The NAG compiler environment modules (see below) set the environment variable $NAG_KUSARI_FILE to the path to this file. Version 6.1

2.2. Iceberg 153 Sheffield HPC Documentation, Release

1. Perform an unattended install using this script. The software will be installed into /usr/local/packages6/compilers/NAG/6.1. 2. Install this modulefile as /usr/local/modulefiles/compilers/NAG/6.1 3. Test the installation by compiling and building a sample Fortran 90 program module load compilers/NAG/6.1 nagfor -o /tmp/f90_util /usr/local/packages6/compilers/NAG/6.1/lib/NAG_Fortran/f90_util.f90 /tmp/f90_util

Version 6.0 Run the following interactively: mkdir -p /usr/local/packages6/compilers/NAG/6.0 tar -xvzf ./npl6a60na_amd64.tgz cd NAG_Fortran-amd64/

Run the interactive install script ./INSTALL.sh

Accept the license and answer the questions as follows: Install compiler binaries to where [/usr/local/bin]? /usr/local/packages6/compilers/NAG/6.0

Install compiler library files to where [/usr/local/lib/NAG_Fortran]? /usr/local/packages6/compilers/NAG/6.0/lib/NAG_Fortran

Install compiler man page to which directory [/usr/local/man/man1]? /usr/local/packages6/compilers/NAG/6.0/man/man1

Suffix for compiler man page [1] (i.e. nagfor.1)? Press enter

Install module man pages to which directory [/usr/local/man/man3]? /usr/local/packages6/compilers/NAG/6.0/man/man3

Suffix for module man pages [3] (i.e. f90_gc.3)? Press Enter

Install this modulefile as /usr/local/modulefiles/compilers/NAG/6.0 Finally, test the installation by compiling and building a sample Fortran 90 program module load compilers/NAG/6.0 nagfor -o /tmp/f90_util /usr/local/packages6/compilers/NAG/6.0/lib/NAG_Fortran/f90_util.f90 /tmp/f90_util

PGI Compilers

The PGI Compiler suite offers C,C++ and Fortran Compilers. For full details of the features of this compiler suite, see PGI’s website.

Making the PGI Compilers available After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh or qrsh command. To make one of the versions of the PGI Compiler Suite available, run one of the following module commands

154 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

module load compilers/pgi/15.10 module load compilers/pgi/15.7 module load compilers/pgi/14.4 module load compilers/pgi/13.1

Once you’ve loaded the module, you can check the version with pgcc--version

Compilation examplesC To compile a C hello world example into an executable called hello using the PGI C compiler pgcc hello.c -o hello

C++ To compile a C++ hello world example into an executable called hello using the PGI C++ compiler pgc++ hello.cpp -o hello

Fortran To compile a Fortran hello world example into an executable called hello using the PGI Fortran compiler pgf90 hello.f90 -o hello

Additional resources • Using the PGI Compiler with GPUs on Iceberg

Installation Notes Version 15.10 The interactive installer was slightly different to that of 15.7 (below) but the questions and answers were essentially the same. Version 15.7 The installer is interactive. Most of the questions are obvious. Here is how I answered the rest Installation type A network installation will save disk space by having only one copy of the compilers and most of the libraries for all compilers on the network, and the main installation needs to be done once for all systems on the network.

1 Single system install 2 Network install

Please choose install option: 1

Path Please specify the directory path under which the software will be installed. The default directory is /opt/pgi, but you may install anywhere you wish, assuming you have permission to do so.

Installation directory? [/opt/pgi] /usr/local/packages6/compilers/pgi

2.2. Iceberg 155 Sheffield HPC Documentation, Release

CUDA and AMD components Install CUDA Toolkit Components? (y/n) y Install AMD software components? (y/n) y

AMCL version This PGI version links with ACML 5.3.0 by default. Also available: (1) ACML 5.3.0 (2) ACML 5.3.0 using FMA4 Enter another value to override the default (1) 1

Other questions Install JAVA JRE [yes] yes Install OpenACC Unified Memory Evaluation package? (y/n) n Do you wish to update/create links in the 2015 directory? (y/n) y Do you wish to install MPICH? (y/n) y Do you wish to generate license keys? (y/n) n Do you want the files in the install directory to be read-only? (y/n) n

The license file is on the system at /usr/local/packages6/compilers/pgi/license.dat and is a 5 seat network license. Licenses are only used at compile time.

Extra install steps Unlike gcc, the PGI Compilers do not recognise the environment variable LIBRARY_PATH which is used by a lot of installers to specify the locations of libraries at compile time. This is fixed by creating a siterc file at /usr/local/packages6/compilers/pgi/linux86-64/VER/bin/siterc with the following contents # get the value of the environment variable LIBRARY_PATH variable LIBRARY_PATH is environment(LD_LIBRARY_PATH); variable inc_path is environment(CPATH);

# split this value at colons, separate by -L, prepend 1st one by -L variable library_path is default($if($LIBRARY_PATH,-L$replace($LIBRARY_PATH,":", -L)));

# add the -L arguments to the link line append LDLIBARGS=$library_path; append SITEINC=$inc_path;

Where VER is the version number in question: 15.7, 15.10 etc At the time of writing (August 2015), this is documented on PGI’s website.

Modulefile Version 15.10 The PGI compiler installer creates a suitable modulefile that’s configured to our system. It puts it at /usr/local/packages6/compilers/pgi/modulefiles/pgi64/15.10 so all that is required is to copy this to where we keep modules at /usr/local/modulefiles/compilers/pgi/15.10 Version 15.7 The PGI compiler installer creates a suitable modulefile that’s configured to our system. It puts it at /usr/local/packages6/compilers/pgi/modulefiles/pgi64/15.7 so all that is required is to copy this to where we keep modules at /usr/local/modulefiles/compilers/pgi/15.7

156 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

MPI

OpenMPI (gcc version)

OpenMPI (gcc version) Latest Version 1.10.1 Dependancies gcc URL http://www.open-mpi.org/

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

Module files The latest version is made available using module load mpi/gcc/openmpi

Alternatively, you can load a specific version using one of module load mpi/gcc/openmpi/1.10.1 module load mpi/gcc/openmpi/1.10.0 module load mpi/gcc/openmpi/1.8.8 module load mpi/gcc/openmpi/1.8.3 module load mpi/gcc/openmpi/1.6.4 module load mpi/gcc/openmpi/1.4.4 module load mpi/gcc/openmpi/1.4.3

Installation notes These are primarily for administrators of the system. Version 1.10.1 Compiled using gcc 4.4.7. • install_openMPI_1.10.1.sh Downloads, compiles and installs OpenMPI 1.10.0 using the system gcc. • Modulefile 1.10.1 located on the system at /usr/local/modulefiles/mpi/gcc/openmpi/1.10.1 Version 1.10.0 Compiled using gcc 4.4.7. • install_openMPI_1.10.0.sh Downloads, compiles and installs OpenMPI 1.10.0 using the system gcc. • Modulefile 1.10 located on the system at /usr/local/modulefiles/mpi/gcc/openmpi/1.10.0 Version 1.8.8 Compiled using gcc 4.4.7. • install_openMPI.sh Downloads, compiles and installs OpenMPI 1.8.8 using the system gcc. • Modulefile located on the system at /usr/local/modulefiles/mpi/gcc/openmpi/1.8.8

2.2. Iceberg 157 Sheffield HPC Documentation, Release

OpenMPI (PGI version)

OpenMPI (PGI version) Latest Version 1.8.8 Support Level FULL Dependancies PGI URL http://www.open-mpi.org/

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers. These versions of OpenMPI make use of the PGI compiler suite.

Module files The latest version is made available using module load mpi/pgi/openmpi

Alternatively, you can load a specific version using one of module load mpi/pgi/openmpi/1.8.8 module load mpi/pgi/openmpi/1.6.4

Installation notes These are primarily for administrators of the system. Version 1.8.8 Compiled using PGI 15.7. • install_openMPI.sh Downloads, compiles and installs OpenMPI 1.8.8 using v15.7 of the PGI Compiler. • Modulefile located on the system at /usr/local/modulefiles/mpi/pgi/openmpi/1.8.8 Installation notes for older versions are not available. There are many versions of applications, libraries and compilers installed on the iceberg cluster. In order to avoid conflict between these software items we employ a system called modules. Having logged onto a worker nodes, users are advised to select the software (and the version of the software) they intend to use by utilising the module command. • Modules on Iceberg • Applications on Iceberg • Development Tools on Iceberg (including compilers) • Libraries • Scheduler Commands for interacting with the scheduler • MPI Parallel Programming

2.2.2 Accessing Iceberg

158 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Note: If you have not connected to iceberg before the Getting Started documentation will guide you through the process.

Below is a list of the various ways to connect to or use iceberg:

Terminal Access

Warning: This page is a stub and needs writing. Please send a PR.

JupyterHub

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. It is an excellent interface to explore your data and share your research. The JupyterHub is a web interface for running notebooks on the iceberg cluster. It allows you to login and have a notebook server started up on the cluster, with access to both your data stores and the resources of iceberg.

Warning: This service is currenty experimental, if you use this service and encounter a problem, please provide feedback to research-it@sheffield.ac.uk.

Logging into the JupyterHub

To get started visit https://jupyter.shef.ac.uk and log in with your university account. A notebook session will be submitted to the iceberg queue once you have logged in, this can take a minute or two, so the page may seem to be loading for some time.

Note: There is currently no way of specifying any options to the queue when submitting the server job. So you can not increase the amount of memory assigned or the queue the job runs in. There is an ongoing project to add this functionality.

Using the Notebook on iceberg

To create a new notebook session, select the “New” menu on the right hand side of the top menu.

2.2. Iceberg 159 Sheffield HPC Documentation, Release

You will be presented with a menu showing all available conda environments that have Jupyter available. To learn more about installing custom Python environments with conda see Python. To learn how to use the Jupyter Notebook see the official documentation.

Using the Notebook Terminal In the “New” menu it is possible to start a terminal, this is a fully featured web terminal, running on a worker node. You can use this terminal to perform any command line only operation on iceberg. Including configuring new Python environments and installing packages.

Troubleshooting

Below is a list of common problems: 1. If you modify the PYTHON_PATH variable in your .bashrc file your jupyter server may not start correctly, and you may encounter a 503 error after logging into the hub. The solution to this is to remove these lines from your .bashrc file. 2. If you have previously tried installing and running Jupyter yourself (i.e. not using this JupyterHub interface) then you may get 503 errors when connecting to JupyterHub due to the old .jupyter profile in your home directory; if you then find a jupyter log file in your home directory containing SSL WRONG_VERSION_NUMBER then try deleting (or renaming) the .jupyter directory in your home directory.

Remote Visualisation

Warning: This page is a stub and needs writing. Please send a PR.

2.2.3 Filestore on Iceberg

Every user on the system has access to three different types of filestore. They differ in terms of the amount of space available, the speed of the underlying storage system, frequency of backup and the time that the data can be left there. Here are the current details of filestore available to each user.

Home directory

All users have a home directory in the location /home/yourusername. The filestore quota is 10 GB per user. Backup policy: /home has backup snapshots taken every 4 hours and we keep the 10 most recent. /home also has daily snapshots taken each night, and we keep 28 days worth, mirrored onto a separate storage system.

160 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The filesystem is NFS.

Data directory

Every user has access to a much larger data-storage area provided at the location /data/yourusername. The quota for this area is 100 GB per user. Backup policy: /data has snapshots taken every 4 hours and we keep the 10 most recent. /data also has daily snapshots taken each night, and we keep 7 days worth, but this is not mirrored. The filesystem is NFS. Note: the directory /data/yourusername is made available to you (mounted) on demand: if you list the contents of /data after first logging on then this subdirectory might not be shown. However, if you list the contents of /data/yourusername itself or change into that directory then its contents will appear. Later on if you list the contents of /data again you may find that /data/yourusername has disappeared again, as it is automatically unmounted following a period of inactivity.

Fastdata directory

All users also have access to a large fast-access data storage area under /fastdata. In order to avoid interference from other users’ files it is vitally important that you store your files in a directory created and named the same as your username. e.g. mkdir/fastdata/yourusername

By default the directory you create will have world-read access - if you want to restrict read access to just your account then run chmod 700 /fastdata/yourusername

after creating the directory. A more sophisticated sharing scheme would have private and public directories mkdir /fastdata/yourusername mkdir /fastdata/yourusername/public mkdir /fastdata/yourusername/private

chmod 755 /fastdata/yourusername chmod 755 /fastdata/yourusername/public chmod 700 /fastdata/yourusername/private

The fastdata area provides 260 Terabytes of storage in total and takes advantage of the internal infiniband network for fast access to data. Although /fastdata is available on all the worker nodes, only by accessing from the Intel-based nodes ensures that you can benefit from these speed improvements. There are no quota controls on the /fastdata area but files older than 3 months will be automatically deleted without warning. We reserve the right to change this policy without warning in order to ensure efficient running of the service. You can use the lfs command to find out which files under /fastdata are older than a certain number of days and hence approaching the time of deletion. For example, to find files 50 or more days old lfs find -ctime +50 /fastdata/yourusername

2.2. Iceberg 161 Sheffield HPC Documentation, Release

/fastdata uses the Lustre filesystem. This does not support POSIX locking which can cause issues for some applications. fastdata is optimised for large file operations and does not handle lots of small files very well. An example of how slow it can be for large numbers of small files is detailed at http://www.walkingrandomly.com/?p=6167

Shared directories

When you purchase an extra filestore from CiCS you should be informed of its name. Once you know this you can access it • as a Windows-style (SMB) file share on machines other than Iceberg using \\uosfstore.shef.ac.uk\shared\ • as a subdirectory of /shared on Iceberg. Note that this subdirectory will be mounted on demand on Iceberg: it will not be visible if you simply list the contents of the /shared directory but will be accessible if you cd (change directory) into it e.g. cd /shared/my_group_file_share1 A note regarding permissions: behind the scenes, the file server that provides this shared storage manages permissions using Windows-style ACLs (which can be set by area owners via Group Management web interface). However, the filesystem is mounted on a Linux cluster using NFSv4 so the file server therefore requires a means for mapping Windows-style permissions to Linux ones. An effect of this is that the Linux mode bits as seen on Iceberg are not always to be believed for files under /shared: the output of ls -l somefile.sh may indicate that a file is readable/writable/executable when the ACLs are what really determine access permissions. Most applications have robust ways of checking for properties such as executability but some applications can cause problems when accessing files/directories on /shared by naievely checking permissions just using Linux mode bits: • which: a directory under /shared may be on your path and you may be able to run a contained executable without prefixing it with a absolute/relative directory but which may fail to find that executable. • Perl: scripts that check for executability of files on /shared using -x may fail unless Perl is explicitly told to test for file permissions in a more thorough way (see the mention of use filetest ’access’ here).

Determining your current filestore allocation

To find out your current filestore quota allocation and usage type quota.

If you exceed your file storage allocation

As soon as the quota is exceeded your account becomes frozen. In order to avoid this situation it is strongly recom- mended that you • Use the quota command to check your usage regularly. • Copy files that do not need to be backed to the /data/username area, or remove them from iceberg com- pletely.

Efficiency considerations - The /scratch areas

For jobs requiring a lot of Input and Output (I/O), it may sometimes be necessary to store copies of the data on the actual compute node on which your job is running. For this, you can create temporary areas of storage under the directory /scratch. The /scratch area is local to each worker node and is not visible to the other worker nodes or to the head-nodes. Therefore any data created by jobs should be transferred to either your /data or /home area before the job finishes if you wish to keep them.

162 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

The next best I/O performance that requires the minimum amount of work is achieved by keeping your data in the /fastdata area and running your jobs on the new intel nodes by specifying -l arch=intel in your job submis- sion script. These methods provide much faster access to data than the network attached storage on either /home or /data areas, but you must remember to copy important data back onto your /home area. If you decide to use the /scratch area we recommend that under /scratch you create a directory with the same name as your username and work under that directory to avoid the possibility of clashing with other users. Anything under the /scratch is deleted periodically when the worker-node is idle, whereas files on the /fastdata area will be deleted only when they are 3 months old. \scratch uses the ext4 filesystem.

Recovering snapshots

We take regular back-ups of your /home and /data directories and it is possible to directly access a limited subset of them. There are 7 days worth of snapshots available in your /home and /data directories in a hidden directory called .snapshot. You need to explicitly cd into this directory to get at the files: cd /home/YOURUSERNAME/.snapshot

The files are read-only. This allows you to attempt recover any files you might have accidentally deleted recently. This does not apply for /fastdata for which we take no back-ups.

2.2.4 Iceberg specifications

• Total CPUs: 3440 cores • Total GPUs: 16 units • Total Memory: 31.8 TBytes • Permanent Filestore: 45 TBytes • Temporary Filestore: 260 TBytes • Physical size: 8 Racks • Maximum Power Consumption: 83.7 KWatts • All nodes are connected via fast infiniband. For reliability, there are two iceberg head-nodes ‘for logging in’ configured to take over from each other in case of hardware failure.

Worker Nodes CPU Specifications

Intel Ivybridge based nodes • 92 nodes, each with 16 cores and 64 GB of total memory (i.e. 4 GB per core). • 4 nodes, each with 16 cores and 256 GB of total memory (i.e. 16GB per core). • Each node uses2 of Intel E5 2650V2 8-core processors (hence 2*8=16 cores). • Scratch space on local disk of on each node is 400 GB

2.2. Iceberg 163 Sheffield HPC Documentation, Release

Intel Westmere based nodes • 103 nodes, each with 12 cores and 24 GB of total memory ( i.e. 2 GB per core ) • 4 nodes with 12 cores and 48 GB of total memory ( i.e. 4GB per core ) • Each node uses 2 of Intel X5650 6-core processors ( hence 2*6=12 cores )

GPU Units Specifications

8 Nvidia Tesla Kepler K40Ms GPU units • Each GPU unit contains 2880 thread processor cores • Each GPU unit has 12GB of GDR memory. Hence total GPU memory is 8*12=96 GB • Each GPU unit is capable of about 4.3TFlop of single precision floating point performance, or 1.4TFlops at double precision. 8 Nvidia Tesla Fermi M2070s GPU units • Each GPU unit contains 448 thread processor cores • Each GPU unit contains 6GB of GDR memory. Hence total GPU memory is 8*6=48 GB • Each GPU unit is capable of about 1TFlop of single precision floating point performance, or 0.5TFlops at double precision.

Software and Operating System

Users normally log into a head node and then use one (or more) of the worker nodes to run their jobs on. Scheduling of users’ jobs on the worker nodes are managed by the ‘Sun Grid Engine’ software. Jobs can be run interactively ( qsh ) or submitted as batch jobs ( qsub ). • The operating system is 64-bit Scientific Linux (which is Redhat based) on all nodes • The Sun Grid Engine for batch and interactive job scheduling • Many Applications, Compilers, Libraries and Parallel Processing Tools. See Software on iceberg

2.3 ShARC

The University of Sheffield’s new HPC service, ShARC, is currently under development and is not yet ready for users to access. Please use our current system, Iceberg, for now.

2.3.1 Software on ShARC

These pages list the software available on ShARC. If you notice an error or an omission, or wish to request new software please create an issue on our github page

Applications on Sharc

R

164 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

R URL http://www.r-project.org/ Documentation http://www.r-project.org/

R is a statistical computing language.

Interactive Usage After connecting to ShARC, start an interactive session with the qrshx command. The latest version of R can be loaded with module load apps/R

Alternatively, you can load a specific version of R using one of the following module load apps/R/3.3.2/gcc-4.8.5

R can then be run with $ R

Serial (one CPU) Batch usage Here, we assume that you wish to run the program my_code.r on the system. With batch usage it is recommended to load a specific version of R, for example module load apps/R/3.3.2, to ensure the expected output is achieved. First, you need to write a batch submission file. We assume you’ll call this my_job.sge #!/bin/bash #$ -cwd # Run job from current directory #$ -l rmem=4G # Request 4 gigabytes of memory

module load apps/R/3.3.2/gcc-4.8.5 # Recommended to load a specific version of R

R CMD BATCH my_code.r my_code.r.o$JOB_ID

Note that R must be called with both the CMD and BATCH options which tell it to run an R program, in this case my_code.r. If you do not do this, R will attempt to open an interactive prompt. The final argument, my_code.r.o$JOBID, tells R to send output to a file with this name. Since $JOBID will always be unique, this ensures that all of your output files are unique. Without this argument R sends all output to a file called my_code.Rout. Ensuring that my_code.r and my_job.sge are both in your current working directory, submit your job to the batch system qsub my_job.sge

Replace my_job.sge with the name of your submission script.

Graphical output By default, graphical output from batch jobs is sent to a file called Rplots.pdf

Installing additional packages As you will not have permissions to install packages to the default folder, additional R packages can be installed to your home folder ~/. To create the appropriate folder, install your first package in R in interactive mode. Load an interactive R session as described above, and install a package with

2.3. ShARC 165 Sheffield HPC Documentation, Release

install.packages()

You will be prompted to create a personal package library. Choose yes. The package will download and install from a CRAN mirror (you may be asked to select a nearby mirror, which you can do simply by entering the number of your preferred mirror). Once the chosen package has been installed, additional packages can be installed either in the same way, or by creating a .R script. An example script might look like install.packages("dplyr") install.packages("devtools")

Call this using source(). For example if your script is called packages.R and is stored in your home folder, source this from an interactive R session with source("~/packages.R")

These additional packages will be installed without prompting to your personal package library. To check your packages are up to date, and update them if necessary, run the following line from an R interactive session update.packages(lib.loc="~/R/x86_64-unknown-linux-gnu-library/3.3/")

The folder name after ~/R/ will likely change, but this can be completed with tab autocompletion from the R session. Ensure lib.loc folder is specified, or R will attempt to update the wrong library.

Using the Rmath library in C Programs The Rmath library allows you to access some of R’s functionality from a C program. For example, consider the C-program below #include #define MATHLIB_STANDALONE #include "Rmath.h"

main(){ double shape1,shape2,prob;

shape1 = 1.0; shape2 = 2.0; prob = 0.5;

printf("Critical value is %lf\n",qbeta(prob,shape1,shape2,1,0)); }

This makes use of R’s qbeta function. You can compile and run this on a worker node as follows. Start a session on a worker node with qrsh or qsh and load the R module module load apps/R/3.3.2/gcc-4.8.5

Assuming the program is called test_rmath.c, compile with gcc test_rmath.c -lRmath -lm -o test_rmath

For full details about the functions made available by the Rmath library, see section 6.7 of the document Writing R extensions

166 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Installation Notes These notes are primarily for administrators of the system. version 3.3.2 • What’s new in R version 3.3.2 This was a scripted install. It was compiled from source with gcc 4.8.5 and with –enable-R-shlib enabled. It was run in batch mode. • install_r_3.3.2_gcc4.8.5.sh Downloads, compiles, tests and installs R 3.3.2 and the Rmath library. • R 3.3.2 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.3.2/ • Install log-files, including the output of the make check tests are available on the system at /usr/local/packages/apps/R/3.3.2/gcc-4.8.5/install_logs/

Java

Java Latest Version 1.8.0_102 URL https://www.java.com/en/download/

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command. The default version of Java (which is also the most recent version; currently 1.8.0_102) is made available with the command module load apps/java

Alternatively, you can explicitly load this version using: module load apps/java/1.8.0_102/binary

Explicitly loading a version will become more useful once multiple versions of Java are installed on this system. Check that you have the version you expect. First, the runtime $ java -version

java version "1.8.0_102" Java(TM) SE Runtime Environment (build 1.8.0_102-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

Now, the compiler $ javac -version

javac 1.8.0_102

Virtual Memory Those who have used Java on iceberg will be aware that Java’s default settings regarding memory usage were changed to prevent it from requesting lots of virtual memory at startup (which often exceeded the user’s virtual memory limit set by the scheduler, causing his/her job to fail).

2.3. ShARC 167 Sheffield HPC Documentation, Release

On ShARC, Java’s memory usage settings do not need to be changed from their defaults as ShARC’s scheduler only monitors and manages real memory and not virtual memory, and Java’s initial real memory requirements are more modest. See What is rmem ( real_memory) and mem ( virtual_memory) for explanations of real and virtual memory.

Installation notes These are primarily for administrators of the system. Java 1.8.0_102 1. Download Java SE Development Kit 8u102 from Oracle. Select the tarball (jdk-8u102-linux-x64.tar.gz) for Linux and the x64 CPU architecture family. 2. Save this file to /usr/local/media/java/1.8.0_102/. 3. Install Java using this script. 4. Install this modulefile as /usr/local/modulefiles/apps/java/jdk1.8.0_102/binary

Python

Python Support Level Core URL https://python.org

This page documents the “miniconda” installation on ShARC. This is the recommended way of using Python, and the best way to be able to configure custom sets of packages for your use. “conda” a Python package manager, allows you to create “environments” which are sets of packages that you can modify. It does this by installing them in your home area. This page will guide you through loading conda and then creating and modifying environments so you can install and use whatever Python packages you need.

Using conda Python After connecting to ShARC (see Connect to iceberg), start an interactive session with the qsh or qrsh command. Conda Python can be loaded with: module load apps/python/conda

The root conda environment (the default) provides Python 3 and no extra modules, it is automatically updated, and not recommended for general use, just as a base for your own environments. There is also a python2 environment, which is the same but with a Python 2 installation.

Quickly Loading Anaconda Environments There are a small number of environments provided for everyone to use, these are the default root and python2 environments as well as various versions of Anaconda for Python 3 and Python 2. The anaconda environments can be loaded through provided module files: module load apps/python/anaconda2-4.2.0 module load apps/python/anaconda3-4.2.0

Where anaconda2 represents Python 2 installations and anaconda3 represents Python 3 installations. These commands will also load the apps/python/conda module and then activate the anaconda environment specified.

168 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Note: Anaconda 2.5.0 and higher are compiled with Intel MKL libraries which should result in higher numerical performance.

Using conda Environments Once the conda module is loaded you have to load or create the desired conda environ- ments. For the documentation on conda environments see the conda documentation. You can load a conda environment with: source activate python2

where python2 is the name of the environment, and unload one with: source deactivate

which will return you to the root environment. It is possible to list all the available environments with: conda env list

Provided system-wide are a set of anaconda environments, these will be installed with the anaconda version number in the environment name, and never modified. They will therefore provide a static base for derivative environments or for using directly.

Creating an Environment Every user can create their own environments, and packages shared with the system- wide environments will not be reinstalled or copied to your file store, they will be symlinked, this reduces the space you need in your /home directory to install many different Python environments. To create a clean environment with just Python 2 and numpy you can run: conda create -n mynumpy python=2.7 numpy

This will download the latest release of Python 2.7 and numpy, and create an environment named mynumpy. Any version of Python or list of packages can be provided: conda create -n myscience python=3.5 numpy=1.8.1 scipy

If you wish to modify an existing environment, such as one of the anaconda installations, you can clone that envi- ronment: conda create --clone anaconda3-4.2.0 -n myexperiment

This will create an environment called myexperiment which has all the anaconda 4.2.0 packages installed with Python 3.

Installing Packages Inside an Environment Once you have created your own environment you can install addi- tional packages or different versions of packages into it. There are two methods for doing this, conda and pip, if a package is available through conda it is strongly recommended that you use conda to install packages. You can search for packages using conda: conda search pandas

then install the package using:

2.3. ShARC 169 Sheffield HPC Documentation, Release

conda install pandas

if you are not in your environment you will get a permission denied error when trying to install packages, if this happens, create or activate an environment you own. If a package is not available through conda you can search for and install it using pip, i.e.: pip search colormath

pip install colormath

Using Python with MPI There is an experimental set of packages for conda that have been compiled by the RCG team, which allow you to use a MPI stack entirely managed by conda. This allows you to easily create complex evironments and use MPI without worrying about other modules or system libraries. To get access to these packages you need to run the following command to add the repo to your conda config: conda config --add channels file:///usr/local/packages/apps/conda/conda-bld/

you should then be able to install the packages with the openmpi feature, which currently include openmpi, hdf5, mpi4py and h5py: conda create -n mpi python=3.5 openmpi mpi4py

Currently, there are Python 2.7 and 3.5 versions of mpi4py and h5py compiled in this repository. The build scripts for these packages can be found in this GitHub repository.

Installation Notes These are primarily for administrators of the system. The conda package manager is installed in /usr/share/packages/apps/conda, it was installed using the miniconda installer. It is important to regularly update the root environment to keep the conda package manager up to date. To do this login as a sa_ account (with write permissions to /usr/local/packages/apps/conda) and run: $ conda update --all $ conda update conda

Installing a New Version of Anaconda Run the following as a sa_ user (with write permissions to /usr/local/packages/apps/conda: $ conda create -n anaconda3- python=3 anaconda= $ conda create -n anaconda2- python=2 anaconda=

Then copy the modulefile for the previous version of anaconda to the new version and update the name of the environ- ment. Also you will need to append the new module to the conflict line in apps/python/.conda-environments.tcl.

Libraries on Sharc

Intel Data Analytics Acceleration Library

Intel’s Data Analytics Acceleration Library (DAAL) provides functions for data analysis (characterization, summa- rization, and transformation) and machine learning (regression, classification, and more).

170 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Parallel Studio Composer Edition version DAAL can be used with and without other Parallel Studio packages. To access it: module load libs/intel-daal/2017.0/binary

Licensing and availability See the information on Parallel Studio licensing.

Installation Notes The following notes are primarily for system administrators. Intel DAAL 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/intel-daal/2017.0/binary.

Intel Integrated Performance Primitives

Integrated Performance Primitives (IPP) are “high-quality, production-ready, low-level building blocks for image pro- cessing, signal processing, and data processing (data compression/decompression and cryptography) applications.”

Parallel Studio Composer Edition version IPP can be used with and without other Parallel Studio packages. To access it: module load libs/intel-ipp/2017.0/binary

Licensing and availability See the information on Parallel Studio licensing.

Installation Notes The following notes are primarily for system administrators. Intel IPP 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/intel-ipp/2017.0/binary.

Intel Math Kernel Library

Intel’s Math Kernel Library (MKL) provides highly optimized, threaded and vectorized functions to maximize per- formance on each processor family. It Utilises de-facto standard C and Fortran APIs for compatibility with BLAS, LAPACK and FFTW functions from other math libraries.

Parallel Studio Composer Edition version MKL can be used with and without other Parallel Studio packages. To access it: module load libs/intel-mkl/2017.0/binary

Sample C and Fortran programs demonstrating matrix multiplication are available for version 2017.0 in the directory $MKL_SAMPLES: $ ls $MKL_SAMPLES/ mkl_c_samples mkl_fortran_samples

2.3. ShARC 171 Sheffield HPC Documentation, Release

To compile one of these you need to copy the samples to a directory you can write to, ensure the MKL and the Intel compilers are both loaded, change into the directory containing the relevant sample program (C or Fortran) then run make to compile: $ qrsh $ cp -r $MKL_SAMPLES/ ~/mkl_samples $ module load dev/intel-compilers/17.0.0 $ module load libs/intel-mkl/2017.0/binary $ cd ~/mkl_samples/mkl_fortran_samples/matrix_multiplication $ make ifort -c src/dgemm_example.f -o release/dgemm_example.o ifort release/dgemm_example.o -mkl -static-intel -o release/dgemm_example ifort -c src/dgemm_with_timing.f -o release/dgemm_with_timing.o ifort release/dgemm_with_timing.o -mkl -static-intel -o release/dgemm_with_timing ifort -c src/matrix_multiplication.f -o release/matrix_multiplication.o ifort release/matrix_multiplication.o -mkl -static-intel -o release/matrix_multiplication ifort -c src/dgemm_threading_effect_example.f -o release/dgemm_threading_effect_example.o ifort release/dgemm_threading_effect_example.o -mkl -static-intel -o release/dgemm_threading_effect_example

You should then find several compiled programs in the release directory: $ ./release/matrix_multiplication This example measures performance of computing the real matrix product C=alpha*A*B+beta*C using a triple nested loop, where A, B, and C are matrices and alpha and beta are double precision scalars

Initializing data for matrix multiplication C=A*B for matrix A( 2000 x 200) and matrix B( 200 x 1000)

Intializing matrix data

Making the first run of matrix product using triple nested loop to get stable run time measurements

Measuring performance of matrix product using triple nested loop

== Matrix multiplication using triple nested loop == == completed at 184.42246 milliseconds ==

Example completed.

For documentation and a tutorial, start an interactive session using qsh, load MKL, then run: $ firefox $MKL_SAMPLES/mkl_fortran_samples/matrix_multiplication/tutorial/en/index.htm

Licensing and availability See the information on Parallel Studio licensing.

Installation Notes The following notes are primarily for system administrators. Intel MKL 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/intel-mkl/2017.0/binary.

172 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Intel Threading Building Blocks

Threading Building Blocks (TBB) lets you write “parallel C++ programs that take full advantage of multicore perfor- mance, that are portable and composable, and that have future-proof scalability.”

Interactive usage TBB can be used with and without other Parallel Studio packages. To make use of it you first need to start an interactive session that has access to multiple cores. Here we request six cores from the clusters’ scheduler: $ qrsh -pe openmp 6

Next, we need to active a specific version of TBB: $ module load libs/intel-tbb/2017.0/binary

You can find sample TBB programs in the directory $TBB_SAMPLES: $ ls $TBB_SAMPLES

common concurrent_hash_map concurrent_priority_queue GettingStarted graph index.html license.txt parallel_do parallel_for parallel_reduce pipeline task task_arena task_group test_all

We can compile and run a sample program by copying the samples to our (writable) home directory, moving in to the directory containing the sample program (which should have a Makefile) then running make: $ cp -r $TBB_SAMPLES ~/tbb_samples $ cd ~/tbb_samples/GettingStarted/sub_string_finder/ $ ls

license.txt Makefile readme.html sub_string_finder.cpp sub_string_finder_extended.cpp sub_string_finder_pretty.cpp

$ make

g++ -O2 -DNDEBUG -o sub_string_finder sub_string_finder.cpp -ltbb -lrt g++ -O2 -DNDEBUG -o sub_string_finder_pretty sub_string_finder_pretty.cpp -ltbb -lrt g++ -O2 -DNDEBUG -o sub_string_finder_extended sub_string_finder_extended.cpp -ltbb -lrt ./sub_string_finder_extended Done building string. Done with serial version. Done with parallel version. Done validating results. Serial version ran in 3.79692 seconds Parallel version ran in 0.794959 seconds Resulting in a speedup of 4.77625

Many of the sample directories contain HTML documentation. To read this you need to start an interactive graphical session (using qsh or qrshx) then run: $ firefox ~/tbb_samples/index.html

Tutorials See Chrys Woods’ excellent tutorials for C++ programmers and Python programmers.

Licensing and availability See the information on Parallel Studio licensing.

2.3. ShARC 173 Sheffield HPC Documentation, Release

Installation Notes The following notes are primarily for system administrators. Intel TBB 2017.0 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/libs/intel-tbb/2017.0/binary.

Development Tools on ShARC

GNU Compiler Collection (gcc)

The GNU Compiler Collection (gcc) is a widely used, free collection of compilers including C (gcc), C++ (g++) and Fortran (gfortran). The defaut version of gcc on the system is 4.8.5 $ gcc -v

Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)

It is possible to switch to other versions of the gcc compiler suite using modules. After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qrsh or qsh command. Choose the version of the compiler you wish to use using one of the following commands module load dev/gcc/6.2 module load dev/gcc/5.4

Confirm that you’ve loaded the version of gcc you wanted using gcc -v.

Documentation man pages are available on the system. Once you have loaded the required version of gcc, type man gcc

• What’s new in the gcc version 6 series? • What’s new in the gcc version 5 series?

Installation Notes These notes are primarily for system administrators • gcc version 6.2 was installed using : • this script • gcc 6.2 modulefile saved as /usr/local/modulefiles/dev/gcc/6.2‘ • gcc version 5.4 was installed using : • this script • this modulefile, saved as /usr/local/modulefiles/dev/gcc/5.4

174 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Intel Compilers

Intel Compilers help create C, C++ and Fortran applications that can take full advantage of the advanced hardware capabilities available in Intel processors and co-processors. They also simplify that development by providing high level parallel models and built-in features like explicit vectorization and optimization reports.

Parallel Studio Composer Edition 2017 versions These versions of the Intel compilers were installed as part of Intel Parallel Studio but can be used with or without the other Parallel Studio components. After connecting to the ShARC cluster (see Connect to iceberg), start an interactive session with the qsh or qrsh command then run dev/intel-compilers/17.0.0/binary

Compilation examplesC To compile the C hello world example into an executable called hello using the Intel C compiler icc hello.c -o hello

C++ To compile the C++ hello world example into an executable called hello using the Intel C++ compiler icpc hello.cpp -o hello

Fortran To compile the Fortran hello world example into an executable called hello using the Intel Fortran compiler ifort hello.f90 -o hello

Detailed Documentation Once you have loaded the module on Iceberg, man pages are available for Intel compiler products man ifort man icc

The following links are to Intel’s website: • User and Reference Guide for the Intel® C++ Compiler 17.0 • User and Reference Guide for the Intel® Fortran Compiler 17.0 • Step by Step optimizing with Intel C++ Compiler

Licensing and availability See the information on Parallel Studio licensing.

Related Software on the system TODO: add link to NAG Fortran Library (serial)

Installation Notes The following notes are primarily for system administrators. Intel Compilers 2017 Installed as part of Parallel Studio Composer Edition 2017. This modulefile was installed as /usr/local/modulefiles/dev/intel-compilers/17.0.0.

2.3. ShARC 175 Sheffield HPC Documentation, Release

Intel Parallel Studio

Intel Parallel Studio XE is a software development suite that helps boost application performance by taking advantage of the ever-increasing processor core count and vector register width available in Intel Xeon processors and other compatible processors. Its core components are compilers and libraries for fast linear algebra, data transformation and parallelisation.

Composer Edition 2017 This includes: • Intel C++ and Fortran compilers: these help create C, C++ and Fortran applications that “can take full advan- tage of the advanced hardware capabilities available in Intel processors and co-processors. They also simplify that development by providing high level parallel models and built-in features like explicit vectorization and optimization reports”. • Data Analytics Acceleration Library (DAAL): functions for data analysis (characterization, summarization, and transformation) and Machine Learning (regression, classification, and more). • Math Kernel Library (MKL): This library provides highly optimized, threaded and vectorized functions to max- imize performance on each processor family. Utilizes de facto standard C and Fortran APIs for compatibility with BLAS, LAPACK and FFTW functions from other math libraries. • Integrated Performance Primitives (IPP): “high-quality, production-ready, low-level building blocks for image processing, signal processing, and data processing (data compression/decompression and cryptography) appli- cations.” • Threading Building Blocks (TBB): lets you write “parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.” It does not include Intel’s MPI implementation. See Intel’s site for further details of what Parallel Studio Composer Edition includes.

Licensing and availability Some components of Parallel Studio are freely available for those only wanting commu- nity support; other components, such as the compilers are commercial software. The University has purchased a number of Intel-supported licenses for the Parallel Studio Composer Edition products listed above. These can be accessed by (and are shared between) the University’s HPC clusters and researchers’ own machines.

Installation Notes The following notes are primarily for system administrators. Parallel Studio XE Composer Edition 2017.0 1. Download parallel_studio_xe_2017_composer_edition.tgz from the Intel Portal, save in /usr/local/media/protected/intel/2017.0/ then make it only readable by the app-admins group. 2. Ensure details of the Intel license server are in the file /usr/local/packages/dev/intel-pe-xe-ce/license.lic 3. Run this script. This installs Parallel Studio into /usr/local/packages/dev/intel-pe-xe-ce/2017.0/binary/. Products are activated using the aforementioned license file during the installation process. 4. Install several modulefiles to locations under /usr/local/modulefiles. These modulefiles set the INTEL_LICENSE_FILE environment variable to the location of the aforementioned license file and set other environment variables required by the different Parallel Studio products. There is one modulefile for all Parallel Studio software and other modulefiles for specific products. • The Compilers modulefile should be installed as /usr/local/modulefiles/dev/intel-compilers/17.0.0. • The DAAL modulefile should be installed as /usr/local/modulefiles/libs/intel-daal/2017.0/binary.

176 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

• The IPP modulefile should be installed as /usr/local/modulefiles/libs/intel-ipp/2017.0/binary. • The MKL modulefile should be installed as /usr/local/modulefiles/libs/intel-mkl/2017.0/binary. • The TBB modulefile should be installed as /usr/local/modulefiles/libs/intel-tbb/2017.0/binary. • See the (TCL) modulefiles for details of how they were derived from Intel-supplied environment- manipulating shell scripts. 5. Check that licensing is working by activating the Intel Compilers modulefile then try compiling a trivial C program using the icc compiler.

NAG Fortran Compiler

The NAG Fortran Compiler is robust, highly tested, and valued by developers all over the globe for its checking capa- bilities and detailed error reporting. The Compiler is available on Unix platforms as well as for Microsoft Windows and Apple Mac platforms. Release 6.0 has extensive support for both legacy and modern Fortran features, and also supports parallel programming with OpenMP.

Making the NAG Compiler available After connecting to ShARC (see Connect to iceberg), start an interactive sesssion with the qsh or qrsh command. To make a version of the NAG Fortran Compiler available, run one of following module commands: module load dev/NAG/6.1 module load dev/NAG/6.0

Compilation examples To compile the Fortran hello world example into an executable called hello using the NAG compiler nagfor hello.f90 -o hello

Detailed Documentation Once you’ve run one of the NAG Compiler module command, man documentation is available: man nagfor

Online documentation: • PDF version of the NAG Fortran Compiler Manual • NAG Fortran Compiler Documentation Index (NAG’s Website) • Differences between different versions can be found in the Release Notes.

Installation Notes The following notes are primarily for system administrators: Licensing Add the license key(s) to /usr/local/packages/dev/NAG/license.lic. The license key needs to be updated annually before 31st July. The NAG compiler environment modules (see below) set the environment variable $NAG_KUSARI_FILE to the path to this file. Version 6.1

2.3. ShARC 177 Sheffield HPC Documentation, Release

1. Perform an unattended install using this script. The software will be installed into /usr/local/packages/dev/NAG/6.1. 2. Install this modulefile as /usr/local/modulefiles/dev/NAG/6.1 3. Test the installation by compiling and building a sample Fortran 90 program module load dev/NAG/6.1 nagfor -o /tmp/f90_util /usr/local/packages/dev/NAG/6.1/lib/NAG_Fortran/f90_util.f90 /tmp/f90_util

Version 6.0 First, run the following nag_vers=6.0 media_dir="/usr/local/media/NAG/$nag_vers" mkdir -p $media_dir tarball=npl6a60na_amd64.tgz tarball_url=https://www.nag.co.uk/downloads/impl/$tarball wget -c $tarball_url -P $media_dir mkdir -m 2775 -p $prefix chown -R $USER:app-admins $prefix mkdir -p $tmp_dir pushd $tmp_dir tar -zxf $media_dir/$tarball pushd NAG_Fortran-amd64/

Next, run the interactive install script ./INSTALL.sh

Accept the license and answer the questions as follows: • Install compiler binaries to where? [/usr/local/bin]? /usr/local/packages/dev/NAG/6.0/bin/ • Install compiler library files to where? /usr/local/packages/dev/NAG/6.0/lib/NAG_Fortran • Install compiler man page to which directory? /usr/local/packages/dev/NAG/6.0/man/man1 • Suffix for compiler man page [1] leave as default • Install module man pages to which directory? /usr/local/packages/dev/NAG/6.0/man/man3 • Suffix for module man pages [3]? leave as default Install this modulefile as /usr/local/modulefiles/dev/NAG/6.0 Finally, test the installation by compiling and building a sample Fortran 90 program module load dev/NAG/6.0 nagfor -o /tmp/f90_util /usr/local/packages/dev/NAG/6.0/lib/NAG_Fortran/f90_util.f90 /tmp/f90_util

Parallel Systems

OpenMPI (gcc version)

178 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

OpenMPI (gcc version) Latest Version 2.0.1 Dependancies gcc URL http://www.open-mpi.org/

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

Versions You can load a specific version using module load mpi/openmpi/2.0.1/gcc-6.2 module load mpi/openmpi/1.10.4/gcc-6.2

See here for a brief guide to the new features in OpenMPI 2.x and here for a detailed view of the changes between OpenMPI versions.

Examples Example programs are available in the /usr/local/packages/mpi/openmpi/XXX/gcc-6.2/examples/ directory (where XXX is the version of OpenMPI you are using e.g. 2.0.1). To compile and run these programs: copy that directory to your home directory, start an interactive MPI-aware session on a worker node, activate the version of OpenMPI you wish to use, compile the examples then run them. In more detail # Connect to iceberg ssh user@iceberg

# Copy the examples to your home directory cp -r /usr/local/packages/mpi/openmpi/2.0.1/gcc-6.2/examples/ ~/openmpi_2.0.1_examples

# Start an interactive session from which we can run MPI processes using a core on each of four nodes qrsh -pe mpi 4

# Compile all programs in the examples directory cd ~/openmpi_2.0.1_examples make

# Once compiled, run an example program on all (or a subset) of your MPI nodes using the mpirun utility $ mpirun -np 4 hello_c Hello, world, I am 0 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141) Hello, world, I am 1 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141) Hello, world, I am 2 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141) Hello, world, I am 3 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141)

Installation notes These are primarily for administrators of the system. Version 2.0.1 1. Enable GCC 6.2.0. 2. Download, compile and install OpenMPI 2.0.1 using the this script.

2.3. ShARC 179 Sheffield HPC Documentation, Release

3. Install this modulefile as /usr/local/modulefiles/mpi/openmpi/2.0.1/gcc-6.2 Version 1.10.4 1. Enable GCC 6.2.0. 2. Download, compile and install OpenMPI 1.10.4 using this script. 3. Install this modulefile as /usr/local/modulefiles/mpi/openmpi/1.10.4/gcc-6.2

OpenMPI (Intel version)

OpenMPI (intel version) Latest Version 2.0.1 Dependancies Intel compilers URL http://www.open-mpi.org/

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

Versions You can load a specific version using module load mpi/openmpi/2.0.1/intel-17.0.0

See here for a brief guide to the new features in OpenMPI 2.x and here for a detailed view of the changes between OpenMPI versions.

Examples Example programs are available in the /usr/local/packages/mpi/openmpi/XXX/intel-17.0.0/examples/ directory (where XXX is the version of OpenMPI you are using e.g. 2.0.1). To compile and run these programs: copy that directory to your home directory, start an interactive MPI-aware session on a worker node, activate the version of OpenMPI you wish to use, compile the examples then run them. In more detail # Connect to iceberg ssh user@iceberg

# Copy the examples to your home directory cp -r /usr/local/packages/mpi/openmpi/2.0.1/intel-17.0.0/examples/ ~/openmpi_2.0.1_examples

# Start an interactive session from which we can run MPI processes using a core on each of four nodes qrsh -pe mpi 4

# Compile all programs in the examples directory cd ~/openmpi_2.0.1_examples make

# Once compiled, run an example program on all (or a subset) of your MPI nodes using the mpirun utility $ mpirun -np 4 hello_c Hello, world, I am 0 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141) Hello, world, I am 1 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141)

180 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Hello, world, I am 2 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141) Hello, world, I am 3 of 4, (Open MPI v2.0.1, package: Open MPI [email protected] Distribution, ident: 2.0.1, repo rev: v2.0.0-257-gee86e07, Sep 02, 2016, 141)

Installation notes These are primarily for administrators of the system. Version 2.0.1 1. Ensure Intel compilers 17.0.0 are installed and licensed. 2. Download, compile and install OpenMPI 2.0.1 using this script. 3. Install this modulefile as /usr/local/modulefiles/mpi/openmpi/2.0.1/intel-17.0.0 4. Test by running some _. • Applications on Sharc • Development Tools on ShARC • Libraries on Sharc • Parallel Systems

2.4 Parallel Computing

Modern computers contain more than one processor with a typical laptop usually containing either 2 or 4 (if you think your laptop contains 8, it is because you have been fooled by Hyperthreading). Systems such as Iceberg or Sharc contain many hundreds of processors and the key to making your research faster is to learn how to distribute your work across them. If your program is designed to run on only one processor, running it on our systems without modification will not make it any faster (it may even be slower!). Learning how to use parallelisation technologies is vital. This section explains how to use the most common parallelisation technologies on our systems. If you need advice on how to parallelise your workflow, please contact the Research Software Engineering Group

2.4.1 SGE Job Arrays

The simplest way of exploiting parallelism on the clusters is to use Job Arrays. A Job Array is a set of batch jobs run from a single job script. For example #!/bin/bash # #$ -t 1-100 # echo "Task id is $SGE_TASK_ID"

./myprog.exe $SGE_TASK_ID > output.$SGE_TASK_ID

The above script will submit 100 tasks to the system at once. The difference between each of these 100 jobs is the value of the environment variable $SGE_TASK_ID which will range from 1 to 100, determined by the line #$ -t 1-100. In this example, the program myprog.exe will be run 100 times with differing input values of $SGE_TASK_ID. 100 output files will be created with filenames output.1, output.2 and so on. Job arrays are particularly useful for Embarrassingly Parallel problems such as Monte Carlo Simulations (Where $SGE_TASK_ID might correspond to random number seed), or batch file processing (where $SGE_TASK_ID might refer to a file in a list of files to be processed).

2.4. Parallel Computing 181 Sheffield HPC Documentation, Release

Examples

• MATLAB SGE Job Array example

2.4.2 Shared Memory Parallelism (SMP)

Shared Memory Parallelism (SMP) is when multiple processors can operate independently but they have access to the same memory space. On our systems, programs that use SMP can make use of only the resources available on a single node. This limits you to 16 processor cores on our current system. If you wish to scale to more processors, you should consider more complex parallelisation schemes such as MPI or Hybrid OpenMP/MPI.

OpenMP

OpenMP (Open Multi-Processing) is one of the most common programming frameworks used to implement shared memory parallelism. To submit an OpenMP job to our system, we make use of the OpenMP parallel environment which is specified by adding the line `#$ -pe openmp N` to your job submission script, changing the value N to the number of cores you wish to use. Here’s an example for 4 cores #!/bin/bash # Request 4 cores from the scheduler #$ -pe openmp 4 #$ Request 4 Gb of memory per CPU core #$ -l rmem=4G #$ -l mem=4G

# Tell the OpenMP executable to use 4 cores export OMP_NUM_THREADS=4 ./myprogram

Note that you have to specify the number of cores twice. Once to the scheduler (#$ -pe openmp 4) and once to the OpenMP runtime environment export OMP_NUM_THREADS=4

Running other SMP schemes

Any system that uses shared memory parallelism, such as pthreads, Python’s multiprocessing module, R’s mcapply and so on can be submitted using the OpenMP parallel environment.

2.4.3 Message Passing Interface (MPI)

The Message Passing Interface (MPI) Standard is a specification for a message passing library. MPI was originally designed for distributed memory architectures and is used on systems ranging from a few interconnected Raspberry Pi’s through to the UK’s national supercomputer, Archer.

MPI Implementations

Our systems have several MPI implementations installed. See the MPI section in software for details

182 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Example MPI jobs

Some example MPI jobs are available in the HPC Examples repository of Sheffield’s Research Software Engineering group

Batch MPI

The queue to submit to is openmpi-ib. Here is an example that requests 4 slots with 8Gb per slot using the gcc implementation of OpenMPI #!/bin/bash #$ -l h_rt=1:00:00 # Change 4 to the number of slots you want #$ -pe openmpi-ib 4 # 8Gb per slot #$ -l rmem=8G #$ -l mem=8G module load mpi/gcc/openmpi mpirun ./executable

Interactive MPI

Our general-access interactive queues currently don’t have any MPI-compatible parallel environments enabled. Thus, it is not possible to run MPI jobs interactively.

MPI Training

Course notes from the national supercomputing centre are available here

2.4.4 Hybrid OpenMP / MPI

Support for Hybrid OpenMP/MPI is in the preliminary stages. Here is an example job submission for an executable that combines OpenMP with MPI #$ -pe openmpi-hybrid-4 8 #$ -l rmem=2G #$ -l mem=6G module load mpi/intel/openmpi export OMP_NUM_THREADS=4 mpirun -bynode -np 2 -cpus-per-proc 4 [executable + options]

There would be 2 MPI processes running, each with 4x (6GB mem, 2GB rmem) shared between the 4 threads per MPI process, for a total of 8 x (6GB mem, 2GB rmem) for the entire job. When we tried this, we got warnings saying that the -cpus-per-proc was getting deprecated. A quick google suggests that mpirun -np 2 --map-by node:pe=1 [executable + options] would be the appropriate replacement.

2.4. Parallel Computing 183 Sheffield HPC Documentation, Release

2.4.5 GPU Computing

Graphics Processing Units (GPUs) were, as the name suggests, originally designed for the efficient processing of graphics. Over time, they were developed into systems that were capable of performing general purpose computing which is why some people refer to modern GPUs as GP-GPUs (General Purpose Graphical Processing Units). Graphics processing tpically involves relatively simple computations that need to applied to millions of on-screen pixels in parallel. As such, GPUs tend to be very quick and efficient at computing certain types of parallel workloads. The GPUComputing@Sheffield website aims to facilitate the use of GPU computing within University of Sheffield research by providing resources for training and purchasing of equipment as well as providing a network of GPU users and research projects within the University.

FLAMEGPU

Installing FLAMEGPU on Iceberg/ShARC

FLAMEGPU is a multi agent simulation framework developed at the University of Sheffield. It uses GPUs to accel- erate the simulation of multiagent systems but abstracts the GPU architecture away from users so that they can write models using a high level of abstraction (without having to write GPU code). To install FLAME GPU you should checkout the latest master branch of the code which has Linux support git clone https://github.com/FLAMEGPU/FLAMEGPU.git

To compile the FLAME GPU examples you will need to be on a GPU node. You can start an interactive session using qsh -l gpu=1 -P gpu --l gpu_arch=nvidia-k40m -l mem=13G

You will then need to load the relevant modules module load libs/cuda/7.5.18 module load compilers/gcc/4.9.2

FLAMEGPU allows you to specify a CUDA_PATH environment variable so that you can change the CUDA version easily. To set this for the CUDA 7.5 module use export CUDA_PATH=/usr/local/packages6/libs/binlibs/CUDA/7.5.18/cuda

You can now navigate to the FLAME GPU examples folder and build the examples. e.g. cd FLAMEGPU/examples make

You can now run the console versions of the example models by navigating to FLAMEGPU/bin/x64 and calling the appropriate shell script. e.g. cd ../.. cd bin/x64 ./Boids_BruteForce_console.sh

FLAME GPU will run the example for one iteration and output the model state to a file 1.xml in the model’s iterations directory. Visualisation currently does not work with X forwarding as FLAME GPU uses complex rendering techniques which are not supported. A solution using VirtualGL is in progress.

184 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Requesting access to GPU facilities

In order to ensure that the nodes hosting the GPUs run only GPU related tasks, we have defined a special project- group for accessing these nodes. If you wish to take advantage of the GPU processors, please contact research- it@sheffield.ac.uk asking to join the GPU project group. Any iceberg user can apply to join this group. However, because our GPU resources are limited we will need to discuss the needs of the user and obtain consent from the project leader before allowing usage.

GPU Community and NVIDIA Research Centre

The University of Sheffield has been officially affiliated with NVIDIA since 2011 as an NVIDIA CUDA Research Centre. As such NVIDIA offer us some benefits as a research institution including discounts on hardware, technical liaisons, online training and priority seed hardware for new GPU architectures. For first access to hardware, training and details of upcoming events, discussions and help please join the GPUComputing google group.

GPU Hardware on iceberg

Iceberg currently contains 16 GPU units: • 8 Nvidia Tesla Kepler K40M GPU units. Each unit contains 2880 CUDA cores, 12GB of memory and is capable of up to 1.43 Teraflops of double precision compute power. • 8 Nvidia Tesla Fermi M2070 GPU units. Each unit contains 448 CUDA cores, 6GB of memory and is capable of up to 515 Gigaflops of double precision compute power.

Compiling on the GPU using the PGI Compiler

The PGI Compilers are a set of commercial Fortran,C and C++ compilers from the Portland Group. To make use of them, first start an interactive GPU session and run one of the following module commands, depending on which version of the compilers you wish to use module load compilers/pgi/13.1 module load compilers/pgi/14.4

The PGI compilers have several features that make them interesting to users of GPU hardware:-

OpenACC Directives

OpenACC is a relatively new way of programming GPUs that can be significantly simpler to use than low-level language extensions such as CUDA or OpenCL. From the OpenACC website: The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator. OpenACC is designed for portability across operating systems, host CPUs, and a wide range of accelerators, including APUs, GPUs, and many-core coprocessors. The directives and programming model defined in the OpenACC API document allow programmers to create high-level host+accelerator programs without the need to explicitly initialize the accelerator, man- age data or program transfers between the host and accelerator, or initiate accelerator startup and shut- down. For more details concerning OpenACC using the PGI compilers, see The PGI OpenACC website.

2.4. Parallel Computing 185 Sheffield HPC Documentation, Release

CUDA Fortran

In mid 2009, PGI and NVIDIA cooperated to develop CUDA Fortran. CUDA Fortran includes a Fortran 2003 compiler and tool chain for programming NVIDIA GPUs using Fortran. • CUDA Fortran Programming Guide.

CUDA-x86

NVIDIA CUDA was developed to enable offloading computationally intensive kernels to massively parallel GPUs. Through API function calls and language extensions, CUDA gives developers explicit control over the mapping of general-purpose computational kernels to GPUs, as well as the placement and movement of data between an x86 processor and the GPU. The PGI CUDA C/C++ compiler for x86 platforms allows developers using CUDA to compile and optimize their CUDA applications to run on x86-based workstations, servers and clusters with or without an NVIDIA GPU accelera- tor. When run on x86-based systems without a GPU, PGI CUDA C applications use multiple cores and the streaming SIMD (Single Instruction Multiple Data) capabilities of Intel and AMD CPUs for parallel execution. • PGI CUDA-x86 guide.

Deep Learning with Theano on GPU Nodes

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano is most commonly used to perform deep learning and has excellent GPU support and integration through PyCUDA. The following steps can be used to setup and configure Theano on your own profile. Request an interactive session with a sufficient amount of memory: qsh -l gpu=1 -P gpu –l gpu_arch=nvidia-k40m -l mem=13G Load the relevant modules module load apps/python/anaconda3-2.5.0 module load libs/cuda/7.5.18 Create a conda environment to load relevant modules on your local user account conda create -n theano python=3.5 anaconda3-2.5.0 source activate theano Install the other Python module dependencies which are required using pip (alternatively these could be installed with conda if you prefer) pip install theano pip install nose pip install nose-parameterized pip install pycuda For optimal Theano performance, enable the CUDA memory manager CNMeM. To do this, create the .theanorc file in your HOME directory and set the fraction of GPU memory reserved by Theano. The exact amount of energy may have to be hand-picked: if Theano asks for more memory that is currently available on the GPU, an error will be thrown during import of theano module. Create or edit the .theanorc file with nano nano ~/.theanorc Add the following lines and, if necessary, change the 0.8 number to whatever works for you: [lib] cnmem=0.8 Run python and verify that Theano is working correctly: python -c “import theano;theano.test()”

186 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Interactive use of the GPUs

Once you are included in the GPU project group you may start using the GPU enabled nodes interactively by typing qsh -l gpu=1 -P gpu the gpu= parameter determines how many GPUs you are requesting. Currently, the maximum number of GPUs allowed per job is set to 4, i.e. you cannot exceed gpu=4. Most jobs will only make use of one GPU. If your job requires selecting the type of GPU hardware, one of the following two optional parameters can be used to make that choice -l gpu_arch=nvidia-m2070 -l gpu_arch=nvidia-k40m

Interactive sessions provide you with 2 Gigabytes of CPU RAM by default which is significantly less than the amount of GPU RAM available. This can lead to issues where your session has insufficient CPU RAM to transfer data to and from the GPU. As such, it is recommended that you request enough CPU memory to communicate properly with the GPU -l gpu_arch=nvidia-m2070 -l mem=7G -l gpu_arch=nvidia-k40m -l mem=13G

The above will give you 1Gb more CPU RAM than GPU RAM for each of the respective GPU architectures.

Submitting GPU jobs

Interactive use of the GPUs

You can access GPU enabled nodes interactively by typing qsh -l gpu=1 the gpu= parameter determines how many GPUs you are requesting. Currently, the maximum number of GPUs allowed per job is set to 4, i.e. you cannot exceed gpu=4. Most jobs will only make use of one GPU. If your job requires selecting the type of GPU hardware, one of the following two optional parameters can be used to make that choice -l gpu_arch=nvidia-m2070 -l gpu_arch=nvidia-k40m

Interactive sessions provide you with 2GB of CPU RAM by default which is significantly less than the amount of GPU RAM available. This can lead to issues where your session has insufficient CPU RAM to transfer data to and from the GPU. As such, it is recommended that you request enough CPU memory to communicate properly with the GPU -l gpu_arch=nvidia-m2070 -l mem=7G -l gpu_arch=nvidia-k40m -l mem=13G

The above will give you 1GB more CPU RAM than GPU RAM for each of the respective GPU architectures.

Submitting batch GPU jobs

To run batch jobs on gpu nodes, edit your jobfile to include a request for GPUs, e.g. for a single GPU #$ -l gpu=1

2.4. Parallel Computing 187 Sheffield HPC Documentation, Release

You can also use the the gpu_arch discussed aboved to target a specific GPU model #$ -l gpu_arch=nvidia-m2070

GPU enabled Software

Ansys

See Issue #25 on github

Maple

TODO

MATLAB

TODO

NVIDIA Compiler

TODO - Link to relevant section

PGI Compiler

TODO - Link to releveant section

Compiling on the GPU using the NVIDIA Compiler

To compile GPU code using the NVIDIA compiler, nvcc, first start an interactive GPU session. Next, you need to set up the compiler environment via one of the following module statements module load libs/cuda/8.0.44 module load libs/cuda/7.5.18 module load libs/cuda/6.5.14 module load libs/cuda/4.0.17 module load libs/cuda/3.2.16 depending on the version of CUDA you intend to use. This makes the nvcc CUDA compiler available. Important To compile CUDA programs you also need a compatible version of the GCC compiler suite. As of version 8.0.44, CUDA is compatible with GCC versions: • greater than or equal to 4.7.0 (to allow for the use of c++11 features) and • less than 5.0.0 It is therefore recommended that you load the most recent 4.x version of GCC when building CUDA programs on Iceberg: module load compilers/gcc/4.9.2

An example of the use of nvcc:

188 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

nvcc filename.cu -arch sm_20

will compile the CUDA program contained in the file filename.cu. The -arch flag above signifies the compute capability of the intended GPU hardware, which is 2.0 for our current GPU modules. If you are intending to generate code for older architectures you may have to specify sm_10 or sm_13 for example.

2.5 News

2.5.1 Iceberg: Perl updates

Published by Will Furnass on 2016-10-28 The bioinformaticians amongst us may appreciate there is now a much newer version of Perl on Iceberg. In addition there are nicer ways of managing Perl packages and BioPerl is installed by default. Perl 5.24.0 is purportedly quite a bit faster than our older version of Perl when it comes to numerical operations. This version can be activated using the standard ‘module load’ mechanism. Several popular Perl packages have been included with this newer Perl version for convenience, including: • BioPerl (many useful modules for bioinformatics), • cpanm (the de-facto standard Perl package manager) • local::lib (for managing your own Perl modules in your home directory. See here for more info, including how to use local::lib to install modules yourself. Feedback on this documentation is very welcome.

2.5.2 Iceberg: CUDA; NVIDIA device driver

Published by Will Furnass on 2016-10-16 Version 8.0 of CUDA is now available on Iceberg and the NVIDIA device driver of all nodes containing GPUs has been upgraded to version 367.44. Some users had reported that the previous NVIDIA device driver had caused problems when accessing GPUs from MATLAB; initial tests indicate that this is no longer a problem. Please let us know if you notice any undesirable changes in behaviour on GPU compute nodes or GPU-accelerated visualisation nodes.

2.5.3 Iceberg: Perl updates

Published by Will Furnass on 2016-10-28 The bioinformaticians amongst us may appreciate there is now a much newer version of Perl on Iceberg. In addition there are nicer ways of managing Perl packages and BioPerl is installed by default. Read more. . .

2.5.4 Iceberg: CUDA; NVIDIA device driver

Published by Will Furnass on 2016-10-16 Version 8.0 of CUDA is now available on Iceberg and the NVIDIA device driver of all nodes containing GPUs has been upgraded to version 367.44.

2.5. News 189 Sheffield HPC Documentation, Release

Read more. . .

2.6 Troubleshooting

In this section, we’ll discuss some tips for solving problems with iceberg. It is suggested that you work through some of the ideas here before contacting the service desk for assistance.

2.6.1 Frequently Asked Questions

I’m a new user and my password is not recognised

When you get a username on the system, the first thing you need to do is to syncronise your passwords which will set your password to be the same as your network password.

Strange things are happening with my terminal

Symptoms include many of the commands not working and just bash-4.1$ being displayed instead of your username at the bash prompt. This may be because you’ve deleted your .bashrc and .bash_profile files - these are ‘hidden’ files which live in your home directory and are used to correctly set up your environment. If you hit this problem you can run the command resetenv which will restore the default files.

I can no longer log onto iceberg

If you are confident that you have no password or remote-access-client related issues but you still can not log onto iceberg you may be having problems due to exceeding your iceberg filestore quota. If you exceed your filestore quota in your /home area it is sometimes possible that crucial system files in your home directory gets truncated that effect the subsequent login process. If this happens, you should immediately email research-it@sheffield.ac.uk and ask to be unfrozen.

I can not log into iceberg via the applications portal

Most of the time such problems arise due to due to JAVA version issues. As JAVA updates are released regularly, these problems are usually caused by the changes to the JAVA plug-in for the browser. Follow the trouble-shooting link from the iceberg browser-access page to resolve these problems. There is also a link on that page to test the functionality of your java plug-in. It can also help to try a different browser to see if it makes any difference. All failing, you may have to fall back to one of the non-browser access methods.

My batch job terminates without any messages or warnings

When a batch job that is initiated by using the qsub command or runfluent, runansys or runabaqus commands, it gets allocated specific amount of virtual memory and real-time. If a job exceeds either of these memory or time limits it gets terminated immediately and usually without any warning messages. It is therefore important to estimate the amount of memory and time that is needed to run your job to completion and specify it at the time of submitting the job to the batch queue. Please refer to the section on hitting-limits and estimating-resources for information on how to avoid these problems.

190 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Exceeding your disk space quota

Each user of the system has a fixed amount of disk space available in their home directory. If you exceed this quota, various problems can emerge such as an inability to launch applications and run jobs. To see if you have exceeded your disk space quota, run the quota command: quota

Size Used Avail Use% Mounted on 10.1G 5.1G 0 100% /home/foo11b 100G 0 100G 0% /data/foo11b

In the above, you can see that the quota was set to 10.1 gigabytes and all of this is in use. Any jobs submitted by this user will likely result in an Eqw status. The recommended action is for the user to delete enough files to allow normal work to continue. Sometimes, it is not possible to log-in to the system because of a full quota, in which case you need to contact research-it@sheffield.ac.uk and ask to the unfrozen.

I am getting warning messages and warning emails from my batch jobs about insufficient memory

There are two types of memory resources that can be requested when submitting batch jobs using the qsub command. These are, virtual memory ( -l mem=nnn ) and real memory ( -l rmem=nnn ). Virtual memory limit specified should always be greater than equal to the real memory limit specification. If a job exceeds its virtual memory resource it gets terminated. However if a job exceeds its real memory resource it does not get terminated but an email message is sent to the user asking him to specify a larger rmem= parameter the next time, so that the job can run more efficiently.

What is rmem ( real_memory) and mem ( virtual_memory)

Running a program always involves loading the program instructions and also its data i.e. all variables and arrays that it uses into the computers “RAM” memory. A program’s entire instructions and its entire data, along with any dynamic link libraries it may use, defines the VIRTUAL STORAGE requirements of that program. If we did not have clever operating systems we would need as much physical memory (RAM) as the virtual-storage requirements of that program. However, operating systems are clever enough to deal with situations where we have insufficient REAL MEMORY to load all the program instructions and data into the available Real Memory ( i.e. RAM ) . This technique works because hardly any program needs to access all its instructions and its data simultaneously. Therefore the operating system loads into RAM only those bits of the instructions and data that are needed by the program at a given instance. This is called PAGING and it involves copying bits of the programs instructions and data to/from hard-disk to RAM as they are needed. If the REAL MEMORY (i.e. RAM) allocated to a job is much smaller than the entire memory requirements of a job ( i.e. VIRTUAL MEMORY) then there will be excessive need for ‘paging’ that will slow the execution of the program considerably due to the relatively slow speeds of transferring information to/from the disk into RAM. On the other hand if the Real Memory (RAM) allocated to a job is larger than the Virtual Memory requirement of that job then it will result in waste of RAM resources which will be idle duration of that job. It is therefore crucial to strike a fine balance between the VIRTUAL MEMORY (i.e. mem) and the PHYSICAL MEMORY ( i.e. rmem) allocated to a job. Virtual memory limit defined by the -l mem parameter defines the maximum amount of virtual-memory your job will be allowed to use. If your job’s virtual memory requirements exceed this limit during its execution your job will be killed immediately. Real memory limit defined by the -l rmem parameter defines the amount of RAM that will be allocated to your job. The way we have configured SGE, if your job starts paging excessively your job is not killed but you receive warning messages to increase the RAM allocated to your job next time by means of the rmem parameter.

2.6. Troubleshooting 191 Sheffield HPC Documentation, Release

It is important to make sure that your -l mem value is always greater than your -l rmem value so as not to waste the valuable RAM resources as mentioned earlier.

Insufficient memory in an interactive session

By default, an interactive session provides you with 2 Gigabytes of RAM (sometimes called real memory) and 6 Gigabytes of Virtual Memory. You can request more than this when running your qsh or qrsh command qsh -l mem=64G -l rmem=8G

This asks for 64 Gigabytes of Virtual Memory and 8 Gigabytes of RAM (real memory). Note that you should • not specify more than 768 Gigabytes of virtual memory (mem) • not specify more than 256 GB of RAM (real memory) (rmem)

Windows-style line endings

If you prepare text files such as your job submission script on a Windows machine, you may find that they do not work as intended on the system. A very common example is when a job immediately goes into Eqw status after you have submitted it. The reason for this behaviour is that Windows and Unix machines have different conventions for specifying ‘end of line’ in text files. Windows uses the control characters for ‘carriage return’ followed by ‘linefeed’, \r\n, whereas Unix uses just ‘linefeed’ \n. The practical upshot of this is that a script prepared in Windows using Notepad looking like this #!/bin/bash echo 'hello world' will look like the following to programs on a Unix system #!/bin/bash\r echo 'hello world'\r

If you suspect that this is affecting your jobs, run the following command on the system dos2unix your_files_filename error: no DISPLAY variable found with interactive job

If you receive the error message error: no DISPLAY variable found with interactive job the most likely cause is that you forgot the -X switch when you logged into iceberg. That is, you might have typed ssh [email protected] instead of ssh -X [email protected]

192 Chapter 2. Research Software Engineering Team Sheffield HPC Documentation, Release

Problems connecting with WinSCP

Some users have reported issues while connetcing to the system using WinSCP, usually when working from home with a poor connection and when accessing folders with large numbers of files. In these instances, turning off Optimize Connection Buffer Size in WinSCP can help: • In WinSCP, goto the settings for the site (ie. from the menu Session->Sites->SiteManager) • From the Site Manager dialog click on the selected session and click edit button • Click the advanced button • The Advanced Site Settings dialog opens. • Click on connection • Untick the box which says Optimize Connection Buffer Size

Login Nodes RSA Fingerprint

The RSA key fingerprint for Iceberg’s login nodes is “de:72:72:e5:5b:fa:0f:96:03:d8:72:9f:02:d6:1d:fd”.

2.7 Glossary of Terms

The worlds of scientific and high performance computing are full of technical jargon that can seem daunting to new- comers. This glossary attempts to explain some of the most common terms as quickly as possible. HPC ‘High Performance Computing’. The exact definition of this term is sometimes a subject of great debate but for the purposes of our services you can think of it as ‘Anything that requires more resources than a typical high-spec desktop or laptop PC’. GPU Acronymn for Graphics Processing Unit. SGE Acronymn for Sun Grid Engine. Sun Grid Engine The Sun Grid Engine is the software that controls and allocates resources on the system. There are many variants of ‘Sun Grid Engine’ in use worldwide and the variant in use at Sheffield is the Son of Grid Engine Wallclock time The actual time taken to complete a job as measured by a clock on the wall or your wristwatch.

2.7. Glossary of Terms 193 Sheffield HPC Documentation, Release

194 Chapter 2. Research Software Engineering Team Index

G GPU, 193 H HPC, 193 S SGE, 193 Sun Grid Engine, 193 W Wallclock time, 193

195