TP - Using CESGA’S Facilities ING3 – IMSI – Use of Supercomputing Resources Academic Year 2017–2018

TP - Using CESGA’s facilities ING3 – IMSI – Use of Supercomputing Resources Academic year 2017–2018 Connecting to Finis Terrae II As explained in the course slides, you can connect to Finis Terrae II at CESGA using a ssh client in the command line (only from the EISTI) or using Gate ONE on a browser from eve- rywhere (https://portalusuarios.cesga.es/tools/web_ssh) . Use the credentials given by your instructor to connect to: [email protected] You can transfer files between your laptop and the supercomputer login nodes using the scp command. That goes without saying, if you have any questions about its usage, check man scp. To edit a file in a login node, you have several editors available, such as nano, emacs and vim. If you want to keep several terminals open over the same SSH connection, use the screen command 1. Note that you are granted access to a CESGA account that will be reused in the future, so make sure to make backups of your codes as often as possible, since the accounts will be emptied at the end of the semester. 1 Running programs in Interactive Mode on FTII In order to get used to the environment at CESGA, let us start by compiling and running some programs in interactive mode. 1. http://aperiodic.net/screen/quick_reference EISTI – Juan Angel Lorenzo – [email protected] 1/4 TP - Using CESGA’s facilities 1 Download the Sysbench benchmark from https://github.com/akopytov/sysbench Send it to your account on the FTII. ² Log on FTII and go to a compute node. ² Load the gcc module. ² Unzip the sysbench file, enter into the newly created directory and run the follo- ² wing commands to compile the benchmark: ./autogen.sh ² ./configure without-mysql prefix=$PWD ² ¡¡ ¡¡ make ² make install ² By now, you should have a ./bin directory containing our sysbench executable ² program. Let us run a sequential, cpu-stress test: ./bin/sysbench cpu events=10000 threads=1 run ¡¡ ¡¡ Take note of the "execution time (avg/stddev)". Now run the benchmark for 2,4,8 and 16 threads. What do you notice? What is the ² cause? The help option of the compute command may give you a hint. ¡¡ Solve the issue and run again the benchmark for 2,4,8 and 16 threads. Write down ² the execution time for each execution. How is the speedup? What is the cause? ä 2 OpenMP: Download the exampleCodes.zip file from https://arel.eisti.fr/rels/54498. Send it to your FTII account and unzip it. Request a compute node with 8 cores. Compile the OpenMP pi_loop.c code un- ² der the omp directory. Use the gcc compiler. Remember to use the -fopenmp option. Run the program. Check whether the execution times are coherent. ² Now load icc, the Intel compiler. The module name is intel/2017. Recompile ² pi_loop.c and execute it again. What do you notice? ä EISTI – Juan Angel Lorenzo – [email protected] 2/4 TP - Using CESGA’s facilities 3 MPI: Go to the mpi directory under your example codes directory. If you have not done it yet, load the Intel compiler module. Then, load the ² OpenMPI compiler module. Compile the code idsmpi.cpp using the following syntax: ² mpicc -std=c++11 -O2 -Wall -Wextra -Wno-long-long -o idsmpi idsmpi.cpp Execute the program with 1 node. Then, execute it with 2 nodes. What happens? ² ä 4 OpenCL: Go to the opencl directory under your example codes directory. Request a compute node. ² Look for the opencl module. ² Load the opencl module. ² Use the module help option to know how to use the opencl module. ² Compile the OpenCL code hello_world.c and execute it. ² ä 5 CUDA: Go to the cuda directory under your example codes directory. ² Request a compute node with gpu. ² ¡¡ Inspect the compileCUDA.sh script. ² Execute it and make sure that the deviceQuery.cpp program has been compiled. ² Run the deviceQuery program. ² ä 2 Running programs in Batch Mode on FTII We saw in Section1 how to run programs using a node in interactive mode. However, this is only useful if you want to test your code while you are developing it or in order to perform small, short tests that do not require an important amount of resources. An additional drawback of the EISTI – Juan Angel Lorenzo – [email protected] 3/4 TP - Using CESGA’s facilities interactive mode is that, unless you are using tools like screen, you cannot disconnect from your SSH session while your program is running, or otherwise it will be killed. When running a simulation, users will typically use the batch mode. As seen on the course slides, a script will specify the parameters of a job to be sent to the workload manager (Slurm at CESGA), as well as the job itself. 6 Let us revisit the Sysbench benchmark. Write a Slurm script to send Sysbench to a queue and run it sequentially (parame- ² ter threads=1). ¡¡ Re-send the job but change the threads=1 parameter to threads=4. Has ² ¡¡ ¡¡ the execution time changed? Re-send the job but now adapt the Slurm script to take into account that we are ² sending Sysbench in parallel, with 4 threads. Check with sacct the state of your jobs in the queues. ² ä 7 OpenMP: Write a Slurm script to: Compile the pi_loop.c OpenMP code with the Intel compiler. ² Run it with 8 cores in 2 nodes. ² ä 8 MPI: Write a Slurm script to: Compile the idsmpi.cpp code. ² Execute the MPI program with 16 processes within 1 node. ² Execute the MPI program with 16 processes, 2 nodes and 8 processes per node. ² Try without specifying any partition in the script and, then, using the thinnodes partition. ä EISTI – Juan Angel Lorenzo – [email protected] 4/4.

TP - Using CESGA’S Facilities ING3 – IMSI – Use of Supercomputing Resources Academic Year 2017–2018

HP-CAST 20 Final Agenda V3.0 (Sessions and Chairs Are Subject to Change Without Notice)

Presentación De Powerpoint

Prace6ip-D7.1

A Comparison of Task Parallel Frameworks Based on Implicit Dependencies in Multi-Core Environments

An Experience of Five Years Using Parallel Programming Contests For

Design of Scalable PGAS Collectives for NUMA and Manycore Systems

The Udocker Tool

Design of Scalable PGAS Collectives for NUMA and Manycore Systems

A Parallel Skeleton for Divide-And-Conquer Unbalanced and Deep Problems

Performance Evaluation of Interconnection Networks Using Simulation: Tools and Case Studies

Device Level Communication Libraries for High-Performance Computing in Java

A Grid Portal for an Undergraduate Parallel Programming Course Juan Touriño, Member, IEEE, María J