<<

Introduction to / for HPCC users

Xiaoge Wang, ICER [email protected] Jan. 17, 2019

Goal

• Prerequisite for “Introduction to HPCC” • commonly used basic Linux commands among HPCC users • Understand simple Linux shell script How does this class work

Task driven learning method – Description of a task – Demonstrate a solution of a task by instructor – hands on exercises.

Use the sticky notes provided to me help you. – No sticky = I am working – Green = I am done and ready to move on (yea!) – Red = I am stuck and need and/or some help Agenda

• Introduction • Linux – Part one: • Connect to HPCC • Set up your working environment • Run your program – Part two: • Scripts • Other useful stuff • Summary Agenda

• Linux – Part one: • Connect to HPCC • Set up your working environment • Run your program – Part two: • Scripts • Other useful stuff • Summary Introduction

Shell Shell

Agenda

• Introduction

• *Connect to HPCC • Set up your working environment • Run your program – Part two: • Scripts • Other useful stuff • Summary Agenda

• Introduction • Linux – Part one:

• Set up your working environment • Run your program – Part two: • Scripts • Other useful stuff • Summary Task 1: Connect to HPCC

• Task: connect from your PC to HPCC develop nodes using your account. • Solution: three steps – Step one: • get/open your client – Terminal session » Windows users MobaXterm » Mac users Terminal, » Linux users Terminal – Web basd remote desktop: http://webrdp.hpcc.msu.edu – Step two: • $ ssh –X [email protected], (rsync.hpcc.msu.edu) – Step three: • $ ssh dev-intel18 • Note: – First command “ssh”, it is often refer as “login”, “connect” – Read message on login page. – How to report problem: client, gateway, node, account, error message, time, location. Exercise 0: connect to HPCC

• Task: – Connect to one of dev node, sure you read the login messages message – Run -R /mnt/home/class0/Intro2Linux_Jan_2019 . You should see “Intro2Linux_Jan_2019” in your directory Agenda

• Introduction • Linux – Part one: • Connect to HPCC

• Run your program – Part two: • Scripts • Other useful stuff • Summary Task 2: Set up working environment • GET the system information – System – Software – Storage • SET up your working place – Obtain software packages – Prepare data – Set Get System

• System and kernal: $ –a • Number of cores: $ nproc --all • Size of Ram: $ free –h • CPU info: $ lscpu • PCI devices info: $ lspci • Workload: $ , $

Get software info

• View the system wide available software** $ module list # currently loaded $ module avail # all avail $ module spider # search for • Where are they installed? $ ls /opt/software • version am I using? $ which python $ python –version • Get examples of some software packages $ getexample $ getexample

Storage on HPCC

• Structure of storage – Spaces (Disk partition) • Home: /mnt/home/NetID • Scratch: /mnt/gs18/scratch/users/NetID • Research: /mnt/research/ – Directories (folder) • “” – Files • Filename: absolute/relative – Short cuts • ~, $HOME, $SCRATCH, .. , ./

Storage info

• Location, size, usage and content of the space? – $ quota, – $ –h /mnt/research/helpdesk, – $ –h --max-depth=1 /mnt/research/icerdesign

Navigation

• Where am I? (print working directory) – $ • Change directory – $ • Using short cuts • $ cd • $ cd .. • $ cd $HOME • $ cd $SCRTACH Exercise 1: • Use commands to the system info of csn-001, css-002 including system and kernel, Number of cores, Size of Ram (memory), CPU info, GPU info. • Find the size and contents of your home directory and your scratch space. • (optional) What is the version number of a compiler “c++”. Could you replace it with another higher version one? (Look into module GNU) • (optional) Is software “Allinea” available on our system? Set working place

1. Obtain software packages (concept of PATH) – Software installation (not covered) – Module load: $module load 2. Prepare data – About a – Make a directory for files: $ – Transfer files from other location (will be covered in “Intro to HPCC”) – Get or create files – Search files 3. Set environment variables – $ export PATH=/mnt/home/wangx147/bin:$PATH – $ export OMP_NUM_THREADS=4 – $ printenv

All about a file

• Attributes: – Access permission – Owner – Group – Size – Modification time – name • Example: $ ls -l undelivered.err -rw-rw-r-- 1 wangx147 staff-np 0 Aug 2 16:33 undelivered.err File Name

• File name: – Full name, base name, directory name $ readlink –f $ , $ – Case sensitive – Avoid special characters / | \ < > # ! $ % & * ( ) [ ] { } ` “ ‘ / ; ,~ ./ • Short cuts: Example: .., ., tab, $HOME, $SCRATCH • Change file name: $ old_name new_name

File Access Permission

• Example: class0@dev-intel14-k20 ~]$ ls -l total 66 -rw-r--r-- 1 class0 temporary 48672 Sep 5 2012 cheatsheet.pdf -rwxr-xr-x 1 class0 temporary 660 Jan 1 2015 Colorfull.sh drwxr-xr-x 2 class0 temporary 3 Oct 12 17:06 Documents ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | | owner group size time name | | | | number of links to directory content | | | permission for all users | | permission for the members of the group | permission for owner of file. - = file, d = directory, l = • Change permission $

Change the access permission

$ chmod – Octal number: $chmod 775 filename – Symbols: ( u| g| o| a) ( +| -| =) (r||x) filename $chmod g+w filesname Get or Create Files

• Get files from other places – $ cp source_file dest_file (cp directory) – $ wget web_address – $ curl web_address – More methods will be covered by “intro to HPCC” • Create files locally – Create directory: mkdir – Editor: nano – Redirect standard output to file: >, >>, >&, 1>, 2> null device /dev/null • Note: || visible in terminal || visible in file || existing Syntax || StdOut | StdErr || StdOut | StdErr || file ======++======+======++======+======++======> || no | || yes | no || overwrite >> || no | yes || yes | no || append 2> || yes | no || no | yes || overwrite 2>> || yes | no || no | yes || append &> || no | no || yes | yes || overwrite &>> || no | no || yes | yes || append

Look into Files

• Open a text file without clicking mouse – , , head –n , , tail –n, more, , gview • Search for a particular pattern from files – , grep –r, grep –i, grep –v – Inside “more” or “less” or “man” display: use / • Other properties of a file – , wc -l Search for Files

• Search a file with – Name – Size – Time – Type • Example: find all version of perl $ find /opt/software/Perl -name perl $ find ~/Document –executable Set/get environment variables

• Environment variables – System: HOME, SCRATCH, PATH – Program: OMP_NUM_THREADS • Example: openmp_exercise – getexample – See the results of running program – Set PATH Exercise 2

(1) get a cheatsheet from Internet add http://steve-parker.org/sh/cheatsheet.pdf ; (2) create a directory “workshop” that contains it; (3) Create a file “List_commands” which recorded all the files in the directory /usr/bin. (4) View the contents of the data file “polls.csv”. Find out how many polls were recorded from Michigan. – Go to the Data directory and view the file – Use “MI” to find the data recorded from Michigan. Agenda

• Introduction • Linux – Part one: • Connect to HPCC • Set up your working environment

– Part two: • Scripts • Other useful stuff • Summary Task 3: run your program

• Launch a program locally on develop node without clicking mouse – “cd” to working location – Set environment variables if needed – Specify Input files – Specify Output files • Launch batch jobs to run on compute nodes – sbatch (no covered here) • Example $ matlab –nodisplay –r test_control_system Monitor your program

• Monitoring your running processes – $ –u – $ ps –u -o pid,%cpu,%mem,cmd

• $ top Get examples

• Load module $ module load hpc_examples • Get list of available examples $ getexample • Get an example $ getexample Exercise 3:

• Run example: openmp_exercise – Getexample openmp_exercise – Read “README” – See the results of running program with different number of threads. • Run example: helloworld – Getexample helloworld – Read “README” – Follow the instruction to run the example LETS TAKE A BREAK Agenda

• Introduction • Linux – Part one: • Connect to HPCC • Set up your working environment • Run your program

• Other useful stuff • Summary From Command to Script

Reference: https://www.gnu.org/ software/bash/manual/bash.html Shell script

Combine more commands together for more sophisticate/complicated task. • Variables – Environment: $HOME, $HOSTNAME, $PATH, $SCRATCH, $PWD, – Internally defined – Define from Command line arguments • expressions, expansion • Control flow (pipeline, branch, loops) • Execution (source vs. direct run, ) Shell Expansion

• Brace expansion ${ } – https://www.gnu.org/software/bash/ manual/bash.html#Brace-Expansion • Arithmetic expansion $(( )) • Filename expansion – Pattern match: ?, *, […] • Command substitution – $(command), `command` Control Flow: sequence of commends • Execution of a sequence of commands – Create a script file – See get_sys_info.sh Control Flow: Pipeline

• Connect several commands in sequence, use standard output of a command as standard input of next command • Example: count number of files under /bin. – Solution 1: ls /bin > bin_list wc bin_list – Solution 2: ls /bin |wc • Learned: – If a standard output could be used as standard input of next command, a pipeline could be build – Commonly used commands for pipelining: • grep, wc, , less, more, , head, tail,…

Examples of pipeline

• Filtering output : $ps aux |grep wangx147 • Sorting the output Ex: $ls –l |sort • Counting Ex: $ls |wc -l • Better viewing Ex: $ps |more • Longer pipeline Ex: $ | -d" " -f 1|sort|uniq|wc -l Exercise 4: Pipeline

• Task: Find out how many jobs in job queue. Among them, how many are running. 1. Count total jobs in job queue. Use “squeue” to get list of jobs in queue. 2. Count only the running jobs. Use keyword “ R ” to filter the jobs 3. *Find total number of jobs of a user in queue. 4. *How many users are currently having jobs in queue? (hint: use “squeue –o %u” to get user list) Control Flow: branch

• Control the workflow according to certain condition • Format if [condition]; then … do something ... else … do something else here ... fi Example 5 :Branch

• Task: a script “scratch2home” that will take a filename as input and back up the file from scratch space to home space. Copy is needed only if the file is newer on scratch. • Condition: file is newer (search “file based condition in shell script”or https:// linuxacademy.com/blog/linux/conditions-in- bash-scripting-if-statements/) • Learn: – Expression of condition – Branch Exercise 5: Branch

• Task: follow example 5 to write a script “home2scratch” that will take a filename as input and copy the file from home space to scratch space if the file is newer on home space. Control Flow: Loop • Run something repeatedly over • Format – For loop: #!/bin/bash for i in $( ls ); do item: $i done – While loop #!/bin/bash COUNTER=0 while [ $COUNTER -lt 10 ]; do echo The counter is $COUNTER let COUNTER=COUNTER+1 done – Until loop #!/bin/bash COUNTER=20 until [ $COUNTER -lt 10 ]; do echo COUNTER $COUNTER let COUNTER-=1 done Example 6:Loop

• Task: Similar as Example 5 except that it take a directory name as input and back up a whole directory. Write a script “scratch2home_dir” • Loop count – Static/dynamic • Learn: – Loop Exercise 6: Loop

• Task: Similar as Exercise 5 except that it takes a directory name as input and copy files in the directory from Home space to scratch space if any files are newer. Write a script “home2scratch_dir” to do it. • Follow example 6 Make Your Own Command

• Make shell script “executable” $ chmod +x • Be aware of “environment” • Execution – source #run in current shell – ./ # run in a new shell – . # same as source run Agenda

• Introduction • Linux – Part one: • Connect to HPCC • Set up your working environment • Run your program – Part two: • Scripts

• Summary Self Learning

• Task: find out what is the command to shuffle lines of a data file “pet_store.csv”, then create a file “pet_shuffled.csv”. • Commands – man, -help, man -k – Google search “how to shuffle lines in a file linux shell”, “random permutation in linux shell” • Learn: – How to find command – How to get details of a command – Command “” Exercise 8: Self Learning

• Task: find out how to sort lines of a data file, then sort the file “polls.csv” to “polls_sort.csv”. 1. find out if there is a command could sort the file. 2. Try to sort data file “polls.csv”. Note: Do not sort the first line! Note: use “ –d polls.csv” Agenda

• Introduction • Linux – Part one: • Connect to HPCC • Set up your working environment • Run your program – Part two: • Scripts • Other useful stuff Summary

• Linux learnt – Commands • Navigation • Get or create files • Organizing files • Look into files • Search files • File attributes • Online help or manual – Scripts • Pipeline • Make you own command • Environment of a shell Q & A

Tank You!

Please turn in the survey sheet.