Quick viewing(Text Mode)

LINUX) Computer in Three Parts

LINUX) Computer in Three Parts

Science on a () computer in three parts

Introductory short couse, Part I

Thorsten Becker University of Southern California, Los Angeles September 2007 Contents

● Part one: and computers

● Part two: Scientific and data analysis

● Part three: Matlab practical Purpose of this short course

● Introduce UNIX-based (e.g. LINUX, Mac OSX) computing and scientific work flow environments by providing pointers for further information ● Describe what I think are best practices in moderate to high-performance computing – I will judgments and provide specific recommendations – I cannot possibly provide a comprehensive, fair, or entirely up to date overview Typographic conventions

important:

– links to web based information are blue

● UNIX (or shell) commands and program names you might the line are written in bold Contents part I

● UNIX (or LINUX, used synonymously here): what and why ● The system and managers ● Shell environment ● Editing files ● Command line tools ● Scripts and GUIs ● Type setting, publishing, layout UNIX: What is UNIX?

● an that originated in the 70ies ● build for multi-user, multi-tasking, scalable (that was new way back then) ● runs on all computing hardware, including iPOD ● many flavours, free: LINUX, BSD (a version made it into OSX), Solaris (SUN) ● they are all kind of the same thing, your mileage may vary (e.g. directory structure) ● there is convergence between LINUX, Mac OS, and Windows look and feel UNIX: Why use UNIX?

● can use same tools and programs on laptop, workstation, and supercomputer ( important if virtualization is available) ● flexible, modular, powerful ● seamless integration of and F90 programs, shell commands, and post-processing (UNIX is written in C) ● all important numerical tools and libraries are available ● LINUX is open (security!), and ubiquitous UNIX: Why (not) use UNIX?

UNIX UNIX

r WINDOWS t e r o f o f p e

WINDOWS UNIX: UNIX/LINUX references

● UNIX reference card ● Robbins: UNIX in a nutshell ● Siever at al: LINUX in a nutshell ● Online list of UNIX commands ● Local computing center documentation and lecture notes ● google on UNIX: commands, shell scripts, , , ... UNIX: This overview

● describes typical, ca. anno 2005, scientific workplace set-up in natural sciences ● fairly low-level, close to machine ● tries to not – spend a lot of time on point-and-click GUIs – discuss vapor ware – discuss cutting-edge programs (with unclear support situation and user base) ● will be out of date tonight : Gnome File system: Graphical Window managers and tools

● GNOME ● KDE ● Provide support and interface with other apps (search, web, access files on other servers, etc.) File system: Hardcore: The actual file system

● user versus super-user (administrator) setup ● structure of files within directories: – /usr/local has – /dev has devices – /home/$USER has all the user's files, might be subdivided into folders like – – – – – – – /mnt/data/ might hold shared data/storage File system: But where is it? The shell

● open a shell to get a command line ● type commands, such as to list the contents of a directory

Even if you regularly use Mac OS-X or GNOME, some knowledge of the background can save the day! File system: Naming conventions

● suffixes indicate type of file: file.dat, file.c, file.f, file.f90, file.awk, file.txt, file.tex, file. (and determines helper applications) ● UNIX is case sensitive ● normally, use lower case for files and directories ● some symbols (e.g.: *, %, ?) are special, if you want those literally you got to quote (\*, \%, \?) ● different quotes (”, ', `) have different meanings File system: ls: list contents of directories

becker@jackie:~ > ls calendar data dokumente idl_gmt plates public_html RCS subduct TEX unison.log CITCOM Desktop evolution ioffice mylibs progs quakes Screenshot.png teaching tmp becker@jackie:~ > ls ­F ­l total 6500 ­rw­r­­r­­ 1 becker users 1638 Jun 17 07:39 calendar drwxrwxr­x 4 becker users 4096 Jun 17 07:39 CITCOM/ drwxr­xr­x 35 becker users 4096 Jul 12 15:22 data/ drwx­­­­­­ 2 becker users 4096 Jul 26 17:20 Desktop/ drwxr­xr­x 25 becker users 4096 Jul 12 07:48 dokumente/ drwx­­­­­­ 7 becker users 4096 Jun 17 11:53 evolution/ drwxr­xr­x 3 becker users 20480 Jul 27 15:00 idl_gmt/ drwxr­xr­x 3 becker users 4096 Jun 17 07:39 ioffice/ drwx­­­­­­ 2 becker users 4096 Jul 7 12:21 mail/ drwxr­xr­x 15 becker users 4096 Jun 17 07:46 mylibs/ drwxr­xr­x 12 becker users 4096 Jun 17 07:46 plates/ drwxr­xr­x 12 becker users 4096 Jun 17 07:46 progs/ drwxr­xr­x 27 becker users 12288 Jul 18 19:20 public_html/ drwxr­xr­x 4 becker users 4096 Jun 17 07:47 quakes/ drwxrwxr­x 2 becker users 4096 Jun 17 07:39 RCS/ ­rw­r­­r­­ 1 becker users 35775 Jul 27 16:15 Screenshot.png drwxrwxr­x 5 becker users 4096 Jun 17 07:47 subduct/ lrwxrwxrwx 1 becker users 19 Jun 17 07:39 teaching ­> dokumente/teaching// drwxr­xr­x 29 becker users 4096 Jul 26 17:39 TEX/ lrwxrwxrwx 1 becker users 12 Jun 16 17:28 tmp ­> /mnt//tmp/ ­rw­­­­­­­ 1 becker users 6508582 Jul 27 15:01 unison.log File system: Commands have options

● command output and workings can be modified by adding -x (or x for ) ● ls: – ls -F – ls -la ● usually, you can do “command --” to learn ● often, there are long version: ls --all --full ● man (RTFM): “man command” File system: File system commands I

: copy files (will normally overwrite!) – cp filea fileb ● : remove files (for real!) – rm goneforever.dat – rm -i goneforever.dat ● : make directories – mkdir new_dir/ ● : change directories (cd ..; cd -; cd ~) ● : current directory File system: File system commands II

● scp: copy files across machines – scp filea [email protected]:~/directory/fileb ● more: display files – more filea.dat ● : create (symbolic) links (shortcuts in Windows) – cd new_dir – ln -s ../old_dir/ . – soft vs. hard: deletion of hard deletes file File system: Using regular expressions

● * (all): cp *.dat new_dir/

● [pat] (pattern): cp file[1-5].dat new_dir

● ? (single letter/number): cp file??.dat new_dir

● rm -rf * (DON'T TRY IT, IT WORKS) File system: Permissions

­rw­r­­r­­ 1 becker users 1638 Jun 17 07:39 calendar { {

u { user group size ctime filename g a

● first character: - (file), d (directory), l (link) ● r: read w: x: execute or list ● u: user g: group a: all o: other – u+x file – chmod a+r *.dat – chmod -R o-rwx my_stuff ● , id: output of user and group Shells: The environment

● shells: interpret your commands when logged in and using a terminal session ● csh, tcsh: for interactive stuff, syntax close to C, command completion, auto correction ● , ksh: nice for programming ● shells use mostly same commands, but there are differences in the script languages, e.g. – export var=100 (bash) – setenv var 100 (csh) Shells: Can use variables, and many are predefined (csh example)

becker@jackie:~ > setenv region 0/360/­90/90

becker@jackie:~ > $region 0/360/­90/90

becker@jackie:~ > echo $HOME /home/becker

becker@jackie:~ > echo $USER becker

becker@jackie:~ > BIBINPUTS=.:/home/becker/TEX//: CFLAGS_DEBUG=­g ­DDEBUG ­DDEBUG ­DLINUX_SUBROUTINE_CONVENTION LDFLAGS=­posixlib ­nofor_main ­Vaxlib ­L/usr/lib/gcc/i386­redhat­linux/3.4.3/ ­lg2c ­lm MANPATH=/home/becker/progs/man/:/home/becker/progs/man/: DVIPSHEADERS=/home/becker/TEX//: SUPPORTED=en_US.UTF­8:en_US:en SSH_AGENT_PID=31881 HOSTNAME=jackie.usc.edu DXROOT= DXMEMORY=128 CONFC=ifort =jackie.usc.edu SHELL=/usr/bin/tcsh FFLAGS_DEBUG=­g ­DDEBUG ­fpp ­nofor_main ­DDEBUG

.... Shells: Source startup scripts

● ~/.login (= $HOME/.login) at startup ● ~/.cshrc every time you start a shell ● those scripts are where you define environment variables and aliases you want to use in all sessions – rm 'rm -i' – setenv F77 ifort; setenv FFLAGS “-O3 -ipo” ● see references on UNIX and dotfiles.com Shells: A few lines from my .tcshrc

# set architecture flag, e.g., ip27 for IRIX and i686 for Pentium # setenv ARCH ` ­m | gawk '{print(tolower($1))}'` # # hostname without domain # setenv myhostname `hostname | gawk '{($1,a,".");print(a[1])}'` # if ( $ARCH == "i686" ) then # # Pentium/Xeon Linux system # # GMT etc setenv GMT_VERSION GMT3.4.5 #setenv GMT_VERSION GMT4.0 set local_gmt_path = /usr/local/src/${GMT_VERSION}/ set local_netcdf_dir = /usr/local/src/netcdf­3.5.0/

... set rmstar set corect=cmd set autocorrect set nobeep set prompt=”%B%n@%m:%b%~\n> “

.... Shells: Command and other feature that save typing

● use up, down, left, right arrows to navigate and edit commands on the command line ● use a bunch of tricks to access and modify last commands, e.g. – !n: execute last command that starts with “n” ● auto-completion (TAB key) ● auto-correction ● many more tricks Shells:

● ps: list currently running processes ● jobs: list current jobs (processes started from shell in background) ● running commands in background – emacs & (or: emacs; CTRL-Z; bg) – echo mybigjob.exe | (don't quit with shell) – %2 (kill the second job running, % are job IDs) – kill -9 12344 (kill with PID 12344) ● : show machine load Shells: Job control example

becker@jackie:~ > ps PID TIME CMD 1758 pts/5 00:00:00 tcsh 2500 pts/5 00:00:00 ps becker@jackie:~ > ps aux | becker 1413 0.0 0.0 4348 1004 ? S 15:10 0:00 /bin/sh /usr/bin/realplay /tmp/youfm_cms.ram becker 1418 3.2 1.1 79664 12000 ? Sl 15:10 3:24 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.ram becker 1420 0.0 0.5 25720 5260 ? S 15:10 0:00 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.ram becker 1421 0.0 0.5 25720 5260 ? S 15:10 0:00 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.ram becker 1642 0.0 0.1 5200 1776 pts/3 Ss+ 16:03 0:00 ­csh becker 1703 0.0 0.1 6624 1764 pts/4 Ss+ 16:08 0:00 ­csh becker 1758 0.0 0.1 5328 1984 pts/5 Ss 16:14 0:00 ­csh becker 1888 0.0 1.6 27332 17228 ? S 16:26 0:01 /usr/lib/acroread/Reader/intellinux/bin/acroread ­display :0.0 ­name main ­visual default +useFrontEndProgram ­xrm *useNullDoc:false ­progressPipe 3 ­xrm *noPrivateColormap:true ­xrm *exitPipe:4 becker 2501 0.0 0.0 3032 772 pts/5 R+ 16:54 0:00 ps aux becker 2502 0.0 0.0 4212 532 pts/5 R+ 16:54 0:00 tail becker@jackie:~ > kill 1413 ... Shells: Cluster job control

● on large, parallel machines one typically runs batch schedulers or queing systems ● this allows distributing jobs and utilizing resources efficiently ● PBS – qsub myjob.exe -tricky_options -q large – qstat | $USER – pbstop – qdel job-ID Editors: Editing text or ASCII data files

: old school: fast, efficient, bizarre – controlled by typing commands like !w, /text – good for minor editing tasks, required for admins ● emacs: best overall tool – GUI, menus – flexible, expandable – bizarre ● tons of others, but don't use Word or such, since UNIX expects pure ASCII characters Editors: What (x)EMACS looks like UNIX tools: Command line tools for file management

● more, less: display files page by page interactively ● : display file ● : display first few lines of file ● tail: guess ● : align files with columns row by row ● paste file1.dat file2.dat ● : count words, lines, and bytes of file UNIX tools: Pipes and redirection (ksh example)

● >: redirect stdout, <: stdin, 2>: stderr ● >>: append, |: pipe – cat file1.dat > combined.dat – cat file2.dat >> combined.dat – cat file1.dat | wc ● myconvectioncode.exe < input.dat ● echo Whatever! > /dev/null ● mycode > log.dat 2> error.dat UNIX tools: grep and

● grep: patterns in file – grep my_function *.c | more – grep -ni my_function.*c (disregard case and list line ) ● sort: sort row data – sort -n +2 file.dat ● : only print unique lines – sort -n splitting.dat | uniq > stations.dat UNIX tools: awk and

● awk: (or gawk) powerful language for ASCII data and text manipulations – like C, interpreted at run time – the best thing since sliced bread ● cat file.dat | gawk '{print($2,cos($5))}' or ● gawk '{print($2,cos($5))}' file.dat ● sed: streaming text editor – sed 's/Bush/Kerry/g' file.dat > new_file.dat ● : more powerful, more complex UNIX tools: compression and dealing with big files

: ASCII files, which can be huge, to binary – compress: gzip file – uncompress: gunzip file.gz ● can write gzipped from within C, can use gunzip on the fly (zcat): this allows using nice ASCII tools such as awk while storing things compactly ● : smaller files, takes longer UNIX tools: storage and backup

● tar: package multiple files into one file or tape drive (e.g. to backup or send across internet) – pack: tar cvf package.tar file_dir/* – display contents: tar tf package.tar – : tar xvf package.tar ● Unison file synchronizer (to laptop and workstation) ● backup all the time (it's easy to screw up big time), or have your admin backup for you Please don't forget to backup UNIX tools: Getting smart: a few tricks

● unpacking on the fly: – gunzip -c newsoftware.tgz | tar xv ● interpreting commands on the fly: – echo $variable_a `cat file.dat | gawk -f mean.awk` ● tcsh interactive functionality: – foreach f ( *.ps) ● convert $f $f:r.gif – end Scripts: Scripts and GUIs

● scripts are the opposite of point-and-click ● need to work hard once to generate template ● benefit forever if you want to produce more products (e.g. plots) using different parameters, or if the data has changed ● automate research galore (needed to explore parameter space) ● scripts can serve as documentation of steps taking to analyse data and produce results Scripts: An example script

#!/bin/bash # # run run_fstrack for different models # models=${1­"pmDsmean_nt pmDnngrand_nt saf1 saf2 saf3 "} strains=${2­"2 1 0.5"}

# PBS queues to use q1="becker64";q2="scec" c=0 for m in $models;do cd $m for s in $strains ;do if [ $c ­eq 1 ];then queue=$q1;c=0 else queue=$q2;c=1 fi # regional ../run_fstrack 1 0 0 0 $s 1 2 1 ­14 0 60 "" 1 $queue done cd ­ done Scripts: scripting languages

● csh, tcsh ● bash ● perl ● python ● Tk ● Scripted programs – gnuplot – Matlab ● Script whenever you can (because you'll want to reproduce things exactly) Scripts: Visual scripting languages: Tcl/TK (e.g. iGMT), GTK, Tkinter, Qt Scripting

● Can be the way to go if individual processing steps are not time-sensitive ● If speed is an issue, need to compile from higher level language such as script ● For basic LINUX automatization tasks, bash and awk are very useful ● Python seems to be a nice middle ground for more advanced projects Virtualization: NX export your workstation to, like, wherever Virtualization: run any OS (e.g. Windows) in, like, whatever USC geosys Wiki I USC geosys Wiki II The next lecture will be on

● Programming – common languages – philosophy – compiling, debugging, make, version control – C and F77 interfacing – libraries and packages ● Number crunching ● Scientific visualization ● Scientific typesetting (why not to use Word, yet, if you need equations)