Linux for Scientists Or: What Can I Do at the Black Screen?

Linux for scientists or: What can I do at the black screen? Lennart C. Karssen PolyΩmica, The Netherlands [email protected] October 2018* * Git information: Hash: 30e1cf0 Date: 2018-10-15 Contents Contents List of Tables x 1 Preface 3 1.1 About this book ........................... 4 1.2 Acknowledgements ........................ 5 2 What is Linux? 7 3 The basics 11 3.1 Logging in and out ......................... 12 3.1.1 X11 forwarding: allowing application windows to ’travel’ from the server to your PC ........... 13 3.2 Editors ................................ 14 3.3 The structure of Linux commands ................ 16 3.3.1 Exercises .......................... 18 3.4 Managing your account ...................... 20 3.5 Getting help ............................. 20 3.6 Working with files and directories ................ 22 3.6.1 Directories ......................... 25 3.6.2 Copying, moving, removing ............... 26 3.6.3 Wildcards .......................... 28 3.6.4 Exercises .......................... 29 3.7 Transferring files from one Linux machine to another .... 32 3.8 Pagers, or how to look at the contents of a file ........ 33 3.8.1 Exercises .......................... 34 3.9 Using compressed archives like .zip and tar.gz files ... 35 3.9.1 zip .............................. 36 3.9.2 gzip ............................. 36 3.9.3 tar .............................. 37 3.9.4 Exercises .......................... 37 3.10 File ownership and permissions ................. 38 3.10.1 Ownership ......................... 38 3.10.2 Permissions ........................ 39 3.11 Process management ....................... 40 3.11.1 Exercises .......................... 41 3.12 Miscellaneous commands .................... 44 3.12.1 wget: downloading files to the server ......... 44 iv Contents 3.12.2 sort ............................. 45 3.12.3 uniq ............................. 47 3.12.4 wc: counting words and lines .............. 48 3.12.5 date ............................. 48 3.12.6 du: disk space usage ................... 50 3.12.7 Differences between files ................ 50 3.13 Input and output redirection ................... 53 3.13.1 Redirecting to and from files .............. 53 3.13.2 Redirecting output of one command to another ... 55 3.14 Aliases and creating your own commands ........... 57 4 Working with text files 63 4.1 Converting between Windows and Linux format ....... 64 4.1.1 Exercises .......................... 65 4.2 grep: finding text .......................... 66 4.2.1 Exercises .......................... 69 4.3 sed, the Stream EDitor ...................... 70 4.3.1 Exercises .......................... 71 4.4 cut: selecting columns ...................... 72 4.5 GAWK: more fun with columns .................. 73 4.5.1 Exercises .......................... 76 4.6 Putting it all together ....................... 78 4.6.1 Exercises .......................... 78 5 Writing Bash scripts 83 5.1 A simple script ........................... 84 5.2 Using variables ........................... 87 5.3 Using shell variables in GAWK .................. 91 5.4 Loops, for and while ....................... 92 5.5 if-clauses and tests ........................ 97 5.6 Arrays in Bash ............................ 99 5.7 Dealing with errors in your script ................ 102 6 Working with the SGE queue system 107 6.1 Submitting jobs to the SGE queues ............... 108 6.1.1 Quick and dirty ....................... 109 6.1.2 Using a submission script ................ 109 v Contents 6.1.3 Refinements to the submission script ......... 110 6.2 Monitoring progress ........................ 111 6.3 Deleting jobs from a queue .................... 112 6.4 Getting info on a finished job .................. 113 6.5 Interactive jobs ........................... 115 6.6 Exercises ............................... 115 7 Good scripting practices, structured programming and data management 121 7.1 Code layout ............................. 123 7.1.1 Indentation ......................... 123 7.1.2 Line length ......................... 124 7.1.3 Spaces ............................ 125 7.2 Comments .............................. 127 7.3 Variable names ........................... 128 7.4 File and directory names ..................... 128 7.5 Summary .............................. 129 8 Where to go from here? 133 8.1 More advanced topics ....................... 134 8.2 Further reading ........................... 135 A Answers to the exercises 139 B Reference Card of Basic Linux Commands 161 C List of acronyms 169 Bibliography 174 Index 178 vi List of Tables List of Tables 3.1 Basic Emacs keyboard shortcuts. ................ 15 5.1 Operators for comparison in Bash. ............... 100 x 1 Chapter 1 Preface Chapter 1 Preface 1.1 About this book This book was initially a collection of lecture notes written for the GE14 “Linux for Scientists” course given at the Erasmus University Medical Centre (ErasmusMC), Rotterdam as part of the NIHES MSc programmes. It fo- cuses on using the Command Line Interface (CLI), if you are using Linux on your desktop you will have noticed that tons of other programs that use a Graphical User Interface (GUI) (i.e. the Firefox web browser) are available also, so this course does not discuss the complete ecosystem of applications for Linux. This book is split up into several chapters, in roughly the same order as presented in the lectures, ranging from basic file and directory management to Bash scripting and working with the SGE batch queue system. Some exercises are labelled with two or more stars indicating their dif- ficulty level. Some use files and/or tools that are pre-installed on the epib-genstat.erasmusmc.nl servers on which this course was initially taught and are therefore less easy to do on other Linux (or Unix-like) systems, but in general these exercises should be quite portable. Several exercises build on results of previous exercises (e.g. a previously created directory structure, previously copied files, etc.), so it is advis- able to work through them in order. Answers to the questions are given at the end of each section or chapter. Remember that there usually are more ways to arrive at the same resulta) and that the solution presented in the answers is not necessarily the best one. Also note that several of the answers contain additional information and tips. Be sure to read those! A $ at the beginning of a line in the output indicates the command line prompt. You don’t have to type it, it should already be visible on the command line after logging in. The symbol - indicates that the command (or the output) is continued on the next line but (in the case of input) should be entered on one line. a) People familiar with the Perl scripting language may know the abbreviation TIMTOWTDI (pronounced Tim Toady), which stands for: “There is more than one way to do it”. 4 1.2 Acknowledgements This document is licenced under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 Unported License. Please contact the author if you would like to obtain a different license. Please note that this document is under active development. Feel free to contact the author for the latest edition or to report mistakes and typos or other suggestions. 1.2 Acknowledgements First and foremost: Aaron Isaacs, back in 2010 you asked me if I could teach some Linux/command line tricks to the members of the Genetic Epidemiology group at the ErasmusMC, where I was working at the time. This gave me the final push to set up a practical, hands-on Linux course for scientists. Cornelia van Duijn, as head of the Genetic Epidemiology Group at the ErasmusMC you allowed me to spend time working on this book. In the course of the next three years the foundations of this book were laid. Thank you for giving me this opportunity. Other GenEpi group members, Najaf Amin, Maarten Kooyman, Elisa van Leeuwen, Sara Willems, Ayşe Demirkan, Carla Ibrahim-Verbaas, thank you for being part of my first audience, my test users, and for the ex- amples of real-world use cases you provided. Finally, Yurii Aulchenko, colleague and business partner, your suggestion to make this the first book published under the PolyΩmica umbrella was what made me convert and expand the lecture notes to the book you are now reading. 5 2 Chapter 2 What is Linux? Chapter 2 What is Linux? Linux, or more precisely GNU/Linux, is an Operating System (OS) usually combined with some additional programs. Other operating systems you might know are Microsoft Windows, Apple macOS and Google’s An- droid. GNU/Linux basically consists of two parts, the Linux kernel and the GNU tools. A kernel is the piece of software at the heart of each OS, it gives each program the memory it requires whilemaking sure that two programs don’t use the same piece of memory at the same time. It allots each of the programs that are running on the computer their time slot on the processor and also takes care of low-level communication between the different peripherals of a computer. For example, if you press a key on the keyboard, it is the kernel that stops the current program, takes time to find out what key was pressed and sends that information to the program you were running. Then the kernel allows the program to con- tinue and process the key your pressed. The Linux kernel was started in 1991a) as a hobby project of Linux Torvalds, a Finnish computer science student [2, 3]. He published his code under the GNU Public License (GPL), a copyleft licence developed by the GNU project. In the 1970’s the PC (Personal Computer) did not exist yet. Computers were large machines occupying a complete room. Most of them used Unix as OS, augmented by many programs written by the scientist using the computers. Many of these programs were distributed by the authors, but the core Unix programs were owned by companies like AT&T. The GNU projectb) was started in 1984 by Richard Stallman. The project’s goal was to create an OS that was freelyc) available. The GNU tools were written to be compatible with (but not exact copies of) the Unix tools known at the time, hence the meaning of the (recursive) GNU acronym: a) A short video that commemorates the 20th anniversary of Linux can be found in Ref.

Load more