GNU Parallel for Bioinformatics

Total Page:16

File Type:pdf, Size:1020Kb

GNU Parallel for Bioinformatics GNU Parallel for Bioinformatics. Pierre Lindenbaum @yokofakun http://plindenbaum.blogspot.com October 15, 2013 Abstract This document follows Ole Tange's parallel tutorial http://www.gnu.org/software/parallel/parallel_ tutorial.html. The sources of this document are available at https://github.com/lindenb/courses/tree/master/ about.parallel Contents 1 Input Source 3 1.1 A single input source . .3 1.1.1 Input can be read from the command line. .3 1.1.2 The input source can be a file . .3 1.1.3 STDIN (standard input) can be the input source . .3 1.2 Multiple input source . .4 1.2.1 If one of the input sources is too short, its values will wrap . .4 1.2.2 The input sources can be files . .5 1 1.2.3 STDIN can be one of the input sources using '-' . .5 1.2.4 Instead of -a files can be given after '::::' ........................5 1.2.5 ':::' and '::::' canbemixed:..............................6 1.3 Matching arguments from all input sources . .6 1.3.1 With --xapply you can get one argument from each input source . .6 1.3.2 If one of the input sources is too short, its values will wrap . .7 1.4 Changing the argument separator . .7 1.4.1 Changing the argument file separator . .7 1.4.2 Changing the argument delimiter . .7 1.4.3 NULL can be given as 'n0' .................................8 1.4.4 A shorthand for '-d n0' is '-0' ..............................8 2 Building the command line 8 2.0.5 No command means arguments are commands . .8 3 Replacement strings 9 3.1 The 5 replacement strings . .9 3.2 Positional replacement strings . 12 3.3 Input from columns . 12 3.4 Header defined replacement strings . 13 3.5 More than one argument . 13 3.6 Quoting . 13 3.7 Trimming space . 14 4 Controling the output 15 4.1 To see what commands will be run without running them . 15 4.2 To print the command before running them use {verbose . 15 4.3 GNU parallel will postpone the output until the command completes . 17 4.4 To get the output immediately use '--ungroup' ......................... 18 4.4.1 Saving output into files . 20 5 Control the execution. 22 5.1 Number of simultaneous jobs. 22 6 Interactiveness. 22 6.1 Timing . 22 6.2 Progress . 23 6.3 Termination . 25 6.4 Limiting the ressources . 25 7 Remote execution 25 7.1 Sshlogin . 25 7.2 Transferring files . 26 7.3 Working dir . 27 7.4 Avoid overloading sshd . 27 7.5 Ignore hosts that are down . 28 2 7.6 Transfer environment variables and functions . 28 8 {pipe 29 8.1 Chunk size . 29 8.2 Records . 29 8.3 Header . 30 8.4 Shebang . 30 9 References 31 1 Input Source 1.1 A single input source 1.1.1 Input can be read from the command line. Example: determine the file type of a list of bam . 1 $ parallel file ::: samtools − 0.1.18/examples/ ∗ . bam output: 1 s a m t o o l s − 0.1.18/examples/ex1a.bam: gzip compressed data, extra field 2 s a m t o o l s − 0.1.18/examples/ex1.bam: gzip compressed data, extra field 3 s a m t o o l s − 0.1.18/examples/ex1b.bam: gzip compressed data, extra field 4 s a m t o o l s − 0.1.18/examples/ex1f.bam: gzip compressed data, extra field 5 s a m t o o l s − 0.1.18/examples/ex1f −rmduppe.bam: gzip compressed data, extra field 6 s a m t o o l s − 0.1.18/examples/ex1f −rmdupse.bam: gzip compressed data, extra field 7 s a m t o o l s − 0.1.18/examples/ex1 sorted.bam: gzip compressed data, extra field 8 s a m t o o l s − 0.1.18/examples/toy.bam: gzip compressed data, extra field 1.1.2 The input source can be a file Example: determine the file type of a list of bam . 1 $ find samtools − 0.1.18/examples/ −name " ∗ . bam" −type f > listbams .txt 2 $ p a r a l l e l −a listbams.txt file output: 1 s a m t o o l s − 0.1.18/examples/ex1a.bam: gzip compressed data, extra field 2 s a m t o o l s − 0.1.18/examples/ex1.bam: gzip compressed data, extra field 3 s a m t o o l s − 0.1.18/examples/ex1b.bam: gzip compressed data, extra field 4 s a m t o o l s − 0.1.18/examples/ex1f.bam: gzip compressed data, extra field 5 s a m t o o l s − 0.1.18/examples/ex1f −rmduppe.bam: gzip compressed data, extra field 6 s a m t o o l s − 0.1.18/examples/ex1f −rmdupse.bam: gzip compressed data, extra field 7 s a m t o o l s − 0.1.18/examples/ex1 sorted.bam: gzip compressed data, extra field 8 s a m t o o l s − 0.1.18/examples/toy.bam: gzip compressed data, extra field 1.1.3 STDIN (standard input) can be the input source Example: determine the file type of a list of bam . 1 $ find samtools − 0.1.18/examples/ −name " ∗ . bam" −type f j parallel file output: 3 1 s a m t o o l s − 0.1.18/examples/ex1a.bam: gzip compressed data, extra field 2 s a m t o o l s − 0.1.18/examples/ex1.bam: gzip compressed data, extra field 3 s a m t o o l s − 0.1.18/examples/ex1b.bam: gzip compressed data, extra field 4 s a m t o o l s − 0.1.18/examples/ex1f.bam: gzip compressed data, extra field 5 s a m t o o l s − 0.1.18/examples/ex1f −rmduppe.bam: gzip compressed data, extra field 6 s a m t o o l s − 0.1.18/examples/ex1f −rmdupse.bam: gzip compressed data, extra field 7 s a m t o o l s − 0.1.18/examples/ex1 sorted.bam: gzip compressed data, extra field 8 s a m t o o l s − 0.1.18/examples/toy.bam: gzip compressed data, extra field Example: indexing sorted sorted bam files with samtools .: 1 $ f i n d d i r 1 −name " ∗ . bam" j g r e p s o r t e d j n 2 p a r a l l e l −a − 'samtools index ' or , without '-a -' 1 $ f i n d d i r 1 −name " ∗ . bam" j g r e p s o r t e d j n 2 parallel 'samtools index ' Ole Tange: "The '-a -' construct is unnatural to me. It makes sense when you have multiple '-a' but if it is the only one, leave it out. That will also make it easier for people who are used to xargs." 1.2 Multiple input source 1 n example f Print the combinations of two lists of nucleotides g 2 $ p a r a l l e l echo ::: ATGC ::: a t g c output: 1 A a 2 A t 3 A g 4 A c 5 T a 6 T t 7 T g 8 T c 9 G a 10 G t 11 G g 12 G c 13 C a 14 C t 15 C g 16 C c 1.2.1 If one of the input sources is too short, its values will wrap Example: Print the combinations of two lists of nucleotides. The second list is shorter. 1 $ p a r a l l e l echo ::: ATGCN ::: a t output: 1 A a 2 A t 3 T a 4 T t 5 G a 6 G t 7 C a 8 C t 9 N a 10 N t 4 1.2.2 The input sources can be files Example: Print the combinations of two files of nucleotides.. 1 $ echo −e "AnnTnnGnnC" > ATGC. t x t 2 $ echo −e " an nt nngn nc " > a t g c . t x t 3 $ p a r a l l e l −a ATGC. t x t −a a t g c . t x t echo output: 1 A a 2 A t 3 A g 4 A c 5 T a 6 T t 7 T g 8 T c 9 G a 10 G t 11 G g 12 G c 13 C a 14 C t 15 C g 16 C c 1.2.3 STDIN can be one of the input sources using '-' Example: Print the combinations of one file of nucleotides and stdin. 1 $ echo −e " an nt nngn nc " > a t g c . t x t 2 $ echo −e "AnnTnnGnnC" j n 3 p a r a l l e l −a − −a a t g c . t x t echo output: 1 A a 2 A t 3 A g 4 A c 5 T a 6 T t 7 T g 8 T c 9 G a 10 G t 11 G g 12 G c 13 C a 14 C t 15 C g 16 C c 1.2.4 Instead of -a files can be given after '::::' Example: Print the combinations of two files of nucleotides.. 1 $ echo −e "AnnTnnGnnC" > ATGC.
Recommended publications
  • Beginning Portable Shell Scripting from Novice to Professional
    Beginning Portable Shell Scripting From Novice to Professional Peter Seebach 10436fmfinal 1 10/23/08 10:40:24 PM Beginning Portable Shell Scripting: From Novice to Professional Copyright © 2008 by Peter Seebach All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-13 (pbk): 978-1-4302-1043-6 ISBN-10 (pbk): 1-4302-1043-5 ISBN-13 (electronic): 978-1-4302-1044-3 ISBN-10 (electronic): 1-4302-1044-3 Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1 Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. Lead Editor: Frank Pohlmann Technical Reviewer: Gary V. Vaughan Editorial Board: Clay Andres, Steve Anglin, Ewan Buckingham, Tony Campbell, Gary Cornell, Jonathan Gennick, Michelle Lowman, Matthew Moodie, Jeffrey Pepper, Frank Pohlmann, Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom Welsh Project Manager: Richard Dal Porto Copy Editor: Kim Benbow Associate Production Director: Kari Brooks-Copony Production Editor: Katie Stence Compositor: Linda Weidemann, Wolf Creek Press Proofreader: Dan Shaw Indexer: Broccoli Information Management Cover Designer: Kurt Krames Manufacturing Director: Tom Debolski Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor, New York, NY 10013.
    [Show full text]
  • A First Course to Openfoam
    Basic Shell Scripting Slides from Wei Feinstein HPC User Services LSU HPC & LON [email protected] September 2018 Outline • Introduction to Linux Shell • Shell Scripting Basics • Variables/Special Characters • Arithmetic Operations • Arrays • Beyond Basic Shell Scripting – Flow Control – Functions • Advanced Text Processing Commands (grep, sed, awk) Basic Shell Scripting 2 Linux System Architecture Basic Shell Scripting 3 Linux Shell What is a Shell ▪ An application running on top of the kernel and provides a command line interface to the system ▪ Process user’s commands, gather input from user and execute programs ▪ Types of shell with varied features o sh o csh o ksh o bash o tcsh Basic Shell Scripting 4 Shell Comparison Software sh csh ksh bash tcsh Programming language y y y y y Shell variables y y y y y Command alias n y y y y Command history n y y y y Filename autocompletion n y* y* y y Command line editing n n y* y y Job control n y y y y *: not by default http://www.cis.rit.edu/class/simg211/unixintro/Shell.html Basic Shell Scripting 5 What can you do with a shell? ▪ Check the current shell ▪ echo $SHELL ▪ List available shells on the system ▪ cat /etc/shells ▪ Change to another shell ▪ csh ▪ Date ▪ date ▪ wget: get online files ▪ wget https://ftp.gnu.org/gnu/gcc/gcc-7.1.0/gcc-7.1.0.tar.gz ▪ Compile and run applications ▪ gcc hello.c –o hello ▪ ./hello ▪ What we need to learn today? o Automation of an entire script of commands! o Use the shell script to run jobs – Write job scripts Basic Shell Scripting 6 Shell Scripting ▪ Script: a program written for a software environment to automate execution of tasks ▪ A series of shell commands put together in a file ▪ When the script is executed, those commands will be executed one line at a time automatically ▪ Shell script is interpreted, not compiled.
    [Show full text]
  • Ten Steps to Linux Survival
    Ten Steps to Linux Survival Bash for Windows People Jim Lehmer 2015 Steps List of Figures 5 -1 Introduction 13 Batteries Not Included .............................. 14 Please, Give (Suggestions) Generously ...................... 15 Why? ........................................ 15 Caveat Administrator ............................... 17 Conventions .................................... 17 How to Get There from Here ........................... 19 Acknowledgments ................................. 20 0 Some History 21 Why Does This Matter? .............................. 23 Panic at the Distro ................................. 25 Get Embed With Me ................................ 26 Cygwin ....................................... 26 1 Come Out of Your Shell 29 bash Built-Ins .................................... 30 Everything You Know is (Almost) Wrong ..................... 32 You’re a Product of Your Environment (Variables) . 35 Who Am I? .................................. 36 Paths (a Part of Any Balanced Shrubbery) .................... 37 Open Your Shell and Interact ........................... 38 Getting Lazy .................................... 39 2 File Under ”Directories” 43 Looking at Files .................................. 44 A Brief Detour Around Parameters ........................ 46 More Poking at Files ............................... 47 Sorting Things Out ................................ 51 Rearranging Deck Chairs ............................. 55 Making Files Disappear .............................. 56 2 touch Me .....................................
    [Show full text]
  • Automating Admin Tasks Using Shell Scripts and Cron Vijay Kumar Adhikari
    AutomatingAutomating adminadmin taskstasks usingusing shellshell scriptsscripts andand croncron VijayVijay KumarKumar AdhikariAdhikari vijayvijay@@kcmkcm..eduedu..npnp HowHow dodo wewe go?go? IntroductionIntroduction toto shellshell scriptsscripts ExampleExample scriptsscripts IntroduceIntroduce conceptsconcepts atat wewe encounterencounter themthem inin examplesexamples IntroductionIntroduction toto croncron tooltool ExamplesExamples ShellShell The “Shell” is a program which provides a basic human-OS interface. Two main ‘flavors’ of Shells: – sh, or bourne shell. It’s derivatives include ksh (korn shell) and now, the most widely used, bash (bourne again shell). – csh or C-shell. Widely used form is the very popular tcsh. – We will be talking about bash today. shsh scriptscript syntaxsyntax The first line of a sh script must (should?) start as follows: #!/bin/sh (shebang, http://en.wikipedia.org/wiki/Shebang ) Simple unix commands and other structures follow. Any unquoted # is treated as the beginning of a comment until end-of-line Environment variables are $EXPANDED “Back-tick” subshells are executed and `expanded` HelloHello WorldWorld scriptscript #!/bin/bash #Prints “Hello World” and exists echo “Hello World” echo “$USER, your current directory is $PWD” echo `ls` exit #Clean way to exit a shell script ---------------------------------------- To run i. sh hello.sh ii. chmod +x hello.sh ./hello.sh VariablesVariables MESSAGE="Hello World“ #no $ SHORT_MESSAGE=hi NUMBER=1 PI=3.142 OTHER_PI="3.142“ MIXED=123abc new_var=$PI echo $OTHER_PI # $ precedes when using the var Notice that there is no space before and after the ‘=‘. VariablesVariables contcont…… #!/bin/bash echo "What is your name?" read USER_NAME # Input from user echo "Hello $USER_NAME" echo "I will create you a file called ${USER_NAME}_file" touch "${USER_NAME}_file" -------------------------------------- Exercise: Write a script that upon invocation shows the time and date and lists all logged-in users.
    [Show full text]
  • Java 11 Single-File Applications
    Java 11 Single-File Applications Replacing bash with a syntax I actually know Steve Corwin Corwin Technology LLC Why use Java instead of shell script? Frustration. # ...working shell script... VALUES=($RESULTS) V500S=${VALUES[0]} V504S=${VALUES[1]} COUNT500S=$(echo "${V500S//[$'\t\r\n ']}") COUNT504S=$(echo "${V504S//[$'\t\r\n ']}") echo "\"$SERVER\", \"$COUNT500S\", \"$COUNT504S\"" Basics of single-file applications JEP 330: https://openjdk.java.net/jeps/330 • Java 11 • Java app where everything is within a single file, including public static void main(String[] args) { • invoke using java or shebang Begin with HelloWorld (where else?) public class HelloWorld { public static void main(String[] args) { System.out.println("Hello, world! I am a plain .java file!"); } } $ /home/scorwin/apps/java/jdk- 11.0.1/bin/java HelloWorld.java • standard main() method • my default is Java 8 Shebang HelloWorldShebang: #!/home/scorwin/apps/java/jdk-11.0.1/bin/java -- source 11 public class HelloWorldShebang { public static void main(String[] args) { System.out.println("Hello, world! I have a shebang but no file extension."); } } $ ./HelloWorldShebang • shebang must include --source option • file must be marked executable • Eclipse doesn’t recognize as Java code Shebang with .java HelloWorldShebang.java: //#!/home/scorwin/apps/java/jdk-11.0.1/bin/java -- source 11 public class HelloWorldShebang { public static void main(String[] args) { System.out.println("Hello, world! I have a shebang and a file extension."); } } • with the shebang line commented out:
    [Show full text]
  • Tour of the Terminal: Using Unix Or Mac OS X Command-Line
    Tour of the Terminal: Using Unix or Mac OS X Command-Line hostabc.princeton.edu% date Mon May 5 09:30:00 EDT 2014 hostabc.princeton.edu% who | wc –l 12 hostabc.princeton.edu% Dawn Koffman Office of Population Research Princeton University May 2014 Tour of the Terminal: Using Unix or Mac OS X Command Line • Introduction • Files • Directories • Commands • Shell Programs • Stream Editor: sed 2 Introduction • Operating Systems • Command-Line Interface • Shell • Unix Philosophy • Command Execution Cycle • Command History 3 Command-Line Interface user operating system computer (human ) (software) (hardware) command- programs kernel line (text (manages interface editors, computing compilers, resources: commands - memory for working - hard-drive cpu with file - time) memory system, point-and- hard-drive many other click (gui) utilites) interface 4 Comparison command-line interface point-and-click interface - may have steeper learning curve, - may be more intuitive, BUT provides constructs that can BUT can also be much more make many tasks very easy human-manual-labor intensive - scales up very well when - often does not scale up well when have lots of: have lots of: data data programs programs tasks to accomplish tasks to accomplish 5 Shell Command-line interface provided by Unix and Mac OS X is called a shell a shell: - prompts user for commands - interprets user commands - passes them onto the rest of the operating system which is hidden from the user How do you access a shell ? - if you have an account on a machine running Unix or Linux , just log in. A default shell will be running. - if you are using a Mac, run the Terminal app.
    [Show full text]
  • Parallel Processing.Pptx
    Compute Grid: Parallel Processing RCS Lunch & Learn Training Series Bob Freeman, PhD Director, Research Technology Operations HBS 8 November, 2017 Overview • Q&A • Introduction • Serial vs parallel • Approaches to Parallelization • Submitting parallel jobs on the compute grid • Parallel tasks • Parallel Code Serial vs Parallel work Serial vs Multicore Approaches Traditionally, software has been written for serial computers • To be run on a single computer having a single Central Processing Unit (CPU) • Problem is broken into a discrete set of instructions • Instructions are executed one after the other • One one instruction can be executed at any moment in time Serial vs Multicore Approaches In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: • To be run using multiple CPUs • A problem is broken into discrete parts (either by you or the application itself) that can be solved concurrently • Each part is further broken down to a series of instructions • Instructions from each part execute simultaneously on different CPUs or different machines Serial vs Multicore Approaches Many different parallelization approaches, which we won't discuss: Shared memory Distributed memory 6 Hybrid Distributed-Shared memory Parallel Processing… So, we are going to briefly touch on two approaches: • Parallel tasks • Tasks in the background • gnu_parallel • Pleasantly parallelizing • Parallel code • Considerations for parallelizing • Parallel frameworks & examples We will not discuss parallelized frameworks such as Hadoop, Apache Spark, MongoDB, ElasticSearch, etc Parallel Jobs on the Compute Grid… Nota Bene!! • In order to run in parallel, programs (code) must be explicitly programmed to do so. • And you must ask the scheduler to reserve those cores for your program/work to use.
    [Show full text]
  • NRPE: Unable to Read Output
    NRPE - NRPE: Unable To Read Output Article Number: 620 | Rating: 3.3/5 from 7 votes | Last Updated: Mon, Jul 17, 2017 at 12:35 AM Pr o ble m De s c r ipt io n This KB article addresses the following NRPE error: NRPE: Unable To Read Output As s ume d Kno wle dge The following KB article contains an explanation of how NRPE works and may need to be referenced to completely understand the problem and solution that is provided here: NRPE - Agent and Plugin Explained Tr o uble s ho o t ing The Er r o r This error implies that NRPE did not return any character output. Common causes are incorrect plugin paths in the nrpe.cfg file or that the remote host does not have NRPE installed. There are also cases where the wrong interpreter is invoked when running the remote plugin. Rarely, it is caused by trying to run a plugin that requires root privileges. Inc orre c t Plugin Pat hs Log onto the remote host as root and check the plugin paths in /usr/local/nagios/etc/nrpe.cfg. Try to browse to the plugin folder and make sure the plugins are listed. Sometimes when installing from a package repo, the commands in nrpe.cfg will have a path to a distribution specific location. If the nagios-plugins package was installed from source or moved over from another remote host, they me be located in a different directory. The default location for the nagios-plugins can be found at /usr/local/nagios/libexec/.
    [Show full text]
  • Linux Software Installation Session 2
    Linux Software Installation Session 2 Qi Sun Bioinformatics Facility Installation as non-root user • Change installation directory; o Default procedure normally gives “permission denied” error. • Sometimes not possible; o Installation path is hard coded; • Ask system admin; o Service daemons, e.g. mysql; • Docker Python Two versions of Python Python2 is the legacy Python3 is the but still very popular current version. They are not compatible, like two different software sharing the same name. Running software Installing software python 2 python 2 python myscript.py pip install myPackage python 3 python 3 python3 myscript.py pip3 install myPackage Check shebang line, to make sure it points to right python A typical Python software directory ls pyGenomeTracks-2.0 bin lib lib64 Main code Libraries ls pyGenomeTracks-2.0 bin lib lib64 To run the software: export PATH=/programs/pyGenomeTracks-2.0/bin:$PATH export PYTHONPATH=/programs/pyGenomeTracks- 2.0/lib64/python2.7/site-packages:/programs/pyGenomeTracks- 2.0/lib/python2.7/site-packages/ If there is a problem, modify Other library files PYTHONPATH or files in $HOME/.local/lib 1. $PYTHONPATH directories Defined by you; Defined by Python, 2. $HOME/.local/lib but you can change files in directory; 3. Python sys.path directories Defined by Python; Check which python module is being used For example: >>> import numpy $PYTHONPATH frequently causes problem. E.g. python2 and python3 share the same $PYTHONPATH >>> print numpy.__file__ /usr/lib64/python2.7/site- packages/numpy/__init__.pyc >>> print numpy.__version__ 1.14.3 * run these commands in “python” prompt PIP – A tool for installing/managing Python packages • PIP default to use PYPI repository; • PIP vs PIP3 PIP -> python2 PIP3 -> python3 Two ways to run “pip install” as non-root users pip install deepTools \ pip install deepTools --user --install-option="--prefix=mydir" \ --ignore-installed Installed in $HOME/.local/bin Installed in $HOME/.local/lib & lib64 mydir/bin mydir/lib & lib64 * Suitable for personal installation * Suitable for installation for a group e.g.
    [Show full text]
  • Functional Package Management with Guix
    Functional Package Management with Guix Ludovic Courtès Bordeaux, France [email protected] ABSTRACT 1. INTRODUCTION We describe the design and implementation of GNU Guix, a GNU Guix1 is a purely functional package manager for the purely functional package manager designed to support a com- GNU system [20], and in particular GNU/Linux. Pack- plete GNU/Linux distribution. Guix supports transactional age management consists in all the activities that relate upgrades and roll-backs, unprivileged package management, to building packages from source, honoring the build-time per-user profiles, and garbage collection. It builds upon the and run-time dependencies on packages, installing, removing, low-level build and deployment layer of the Nix package man- and upgrading packages in user environments. In addition ager. Guix uses Scheme as its programming interface. In to these standard features, Guix supports transactional up- particular, we devise an embedded domain-specific language grades and roll-backs, unprivileged package management, (EDSL) to describe and compose packages. We demonstrate per-user profiles, and garbage collection. Guix comes with a how it allows us to benefit from the host general-purpose distribution of user-land free software packages. programming language while not compromising on expres- siveness. Second, we show the use of Scheme to write build Guix seeks to empower users in several ways: by offering the programs, leading to a \two-tier" programming system. uncommon features listed above, by providing the tools that allow users to formally correlate a binary package and the Categories and Subject Descriptors \recipes" and source code that led to it|furthering the spirit D.4.5 [Operating Systems]: Reliability; D.4.5 [Operating of the GNU General Public License|, by allowing them to Systems]: System Programs and Utilities; D.1.1 [Software]: customize the distribution, and by lowering the barrier to Applicative (Functional) Programming entry in distribution development.
    [Show full text]
  • Pash: Light-Touch Data-Parallel Shell Processing
    PaSh: Light-touch Data-Parallel Shell Processing Nikos Vasilakis∗ Konstantinos Kallas∗ Konstantinos Mamouras MIT University of Pennsylvania Rice University [email protected] [email protected] [email protected] Achilles Benetopoulos Lazar Cvetković Unaffiliated University of Belgrade [email protected] [email protected] Abstract Parallelizability Parallelizing Runtime Classes §3 Transformations §4.3 Primitives §5 Dataflow This paper presents PaSh, a system for parallelizing POSIX POSIX, GNU §3.1 Regions shell scripts. Given a script, PaSh converts it to a dataflow Annotations §3.2 § 4.1 DFG § 4.4 graph, performs a series of semantics-preserving program §4.2 transformations that expose parallelism, and then converts Seq. Script Par. Script the dataflow graph back into a script—one that adds POSIX constructs to explicitly guide parallelism coupled with PaSh- Fig. 1. PaSh overview. PaSh identifies dataflow regions (§4.1), converts provided Unix-aware runtime primitives for addressing per- them to dataflow graphs (§4.2), applies transformations (§4.3) based onthe parallelizability properties of the commands in these regions (§3.1, §3.2), formance- and correctness-related issues. A lightweight an- and emits a parallel script (§4.4) that uses custom primitives (§5). notation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU • Command developers, responsible for implementing indi- commands—two large and commonly used groups—guides vidual commands such as sort, uniq, and jq. These de- the annotation language and optimized aggregator library velopers usually work in a single programming language, that PaSh uses. PaSh’s extensive evaluation over 44 unmod- leveraging its abstractions to provide parallelism when- ified Unix scripts shows significant speedups (0.89–61.1×, ever possible.
    [Show full text]
  • Mac OS X Server Introduction to Command-Line Administration Version 10.6 Snow Leopard Kkapple Inc
    Mac OS X Server Introduction to Command-Line Administration Version 10.6 Snow Leopard K Apple Inc. Apple Remote Desktop, Finder, and Snow Leopard are © 2009 Apple Inc. All rights reserved. trademarks of Apple Inc. Under the copyright laws, this manual may not AIX is a trademark of IBM Corp., registered in the U.S. be copied, in whole or in part, without the written and other countries, and is being used under license. consent of Apple. The Bluetooth® word mark and logos are registered The Apple logo is a trademark of Apple Inc., registered trademarks owned by Bluetooth SIG, Inc. and any use in the U.S. and other countries. Use of the “keyboard” of such marks by Apple is under license. Apple logo (Option-Shift-K) for commercial purposes without the prior written consent of Apple may This product includes software developed by the constitute trademark infringement and unfair University of California, Berkeley, FreeBSD, Inc., competition in violation of federal and state laws. The NetBSD Foundation, Inc., and their respective contributors. Every effort has been made to ensure that the information in this manual is accurate. Apple is not Java™ and all Java-based trademarks and logos responsible for printing or clerical errors. are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Apple 1 Infinite Loop PowerPC™ and the PowerPC logo™ are trademarks Cupertino, CA 95014 of International Business Machines Corporation, used 408-996-1010 under license therefrom. www.apple.com UNIX® is a registered trademark of The Open Group. Apple, the Apple logo, AppleScript, FireWire, Keychain, Other company and product names mentioned herein Leopard, Mac, Mac OS, Quartz, Safari, Xcode, Xgrid, and are trademarks of their respective companies.
    [Show full text]