GNU Parallel alternatives NAME parallel_alternatives - Alternatives to GNU parallel DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES There are a lot programs that share functionality with GNU parallel. Some of these are specialized tools, and while GNU parallel can emulate many of them, a specialized tool can be better at a given task. GNU parallel strives to include the best of the general functionality without sacrificing ease of use. parallel has existed since 2002-01-06 and as GNU parallel since 2010. A lot of the alternatives have not had the vitality to survive that long, but have come and gone during that time. GNU parallel is actively maintained with a new release every month since 2010. Most other alternatives are fleeting interests of the developers with irregular releases and only maintained for a few years. SUMMARY LEGEND The following features are in some of the comparable tools: Inputs I1. Arguments can be read from stdin I2. Arguments can be read from a file I3. Arguments can be read from multiple files I4. Arguments can be read from command line I5. Arguments can be read from a table I6. Arguments can be read from the same file using #! (shebang) I7. Line oriented input as default (Quoting of special chars not needed) Manipulation of input M1. Composed command M2. Multiple arguments can fill up an execution line M3. Arguments can be put anywhere in the execution line M4. Multiple arguments can be put anywhere in the execution line M5. Arguments can be replaced with context M6. Input can be treated as the complete command line Outputs O1. Grouping output so output from different jobs do not mix O2. Send stderr (standard error) to stderr (standard error) O3. Send stdout (standard output) to stdout (standard output) O4. Order of output can be same as order of input O5. Stdout only contains stdout (standard output) from the command O6. Stderr only contains stderr (standard error) from the command O7. Buffering on disk O8. Cleanup of temporary files if killed O9. Test if disk runs full during run O10. Output of a line bigger than 4 GB Execution E1. Running jobs in parallel
Page 1 GNU Parallel alternatives E2. List running jobs E3. Finish running jobs, but do not start new jobs E4. Number of running jobs can depend on number of cpus E5. Finish running jobs, but do not start new jobs after first failure E6. Number of running jobs can be adjusted while running E7. Only spawn new jobs if load is less than a limit Remote execution R1. Jobs can be run on remote computers R2. Basefiles can be transferred R3. Argument files can be transferred R4. Result files can be transferred R5. Cleanup of transferred files R6. No config files needed R7. Do not run more than SSHD's MaxStartups can handle R8. Configurable SSH command R9. Retry if connection breaks occasionally Semaphore S1. Possibility to work as a mutex S2. Possibility to work as a counting semaphore Legend - = no x = not applicable ID = yes As every new version of the programs are not tested the table may be outdated. Please file a bug report if you find errors (See REPORTING BUGS). parallel: I1 I2 I3 I4 I5 I6 I7 M1 M2 M3 M4 M5 M6 O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 E1 E2 E3 E4 E5 E6 E7 R1 R2 R3 R4 R5 R6 R7 R8 R9 S1 S2 DIFFERENCES BETWEEN xargs AND GNU Parallel Summary (see legend above): I1 I2 ------M2 M3 - - - - O2 O3 - O5 O6 E1 ------x - - - - - xargs offers some of the same possibilities as GNU parallel.
Page 2 GNU Parallel alternatives xargs deals badly with special characters (such as space, \, ' and "). To see the problem try this: touch important_file touch 'not important_file' ls not* | xargs rm mkdir -p "My brother's 12\" records" ls | xargs rmdir touch 'c:\windows\system32\clfs.sys' echo 'c:\windows\system32\clfs.sys' | xargs ls -l
You can specify -0, but many input generators are not optimized for using NUL as separator but are optimized for newline as separator. E.g. awk, ls, echo, tar -v, head (requires using -z), tail (requires using -z), sed (requires using -z), perl (-0 and \0 instead of \n), locate (requires using -0), find (requires using -print0), grep (requires using -z or -Z), sort (requires using -z). GNU parallel's newline separation can be emulated with: cat | xargs -d "\n" -n1 command xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel. xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process. The example Parallel grep cannot be done reliably with xargs because of this. To see this in action try: parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \ '>' {} ::: a b c d e f g h # Serial = no mixing = the wanted result # 'tr -s a-z' squeezes repeating letters into a single letter echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z # Compare to 8 jobs in parallel parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \ tr -s a-z
Or try this: slow_seq() { echo Count to "$@" seq "$@" | perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}' } export -f slow_seq # Serial = no mixing = the wanted result seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}' # Compare to 8 jobs in parallel seq 8 | parallel -P8 slow_seq {} seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}' xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done. xargs has no support for running jobs on remote computers. xargs has no support for context replace, so you will have to create the arguments. If you use a replace string in xargs (-I) you can not force xargs to use more than one argument.
Page 3 GNU Parallel alternatives Quoting in xargs works like -q in GNU parallel. This means composed commands and redirection require using bash -c. ls | parallel "wc {} >{}.wc" ls | parallel "echo {}; ls {}|wc"
becomes (assuming you have 8 cores and that none of the filenames contain space, " or '). ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc" ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
A more extreme example can be found on: https://unix.stackexchange.com/q/405552/ https://www.gnu.org/software/findutils/ DIFFERENCES BETWEEN find -exec AND GNU Parallel Summary (see legend above): - - - x - x - - M2 M3 - - - - - O2 O3 O4 O5 O6 ------x x find -exec offers some of the same possibilities as GNU parallel. find -exec only works on files. Processing other input (such as hosts or URLs) will require creating these inputs as files. find -exec has no support for running commands in parallel. https://www.gnu.org/software/findutils/ (Last checked: 2019-01) DIFFERENCES BETWEEN make -j AND GNU Parallel Summary (see legend above): ------O1 O2 O3 - x O6 E1 - - - E5 ------make -j can run jobs in parallel, but requires a crafted Makefile to do this. That results in extra quoting to get filenames containing newlines to work correctly. make -j computes a dependency graph before running jobs. Jobs run by GNU parallel does not depend on each other. (Very early versions of GNU parallel were coincidentally implemented using make -j). https://www.gnu.org/software/make/ (Last checked: 2019-01) DIFFERENCES BETWEEN ppss AND GNU Parallel Summary (see legend above): I1 I2 - - - - I7 M1 - M3 - - M6
Page 4 GNU Parallel alternatives O1 - - x - - E1 E2 ?E3 E4 - - - R1 R2 R3 R4 - - ?R7 ? ? - - ppss is also a tool for running jobs in parallel. The output of ppss is status information and thus not useful for using as input for another command. The output from the jobs are put into files. The argument replace string ($ITEM) cannot be changed. Arguments must be quoted - thus arguments containing special characters (space '"&!*) may cause problems. More than one argument is not supported. Filenames containing newlines are not processed correctly. When reading input from a file null cannot be used as a terminator. ppss needs to read the whole input file before starting any jobs. Output and status information is stored in ppss_dir and thus requires cleanup when completed. If the dir is not removed before running ppss again it may cause nothing to happen as ppss thinks the task is already done. GNU parallel will normally not need cleaning up if running locally and will only need cleaning up if stopped abnormally and running remote (--cleanup may not complete if stopped abnormally). The example Parallel grep would require extra postprocessing if written using ppss. For remote systems PPSS requires 3 steps: config, deploy, and start. GNU parallel only requires one step. EXAMPLES FROM ppss MANUAL Here are the examples from ppss's manual page with the equivalent using GNU parallel: 1$ ./ppss.sh standalone -d /path/to/files -c 'gzip '
1$ find /path/to/files -type f | parallel gzip
2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
2$ find /path/to/files -type f | parallel cp {} /destination/dir
3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
3$ parallel -a list-of-urls.txt wget -q
4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
4$ parallel -a list-of-urls.txt wget -q {}
5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \ -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \ -n nodes.txt -o /some/output/dir --upload --download; ./ppss deploy -C config.cfg ./ppss start -C config
5$ # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname find source/dir -type f | parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o
Page 5 GNU Parallel alternatives {.}.mp3 --preset standard --quiet
6$ ./ppss stop -C config.cfg
6$ killall -TERM parallel
7$ ./ppss pause -C config.cfg
7$ Press: CTRL-Z or killall -SIGTSTP parallel
8$ ./ppss continue -C config.cfg
8$ Enter: fg or killall -SIGCONT parallel
9$ ./ppss.sh status -C config.cfg
9$ killall -SIGUSR2 parallel
https://github.com/louwrentius/PPSS DIFFERENCES BETWEEN pexec AND GNU Parallel Summary (see legend above): I1 I2 - I4 I5 - - M1 - M3 - - M6 O1 O2 O3 - O5 O6 E1 - - E4 - E6 - R1 - - - - R6 - - - S1 - pexec is also a tool for running jobs in parallel. EXAMPLES FROM pexec MANUAL Here are the examples from pexec's info page with the equivalent using GNU parallel: 1$ pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \ 'echo "scale=10000;sqrt($NUM)" | bc'
1$ seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | \ bc > sqrt-{}.dat'
2$ pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
2$ ls myfiles*.ext | parallel sort {} ">{}.sort"
3$ pexec -f image.list -n auto -e B -u star.log -c -- \ 'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
3$ parallel -a image.list \ 'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
Page 6 GNU Parallel alternatives 4$ pexec -r *.png -e IMG -c -o - -- \ 'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
4$ ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
5$ pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
5$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
6$ for p in *.png ; do echo ${p%.png} ; done | \ pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
6$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
7$ LIST=$(for p in *.png ; do echo ${p%.png} ; done) pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
7$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
8$ pexec -n 8 -r *.jpg -y unix -e IMG -c \ 'pexec -j -m blockread -d $IMG | \ jpegtopnm | pnmscale 0.5 | pnmtojpeg | \ pexec -j -m blockwrite -s th_$IMG'
8$ # Combining GNU B
# If reading and writing is done to the same disk, this may be # faster as only one process will be either reading or writing: ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \ 'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
https://www.gnu.org/software/pexec/ DIFFERENCES BETWEEN xjobs AND GNU Parallel xjobs is also a tool for running jobs in parallel. It only supports running jobs on your local computer. xjobs deals badly with special characters just like xargs. See the section DIFFERENCES BETWEEN xargs AND GNU Parallel. EXAMPLES FROM xjobs MANUAL Here are the examples from xjobs's man page with the equivalent using GNU parallel: 1$ ls -1 *.zip | xjobs unzip
1$ ls *.zip | parallel unzip
2$ ls -1 *.zip | xjobs -n unzip
2$ ls *.zip | parallel unzip >/dev/null
3$ find . -name '*.bak' | xjobs gzip Page 7 GNU Parallel alternatives 3$ find . -name '*.bak' | parallel gzip
4$ ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
4$ ls *.jar | parallel jar tf {} '>' {}.idx
5$ xjobs -s script
5$ cat script | parallel
6$ mkfifo /var/run/my_named_pipe; xjobs -s /var/run/my_named_pipe & echo unzip 1.zip >> /var/run/my_named_pipe; echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
6$ mkfifo /var/run/my_named_pipe; cat /var/run/my_named_pipe | parallel & echo unzip 1.zip >> /var/run/my_named_pipe; echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
https://www.maier-komor.de/xjobs.html (Last checked: 2019-01) DIFFERENCES BETWEEN prll AND GNU Parallel prll is also a tool for running jobs in parallel. It does not support running jobs on remote computers. prll encourages using BASH aliases and BASH functions instead of scripts. GNU parallel supports scripts directly, functions if they are exported using export -f, and aliases if using env_parallel. prll generates a lot of status information on stderr (standard error) which makes it harder to use the stderr (standard error) output of the job directly as input for another program. EXAMPLES FROM prll's MANUAL Here is the example from prll's man page with the equivalent using GNU parallel: 1$ prll -s 'mogrify -flip $1' *.jpg
1$ parallel mogrify -flip ::: *.jpg
https://github.com/exzombie/prll (Last checked: 2019-01) DIFFERENCES BETWEEN dxargs AND GNU Parallel dxargs is also a tool for running jobs in parallel. dxargs does not deal well with more simultaneous jobs than SSHD's MaxStartups. dxargs is only built for remote run jobs, but does not support transferring of files. https://web.archive.org/web/20120518070250/http://www. semicomplete.com/blog/geekery/distributed-xargs.html (Last checked: 2019-01) DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel middleman(mdm) is also a tool for running jobs in parallel. EXAMPLES FROM middleman's WEBSITE Here are the shellscripts of https://web.archive.org/web/20110728064735/http://mdm. berlios.de/usage.html ported to GNU parallel:
Page 8 GNU Parallel alternatives 1$ seq 19 | parallel buffon -o - | sort -n > result cat files | parallel cmd find dir -execdir sem cmd {} \;
https://github.com/cklin/mdm (Last checked: 2019-01) DIFFERENCES BETWEEN xapply AND GNU Parallel xapply can run jobs in parallel on the local computer. EXAMPLES FROM xapply's MANUAL Here are the examples from xapply's man page with the equivalent using GNU parallel: 1$ xapply '(cd %1 && make all)' */
1$ parallel 'cd {} && make all' ::: */
2$ xapply -f 'diff %1 ../version5/%1' manifest | more
2$ parallel diff {} ../version5/{} < manifest | more
3$ xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
3$ parallel --link diff {1} {2} :::: manifest1 checklist1
4$ xapply 'indent' *.c
4$ parallel indent ::: *.c
5$ find ~ksb/bin -type f ! -perm -111 -print | \ xapply -f -v 'chmod a+x' -
5$ find ~ksb/bin -type f ! -perm -111 -print | \ parallel -v chmod a+x
6$ find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
6$ sh <(find */ -... | parallel -s 1024 echo vi)
6$ find */ -... | parallel -s 1024 -Xuj1 vi
7$ find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
7$ sh <(find ... | parallel -n5 echo vi)
7$ find ... | parallel -n5 -uj1 vi
8$ xapply -fn "" /etc/passwd
8$ parallel -k echo < /etc/passwd
9$ tr ':' '\012' < /etc/passwd | \
Page 9 GNU Parallel alternatives xapply -7 -nf 'chown %1 %6' ------
9$ tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
10$ xapply '[ -d %1/RCS ] || echo %1' */
10$ parallel '[ -d {}/RCS ] || echo {}' ::: */
11$ xapply -f '[ -f %1 ] && echo %1' List | ...
11$ parallel '[ -f {} ] && echo {}' < List | ...
https://www.databits.net/~ksb/msrc/local/bin/xapply/xapply.html DIFFERENCES BETWEEN AIX apply AND GNU Parallel apply can build command lines based on a template and arguments - very much like GNU parallel. apply does not run jobs in parallel. apply does not use an argument separator (like :::); instead the template must be the first argument. EXAMPLES FROM IBM's KNOWLEDGE CENTER Here are the examples from IBM's Knowledge Center and the corresponding command using GNU parallel: To obtain results similar to those of the ls command, enter: 1$ apply echo * 1$ parallel echo ::: *
To compare the file named a1 to the file named b1, and the file named a2 to the file named b2, enter: 2$ apply -2 cmp a1 b1 a2 b2 2$ parallel -N2 cmp ::: a1 b1 a2 b2
To run the who command five times, enter: 3$ apply -0 who 1 2 3 4 5 3$ parallel -N0 who ::: 1 2 3 4 5
To link all files in the current directory to the directory /usr/joe, enter: 4$ apply 'ln %1 /usr/joe' * 4$ parallel ln {} /usr/joe ::: *
https://www-01.ibm.com/support/knowledgecenter/ ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01) DIFFERENCES BETWEEN paexec AND GNU Parallel paexec can run jobs in parallel on both the local and remote computers. paexec requires commands to print a blank line as the last output. This means you will have to write a wrapper for most programs. paexec has a job dependency facility so a job can depend on another job to be executed successfully. Sort of a poor-man's make.
Page 10 GNU Parallel alternatives EXAMPLES FROM paexec's EXAMPLE CATALOG Here are the examples from paexec's example catalog with the equivalent using GNU parallel: 1_div_X_run 1$ ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 < 1$ parallel echo {} '|' `pwd`/1_div_X_cmd < 2$ parallel echo {} '|' `pwd`/all_substr_cmd < 3$ parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \ -S host1,host2 < # This is not exactly the same, but avoids the wrapper parallel gcc -O2 -c -o {.}.o {} \ -S host1,host2 < 4$ parallel echo {} '|' ./toupper_cmd < # Without the wrapper: parallel echo {} '| awk {print\ toupper\(\$0\)}' < https://github.com/cheusov/paexec DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel Summary (see legend above): I1 - - I4 - - (I7) M1 (M2) M3 (M4) M5 M6 - O2 O3 - O5 - - N/A N/A O10 E1 ------(I7): Only under special circumstances. See below. (M2+M4): Only if there is a single replacement string. map rejects input with special characters: echo "The Cure" > My\ brother\'s\ 12\"\ records Page 11 GNU Parallel alternatives ls | map 'echo %; wc %' It works with GNU parallel: ls | parallel 'echo {}; wc {}' Under some circumstances it also works with map: ls | map 'echo % works %' But tiny changes make it reject the input with special characters: ls | map 'echo % does not work "%"' This means that many UTF-8 characters will be rejected. This is by design. From the web page: "As such, programs that quietly handle them, with no warnings at all, are doing their users a disservice." map delays each job by 0.01 s. This can be emulated by using parallel --delay 0.01. map prints '+' on stderr when a job starts, and '-' when a job finishes. This cannot be disabled. parallel has --bar if you need to see progress. map's replacement strings (% %D %B %E) can be simulated in GNU parallel by putting this in ~/.parallel/config: --rpl '%' --rpl '%D $_=Q(::dirname($_));' --rpl '%B s:.*/::;s:\.[^/.]+$::;' --rpl '%E s:.*\.::' map does not have an argument separator on the command line, but uses the first argument as command. This makes quoting harder which again may affect readability. Compare: map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" * parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: * map can do multiple arguments with context replace, but not without context replace: parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3 map "echo 'BEGIN{'%'}END'" 1 2 3 map has no support for grouping. So this gives the wrong results: parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \ ::: a b c d e f ls -l a b c d e f parallel -kP4 -n1 grep 1 ::: a b c d e f > out.par map -n1 -p 4 'grep 1' a b c d e f > out.map-unbuf map -n1 -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf map -n1 -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial ls -l out* md5sum out* Page 12 GNU Parallel alternatives EXAMPLES FROM map's WEBSITE Here are the examples from map's web page with the equivalent using GNU parallel: 1$ ls *.gif | map convert % %B.png # default max-args: 1 1$ ls *.gif | parallel convert {} {.}.png 2$ map "mkdir %B; tar -C %B -xf %" *.tgz # default max-args: 1 2$ parallel 'mkdir {.}; tar -C {.} -xf {}' ::: *.tgz 3$ ls *.gif | map cp % /tmp # default max-args: 100 3$ ls *.gif | parallel -X cp {} /tmp 4$ ls *.tar | map -n 1 tar -xf % 4$ ls *.tar | parallel tar -xf 5$ map "cp % /tmp" *.tgz 5$ parallel cp {} /tmp ::: *.tgz 6$ map "du -sm /home/%/mail" alice bob carol 6$ parallel "du -sm /home/{}/mail" ::: alice bob carol or if you prefer running a single job with multiple args: 6$ parallel -Xj1 "du -sm /home/{}/mail" ::: alice bob carol 7$ cat /etc/passwd | map -d: 'echo user %1 has shell %7' 7$ cat /etc/passwd | parallel --colsep : 'echo user {1} has shell {7}' 8$ export MAP_MAX_PROCS=$(( `nproc` / 2 )) 8$ export PARALLEL=-j50% https://github.com/sitaramc/map (Last checked: 2020-05) DIFFERENCES BETWEEN ladon AND GNU Parallel ladon can run multiple jobs on files in parallel. ladon only works on files and the only way to specify files is using a quoted glob string (such as \*.jpg). It is not possible to list the files manually. As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR RELPATH These can be simulated using GNU parallel by putting this in ~/.parallel/config: --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});' --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});' --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;' --rpl 'EXT s:.*\.::' Page 13 GNU Parallel alternatives --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd}); s:\Q$c/\E::;$_=::dirname($_);' --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd}); s:\Q$c/\E::;' ladon deals badly with filenames containing " and newline, and it fails for output larger than 200k: ladon '*' -- seq 36000 | wc EXAMPLES FROM ladon MANUAL It is assumed that the '--rpl's above are put in ~/.parallel/config and that it is run under a shell that supports '**' globbing (such as zsh): 1$ ladon "**/*.txt" -- echo RELPATH 1$ parallel echo RELPATH ::: **/*.txt 2$ ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt 2$ parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt 3$ ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH \ -thumbnail 100x100^ -gravity center -extent 100x100 \ thumbs/RELPATH 3$ parallel mkdir -p thumbs/RELDIR\; convert FULLPATH -thumbnail 100x100^ -gravity center -extent 100x100 \ thumbs/RELPATH ::: **/*.jpg 4$ ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 4$ parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav https://github.com/danielgtaylor/ladon (Last checked: 2019-01) DIFFERENCES BETWEEN jobflow AND GNU Parallel jobflow can run multiple jobs in parallel. Just like xargs output from jobflow jobs running in parallel mix together by default. jobflow can buffer into files (placed in /run/shm), but these are not cleaned up if jobflow dies unexpectedly (e.g. by Ctrl-C). If the total output is big (in the order of RAM+swap) it can cause the system to slow to a crawl and eventually run out of memory. jobflow gives no error if the command is unknown, and like xargs redirection and composed commands require wrapping with bash -c. Input lines can at most be 4096 bytes. You can at most have 16 {}'s in the command template. More than that either crashes the program or simple does not execute the command. jobflow has no equivalent for --pipe, or --sshlogin. jobflow makes it possible to set resource limits on the running jobs. This can be emulated by GNU parallel using bash's ulimit: jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob Page 14 GNU Parallel alternatives parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob' EXAMPLES FROM jobflow README 1$ cat things.list | jobflow -threads=8 -exec ./mytask {} 1$ cat things.list | parallel -j8 ./mytask {} 2$ seq 100 | jobflow -threads=100 -exec echo {} 2$ seq 100 | parallel -j100 echo {} 3$ cat urls.txt | jobflow -threads=32 -exec wget {} 3$ cat urls.txt | parallel -j32 wget {} 4$ find . -name '*.bmp' | \ jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg 4$ find . -name '*.bmp' | \ parallel -j8 bmp2jpeg {.}.bmp {.}.jpg https://github.com/rofl0r/jobflow DIFFERENCES BETWEEN gargs AND GNU Parallel gargs can run multiple jobs in parallel. Older versions cache output in memory. This causes it to be extremely slow when the output is larger than the physical RAM, and can cause the system to run out of memory. See more details on this in man parallel_design. Newer versions cache output in files, but leave files in $TMPDIR if it is killed. Output to stderr (standard error) is changed if the command fails. EXAMPLES FROM gargs WEBSITE 1$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}" 1$ seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}" 2$ cat t.txt | gargs --sep "\s+" \ -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'" 2$ cat t.txt | parallel --colsep "\\s+" \ -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'" https://github.com/brentp/gargs DIFFERENCES BETWEEN orgalorg AND GNU Parallel orgalorg can run the same job on multiple machines. This is related to --onall and --nonall. orgalorg supports entering the SSH password - provided it is the same for all servers. GNU parallel advocates using ssh-agent instead, but it is possible to emulate orgalorg's behavior by setting SSHPASS and by using --ssh "sshpass ssh". Page 15 GNU Parallel alternatives To make the emulation easier, make a simple alias: alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb" If you want to supply a password run: SSHPASS=`ssh-askpass` or set the password directly: SSHPASS=P4$$w0rd! If the above is set up you can then do: orgalorg -o frontend1 -o frontend2 -p -C uptime par_emul -S frontend1 -S frontend2 uptime orgalorg -o frontend1 -o frontend2 -p -C top -bid 1 par_emul -S frontend1 -S frontend2 top -bid 1 orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \ 'md5sum /tmp/bigfile' -S bigfile par_emul -S frontend1 -S frontend2 --basefile bigfile \ --workdir /tmp md5sum /tmp/bigfile orgalorg has a progress indicator for the transferring of a file. GNU parallel does not. https://github.com/reconquest/orgalorg DIFFERENCES BETWEEN Rust parallel AND GNU Parallel Rust parallel focuses on speed. It is almost as fast as xargs, but not as fast as parallel-bash. It implements a few features from GNU parallel, but lacks many functions. All these fail: # Read arguments from file parallel -a file echo # Changing the delimiter parallel -d _ echo ::: a_b_c_ These do something different from GNU parallel # -q to protect quoted $ and space parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c # Generation of combination of inputs parallel echo {1} {2} ::: red green blue ::: S M L XL XXL # {= perl expression =} replacement string parallel echo '{= s/new/old/ =}' ::: my.new your.new # --pipe seq 100000 | parallel --pipe wc # linked arguments parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu # Run different shell dialects zsh -c 'parallel echo \={} ::: zsh && true' csh -c 'parallel echo \$\{\} ::: shell && true' bash -c 'parallel echo \$\({}\) ::: pwd && true' # Rust parallel does not start before the last argument is read (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo' tail -f /var/log/syslog | parallel echo Page 16 GNU Parallel alternatives Most of the examples from the book GNU Parallel 2018 do not work, thus Rust parallel is not close to being a compatible replacement. Rust parallel has no remote facilities. It uses /tmp/parallel for tmp files and does not clean up if terminated abruptly. If another user on the system uses Rust parallel, then /tmp/parallel will have the wrong permissions and Rust parallel will fail. A malicious user can setup the right permissions and symlink the output file to one of the user's files and next time the user uses Rust parallel it will overwrite this file. attacker$ mkdir /tmp/parallel attacker$ chmod a+rwX /tmp/parallel # Symlink to the file the attacker wants to zero out attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1 victim$ seq 1000 | parallel echo # This file is now overwritten with stderr from 'echo' victim$ cat ~victim/.important-file If /tmp/parallel runs full during the run, Rust parallel does not report this, but finishes with success - thereby risking data loss. https://github.com/mmstick/parallel DIFFERENCES BETWEEN Rush AND GNU Parallel rush (https://github.com/shenwei356/rush) is written in Go and based on gargs. Just like GNU parallel rush buffers in temporary files. But opposite GNU parallel rush does not clean up, if the process dies abnormally. rush has some string manipulations that can be emulated by putting this into ~/.parallel/config (/ is used instead of %, and % is used instead of ^ as that is closer to bash's ${var%postfix}): --rpl '{:} s:(\.[^/]+)*$::' --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::' --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:' --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:' --rpl '{@(.*?)} /$$1/ and $_=$1;' EXAMPLES FROM rush's WEBSITE Here are the examples from rush's website with the equivalent command in GNU parallel. 1. Simple run, quoting is not necessary 1$ seq 1 3 | rush echo {} 1$ seq 1 3 | parallel echo {} 2. Read data from file (`-i`) 2$ rush echo {} -i data1.txt -i data2.txt 2$ cat data1.txt data2.txt | parallel echo {} 3. Keep output order (`-k`) 3$ seq 1 3 | rush 'echo {}' -k 3$ seq 1 3 | parallel -k echo {} Page 17 GNU Parallel alternatives 4. Timeout (`-t`) 4$ time seq 1 | rush 'sleep 2; echo {}' -t 1 4$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}' 5. Retry (`-r`) 5$ seq 1 | rush 'python unexisted_script.py' -r 1 5$ seq 1 | parallel --retries 2 'python unexisted_script.py' Use -u to see it is really run twice: 5$ seq 1 | parallel -u --retries 2 'python unexisted_script.py' 6. Dirname (`{/}`) and basename (`{%}`) and remove custom suffix (`{^suffix}`) 6$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}' 6$ echo dir/file_1.txt.gz | parallel --plus echo {//} {/} {%_1.txt.gz} 7. Get basename, and remove last (`{.}`) or any (`{:}`) extension 7$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}' 7$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}' 8. Job ID, combine fields index and other replacement strings 8$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}' 8$ echo 12 file.txt dir/s_1.fq.gz | parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}' 9. Capture submatch using regular expression (`{@regexp}`) 9$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}' 9$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}' 10. Custom field delimiter (`-d`) 10$ echo a=b=c | rush 'echo {1} {2} {3}' -d = 10$ echo a=b=c | parallel -d = echo {1} {2} {3} 11. Send multi-lines to every command (`-n`) 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' 11$ seq 5 | parallel -n 2 -k \ Page 18 GNU Parallel alternatives 'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo' 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' ' 11$ seq 5 | parallel -n 2 -k 'echo {}; echo' 12. Custom record delimiter (`-D`), note that empty records are not used. 12$ echo a b c d | rush -D " " -k 'echo {}' 12$ echo a b c d | parallel -d " " -k 'echo {}' 12$ echo abcd | rush -D "" -k 'echo {}' Cannot be done by GNU Parallel 12$ cat fasta.fa >seq1 tag >seq2 cat gat >seq3 attac a cat 12$ cat fasta.fa | rush -D ">" \ 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n" # rush fails to join the multiline sequences 12$ cat fasta.fa | (read -n1 ignore_first_char; parallel -d '>' --colsep '\n' echo FASTA record {#}: \ name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}' ) 13. Assign value to variable, like `awk -v` (`-v`) 13$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen 13$ seq 1 | parallel -N0 \ 'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!' 13$ for var in a b; do \ 13$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \ 13$ done In GNU parallel you would typically do: 13$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: - Page 19 GNU Parallel alternatives If you really want the var: 13$ seq 1 3 | parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: - If you really want the for-loop: 13$ for var in a b; do export var; seq 1 3 | parallel -k 'echo var: $var, data: {}'; done Contrary to rush this also works if the value is complex like: My brother's 12" records 14. Preset variable (`-v`), avoid repeatedly writing verbose replacement strings 14$ # naive way echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz' 14$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz' 14$ # macro + removing suffix echo read_1.fq.gz | rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz' 14$ echo read_1.fq.gz | parallel 'p={:%_1}; echo $p ${p}_2.fq.gz' 14$ # macro + regular expression echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz' 14$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz' Contrary to rush GNU parallel works with complex values: 14$ echo "My brother's 12\"read_1.fq.gz" | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz' 15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and exit. 15$ seq 1 20 | rush 'sleep 1; echo {}' ^C 15$ seq 1 20 | parallel 'sleep 1; echo {}' ^C 16. Continue/resume jobs (`-c`). When some jobs failed (by execution failure, timeout, or canceling by user with `Ctrl + C`), please switch flag `-c/--continue` on and run again, so that `rush` can save successful commands and ignore them in NEXT run. 16$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c cat successful_cmds.rush seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c Page 20 GNU Parallel alternatives 16$ seq 1 3 | parallel --joblog mylog --timeout 2 \ 'sleep {}; echo {}' cat mylog seq 1 3 | parallel --joblog mylog --retry-failed \ 'sleep {}; echo {}' Multi-line jobs: 16$ seq 1 3 | rush 'sleep {}; echo {}; \ echo finish {}' -t 3 -c -C finished.rush cat finished.rush seq 1 3 | rush 'sleep {}; echo {}; \ echo finish {}' -t 3 -c -C finished.rush 16$ seq 1 3 | parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \ echo finish {}' cat mylog seq 1 3 | parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \ echo finish {}' 17. A comprehensive example: downloading 1K+ pages given by three URL list files using `phantomjs save_page.js` (some page contents are dynamically generated by Javascript, so `wget` does not work). Here I set max jobs number (`-j`) as `20`, each job has a max running time (`-t`) of `60` seconds and `3` retry changes (`-r`). Continue flag `-c` is also switched on, so we can continue unfinished jobs. Luckily, it's accomplished in one run :) 17$ for f in $(seq 2014 2016); do \ /bin/rm -rf $f; mkdir -p $f; \ cat $f.html.txt | rush -v d=$f -d = \ 'phantomjs save_page.js "{}" > {d}/{3}.html' \ -j 20 -t 60 -r 3 -c; \ done GNU parallel can append to an existing joblog with '+': 17$ rm mylog for f in $(seq 2014 2016); do /bin/rm -rf $f; mkdir -p $f; cat $f.html.txt | parallel -j20 --timeout 60 --retries 4 --joblog +mylog \ --colsep = \ phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html done 18. A bioinformatics example: mapping with `bwa`, and processing result with `samtools`: 18$ ref=ref/xxx.fa threads=25 ls -d raw.cluster.clean.mapping/* \ | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \ 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\ samtools view -bS {p}.sam > {p}.bam; \ samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \ samtools index {p}.sorted.bam; \ samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \ Page 21 GNU Parallel alternatives /bin/rm {p}.bam {p}.sam;' \ -j 2 --verbose -c -C mapping.rush GNU parallel would use a function: 18$ ref=ref/xxx.fa export ref thr=25 export thr bwa_sam() { p="$1" bam="$p".bam sam="$p".sam sortbam="$p".sorted.bam bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam" samtools view -bS "$sam" > "$bam" samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam" samtools index "$sortbam" samtools flagstat "$sortbam" > "$sortbam".flagstat /bin/rm "$bam" "$sam" } export -f bwa_sam ls -d raw.cluster.clean.mapping/* | parallel -j 2 --verbose --joblog mylog bwa_sam Other rush features rush has: * awk -v like custom defined variables (-v) With GNU parallel you would simply set a shell variable: parallel 'v={}; echo "$v"' ::: foo echo foo | rush -v v={} 'echo {v}' Also rush does not like special chars. So these do not work: echo does not work | rush -v v=\" 'echo {v}' echo "My brother's 12\" records" | rush -v v={} 'echo {v}' Whereas the corresponding GNU parallel version works: parallel 'v=\"; echo "$v"' ::: works parallel 'v={}; echo "$v"' ::: "My brother's 12\" records" * Exit on first error(s) (-e) This is called --halt now,fail=1 (or shorter: --halt 2) when used with GNU parallel. * Settable records sending to every command (-n, default 1) This is also called -n in GNU parallel. * Practical replacement strings {:} remove any extension With GNU parallel this can be emulated by: parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz {^suffix}, remove suffix Page 22 GNU Parallel alternatives With GNU parallel this can be emulated by: parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz {@regexp}, capture submatch using regular expression With GNU parallel this can be emulated by: parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \ echo '{@\d_(.*).gz}' ::: 1_foo.gz {%.}, {%:}, basename without extension With GNU parallel this can be emulated by: parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz And if you need it often, you define a --rpl in $HOME/.parallel/config: --rpl '{%.} s:.*/::;s/\..*//' --rpl '{%:} s:.*/::;s/\..*//' Then you can use them as: parallel echo {%.} {%:} ::: dir/foo.bar.gz * Preset variable (macro) E.g. echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix' With GNU parallel this can be emulated by: echo foosuffix | parallel --plus 'p={%suffix}; echo ${p}_new_suffix' Opposite rush GNU parallel works fine if the input contains double space, ' and ": echo "1'6\" foosuffix" | parallel --plus 'p={%suffix}; echo "${p}"_new_suffix' * Commands of multi-lines While you can use multi-lined commands in GNU parallel, to improve readability GNU parallel discourages the use of multi-line commands. In most cases it can be written as a function: seq 1 3 | parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \ echo finish {}' Could be written as: doit() { sleep "$1" echo "$1" echo finish "$1" } export -f doit seq 1 3 | parallel --timeout 2 --joblog my.log doit The failed commands can be resumed with: seq 1 3 | parallel --resume-failed --joblog my.log 'sleep {}; echo {};\ Page 23 GNU Parallel alternatives echo finish {}' https://github.com/shenwei356/rush DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel ClusterSSH solves a different problem than GNU parallel. ClusterSSH opens a terminal window for each computer and using a master window you can run the same command on all the computers. This is typically used for administrating several computers that are almost identical. GNU parallel runs the same (or different) commands with different arguments in parallel possibly using remote computers to help