NAME DIFFERENCES BETWEEN GNU Parallel and ALTERNATIVES
Total Page:16
File Type:pdf, Size:1020Kb
GNU Parallel alternatives NAME parallel_alternatives - Alternatives to GNU parallel DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES There are a lot programs that share functionality with GNU parallel. Some of these are specialized tools, and while GNU parallel can emulate many of them, a specialized tool can be betterat a given task. GNU parallel strives to include the best of thegeneral functionality without sacrificing ease of use. parallel has existed since 2002-01-06 and as GNU parallel since2010. A lot of the alternatives have not had the vitality to survivethat long, but have come and gone during that time. GNU parallel is actively maintained with a new release every monthsince 2010. Most other alternatives are fleeting interests of thedevelopers with irregular releases and only maintained for a fewyears. SUMMARY LEGEND The following features are in some of the comparable tools: Inputs I1. Arguments can be read from stdin I2. Arguments can be read from a file I3. Arguments can be read from multiple files I4. Arguments can be read from command line I5. Arguments can be read from a table I6. Arguments can be read from the same file using #! (shebang) I7. Line oriented input as default (Quoting of special chars not needed) Manipulation of input M1. Composed command M2. Multiple arguments can fill up an execution line M3. Arguments can be put anywhere in the execution line M4. Multiple arguments can be put anywhere in the execution line M5. Arguments can be replaced with context M6. Input can be treated as the complete command line Outputs O1. Grouping output so output from different jobs do not mix O2. Send stderr (standard error) to stderr (standard error) O3. Send stdout (standard output) to stdout (standard output) O4. Order of output can be same as order of input O5. Stdout only contains stdout (standard output) from the command O6. Stderr only contains stderr (standard error) from the command O7. Buffering on disk O8. Cleanup of temporary files if killed O9. Test if disk runs full during run O10. Output of a line bigger than 4 GB Execution E1. Running jobs in parallel Page 1 GNU Parallel alternatives E2. List running jobs E3. Finish running jobs, but do not start new jobs E4. Number of running jobs can depend on number of cpus E5. Finish running jobs, but do not start new jobs after first failure E6. Number of running jobs can be adjusted while running E7. Only spawn new jobs if load is less than a limit Remote execution R1. Jobs can be run on remote computers R2. Basefiles can be transferred R3. Argument files can be transferred R4. Result files can be transferred R5. Cleanup of transferred files R6. No config files needed R7. Do not run more than SSHD's MaxStartups can handle R8. Configurable SSH command R9. Retry if connection breaks occasionally Semaphore S1. Possibility to work as a mutex S2. Possibility to work as a counting semaphore Legend - = no x = not applicable ID = yes As every new version of the programs are not tested the table may beoutdated. Please file a bug report if you find errors (See REPORTINGBUGS). parallel: I1 I2 I3 I4 I5 I6 I7 M1 M2 M3 M4 M5 M6 O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 E1 E2 E3 E4 E5 E6 E7 R1 R2 R3 R4 R5 R6 R7 R8 R9 S1 S2 DIFFERENCES BETWEEN xargs AND GNU Parallel Summary (see legend above): I1 I2 - - - - - - M2 M3 - - - - O2 O3 - O5 O6 E1 - - - - - - - - - - - x - - - - - xargs offers some of the same possibilities as GNU parallel. Page 2 GNU Parallel alternatives xargs deals badly with special characters (such as space, \, ' and"). To see the problem try this: touch important_file touch 'not important_file' ls not* | xargs rm mkdir -p "My brother's 12\" records" ls | xargs rmdir touch 'c:\windows\system32\clfs.sys' echo 'c:\windows\system32\clfs.sys' | xargs ls -l You can specify -0, but many input generators are not optimized forusing NUL as separator but are optimized for newline asseparator. E.g. awk, ls, echo, tar -v, head (requiresusing -z), tail (requires using -z), sed (requires using -z), perl (-0 and \0 instead of \n), locate (requiresusing -0), find (requires using -print0), grep (requiresusing -z or -Z), sort (requires using -z). GNU parallel's newline separation can be emulated with: cat | xargs -d "\n" -n1 command xargs can run a given number of jobs in parallel, but has nosupport for running number-of-cpu-cores jobs in parallel. xargs has no support for grouping the output, therefore output mayrun together, e.g. the first half of a line is from one process andthe last half of the line is from another process. The example Parallel grep cannot be done reliably with xargs because ofthis. To see this in action try: parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \ '>' {} ::: a b c d e f g h # Serial = no mixing = the wanted result # 'tr -s a-z' squeezes repeating letters into a single letter echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z # Compare to 8 jobs in parallel parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \ tr -s a-z Or try this: slow_seq() { echo Count to "$@" seq "$@" | perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}' } export -f slow_seq # Serial = no mixing = the wanted result seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}' # Compare to 8 jobs in parallel seq 8 | parallel -P8 slow_seq {} seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}' xargs has no support for keeping the order of the output, thereforeif running jobs in parallel using xargs the output of the secondjob cannot be postponed till the first job is done. xargs has no support for running jobs on remote computers. xargs has no support for context replace, so you will have to create thearguments. If you use a replace string in xargs (-I) you can not force xargs to use more than one argument. Page 3 GNU Parallel alternatives Quoting in xargs works like -q in GNU parallel. This meanscomposed commands and redirection require using bash -c. ls | parallel "wc {} >{}.wc" ls | parallel "echo {}; ls {}|wc" becomes (assuming you have 8 cores and that none of the filenamescontain space, " or '). ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc" ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc" A more extreme example can be found on:https://unix.stackexchange.com/q/405552/ https://www.gnu.org/software/findutils/ DIFFERENCES BETWEEN find -exec AND GNU Parallel Summary (see legend above): - - - x - x - - M2 M3 - - - - - O2 O3 O4 O5 O6 - - - - - - - - - - - - - - - - x x find -exec offers some of the same possibilities as GNU parallel. find -exec only works on files. Processing other input (such ashosts or URLs) will require creating these inputs as files. find-exec has no support for running commands in parallel. https://www.gnu.org/software/findutils/ (Last checked: 2019-01) DIFFERENCES BETWEEN make -j AND GNU Parallel Summary (see legend above): - - - - - - - - - - - - - O1 O2 O3 - x O6 E1 - - - E5 - - - - - - - - - - - - make -j can run jobs in parallel, but requires a crafted Makefileto do this. That results in extra quoting to get filenames containingnewlines to work correctly. make -j computes a dependency graph before running jobs. Jobs runby GNU parallel does not depend on each other. (Very early versions of GNU parallel were coincidentally implementedusing make -j). https://www.gnu.org/software/make/ (Last checked: 2019-01) DIFFERENCES BETWEEN ppss AND GNU Parallel Summary (see legend above): I1 I2 - - - - I7 M1 - M3 - - M6 Page 4 GNU Parallel alternatives O1 - - x - - E1 E2 ?E3 E4 - - - R1 R2 R3 R4 - - ?R7 ? ? - - ppss is also a tool for running jobs in parallel. The output of ppss is status information and thus not useful forusing as input for another command. The output from the jobs are putinto files. The argument replace string ($ITEM) cannot be changed. Arguments mustbe quoted - thus arguments containing special characters (space '"&!*)may cause problems. More than one argument is not supported. Filenamescontaining newlines are not processed correctly. When reading input from a file null cannot be used as a terminator. ppss needs to readthe whole input file before starting any jobs. Output and status information is stored in ppss_dir and thus requirescleanup when completed. If the dir is not removed before running ppss again it may cause nothing to happen as ppss thinks thetask is already done. GNU parallel will normally not need cleaningup if running locally and will only need cleaning up if stoppedabnormally and running remote (--cleanup may not complete ifstopped abnormally). The example Parallel grep would require extrapostprocessing if written using ppss. For remote systems PPSS requires 3 steps: config, deploy, andstart. GNU parallel only requires one step. EXAMPLES FROM ppss MANUAL Here are the examples from ppss's manual page with the equivalentusing GNU parallel: 1$ ./ppss.sh standalone -d /path/to/files -c 'gzip ' 1$ find /path/to/files -type f | parallel gzip 2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir ' 2$ find /path/to/files -type f | parallel cp {} /destination/dir 3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q ' 3$ parallel -a list-of-urls.txt wget -q 4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"' 4$ parallel -a list-of-urls.txt wget -q {} 5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \ -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \ -n nodes.txt -o /some/output/dir --upload --download; ./ppss deploy -C config.cfg ./ppss start -C config 5$ # parallel does not use configs.