Using Parallel Execution Perl Program on Multiple Biohpc Lab Machines

Using Parallel Execution Perl Program on Multiple Biohpc Lab Machines

Using parallel execution Perl program on multiple BioHPC Lab machines This Perl program is intended to run on multiple machines, and it is also possible to run multiple programs on each machine. This Perl program takes 2 arguments: /programs/bin/perlscripts/perl_fork_univ.pl JobListFile MachineFile JobListFile is a file containing all the commands to execute – one per line. MachineFile is the file with the list of machines to be used (one per line), with a number of processes to execute in parallel on this machine following the machine name: cbsumm14 24 cbsumm15 24 cbsumm16 24 Typical examples of parallel Perl driver use are cases when the number of tasks exceeds the number of cores. For example, when the number of libraries in RAN-seq project is large (say 500), you can prepare a file with all tophat tasks needed (50 lines, no ‘&’ at the end of lines!, each of them on 7 cores) and then run 9 of them at a time on 2 64 core machines (using total 63 cores on each machine – 9 instances at a time using 7 cores each). Using multiple machines is more complicated than running parallel Perl driver on a single machine. Local directory /workdir is not visible across machines so the processes will need to use home directory for files storage and communication. Also, you need to enable program execution between BioHPC Lab machines without a need to enter your password. It can be done using the following commands: cd ~ ssh-keygen -t rsa (use empty password when prompted) cat .ssh/id_rsa.pub >> .ssh/authorized_keys chmod 640 .ssh/authorized_keys chmod 700 .ssh A good example is PAML simulation on 110 genes. The example input data for multiple machines is in /programs/paml_mn.example.tar. If you would like to try it you need to unpack data into a subdirectory of your home directory. In this example it will be /home/jarekp/tmp – or ~/tmp (~/ denotes home directory). cd /~ mkdir tmp cd tmp tar –xf /programs/paml_mn.example.tar Task list is stored in file ‘tasklist’, which begins as follows (total 110 lines). Note no ‘&’ at line ends! ~/tmp/runpaml.sh 961 ~/tmp/runpaml.sh 914 ~/tmp/runpaml.sh 971 ~/tmp/runpaml.sh 974 ~/tmp/runpaml.sh 970 ~/tmp/runpaml.sh 948 Each PAML execution is done via runpaml.sh script. It copies files from home directory to a subdirectory of /workdir, executes PAML there and then copies the results back to home directory (removing leftover files from /workdir): #!/bin/bash cd /workdir if [ ! -e $LOGNAME ] then mkdir $LOGNAME fi cd $LOGNAME cp -ar ~/tmp/results/$1 . cd $1 ~/tmp/paml4.7/bin/codeml my.control >& log cd .. cp -arf $1 ~/tmp/results/ rm -rf $1 $LOGNAME is an environmental variable always set to user’s login name. $1 represents the first argument of the script command line (and the only one in this example). In order to run PAML simulations in parallel you need to execute Perl parallel driver /programs/bin/perlscripts/perl_fork_univ_mn.pl tasklist machines I used 3 medium memory machines in this example and 24 cores (tasks) on each. Before running the driver make sure you can connect to each machine using ssh from the “master” machine (the one you will be running perl_fork_univ_mn.pl), if it is the first time you ssh into it from master you will need to allow it to be added to “known hosts”: [jarekp@cbsumm14 tmp]$ ssh cbsumm16 The authenticity of host 'cbsumm16 (128.84.3.236)' can't be established. RSA key fingerprint is 4d:5d:d1:a0:b1:c4:cb:e0:5d:b5:03:e0:8b:e8:b4:5f. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'cbsumm16,128.84.3.236' (RSA) to the list of known hosts. [jarekp@cbsumm16 ~]$ .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    3 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us