Package 'Slurmr'
Total Page:16
File Type:pdf, Size:1020Kb
Package ‘slurmR’ September 3, 2021 Title A Lightweight Wrapper for 'Slurm' Version 0.5-1 Description 'Slurm', Simple Linux Utility for Resource Management <https://slurm.schedmd.com/>, is a popular 'Linux' based software used to schedule jobs in 'HPC' (High Performance Computing) clusters. This R package provides a specialized lightweight wrapper of 'Slurm' with a syntax similar to that found in the 'parallel' R package. The package also includes a method for creating socket cluster objects spanning multiple nodes that can be used with the 'parallel' package. Depends R (>= 3.3.0), parallel License MIT + file LICENSE BugReports https://github.com/USCbiostats/slurmR/issues URL https://github.com/USCbiostats/slurmR, https://slurm.schedmd.com/ Encoding UTF-8 RoxygenNote 7.1.1 Suggests knitr, rmarkdown, covr, tinytest Imports utils VignetteBuilder knitr Language en-US NeedsCompilation no Author George Vega Yon [aut, cre] (<https://orcid.org/0000-0002-3171-0844>), Paul Marjoram [ctb, ths] (<https://orcid.org/0000-0003-0824-7449>), National Cancer Institute (NCI) [fnd] (Grant Number 5P01CA196569-02), Michael Schubert [rev] (JOSS reviewer, <https://orcid.org/0000-0002-6862-5221>), Michel Lang [rev] (JOSS reviewer, <https://orcid.org/0000-0001-9754-0393>) Maintainer George Vega Yon <[email protected]> Repository CRAN Date/Publication 2021-09-03 04:20:02 UTC 1 2 expand_array_indexes R topics documented: expand_array_indexes . .2 JOB_STATE_CODES . .3 makeSlurmCluster . .4 new_rscript . .6 opts_slurmR . .7 parse_flags . .9 random_job_name . .9 read_sbatch . 10 slurmR . 11 slurmr_docker . 11 slurm_available . 12 Slurm_clean . 15 Slurm_collect . 16 Slurm_env . 17 Slurm_EvalQ . 18 slurm_job . 19 Slurm_log . 21 Slurm_Map . 22 snames . 25 sourceSlurm . 25 status . 28 the_plan . 29 wait_slurm . 30 WhoAmI . 31 Index 33 expand_array_indexes Expand Array Indexes Description When submitting array jobs using sbatch, users can specify indices in several ways. These could be specified as, for example, ranges, "1-9", lists, "1,2,5", or intervals as "1-7:3", which translates into "1, 4, 7". This function expands those cases. Usage expand_array_indexes(x) Arguments x A character vector. Array indexes (see details). JOB_STATE_CODES 3 Details x is assumed to be in the form of [jobid](_[array expression]), where the expression after the underscore is optional. The first The function will return an expanded version of this, e.g. if x = "8123_[1,3-6]" the resulting expression will be the vector "8123_1", "8123_3", "8123_4", "8123_5", and "8123_6". This function was developed mainly to be used internally. Value A character vector with the expanded indices. Examples expand_array_indexes(c("512", "123_1", "55_[1-5]", "122_[1, 5-6]", "44_[1-3:2]")) # [1] "512" "123_1" "55_1" "55_2" "55_3" "55_4" "55_5" # "122_1" "122_5" "122_6" "44_1" "44_3" JOB_STATE_CODES Slurm Job state codes Description This data frame contains information regarding the job state codes that Slurm returns when querying the status of a given job. The last column, type, shows a description of how that corresponding state is considered in the package’s various operations. This is used in the function status. Usage JOB_STATE_CODES Format A data frame with 24 rows and 4 columns. References Slurm’s website https://slurm.schedmd.com/squeue.html 4 makeSlurmCluster makeSlurmCluster Create a Parallel Socket Cluster using Slurm Description This function is essentially a wrapper of the function parallel::makePSOCKcluster. makeSlurmCluster main feature is adding node addresses. Usage makeSlurmCluster( n, job_name = random_job_name(), tmp_path = opts_slurmR$get_tmp_path(), cluster_opt = list(), max_wait = 300L, verb = TRUE, ... ) ## S3 method for class 'slurm_cluster' stopCluster(cl) Arguments n Integer scalar. Size of the cluster object (see details). job_name Character. Name of the job to be passed to Slurm. tmp_path Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). cluster_opt A list of arguments passed to parallel::makePSOCKcluster. max_wait Integer scalar. Wait time before exiting with error while trying to read the nodes information. verb Logical scalar. If TRUE, the function will print messages on screen reporting on the status of the job submission. ... Further arguments passed to Slurm_EvalQ via sbatch_opt. cl An object of class slurm_cluster. Details By default, if the time option is not specified via ..., then it is set to the value 01:00:00, this is, 1 hour. Once a job is submitted via Slurm, the user gets access to the nodes associated with it, which allows users to star new processes within those. By means of this, we can create Socket, also known as makeSlurmCluster 5 "PSOCK", clusters across nodes in a Slurm environment. The name of the hosts are retrieved and passed later on to parallel::makePSOCKcluster. It has been the case that R fails to create the cluster with the following message in the Slurm log file: srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive In such cases, setting the memory, for example, upfront can solve the problem. For example: cl <- makeSlurmCluster(20, mem = 20) If the problem persists, i.e., the cluster cannot be created, make sure that your Slurm cluster allows Socket connections between nodes. The method stopCluster for slurm_cluster stops the cluster doing the following: 1. Closes the connection by calling the stopCluster method for PSOCK objects. 2. Cancel the Slurm job using scancel. Value A object of class c("slurm_cluster","SOCKcluster","cluster"). It is the same as what is returned by parallel::makePSOCKcluster with the main difference that it has two extra attributes: • SLURM_JOBID Which is the id of the Job that initialized that cluster. Maximum number of connections By default, R limits the number of simultaneous connections (see this thread in R-sig-hpc https: //stat.ethz.ch/pipermail/r-sig-hpc/2012-May/001373.html) Current maximum is 128 (R version 3.6.1). To modify that limit, you would need to reinstall R updating the macro NCONNECTIONS in the file src/main/connections.c. For now, if the user sets n above 128 it will get an immediate warning pointing to this issue, in particular, specifying that the cluster object may not be able to be created. Examples ## Not run: # Creating a cluster with 100 workers/offpring/child R sessions cl <- makeSlurmCluster(100) # Computing the mean of a 100 random uniforms within each worker # for this we can use any of the function available in the parallel package. ans <- parSapply(1:200, function(x) mean(runif(100))) # We simply call stopCluster as we would do with any other cluster # object stopCluster(ans) # We can also specify SBATCH options directly (...) 6 new_rscript cl <- makeSlurmCluster(200, partition = "thomas", time = "02:00:00") stopCluster(cl) ## End(Not run) new_rscript General purpose function to write R scripts Description This function will create an object of class slurmR_rscript that can be used to write the R com- ponent in a batch job. Usage new_rscript( njobs, tmp_path, job_name, pkgs = list_loaded_pkgs(), libPaths = .libPaths() ) Arguments njobs Integer. Number of jobs to use in the job-array. This specifies the number of R sessions to initialize. This does not specify the number of cores to be used. tmp_path Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). job_name Character. Name of the job to be passed to Slurm. pkgs A named list with packages to be included. Each element of the list must be a path to the R library, while the names of the list are the names of the R packages to be loaded. libPaths A character vector. See .libPaths. Value An environment of class slurmR_rscript. This has the following accessible components: • add_rds Add rds files to be loaded in each job.", x is a named list with the objects that should be loaded in the jobs. If index = TRUE the function assumes that the user will be accessing a particular subset of x during the job, which is accessed according to INDICES[[ARRAY_ID]]. The option compress is passed to saveRDS. One important side effect is that when this function is called, the object will be saved in the current job directory, this is opts_slurmR$get_tmp_path(). opts_slurmR 7 • append Adds a line to the R script. Its only argument, x is a character vector that will be added to the R script. • rscript A character vector. This is the actual R script that will be written at the end. • finalize Adds the final line of the R script. This function receives a character scalar x which is used as the name of the object to be saved. If missing, the function will save a NULL object. The compress argument is passed to saveRDS. • set_seed Adds a vector of seeds to be used across the jobs. This vector of seeds should be of length njobs. The other two parameters of the function are passed to set.seed. By default the seed is picked as follows: seeds <- sample.int(.Machine$integer.max, njobs, replace = FALSE) • write Finalizes the process by writing the R script in the corresponding folder to be used with Slurm. opts_slurmR Get and set default options for sbatch and slurmR internals Description Most of the functions in the slurmR package use tmp_path and job-name options to write and sub- mit jobs to Slurm. These options have global defaults that are set and retrieved using opts_slurmR. These options also include SBATCH options and things to do before calling RScript, e.g., loading modules on an HPC cluster. Usage opts_slurmR Format An object of class opts_slurmR of length 17. Details Whatever the path specified on tmp_path, all nodes should have access to it. Moreover, it is recom- mended to use a path located in a high-performing drive. See for example disk staging.