
DRMAA2 – An Open Standard for Job Submission and Cluster Monitoring DANIEL GRUBER [email protected] INTRODUCTION DRMAA in a Nutshell Why a Standard for Cluster Scheduler Access? Generic and Simple Interface No Vendor Stable Well Lock-in Interface Documented Clearly defined Simplify Protect Semantic Investments Portable Code Future Simple Save Migration Community Command Line versus Standardized API • CLI offers most flexibility but is @ght to: • DRM vendor • DRM version • CLI cons • CLI is slow: • Creates process per request • Establishes communicaon channel, authen@caon, shutdown • No syntax checking / problems with outdated scripts • CLI output is hard to parse (error code / different formats / requires parser) • API: • Well defined (simple) func@ons and output • Efficient: Usually same connec@on used during run@me DESIGN OF DRMAA2 ...a liAle Bit of History • 2006 first implementaon of DRMAA API (DRMAA1) in Sun Grid Engine and Condor • Implementaons available (not only for): • PBS, Torque, LoadLeveler, Moab, Apple Xgrid, ... • 2009 Working on DRMAA2 started (ISC) based on public survey, customer feedback, experiences, ... • 2012 Ini@al version of DRMAA2 finalized (IDL): GFD 194 • 2012 C language binding finalized: GFD 198 DRMAA versus DRMAA2 DRMAA Version 1 DRMAA Version 2 Simple API (~40 func@ons) Rich API (~100 func@ons) Job submission / job workflow support Job submission / job workflow support One job session (volale) per applicaon Mul@ple, concurrent, persistent job sessions per applicaon (only nave specificaon for job Extensible objects submission) - Advance reservaons - Cluster monitoring (machines, queues, non-DRMAA jobs) - No@on of queues, slots, machines, job classes... Several language bindings ANSI C API standardized, Go available – others in progress Widely adopted New interface Basic Structure of DRMAA2 Design Goals • Minimum set of func@ons which are supported by all major cluster scheduler: • Func@onality which is not available everywhere or has different seman@c is oponal • Example: Deadline @me • Seman@c of queues not defined, but queues are available, etc. • Relaonship to other OGF standards: OCCI-DRMAA2 mapping, SAGA, GLUE 2.0 • Defini@on in abstract interface defini@on language (IDL) • Scope of funcons / grouping of funcons • Return values / error condi@ons • Clear seman@c of func@ons JoB Session – Working with Jobs • Create named job session ß persistent • Destroy named job session ß does not affect jobs • Open exisng job session / close job session ß connec@on setup • Get all jobs of session (filter as argument) • Job state: • Job object: Get State • Job info object: Get State • Waing for job state: • WaitAnyStarted • WaitAnyTerminated JoB Session – Working with JoBs • Job submission like in DRMAA1: • Allocate job template • Run jobs / run bulk jobs using job template • With a job template you can define (incomplete): • Job: remoteCommand, args, and slots • Submission opons: rerunnable, submitAsHold, workingDirectory, jobEnvironment, jobName, queueName, startTime, accoun@ngId • Host selec'on: candidateMachines, minPhysMemory, machineOS, machineArch • Email opons: email, emailOnStarted, emailOnTerminated • Limits: resource Limits • Extensible! JoB Session – Working with Jobs • Informaon about jobs: • Job object • Jobinfo object • Job informaon object as filter: • Values of job info struct used as filter: • Job ID • Exit status • Termination signal • Job state • Job owner Methods defined on jobs: • drmaa2_j_get_id() • Slots • drmaa2_j_get_jt() • Queue • drmaa2_j_suspend() / resume() / hold() / release() / terminate() • Resource usage, … • drmaa2_j_get_state() • drmaa2_j_get_info() • drmaa2_j_wait_started() • drmaa2_j_wait_terminated() JoB Session – Working with JoBs • Example: JobInfo as filter Monitoring Session • Open / close (no create / destroy) • Get all jobs (if allowed by DRM security sengs) with filter • No job manipulaon • Job informaon / job state • Get all machines • Machine object contains sockets, cores, hw. threads, load, memory • Extensible! • Get all queues • Queue object contains name • Extensible! Error Handling • Defined as excep@ons in IDL • In C: Func@ons return error code or NULL in case of an error. Errors are stored in thread local storage to avoid issues in mul@-threaded applicaons. • int drmaa2_lasterror(void) • drmaa2_string drmaa2_lasterror_text(void) Dealing with Enhancements - OpPonal • Optional functionality: Using DrmaaCapability interface (drmaa2_supports()): Dealing with Enhancements - Extensions • Check and use data structure enhancements with the DrmaaReflec've interface • Set instance value • Get instance value LANGUAGE BINDINGS C Language Binding • OGF GFD-198 • Short (4 pages + C header) ß all seman@c in IDL • Adds high-level data structures: • Lists and Dic@onaries • Errata finalized at ISC: • Issue tracking: hp://redmine.ogf.org • Naming inconsistency (jtemplate vs jt vs job_template) • Dict keys are strings… • Unset values are part of enum • Finalize func@on Go Language Binding • Go (#golang): Easy, compiled, fast, garbage collector, corounes, closures, … • Uses cgo to access DRMAA2 C binding • Not yet a finalized standard – feedback welcome: • hmps://redmine.ogf.org/projects/drmaav2-go-binding/ repository/revisions/master/raw/drmaa2-go.pdf • Open source implementaon (Apache license): • hmps://github.com/dgruber/drmaa2 • Example applicaons: • Simple mul@-clustering tool (implements minimalis@c web service API): hmp://github.com/dgruber/ubercluster OUTLOOK Give it a try ... • Example implementaon (wrapping OS calls) of DRMAA2: • hmps://github.com/troeger/drmaav2-mock • DRMAA2 included in Univa Grid Engine 48 core limited free downloadable version (www.univa.com) • Contains all man pages and some examples • For compability and feature checks • Vagrant installaon recipe available: hmps://github.com/dgruber/vagrantGridEngine DRMAA Working Group – Your Input is Required • Hard work is done (syntax and seman@c) • Now we need your DRMAA2 implementa'on! • or create other language bindings based on C bindings! • OGF: hp://www.ogf.org • Working group: hmp://www.drmaa.org • Join (low traffic) mailing list: hmps://www.ogf.org/mailman/lis@nfo/drmaa-wg Thank you very much for your aen@on! Next event: Meet us at ISC in Frankfurt Feel free to contact me here at HEPiX or at [email protected] .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages25 Page
-
File Size-