DRMAA2 – an Open Standard for Job Submission and Cluster Monitoring

DRMAA2 – an Open Standard for Job Submission and Cluster Monitoring

DRMAA2 – An Open Standard for Job Submission and Cluster Monitoring DANIEL GRUBER [email protected] INTRODUCTION DRMAA in a Nutshell Why a Standard for Cluster Scheduler Access? Generic and Simple Interface No Vendor Stable Well Lock-in Interface Documented Clearly defined Simplify Protect Semantic Investments Portable Code Future Simple Save Migration Community Command Line versus Standardized API • CLI offers most flexibility but is @ght to: • DRM vendor • DRM version • CLI cons • CLI is slow: • Creates process per request • Establishes communicaon channel, authen@caon, shutdown • No syntax checking / problems with outdated scripts • CLI output is hard to parse (error code / different formats / requires parser) • API: • Well defined (simple) func@ons and output • Efficient: Usually same connec@on used during run@me DESIGN OF DRMAA2 ...a liAle Bit of History • 2006 first implementaon of DRMAA API (DRMAA1) in Sun Grid Engine and Condor • Implementaons available (not only for): • PBS, Torque, LoadLeveler, Moab, Apple Xgrid, ... • 2009 Working on DRMAA2 started (ISC) based on public survey, customer feedback, experiences, ... • 2012 Ini@al version of DRMAA2 finalized (IDL): GFD 194 • 2012 C language binding finalized: GFD 198 DRMAA versus DRMAA2 DRMAA Version 1 DRMAA Version 2 Simple API (~40 func@ons) Rich API (~100 func@ons) Job submission / job workflow support Job submission / job workflow support One job session (volale) per applicaon Mul@ple, concurrent, persistent job sessions per applicaon (only nave specificaon for job Extensible objects submission) - Advance reservaons - Cluster monitoring (machines, queues, non-DRMAA jobs) - No@on of queues, slots, machines, job classes... Several language bindings ANSI C API standardized, Go available – others in progress Widely adopted New interface Basic Structure of DRMAA2 Design Goals • Minimum set of func@ons which are supported by all major cluster scheduler: • Func@onality which is not available everywhere or has different seman@c is oponal • Example: Deadline @me • Seman@c of queues not defined, but queues are available, etc. • Relaonship to other OGF standards: OCCI-DRMAA2 mapping, SAGA, GLUE 2.0 • Defini@on in abstract interface defini@on language (IDL) • Scope of funcons / grouping of funcons • Return values / error condi@ons • Clear seman@c of func@ons JoB Session – Working with Jobs • Create named job session ß persistent • Destroy named job session ß does not affect jobs • Open exisng job session / close job session ß connec@on setup • Get all jobs of session (filter as argument) • Job state: • Job object: Get State • Job info object: Get State • Waing for job state: • WaitAnyStarted • WaitAnyTerminated JoB Session – Working with JoBs • Job submission like in DRMAA1: • Allocate job template • Run jobs / run bulk jobs using job template • With a job template you can define (incomplete): • Job: remoteCommand, args, and slots • Submission opons: rerunnable, submitAsHold, workingDirectory, jobEnvironment, jobName, queueName, startTime, accoun@ngId • Host selec'on: candidateMachines, minPhysMemory, machineOS, machineArch • Email opons: email, emailOnStarted, emailOnTerminated • Limits: resource Limits • Extensible! JoB Session – Working with Jobs • Informaon about jobs: • Job object • Jobinfo object • Job informaon object as filter: • Values of job info struct used as filter: • Job ID • Exit status • Termination signal • Job state • Job owner Methods defined on jobs: • drmaa2_j_get_id() • Slots • drmaa2_j_get_jt() • Queue • drmaa2_j_suspend() / resume() / hold() / release() / terminate() • Resource usage, … • drmaa2_j_get_state() • drmaa2_j_get_info() • drmaa2_j_wait_started() • drmaa2_j_wait_terminated() JoB Session – Working with JoBs • Example: JobInfo as filter Monitoring Session • Open / close (no create / destroy) • Get all jobs (if allowed by DRM security sengs) with filter • No job manipulaon • Job informaon / job state • Get all machines • Machine object contains sockets, cores, hw. threads, load, memory • Extensible! • Get all queues • Queue object contains name • Extensible! Error Handling • Defined as excep@ons in IDL • In C: Func@ons return error code or NULL in case of an error. Errors are stored in thread local storage to avoid issues in mul@-threaded applicaons. • int drmaa2_lasterror(void) • drmaa2_string drmaa2_lasterror_text(void) Dealing with Enhancements - OpPonal • Optional functionality: Using DrmaaCapability interface (drmaa2_supports()): Dealing with Enhancements - Extensions • Check and use data structure enhancements with the DrmaaReflec've interface • Set instance value • Get instance value LANGUAGE BINDINGS C Language Binding • OGF GFD-198 • Short (4 pages + C header) ß all seman@c in IDL • Adds high-level data structures: • Lists and Dic@onaries • Errata finalized at ISC: • Issue tracking: hp://redmine.ogf.org • Naming inconsistency (jtemplate vs jt vs job_template) • Dict keys are strings… • Unset values are part of enum • Finalize func@on Go Language Binding • Go (#golang): Easy, compiled, fast, garbage collector, corounes, closures, … • Uses cgo to access DRMAA2 C binding • Not yet a finalized standard – feedback welcome: • hmps://redmine.ogf.org/projects/drmaav2-go-binding/ repository/revisions/master/raw/drmaa2-go.pdf • Open source implementaon (Apache license): • hmps://github.com/dgruber/drmaa2 • Example applicaons: • Simple mul@-clustering tool (implements minimalis@c web service API): hmp://github.com/dgruber/ubercluster OUTLOOK Give it a try ... • Example implementaon (wrapping OS calls) of DRMAA2: • hmps://github.com/troeger/drmaav2-mock • DRMAA2 included in Univa Grid Engine 48 core limited free downloadable version (www.univa.com) • Contains all man pages and some examples • For compability and feature checks • Vagrant installaon recipe available: hmps://github.com/dgruber/vagrantGridEngine DRMAA Working Group – Your Input is Required • Hard work is done (syntax and seman@c) • Now we need your DRMAA2 implementa'on! • or create other language bindings based on C bindings! • OGF: hp://www.ogf.org • Working group: hmp://www.drmaa.org • Join (low traffic) mailing list: hmps://www.ogf.org/mailman/lis@nfo/drmaa-wg Thank you very much for your aen@on! Next event: Meet us at ISC in Frankfurt Feel free to contact me here at HEPiX or at [email protected] .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us