Condor Via Developer Apis/Plugins
Total Page:16
File Type:pdf, Size:1020Kb
Extend/alter Condor via developer APIs/plugins CERN Feb 14 2011 Todd Tannenbaum Condor Project Computer Sciences Department University of Wisconsin-Madison Some classifications Application Program Interfaces (APIs) › Job Control › Operational Monitoring Extensions 2 www.cs.wisc.edu/Condor Job Control APIs The biggies: › Command Line Tools › DRMAA › Condor DBQ › Web Service Interface (SOAP) http://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=SoapWisdom 3 www.cs.wisc.edu/Condor Command Line Tools › Don’t underestimate them! › Your program can create a submit file on disk and simply invoke condor_submit: system(“echo universe=VANILLA > /tmp/condor.sub”); system(“echo executable=myprog >> /tmp/condor.sub”); . system(“echo queue >> /tmp/condor.sub”); system(“condor_submit /tmp/condor.sub”); 4 www.cs.wisc.edu/Condor Command Line Tools › Your program can create a submit file and give it to condor_submit through stdin: PERL: fopen(SUBMIT, “|condor_submit”); print SUBMIT “universe=VANILLA\n”; . C/C++: int s = popen(“condor_submit”, “r+”); write(s, “universe=VANILLA\n”, 17/*len*/); . 5 www.cs.wisc.edu/Condor Command Line Tools › Using the +Attribute with condor_submit: universe = VANILLA executable = /bin/hostname output = job.out log = job.log +webuser = “zmiller” queue 6 www.cs.wisc.edu/Condor Command Line Tools › Use -constraint and –format with condor_q: % condor_q -constraint 'webuser=="zmiller"' -- Submitter: bio.cs.wisc.edu : <128.105.147.96:37866> : bio.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 213503.0 zmiller 10/11 06:00 0+00:00:00 I 0 0.0 hostname % condor_q -constraint 'webuser=="zmiller"' -format "%i\t" ClusterId -format "%s\n" Cmd 213503 /bin/hostname 7 www.cs.wisc.edu/Condor Command Line Tools › condor_wait will watch a job log file and wait for a certain (or all) jobs to complete: system(“condor_wait job.log”); › can specify a timeout 8 www.cs.wisc.edu/Condor Command Line Tools › condor_q and condor_status –xml option › So it is relatively simple to build on top of Condor’s command line tools alone, and can be accessed from many different languages (C, PERL, python, PHP, etc). › However… 9 www.cs.wisc.edu/Condor DRMAA › DRMAA is a OGF standardized job- submission API › Has C (and now Java) bindings › Is not Condor-specific › SourceForge Project http://sourceforge.net/projects/condor-ext 10 www.cs.wisc.edu/Condor DRMAA › Unfortunately, the DRMAA 1.x API does not support some very important features, such as: Fault tolerance Transactions 11 www.cs.wisc.edu/Condor Condor Database Queue (Condor DBQ) › Layer on top of Condor › Relational database interface to Submit work to Condor Monitor status of submission Monitor status of individual jobs › Perfect for applications that Submit jobs to Condor Already use a database 12 www.cs.wisc.edu/Condor Web App Before Condor Submit Job Web (SOAPDBQ or cmd line interface) Application Schedd Check Status (jobCrash!!! log file, R/W SOAP, or cmd line app Youinterface) did implement data Condor two phase commit Pool and recovery,Non- User to log Trivia DBMS getl Code run once App semantics, right? tables 13 www.cs.wisc.edu/Condor Web App After Condor Web DBQ Application Schedd • Single SQL statements Chec R/W • Submi k app Transacti t Job Statu data onal Condor s Pool User log DBMS Submit Get Job Job Updates App work job (cmd line) tables table table Check New condor_dbq 14 Work www.cs.wisc.edu/Condor Update Status Benefits of Condor DBQ › Natural simple SQL API Submit work insert into work values(condor-submit- file) Check status select * from jobs where work_id = id › Transactions/Consistency comes for free › DBMS performs crash recovery 15 www.cs.wisc.edu/Condor Condor DBQ Limitations › Overrides log file location › All jobs submitted as same user › Dagman not supported › Only Vanilla and Standard universe jobs supported (others are unknown) › Currently only supports PostgreSQL 16 www.cs.wisc.edu/Condor Web Service Interface › Simple Object Access Protocol Mechanism for doing RPC using XML (typically over HTTP or HTTPS) A World Wide Web Consortium (W3C) standard › SOAP Toolkit: Transform a WSDL to a client library 17 www.cs.wisc.edu/Condor Benefits of a Condor SOAP API › Can be accessed with standard web service tools › Condor accessible from platforms where its command-line tools are not supported › Talk to Condor with your favorite language and SOAP toolkit 18 www.cs.wisc.edu/Condor Condor SOAP API functionality › Get basic daemon info (version, platform) › Submit jobs › Retrieve job output › Remove/hold/release jobs › Query machine status › Advertise resources › Query job status 19 www.cs.wisc.edu/Condor Getting machine status via SOAP Your program condor_collector queryStartdAds() Machine List SOAP library SOAP over HTTP 20 www.cs.wisc.edu/Condor Some classifications Application Program Interfaces (APIs) › Job Control › Operational Monitoring Extensions 21 www.cs.wisc.edu/Condor Operational Monitoring APIs › Via Web Services (SOAP) › Via Relational Database: Quill Job, Machine, and Matchmaking data echoed into PostgreSQL RDBMS › Via a file: the Event Log Just like the job log, but has events for all jobs submitted to a schedd. Structured journal of job events Sample code in C++ to read/parse these events › Via Enterprise messaging: Condor AMQP Event Log events echoed into Qpid, an AMQP message broker in a highly reliable manner https://condor- wiki.cs.wisc.edu/index.cgi/wiki?p=CondorPigeon 22 www.cs.wisc.edu/Condor Some classifications Application Program Interfaces (APIs) › Job Control › Operational Monitoring Extensions 23 www.cs.wisc.edu/Condor Extending Condor › APIs: How to interface w/ Condor › Extensions: Changing Condor’s behavior Hooks Plugins 24 www.cs.wisc.edu/Condor Job Wrapper Hook › Allows an administrator to specify a “wrapper” script to handle the execution of all user jobs › Set via condor_config “USER_JOB_WRAPPER” › Wrapper runs as the user, command-line args are passed, machine & job ad is available. › Errors can be propagated to the user. › Example: condor_limits_wrapper.sh 25 www.cs.wisc.edu/Condor Job Fetch & Prepare Hooks › Job Fetch hooks Call outs from the condor_startd Extend claiming Normally jobs are pushed from schedd to startd – now jobs can be “pulled” from anywhere › Job Running Hooks Call outs from the condor_starter Transform the job classad Perform any other pre/post logic 26 www.cs.wisc.edu/Condor Sidebar: “Toppings” If work arrived via fetch hook “foo”, then prepare hooks “foo” will be used. What if an individual job could specify a job prepare hook to use??? Prepare hook to use can be alternatively specified in job classad via attribute “HookKeyword” How cool is that??? 35 www.cs.wisc.edu/Condor Toppings: Simple Example › In condor_config: ANSYS_HOOK_PREPARE_JOB= \ $(LIBEXEC)/ansys_prepare_hook.sh › Contents of ansys_prepare_hook.sh: #!/bin/sh #Read and discard the job classad cat >/dev/null echo'Cmd="/usr/local/bin/ansys"' 36 www.cs.wisc.edu/Condor Topping Example, cont. › In job submit file: universe=vanilla executable=whatever arguments=… +HookKeyword=“ANSYS" queue 37 www.cs.wisc.edu/Condor Configuration Hook › Instead of reading from a file, run a program to generate Condor config settings › Append “|” to CONDOR_CONFIG or LOCAL_CONFIG_FILE. Example: LOCAL_CONFIG_FILE = \ /opt/condor/sbin/make_config 39 www.cs.wisc.edu/Condor File Transfer Hooks › Allows the administrator to configure hooks for handling URLs during Condor's file transfer › Enables transfer from third party directly to execute machine, which can offload traffic from the submit point › Can be used in a number of clever ways www.cs.wisc.edu/Condor File Transfer Hooks › API is extremely simple › Must support being invoked with the “-classad” option to advertise its abilities: #!/bin/env perl if ($ARGV[0] eq "-classad") { print "PluginType = \"FileTransfer\"\n"; print "SupportedMethods = \"http,ftp,file\"\n"; exit 0; } www.cs.wisc.edu/Condor File Transfer Hooks › When invoked normally, a plugin simply transfers the URL (first argument) into filename (second argument) # quoting could be an issue but this runs in user space $cmd = "curl " . $ARGV[0] . " -o " . $ARGV[1]; system($cmd); $retval = $?; exit $retval; www.cs.wisc.edu/Condor File Transfer Hooks › In the condor_config file, the administrator lists the transfer hooks that can be used › Condor invokes each one to find out its abilities › If something that looks like a URL is added to the list of input files, the plugin is invoked on the execute machine www.cs.wisc.edu/Condor File Transfer Hooks › condor_config: FILETRANSFER_PLUGINS = curl_plugin, hdfs_plugin, gdotorg_plugin, rand_plugin › Submit file: transfer_input_files = normal_file, http://cs.wisc.edu/~zkm/data_file, rand://1024/random_kilobyte www.cs.wisc.edu/Condor File Transfer Hooks › As you can see, the format of the URL is relatively arbitrary and is interpreted by the hook › This allows for tricks like rand://, blastdb://, data://, etc. www.cs.wisc.edu/Condor Plugins 46 www.cs.wisc.edu/Condor Plugins › Shared Library Plugins Gets mapped right into the process space of the Condor Services! May not block! Must be thread safe! General and ClassAd Functions › Condor ClassAd Function Plugin Add custom built-in functions to the ClassAd Language. Via condor_config “CLASSAD_LIB_PATH” Cleverly used by SAMGrid 47 www.cs.wisc.edu/Condor General Plugins › In condor_config, use “PLUGINS” or “PLUGIN_DIR”. › Very good idea to do: SUBSYSTEM.PLUGIN or SUBSYSTEM.PLUGIN_DIR › Implement C++ child class, and Condor will call methods at the appropriate times. › Some