Workflow Engines: Why So Many?

Hook Hua (NASA/JPL) ESIP Informaon Technology and Interoperability Rants and Raves Webinar Series

Wednesday, April 7, 2010 Overview

1. So many workflows 2. Workflow management 3. Example workflow engines 4. Earth Science processing – Workflow paerns – Useful features 5. So why so many?

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 2 Workflow Engines

Packet Data • Facilitates the flow of Telemetry data informaon, tasks, and Level 0 PGE Instrument science data events Level 1A PGE • Calibrated data Provides method of Level 1B PGE orchestrang individual Resampled data execuon units Level 2 PGE • Management of control Derived geophysical parameters flow and data flow Level 3 PGE • Connects distributed Gridded data models • Codify producon rules / policies

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 3 Increasingly being used in Earth science processing ARE THERE ANY CONSISTENTLY POPULAR WORKFLOW ENGINES IN USE?

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 4 Duopolies and Oligopolies? • “a market form in which a market or industry is dominated by a small number of sellers.”* • The four‐firm concentraon rao – Verizon, AT&T, Sprint Nextel, and T‐Mobile – Sony Music Entertainment, Universal Music Group, Warner Music Group, and EMI – JDeveloper, Eclipse, NetBeans, and IntelliJ IDEA • Duopolies – Visa and Mastercard – Java and C# – Airbus and Boeing – Python and Ruby – ATI and Nvidia – Matlab and IDL – Intel and AMD – HDF and NetCDF – Oracle and MySQL

* hp://en.wikipedia.org/wiki/Oligopoly

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 5 What About Workflow Engines? • Freefluo • AcveBPEL • PXE • Galaxia • Anlow • ruote (ruby) • Imixs IX Workflow • Apache Agila • RUNA WFE • jawflow • Apache ODE • Sarasva • JBoss jBPM • Beexee • SciFlo • JFlower • Bonita • Swish • JFolder • Bossa • Syrup • kbee.workflow • BpmScript • Taverna • MATIS (jBPM) • Carnot • Triana • Micro‐Flow • con:cern • Tobflow • Microso Windows Workflow • • Web and Flo / Konnuum Dalma Foundaon • • Werkflow Eclipse Java Workflow • ObjectWeb Bonita Tooling • WfMOpen • Open Business Engine • Modeling Workflow • Wilos • OpenSymphony OSWorkflow Engine (MWE) • Workpoint • OpenWFE • Enhydra Shark • XFlow • Pegasus • FlowMind • YAWL • Phoenix Integraon PHX • Flux • ModelCenter Zebra

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 6 Why So Many? • No general dominang workflow engine • Most can exec processes • Many support invoking web services • Many wrien in Java • Many target business processes • Others target scienfic processes • Many support control logic • Many are derivaves of other implementaons

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 7 Workflow Management

Design and Implementaon of a Workflow Engine, Sebasan Bergmann, IAI‐TR‐2007‐5, September 2007.

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 8 Workflow Definion Language Models • Dataflow model / Enty‐based – The workflow is constructed from data processing and data transport (processors and data links). – Directed graphs – Natural for scienfic workflows – E.g. Simple Conceptual Unified Flow Language (Scufl) • Process‐centric model / Acvity‐based – The nodes in the workflows are acvies and the “data” passed between them form a control system rather than being a genuine flow of messages. – “State transions” – Natural for business processes – E.g. Business Process Execuon Language (BPEL)

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 9 Workflow Engines SOME EXAMPLES

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 10 Example BPEL‐based Workflow Engines

• Apache ODE (Orchestraon Director Engine) • OASIS WS‐BPEL 2.0 standard /compability for BPEL4WS 1.1

• Enterprise business process orchestraon and BPM. • Coordinate people, applicaon and services

• Automate and streamline the intricate processes in the enterprise. • Torque open source resource manager • Job scheduling, File Transfer, Workflow and business process management (BPM) engine

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 11 GMU’s GeoBrain

• BPEL‐based chaining from web BPELPower applicaon servers

GeoBrain Online • Automated data access, management, visualizaon, Analysis System analysis, and workflow composion (GeOnAS) • Demoed automated service workflow composion

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 12 Multi-mission Automated Task Invocation System (MATIS)

• A distributed workflow manager used for automated product generaon. • Built from jBPM (jBoss Business Process Management) – Based on BPEL • Used in JPL producon missions – Phoenix and Diviner – Future: MCS and MSL • Consists of – a mul‐mission core workflow component (JBoss jBPM)

– a project‐specific adaptaon MATIS Monitor

BPEL Editor

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 13 Taverna Workbench

• An open source tool for designing and execung workflows created by the myGrid project and funded through the OMII‐UK. • Supports nesng of workflows and parallel execuon • Vectorizaon/iteraon – Dot product and cross product enumeraons • SCUFL2 • Mature • “fault tolerant” • myExperiment Collaboraon • GUI workflow editor and visualizaon • API built with soware design paerns – E.g. enables easy adding of provenance observers/listeners

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 14 VisTrails • An open-source scientific workflow and provenance management system developed at the University of Utah that provides support for data exploration and visualization. • Emphasis on visualization and provenance • Workflow nesting • Workflow versioning • Python-centric • Academia adaptations

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 15 SciFlo

• Scienfic Dataflow • Python • Web‐based – AJAX editor • Employs a Peer‐to‐Peer (P2P) Network of Grid workflow nodes • Data & operator movement – Somemes beer to migrate processing, not data

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 16 Phoenix Integraon PHX ModelCenter • Commercial (~$30K?) • Windows‐centric • Design‐Of‐Experiments • Trade studies • Plugins connect to Excel, Matlab, Mathemaca, JMP, Pspice, etc.

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 17 Workflow Design Paerns SOME PATTERNS FOR EARTH SCIENCE PROCESSING

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 18 Usage in Science Data Systems Generic Soware Architecture View

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 19 Handling Large Data Transfers • Keep interface of workflow connecons light – Orchestraon engine passes data locaon, and not the data itself • Each service endpoint pulls in its own large input data

primive Enactor primive data types binary data types data pull WS WS binary WS binary data pull data pull

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 20 Configuraon Not in Flow • Configuraon for each workflow component should not be in workflow pipes • “lazy loading” of configuraon – Each workflow component reads configuraon sengs from file • Enables modificaons to configuraon for long running workflow instances Control

Configuraon Workflow Data out Data in Component

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 21 Outdated Input Sengs

• Long runmes of PGE • Need to check configuraon inputs once PGE completed in case of change. • Rerun PGE workflow component if input configuraon has changed

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 22 Vectorizing Runs • Apply workflow on a sequence of data • Example: Hyperspectral retrieval iterang through each pixel of image

geometry, atmospheric Chl a+b, H20, dry matter, canopy makeup, geometry, geometry, SNR, spectral and conditions leaf structure atmospheric clarity atmospheric conditions radiometric properties

MODTRAN PROSPECT SAIL-2 MODTRAN HyspIRI Model Retrieval

noisy spectrac H 0 solar irradiance at reflectance at top radiance at top of 2 atmosphere top of canopy leaf reflectance and of canopy transmittance Retrieval matches One workflow iteraon for each pixel retrieval ground truth?

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 23 Scienfic Workflows USEFUL FEATURES FOR SCIENTIFIC WORKFLOWS

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 24 Desirements for Scienfic Workflows • Hierarchical (nested) workflows – Layered abstracons, modular • Vectorizaon / iteraons – Processing sequence of data flow – Analogous to vector operaons in Matlab and IDL • Orchestrang distributed services – SOAP, REST, OGC services, etc. • Runme WSDL and WADL introspecon • Integrated service registries discovery – UDDI, ServiceCasng, etc.

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 25 Desirements for Scienfic Workflows • Bean shell components – “Shim” services • Collaboraon – e‐Science • Semancs • Provenance – Traceability • Reproducibility of results – “Climate‐gate” • Workflow instance callable as API

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 26 CONCLUSION

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 27 So Why So Many? • Domain‐specific workflow features – Data flow for Bioinformacs and Earth science – Acvity flow for business process management • Fragmented “market” – Many derivaves of BPEL engines – Many custom adaptaons • Popular workflow engines in each domain‐ specific field. Examples: – Kepler (ecology, Ptolemy II) – Taverna (biology) – VisTrails (visualizaon) – ModelCenter (DOE)

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 28 Where We Are At / Heading To? • Mixed results with workflow‐based visual programming • Asynchronous services – WS‐Evenng and WS‐Messaging – “Jobificaon” of SOAP/REST service interfaces • Integrang with other services – ServiceCasng, DataCasng, Federated OpenSearch, etc. • Collaborave workflows – myExperiment (Taverna) – Drupal‐based Talkoot collaboraon workflow (Rahul and Chris) • Semanc service and datatype ontology – ESIP and ESDSWG acvity • Automated workflow discovery, execuon, composion and interoperaon – OWL‐Services, WS‐BPEL (legacy OASIS BPEL4WS) • Provenance, semanc web services, and Proof Markup Language (PML)

2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 29