Workflow Engines: Why So Many?
Hook Hua (NASA/JPL) ESIP Informa on Technology and Interoperability Rants and Raves Webinar Series
Wednesday, April 7, 2010 Overview
1. So many workflows 2. Workflow management 3. Example workflow engines 4. Earth Science processing – Workflow pa erns – Useful features 5. So why so many?
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 2 Workflow Engines
Packet Data • Facilitates the flow of Telemetry data informa on, tasks, and Level 0 PGE Instrument science data events Level 1A PGE • Calibrated data Provides method of Level 1B PGE orchestra ng individual Resampled data execu on units Level 2 PGE • Management of control Derived geophysical parameters flow and data flow Level 3 PGE • Connects distributed Gridded data models • Codify produc on rules / policies
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 3 Increasingly being used in Earth science processing ARE THERE ANY CONSISTENTLY POPULAR WORKFLOW ENGINES IN USE?
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 4 Duopolies and Oligopolies? • “a market form in which a market or industry is dominated by a small number of sellers.”* • The four‐firm concentra on ra o – Verizon, AT&T, Sprint Nextel, and T‐Mobile – Sony Music Entertainment, Universal Music Group, Warner Music Group, and EMI – JDeveloper, Eclipse, NetBeans, and IntelliJ IDEA • Duopolies – Visa and Mastercard – Java and C# – Airbus and Boeing – Python and Ruby – ATI and Nvidia – Matlab and IDL – Intel and AMD – HDF and NetCDF – Oracle and MySQL
* h p://en.wikipedia.org/wiki/Oligopoly
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 5 What About Workflow Engines? • Freefluo • Ac veBPEL • PXE • Galaxia • An low • ruote (ruby) • Imixs IX Workflow • Apache Agila • RUNA WFE • jawflow • Apache ODE • Sarasva • JBoss jBPM • Beexee • SciFlo • JFlower • Bonita • Swish • JFolder • Bossa • Syrup • kbee.workflow • BpmScript • Taverna • MATIS (jBPM) • Carnot • Triana • Micro‐Flow • con:cern • Tobflow • Microso Windows Workflow • • Web and Flo / Kon nuum Dalma Founda on • • Werkflow Eclipse Java Workflow • ObjectWeb Bonita Tooling • WfMOpen • Open Business Engine • Modeling Workflow • Wilos • OpenSymphony OSWorkflow Engine (MWE) • Workpoint • OpenWFE • Enhydra Shark • XFlow • Pegasus • FlowMind • YAWL • Phoenix Integra on PHX • Flux • ModelCenter Zebra
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 6 Why So Many? • No general domina ng workflow engine • Most can exec processes • Many support invoking web services • Many wri en in Java • Many target business processes • Others target scien fic processes • Many support control logic • Many are deriva ves of other implementa ons
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 7 Workflow Management
Design and Implementa on of a Workflow Engine, Sebas an Bergmann, IAI‐TR‐2007‐5, September 2007.
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 8 Workflow Defini on Language Models • Dataflow model / En ty‐based – The workflow is constructed from data processing and data transport (processors and data links). – Directed graphs – Natural for scien fic workflows – E.g. Simple Conceptual Unified Flow Language (Scufl) • Process‐centric model / Ac vity‐based – The nodes in the workflows are ac vi es and the “data” passed between them form a control system rather than being a genuine flow of messages. – “State transi ons” – Natural for business processes – E.g. Business Process Execu on Language (BPEL)
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 9 Workflow Engines SOME EXAMPLES
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 10 Example BPEL‐based Workflow Engines
• Apache ODE (Orchestra on Director Engine) • OASIS WS‐BPEL 2.0 standard /compa bility for BPEL4WS 1.1
• Enterprise business process orchestra on and BPM. • Coordinate people, applica on and services
• Automate and streamline the intricate processes in the enterprise. • Torque open source resource manager • Job scheduling, File Transfer, Workflow and business process management (BPM) engine
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 11 GMU’s GeoBrain
• BPEL‐based web service chaining from web BPELPower applica on servers
GeoBrain Online • Automated data access, management, visualiza on, Analysis System analysis, and workflow composi on (GeOnAS) • Demoed automated service workflow composi on
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 12 Multi-mission Automated Task Invocation System (MATIS)
• A distributed workflow manager used for automated product genera on. • Built from jBPM (jBoss Business Process Management) – Based on BPEL • Used in JPL produc on missions – Phoenix and Diviner – Future: MCS and MSL • Consists of – a mul ‐mission core workflow component (JBoss jBPM)
– a project‐specific adapta on MATIS Monitor
BPEL Editor
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 13 Taverna Workbench
• An open source tool for designing and execu ng workflows created by the myGrid project and funded through the OMII‐UK. • Supports nes ng of workflows and parallel execu on • Vectoriza on/itera on – Dot product and cross product enumera ons • SCUFL2 • Mature • “fault tolerant” • myExperiment Collabora on • GUI workflow editor and visualiza on • API built with so ware design pa erns – E.g. enables easy adding of provenance observers/listeners
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 14 VisTrails • An open-source scientific workflow and provenance management system developed at the University of Utah that provides support for data exploration and visualization. • Emphasis on visualization and provenance • Workflow nesting • Workflow versioning • Python-centric • Academia adaptations
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 15 SciFlo
• Scien fic Dataflow • Python • Web‐based – AJAX editor • Employs a Peer‐to‐Peer (P2P) Network of Grid workflow nodes • Data & operator movement – Some mes be er to migrate processing, not data
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 16 Phoenix Integra on PHX ModelCenter • Commercial (~$30K?) • Windows‐centric • Design‐Of‐Experiments • Trade studies • Plugins connect to Excel, Matlab, Mathema ca, JMP, Pspice, etc.
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 17 Workflow Design Pa erns SOME PATTERNS FOR EARTH SCIENCE PROCESSING
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 18 Usage in Science Data Systems Generic So ware Architecture View
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 19 Handling Large Data Transfers • Keep interface of workflow connec ons light – Orchestra on engine passes data loca on, and not the data itself • Each service endpoint pulls in its own large input data
primi ve Enactor primi ve data types binary data types data pull WS WS binary WS binary data pull data pull
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 20 Configura on Not in Flow • Configura on for each workflow component should not be in workflow pipes • “lazy loading” of configura on – Each workflow component reads configura on se ngs from file • Enables modifica ons to configura on for long running workflow instances Control
Configura on Workflow Data out Data in Component
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 21 Outdated Input Se ngs
• Long run mes of PGE • Need to check configura on inputs once PGE completed in case of change. • Rerun PGE workflow component if input configura on has changed
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 22 Vectorizing Runs • Apply workflow on a sequence of data • Example: Hyperspectral retrieval itera ng through each pixel of image
geometry, atmospheric Chl a+b, H20, dry matter, canopy makeup, geometry, geometry, SNR, spectral and conditions leaf structure atmospheric clarity atmospheric conditions radiometric properties
MODTRAN PROSPECT SAIL-2 MODTRAN HyspIRI Model Retrieval
noisy spectrac H 0 solar irradiance at reflectance at top radiance at top of 2 atmosphere top of canopy leaf reflectance and of canopy transmittance Retrieval matches One workflow itera on for each pixel retrieval ground truth?
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 23 Scien fic Workflows USEFUL FEATURES FOR SCIENTIFIC WORKFLOWS
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 24 Desirements for Scien fic Workflows • Hierarchical (nested) workflows – Layered abstrac ons, modular • Vectoriza on / itera ons – Processing sequence of data flow – Analogous to vector opera ons in Matlab and IDL • Orchestra ng distributed services – SOAP, REST, OGC services, etc. • Run me WSDL and WADL introspec on • Integrated service registries discovery – UDDI, ServiceCas ng, etc.
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 25 Desirements for Scien fic Workflows • Bean shell components – “Shim” services • Collabora on – e‐Science • Seman cs • Provenance – Traceability • Reproducibility of results – “Climate‐gate” • Workflow instance callable as API
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 26 CONCLUSION
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 27 So Why So Many? • Domain‐specific workflow features – Data flow for Bioinforma cs and Earth science – Ac vity flow for business process management • Fragmented “market” – Many deriva ves of BPEL engines – Many custom adapta ons • Popular workflow engines in each domain‐ specific field. Examples: – Kepler (ecology, Ptolemy II) – Taverna (biology) – VisTrails (visualiza on) – ModelCenter (DOE)
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 28 Where We Are At / Heading To? • Mixed results with workflow‐based visual programming • Asynchronous services – WS‐Even ng and WS‐Messaging – “Jobifica on” of SOAP/REST service interfaces • Integra ng with other services – ServiceCas ng, DataCas ng, Federated OpenSearch, etc. • Collabora ve workflows – myExperiment (Taverna) – Drupal‐based Talkoot collabora on workflow (Rahul and Chris) • Seman c service and datatype ontology – ESIP and ESDSWG ac vity • Automated workflow discovery, execu on, composi on and interopera on – OWL‐Services, WS‐BPEL (legacy OASIS BPEL4WS) • Provenance, seman c web services, and Proof Markup Language (PML)
2010‐04‐07T11:00:00‐07:00 ESIP IT&I: Rants and Raves Webinar 29