Apache Taverna

hp://taverna.incubator.apache.org/

Donal Fellows San Soiland-Reyes @donalfellows @soilandreyes [email protected] [email protected] hp://orcid.org/0000-0002-9091-5938 hp://orcid.org/0000-0001-9842-9718 Alan Williams @alanrw [email protected] hp://orcid.org/0000-0003-3156-2105

This work is licensed under a Creave Commons Collaboraons Workshop 2015-03-26 Aribuon 3.0 Unported License. Taverna Workflow Ecosystem

• Workflow Language — SCUFL2 (and t2flow) • Workflow Engine — Taverna • Used in… – Taverna Command Line Tool – Taverna Server – Taverna Workbench • Allied services – myExperiment, workflow repository – Service Catalographer, service catalog software • Instantiated as BioCatalogue, BiodiversityCatalogue, …

NERSC Workflow Day 2 Map of the Taverna Ecosystem Taverna Ruby client Taverna Online Applicaon-Specific Portals Player UI UI Plugins UI Plugins Plugins Taverna UI Plugins Lite Taverna Taverna Workbench UI Plugins Command UI Plugins Line Tool

REST API Components Taverna Other TavernaUI Plugins APIs Server Servers

Taverna SOAP API Taverna Core Acvity Engine UI Ps UI Plugins Plugins Workflow Service many Repository Catalogs services… 3 Users, Scientific Areas, Projects Taverna In Use

NERSC Workflow Day 4 Taverna Users Worldwide

NERSC Workflow Day 5 Taverna Uses — Scientific Areas

• Biodiversity — BioVeL project • — SCAPE project • Astronomy — AstroTaverna product • Solar Wind Physics — HELIO project • In silico Medicine — VPH-Share project

NERSC Workflow Day 6 Biodiversity: BioVeL

• Virtual e-Laboratory for Biodiversity – Service and knowledge commons – Supporting biodiversity research – Integrating with third-party applications • For example, iPython Notebook • Portal for running production- grade workflows on users’ data – Powered by Taverna Server – Integration with major biodiversity databases – Interaction support made to support

NERSC Workflow Day 7 Digital Preservation: SCAPE • Automated petabyte-scale digital collection maintenance – Century of scanned newspapers – Whole national radio/TV output – Major Web archives • Processing engine powered by Taverna – Lift simple workflows to work at collection level – Metadata management – Semantic annotations and components for guided workflow construction

NERSC Workflow Day 8 Astronomy: AstroTaverna • Taverna plugin: IVOA (Virtual Observatory) – Astronomy data services and tools • Example workflow: – List of galaxy names → Look up VO properties → Find similar/near galaxies → Add bibliography • VOTable support (select/merge/split/..) – Later adapted by community • Projects: CANUBE, Wf4Ever, VAMDC, ER- Flow • Taverna Workbench used on the desktop: – IVOA service registry user interface – Integrated with standalone astronomy tools (SAMPS protocol): Aladin, TOPCAT

NERSC Workflow Day 9 Astrophysics: HELIO

• Virtual laboratory for Solar Wind Science – Observation catalogs – Processing – Data integration platform • Taverna is workflow glue – Taverna Server created to support – Workflows manage catalog access – Workflows manage data processing

NERSC Workflow Day 10 Medicine and Physiology: VPH-Share • Platform for computer-aided medicine – Support for diagnosis and treatment prognosis • Osteoarthritis, Dementia, Liver disease, Cardiovascular disease – Driven by specially-configured cloud instances • Taverna is control and data management layer – Coordinates processing within cloud instances – User communication with cloud instances via Taverna interactions • Including complex 3D tasks

NERSC Workflow Day 11 Introduction to the Taverna Workflow Language and its Executors Inside the Taverna Ecosystem

NERSC Workflow Day 12 The Basics of a Taverna Workflow

Input Ports (data in)

SOAP processor ( call)

XML handling processors

Data Links (connect processors)

Output Ports (data out)

Get concept suggesons from term Eelke van der Horst NERSC Workflow Day 13 hp://www.myexperiment.org/workflows/4590.html Taverna Workflows

• Describe how data flows between processing nodes – Control dependencies also supported • Processing service nodes of various kinds – Invoke programs/scripts (local or on cluster or grid or …) – Call remote services (SOAP or REST) – Call R scripts (local or remote) – Read from and write to databases • BioMart support – Transfer data – Interact with the user in a browser • Built-in parallelism and iteration – Processes lists of data in parallel • Large data usually handled by reference – Avoids having to transfer it where not necessary

NERSC Workflow Day 14 Taverna Workbench

• IDE for Taverna Workflows • Design workflows • Run workflows • Analyze workflows • Access workflow repository

NERSC Workflow Day 15 Taverna Workflows can get complex…

BioVeL Populaon Model Construcon and Analysis Maria Paula Balcázar-Vargas, Jonathan Giddy and Gerard Oostermeijer hp://www.myexperiment.org/workflows/3684.html

NERSC Workflow Day 16 Managing Workflow Complexity

• Subworkflows – Put smaller workflows within larger ones – Like using a user-defined function in a programming language – Can hide contents of subworkflow • Components – “Black box” (but implemented with subworkflow) – Semantically-annotated; described behaviour – Like using a library in a programming language

NERSC Workflow Day 17 Taverna Engine

• Executes (“enacts”) Taverna Workflows • Pushes data through system in parallel – Subject to limits described in workflow • Processor nodes invoked when their data becomes available – Turn inputs into outputs • Captures detailed trace of what happened (“”) – Follows W3C PROV specification

NERSC Workflow Day 18 Taverna Command Line Tool

• Simple wrapper round Taverna Workflow Engine • Inputs as simple files • Outputs as directory structure • Provenance packaged in Research Object – ZIP Archive – Inputs, Outputs, Intermediate values – Workflow, Provenance, Overall metadata

NERSC Workflow Day 19 Taverna Server

• Extends Workflow Engine to work for multiple simultaneous users – Isolates workflows from each other – Allows asynchronous usage – Manages resources – Clients can be in any language, not just Java • Designed to sit behind a Portal – User interfaces are domain-specific

NERSC Workflow Day 20 Taverna Server Architecture

Deployment Tomcat Container + CXF Framework Host

Processing Taverna Server Service Web Webapp Portal Common System Model Catalog Common Services Management Model Per User File Manager Per-Run Taverna Workflow Engine (forthcoming) Storage Taverna Workbench Ruby Services Client

Selected Management NotificationNERSC Workflow Day Interface 21 Endpoints (separate auth) Taverna Online Web IDE for Taverna

NERSC Workflow Day 22 Apache Taverna and Future Releases The Future of Taverna

NERSC Workflow Day 23 • Non-profit organization, forming a community of open-source software projects. • Strong emphasis on openness, collaboration and a consensus-based development process. • Examples: – Apache HTTP Server, Tomcat, Maven, Hadoop, OpenOffice, Subversion

NERSC Workflow Day 24 Why Apache Taverna?

• Open development: Everything on mailing list • Engagement: Encourage developer involvement – not just making plugins • Independence: Apache Taverna is an independent project – Not a “Manchester thing” • Shared ownership: equal participation • Sustainability: self-managed community

NERSC Workflow Day 25 Gradually becoming an Apache project • Intellectual Property assigned to ASF – License changed to 2.0 • Infrastructure change – everything at *.apache.org • Community building – growing developer base • Mentoring on the “Apache Way” by volunteers from other Apache projects

NERSC Workflow Day 26 Taverna Releases

• Current stable release: Taverna 2.5 – Command Line (2.5.1), Server (2.5.4), Workbench (2.5.1) • http://www.taverna.org.uk/download/ • Taverna 3 Release plan: – Apache Taverna Language • API for workflow definitions – Apache Taverna Engine & Command Line • Can also run workflows from Taverna 2 Workbench – Apache Taverna Server – Apache Taverna Workbench

NERSC Workflow Day 27 Try Taverna!

• Get Taverna: – http://taverna.org.uk/download/ • Documentation: – http://www.taverna.org.uk/documentation/ taverna-2-x/ • Code: – http://taverna.incubator.apache.org/code/ • Getting involved: – http://taverna.incubator.apache.org/community/

NERSC Workflow Day 28