http://www.mygrid.org.uk/

http://www.taverna.org.uk/ Taverna

Robert Haines, Stian Soiland-Reyes myGrid, [email protected] http://orcid.org/0000-0002-9538-7919

IS-ENES2 workshop on workflows, Hamburg, 2014-06-03

This work is licensed under a Creative Commons Attribution 3.0 Unported License Taverna in Context

• Comprehensive Scientific Workflow Management System + auxiliary tools/repositories • Based at Manchester with multiple contributions and collaborations • Releases: Three major; numerous rolling intermediate. First release 2004. • Downloads: 90,000+ cumulative; 1000 in first month per intermediate release; user audit for May 2013 had 900+ unique addresses use Taverna • Users: ~380 sites and institutions have used or use Taverna • Support: mailing list, community list and Jira

Taverna workflows • Sophisticated analysis pipelines

• A set of services to analyze or manage data (local or remote) • Workflows run through the workbench or via a server • Automation of data flow through services • Control of service invocation • Iteration over data sets • Provenance collection • Extensible and open source Taverna workflows

• Dataflow – Graphically connect data between drag-and- dropped services

• Service types – REST, SOAP, Command Line, web interactions, scripts (R, Python, Beanshell) – Domain-specific plugins – Your tool? Taverna workflows

• Nested workflows

• Components – Reusable and inter-compatible workflow fragments – Grouped into families – Semantically annotated – Curated Application RepositoriesRepositories Application

Runtime MiddlewareRuntime Middleware

Execution Activity Plug-ins

Resources/Codes/ServicesResources/Codes/Services Infrastructures

Platforms Registries Application RepositoriesRepositories Application Portals and Applications Taverna Desktop Workbench Taverna Online Workflows & workflow Web Tool components

Runtime Provenance Runtime Middleware Provenance Middleware Player PROV, OPM

Third Party Engine Server Cmd line Servers Data Execution Activity Plug-ins

Resources/Codes/ServicesResources/Codes/Services Infrastructures BioSTIF

Platforms Registries Workflow Clients for People Taverna Concept Knowledge Domain

Technical Computational Domain Scientist Scientist

Workbench Components

Java library Ruby Gem CmdLine Player High Workflow Visibility Low Simulation characteristics

• Platform • Data • Incorporating codes/services • Scale • Parameter and data sweeps • Interacting • Reporting Biodiversity marine monitoring and health assessment ecological niche modelling Enclosed sea problem Pilumnus hirtellus (Ready et al., 2010)

Data Intensive Science Collaborative Science

Sarah Bourlat Data collection

Data discovery

Data assembly, cleaning, and refinement

Ecological Niche Modeling

Statistical analysis

Insights Scholarly Communication & Reporting

Analytical cycle www.biovel.eu

Ecological Niche Modeling Workflow (ENM) data configuration parameters

steps

Data and Parameter Sweeps VPH-Share @neurist Aneurysm Morphology Workflow

Patients Patient Avatar Disease Simulation Workflow Patient Avatar Systemic Factors updated

RISK

Gene Expression Profile

http://www.vph-share.eu/ Implementation in VPH-Share The @neurIST morphological workflow specification in Taverna

http://www.vph-share.eu/ Make Deploy

Taverna Workbench Taverna Player Taverna Server

Integrate and run Save and share

Workflow repository

VPH-Share Analysis portals Commandline Taverna (batch) Taverna Workbench

• Desktop application • GUI • Plug-in Framework • Themed editions • Intermediate results views • Search for Web Services in catalogues • Search and publish to myExperiment Web apps to create and run workflows Taverna Online • Dr Vadim Surpin and Vitaly Sharanutsa – Institute for Information Transmission Problems of Russian Academy of Sciences (IITP RAS)

• An online, in-browser application for assembling and running Taverna Workflows over a HPC platform Web apps to create and run workflows Service Chaining Editor Pete Walker et al Plymouth Marine Laboratory For chaining OGC Web Processing Service geospatial Web Services Desktop Client http://www.xworx.org/

Data Centric Interface

BIFI (Beautiful Interfaces for Inputs) Taverna Workbench Plug-in, GUI definition language Taverna Server family

• Taverna Server – Multiple clients, Multi-user – Local and large scale infrastructures – Site Replication • Taverna Server Amazon Image – Can have local tools and services (e.g. R) – Multiple instances in Amazon Cloud and as required, for multiple users/uses and different security scenarios • Taverna Virtual Machine • Taverna Command Line • Bundled Servers, Services and Tools Interacting with a workflow

• Many tasks need user interaction

• A workflow on a server does not need to be “press a button and wait” – VPH-Share opens a VNC connection to the spawned instance. • Taverna Interaction Service – Users interact with a workflow (wherever it is running) in a Web browser. – Interaction Service Plug-in in workbench iPython integration

http://goo.gl/hm0qCN https://www.youtube.com/watch?v=QVQwSOX5S08 Analysis Portals

https://www.youtube.com/watch?v=s3D8JXc-tSM – Find and share workflows (Taverna, RapidMiner, Kepler, Galaxy, Trident, Vistrails, etc)

– Track updates of workflows

– Social curation: Comments, tags, stars

– Themed groups of users (projects, domain)

Over 7500 members, 300 groups, – Organize packs of related 2500 workflows, 600 files and data (source files, results) 300 packs (research objects) • Data cleaning • Data movement • Data retrieval and annotation • Data analysis • Data mining OpenTox Project Chemistry Eagle • knowledge management Development Kit VPH-Share Next Generation Sequencing • Data curation and data Drug Toxicity Models of Human Ecological warehouse population Physiology based Patient Niche Diagnostics • Data visualisation Modelling • Parameter sweeps over simulations Population Modelling Meta- genomics Phylo- genetics Drug discovery, small molecules, Astronomy & Document targets, HelioPhysics Preservation compounds Digitisation OpenPHACTS Open source development

• Taverna is open source software (LGPL) – https://github.com/taverna/ – License allows integration in closed-source products • Open development – Developer documentation and tutorials – Public mailing lists, issue trackers, wiki – Contributors from around the world • Taverna Plugins – APIs and plugin system • Applying to join the Apache Foundation Summary

• Taverna Suite for interactive and batch workflows • Flexible Plug-ins and Flexibly Plugged-in • Themed Taverna • Moving to the Apache Foundation • We welcome collaboration/contribution

• http://www.taverna.org.uk Integrating with Taverna • AstroTaverna plugin • “anything” can be extended by • OAuth plugin plugins: • Taverna PROV plugin – Service types • VAMDC plugin – Service discovery • BIFI plugin • VPH-Share plugin – Menus and toolbars • Interaction plugin – File types (e.g. SVG, PDF, CSV) • XPath plugin – Complete views/perspectives • REST plugin • Documentation and tutorials for • BioCatalogue plugin plugin developers • PBS plugin • SADI plugin • Installable/updatable from plugin sites • External Tools plugin • Many plugins gets included in Taverna • UNICORE plugin Core or domain-specific editions • CDK plugin (e.g. AstroTaverna++ became • caGrid plugin • XWS plugin Taverna Astronomy edition) • gLite plugin http://dev.mygrid.org.uk/wiki/display/developer/Creating+plugins+for • WPS plugin +Taverna+2 • ...