Taverna Workflow Management System
Total Page:16
File Type:pdf, Size:1020Kb
http://www.mygrid.org.uk/ http://www.taverna.org.uk/ Taverna Robert Haines, Stian Soiland-Reyes myGrid, University of Manchester [email protected] http://orcid.org/0000-0002-9538-7919 IS-ENES2 workshop on workflows, Hamburg, 2014-06-03 This work is licensed under a Creative Commons Attribution 3.0 Unported License Taverna in Context • Comprehensive Scientific Workflow Management System + auxiliary tools/repositories • Based at Manchester with multiple contributions and collaborations • Releases: Three major; numerous rolling intermediate. First release 2004. • Downloads: 90,000+ cumulative; 1000 in first month per intermediate release; user audit for May 2013 had 900+ unique addresses use Taverna • Users: ~380 sites and institutions have used or use Taverna • Support: mailing list, community list and Jira Taverna workflows • Sophisticated analysis pipelines • A set of services to analyze or manage data (local or remote) • Workflows run through the workbench or via a server • Automation of data flow through services • Control of service invocation • Iteration over data sets • Provenance collection • Extensible and open source Taverna workflows • Dataflow – Graphically connect data between drag-and- dropped services • Service types – REST, SOAP, Command Line, web interactions, scripts (R, Python, Beanshell) – Domain-specific plugins – Your tool? Taverna workflows • Nested workflows • Components – Reusable and inter-compatible workflow fragments – Grouped into families – Semantically annotated – Curated Application RepositoriesRepositories Application Runtime MiddlewareRuntime Middleware Execution Activity Plug-ins Resources/Codes/ServicesResources/Codes/Services Infrastructures Platforms Registries Application RepositoriesRepositories Application Portals and Applications Taverna Desktop Workbench Taverna Online Workflows & workflow Web Tool components Runtime Provenance Runtime Middleware Provenance Middleware Player PROV, OPM Third Party Engine Server Cmd line Servers Data Execution Activity Plug-ins Resources/Codes/ServicesResources/Codes/Services Infrastructures BioSTIF Platforms Registries Workflow Clients for People Taverna Concept Knowledge Domain Technical Computational Domain Scientist Scientist Workbench Components Java library Ruby Gem CmdLine Player High Workflow Visibility Low Simulation characteristics • Platform • Data • Incorporating codes/services • Scale • Parameter and data sweeps • Interacting • Reporting Biodiversity marine monitoring and health assessment ecological niche modelling Enclosed sea problem Pilumnus hirtellus (Ready et al., 2010) Data Intensive Science Collaborative Science Sarah Bourlat Data collection Data discovery Data assembly, cleaning, and refinement Ecological Niche Modeling Statistical analysis Insights Scholarly Communication & Reporting Analytical cycle www.biovel.eu Ecological Niche Modeling Workflow (ENM) data configuration parameters steps Data and Parameter Sweeps VPH-Share @neurist Aneurysm Morphology Workflow Patients Patient Avatar Disease Simulation Workflow Patient Avatar Systemic Factors updated RISK Gene Expression Profile http://www.vph-share.eu/ Implementation in VPH-Share The @neurIST morphological workflow specification in Taverna http://www.vph-share.eu/ Make Deploy Taverna Workbench Taverna Player Taverna Server Integrate and run Save and share Workflow repository VPH-Share Analysis portals Commandline Taverna (batch) Taverna Workbench • Desktop application • GUI • Plug-in Framework • Themed editions • Intermediate results views • Search for Web Services in catalogues • Search and publish to myExperiment Web apps to create and run workflows Taverna Online • Dr Vadim Surpin and Vitaly Sharanutsa – Institute for Information Transmission Problems of Russian Academy of Sciences (IITP RAS) • An online, in-browser application for assembling and running Taverna Workflows over a HPC platform Web apps to create and run workflows Service Chaining Editor Pete Walker et al Plymouth Marine Laboratory For chaining OGC Web Processing Service geospatial Web Services Desktop Client http://www.xworx.org/ Data Centric Interface BIFI (Beautiful Interfaces for Inputs) Taverna Workbench Plug-in, GUI definition language Taverna Server family • Taverna Server – Multiple clients, Multi-user – Local and large scale infrastructures – Site Replication • Taverna Server Amazon Image – Can have local tools and services (e.g. R) – Multiple instances in Amazon Cloud and as required, for multiple users/uses and different security scenarios • Taverna Virtual Machine • Taverna Command Line • Bundled Servers, Services and Tools Interacting with a workflow • Many tasks need user interaction • A workflow on a server does not need to be “press a button and wait” – VPH-Share opens a VNC connection to the spawned instance. • Taverna Interaction Service – Users interact with a workflow (wherever it is running) in a Web browser. – Interaction Service Plug-in in workbench iPython integration http://goo.gl/hm0qCN https://www.youtube.com/watch?v=QVQwSOX5S08 Analysis Portals https://www.youtube.com/watch?v=s3D8JXc-tSM – Find and share workflows (Taverna, RapidMiner, Kepler, Galaxy, Trident, Vistrails, etc) – Track updates of workflows – Social curation: Comments, tags, stars – Themed groups of users (projects, domain) Over 7500 members, 300 groups, – Organize packs of related 2500 workflows, 600 files and data (source files, results) 300 packs (research objects) • Data cleaning • Data movement • Data retrieval and annotation • Data analysis • Data mining OpenTox Project Chemistry Eagle Genomics • knowledge management Development Kit VPH-Share Next Generation Sequencing • Data curation and data Drug Toxicity Models of Human Ecological warehouse population Physiology based Patient Niche Diagnostics • Data visualisation Modelling • Parameter sweeps over simulations Population Modelling Meta- genomics Phylo- genetics Drug discovery, small molecules, Astronomy & Document Systems Biology targets, HelioPhysics Preservation compounds Digitisation OpenPHACTS Open source development • Taverna is open source software (LGPL) – https://github.com/taverna/ – License allows integration in closed-source products • Open development – Developer documentation and tutorials – Public mailing lists, issue trackers, wiki – Contributors from around the world • Taverna Plugins – APIs and plugin system • Applying to join the Apache Foundation Summary • Taverna Suite for interactive and batch workflows • Flexible Plug-ins and Flexibly Plugged-in • Themed Taverna • Moving to the Apache Foundation • We welcome collaboration/contribution • http://www.taverna.org.uk Integrating with Taverna • AstroTaverna plugin • “anything” can be extended by • OAuth plugin plugins: • Taverna PROV plugin – Service types • VAMDC plugin – Service discovery • BIFI plugin • VPH-Share plugin – Menus and toolbars • Interaction plugin – File types (e.g. SVG, PDF, CSV) • XPath plugin – Complete views/perspectives • REST plugin • Documentation and tutorials for • BioCatalogue plugin plugin developers • PBS plugin • SADI plugin • Installable/updatable from plugin sites • External Tools plugin • Many plugins gets included in Taverna • UNICORE plugin Core or domain-specific editions • CDK plugin (e.g. AstroTaverna++ became • caGrid plugin • XWS plugin Taverna Astronomy edition) • gLite plugin http://dev.mygrid.org.uk/wiki/display/developer/Creating+plugins+for • WPS plugin +Taverna+2 • ... .