Apache Taverna
Total Page:16
File Type:pdf, Size:1020Kb
Apache Taverna hp://taverna.incubator.apache.org/ Donal Fellows San Soiland-Reyes @donalfellows @soilandreyes [email protected] [email protected] hp://orcid.org/0000-0002-9091-5938 hp://orcid.org/0000-0001-9842-9718 Alan R Williams @alanrw [email protected] hp://orcid.org/0000-0003-3156-2105 This work is licensed under a Creave Commons Collaboraons Workshop 2015-03-26 A,ribuIon 3.0 Unported License. Taverna Workflow Ecosystem • Workflow Language — SCUFL2 (and t2flow) • Workflow Engine — Taverna • Used in… – Taverna Command Line Tool – Taverna Server – Taverna WorkBench • Allied services – myExperiment, workflow repository – Service Catalographer, service catalog software • Instantiated as BioCatalogue, BiodiversityCatalogue, … NERSC Workflow Day 2 Map of the Taverna Ecosystem Taverna Ruby client Taverna Online Applicaon-Specific Portals Player UI UI Plugins UI Plugins Plugins Taverna UI Plugins Lite Taverna Taverna Workbench UI Plugins Command UI Plugins Line Tool REST API Components Taverna Other TavernaUI Plugins APIs Server Servers Taverna SOAP API Taverna Core AcIvity Engine UI Ps UI Plugins Plugins Workflow Service many Repository Catalogs services… 3 Users, Scientific Areas, Projects Taverna In Use NERSC Workflow Day 4 Taverna Users Worldwide NERSC Workflow Day 5 Taverna Uses — Scientific Areas • Biodiversity — BioVeL project • Digital Preservation — SCAPE project • Astronomy — AstroTaverna product • Solar Wind Physics — HELIO project • In silico Medicine — VPH-Share project NERSC Workflow Day 6 Biodiversity: BioVeL • Virtual e-LaBoratory for Biodiversity – Service and knowledge commons – Supporting Biodiversity research – Integrating with third-party applications • For example, iPython NoteBook • Portal for running production- grade workflows on users’ data – Powered By Taverna Server – Integration with major Biodiversity databases – Interaction support made to support NERSC Workflow Day 7 Digital Preservation: SCAPE • Automated petaByte-scale digital collection maintenance – Century of scanned newspapers – Whole national radio/TV output – Major WeB archives • Processing engine powered By Taverna – Lift simple workflows to work at collection level – Metadata management – Semantic annotations and components for guided workflow construction NERSC Workflow Day 8 Astronomy: AstroTaverna • Taverna plugin: IVOA (Virtual OBservatory) – Astronomy data services and tools • Example workflow: – List of galaxy names → Look up VO properties → Find similar/near galaxies → Add BiBliography • VOTaBle support (select/merge/split/..) – Later adapted By Bioinformatics community • Projects: CANUBE, Wf4Ever, VAMDC, ER- Flow • Taverna WorkBench used on the desktop: – IVOA service registry user interface – Integrated with standalone astronomy tools (SAMPS protocol): Aladin, TOPCAT NERSC Workflow Day 9 Astrophysics: HELIO • Virtual laBoratory for Solar Wind Science – OBservation catalogs – Processing – Data integration platform • Taverna is workflow glue – Taverna Server created to support – Workflows manage catalog access – Workflows manage data processing NERSC Workflow Day 10 Medicine and Physiology: VPH-Share • Platform for computer-aided medicine – Support for diagnosis and treatment prognosis • Osteoarthritis, Dementia, Liver disease, Cardiovascular disease – Driven By specially-configured cloud instances • Taverna is control and data management layer – Coordinates processing within cloud instances – User communication with cloud instances via Taverna interactions • Including complex 3D tasks NERSC Workflow Day 11 Introduction to the Taverna Workflow Language and its Executors Inside the Taverna Ecosystem NERSC Workflow Day 12 The Basics of a Taverna Workflow Input Ports (data in) SOAP processor (weB service call) XML handling processors Data Links (connect processors) Output Ports (data out) Get concept suggesIons from term Eelke van der Horst NERSC Workflow Day 13 h,p://www.myexperiment.org/workflows/4590.html Taverna Workflows • DescriBe how data flows Between processing nodes – Control dependencies also supported • Processing service nodes of various kinds – Invoke programs/scripts (local or on cluster or grid or …) – Call remote services (SOAP or REST) – Call R scripts (local or remote) – Read from and write to dataBases • BioMart support – Transfer data – Interact with the user in a Browser • Built-in parallelism and iteration – Processes lists of data in parallel • Large data usually handled By reference – Avoids having to transfer it where not necessary NERSC Workflow Day 14 Taverna WorkBench • IDE for Taverna Workflows • Design workflows • Run workflows • Analyze workflows • Access workflow repository NERSC Workflow Day 15 Taverna Workflows can get complex… BioVeL Populaon Model ConstrucIon and Analysis Maria Paula Balcázar-Vargas, Jonathan Giddy and Gerard Oostermeijer hp://www.myexperiment.org/workflows/3684.html NERSC Workflow Day 16 Managing Workflow Complexity • Subworkflows – Put smaller workflows within larger ones – Like using a user-defined function in a programming language – Can hide contents of suBworkflow • Components – “Black Box” (But implemented with suBworkflow) – Semantically-annotated; descriBed behaviour – Like using a library in a programming language NERSC Workflow Day 17 Taverna Engine • Executes (“enacts”) Taverna Workflows • Pushes data through system in parallel – SuBject to limits descriBed in workflow • Processor nodes invoked when their data becomes available – Turn inputs into outputs • Captures detailed trace of what happened (“provenance”) – Follows W3C PROV specification NERSC Workflow Day 18 Taverna Command Line Tool • Simple wrapper round Taverna Workflow Engine • Inputs as simple files • Outputs as directory structure • Provenance packaged in Research OBject – ZIP Archive – Inputs, Outputs, Intermediate values – Workflow, Provenance, Overall metadata NERSC Workflow Day 19 Taverna Server • Extends Workflow Engine to work for multiple simultaneous users – Isolates workflows from each other – Allows asynchronous usage – Manages resources – Clients can be in any language, not just Java • Designed to sit behind a Portal – User interfaces are domain-specific NERSC Workflow Day 20 Taverna Server Architecture Deployment Tomcat Container + CXF Framework Host Processing Taverna Server Service Web Webapp Portal Common System Model Catalog Common Services Management Model Per User File Manager Per-Run Taverna Workflow Engine (forthcoming) Storage Taverna Workbench Ruby Services Client Selected Management NotificationNERSC Workflow Day Interface 21 Endpoints (separate auth) Taverna Online WeB IDE for Taverna NERSC Workflow Day 22 Apache Taverna and Future Releases The Future of Taverna NERSC Workflow Day 23 • Non-profit organization, forming a community of open-source software projects. • Strong emphasis on openness, collaboration and a consensus-based development process. • Examples: – Apache HTTP Server, Tomcat, Maven, Hadoop, OpenOffice, SuBversion NERSC Workflow Day 24 Why Apache Taverna? • Open development: Everything on mailing list • Engagement: Encourage developer involvement – not just making plugins • Independence: Apache Taverna is an independent project – Not a “Manchester thing” • Shared ownership: equal participation • Sustainability: self-managed community NERSC Workflow Day 25 Apache IncuBator Gradually Becoming an Apache project • Intellectual Property assigned to ASF – License changed to Apache License 2.0 • Infrastructure change – everything at *.apache.org • Community Building – growing developer base • Mentoring on the “Apache Way” By volunteers from other Apache projects NERSC Workflow Day 26 Taverna Releases • Current staBle release: Taverna 2.5 – Command Line (2.5.1), Server (2.5.4), WorkBench (2.5.1) • http://www.taverna.org.uk/download/ • Taverna 3 Release plan: – Apache Taverna Language • API for workflow definitions – Apache Taverna Engine & Command Line • Can also run workflows from Taverna 2 WorkBench – Apache Taverna Server – Apache Taverna WorkBench NERSC Workflow Day 27 Try Taverna! • Get Taverna: – http://taverna.org.uk/download/ • Documentation: – http://www.taverna.org.uk/documentation/ taverna-2-x/ • Code: – http://taverna.incuBator.apache.org/code/ • Getting involved: – http://taverna.incuBator.apache.org/community/ NERSC Workflow Day 28 .