Thomas Steinke: CCWF Objectives
Total Page:16
File Type:pdf, Size:1020Kb
COST D37 GridChem Computational Chemistry Workflow (CCWF) Group AimsAims andand ObjectivesObjectives Thomas Steinke, ZIB December 11th 2006, Cambridge Zuse Institute Berlin (ZIB) <www.zib.de> [email protected] Workflows in Computational Chemistry Workflows have a long tradition in the Computational Chemistry domain. example of a typical tool chain: DB search Æ structure editor Æ molecular simulation(s) Æ properties Æ visualization Æ data mining Æ DB storage/archiv Æ insights Æ publication … The orchestration of complex workflow scenarios is on today’s agenda. linking in-house and (commercial) legacy codes Æ Transformation of scientific ventures into a scientifically validated protocol allowing a highly (semi-) automated data generation (pre-processing) and data processing steps. 2 Major GRIDCHEM Goals further proposal, implementation and promotion of standards in Computational Chemistry (CC) easy integration of CC applications into workflows being part of collaborative environments 3 Goals of the Workflow and Data Management Group The objective of the COST D37 “Workflow and Data Management Group” is to implement a workflow environment for illustration scenarios from the computational chemistry domain. 4 CCWF Goals (Proposal) implementation of workflow environments for QC by adapting standard (Grid) technologies fostering standard techniques (interfaces) for handling quantum chemical data in a flexible and extensible format to ensure application program interoperability and support of an efficient access to chemical information based on a CC ontology. implementation of computational chemistry illustrator scenarios to demonstrate the applicability of our approach 5 CCWF Partners • Aberdeen København Thomas Steinke, Tim Clark (DE) • Hans-Peter Lüthi, Martin Brändle, Peter Kunszt, Manuel Peitsch (CH) Amsterdam Peter Murray-Rust, Henry Rzepa, Jon Essex, Cambridge • Berlin • • Dave Ritchie (UK) • London Southampton • Antonio Márquez, Javier Sanz (ES) • Erlangen Luuk Visscher (NL) Kurt Mikkelsen (DK) Zürich • • Manno Maurizio Bruschi, Roberto Todeschini (IT) Milano • some Grid IT expertise sites - CSCS (Manno, CH) - ZIB (Berlin, DE) • Sevilla - … 6 Goals for Today What do we have? expertise, software, hardware What do we want to do? science, today’s limitations How we want to do it? approaches, implementation Who does what, and when? working plan, milestones 7 CCWF Chemical Illustrator Applications Molecular Design of Functionalised Enzynes Hans-Peter Lüthi, Martin Brändle, Zürich Peter Murray-Rust, Cambridge; Henry Rzepa, London Computational Heterogeneous Catalysis Antonio M. Márquez Cruz, Javier Fdez. Sanz, Sevilla Quantum Chemical Based QSAR/QSPR Tim Clark, Erlangen; Jon Essex, Southampton; Dave Ritchie, Aberdeen Æ PRESENTATIONS 8 2 x 3 Questions to Application Presenters: Chemistry: 1. Which chemistry packages? 2. Which data formats and import/export protocols? 3. Are you using a database systems? IT infrastructure: 1. Which IT infrastructure can you provide? 2. Did you have experiences with Grid/distributed/collaborative environments? 3. If yes, what middleware, e.g. workflow engine, batch system, your are familiar with? 9 Æ PRESENTATIONS 10 IT Infrastructure Issues 11 Generic (?) Workflow 1. Automatic generation + validation of input data 2. Submission, monitoring, and gathering of output data of QC simulation jobs 3. Export of resulting raw data into project database system(s), and application of data mining and visualisation techniques to reduce complexity. 4. Knowledge generation by applying methods of statistical analysis and pattern recognition. 5. On-line publication and archiving of valuable scientific data. 12 Topics chemical problem + application suite data formats, data objects, ontology, DBs workflows (WF) formalise workflow: WF engine, WF description interface to applications/database(s) list of chemistry packages, dependencies middleware: dependencies, WF engine requirements resource management: local + global access control, authentification/authorisation local hardware resources + management 13 “Implementation Roadmap” KISS (keep it simple stupid) implementation in phases? phase I: WF runs on local (distributed) resources chem application codes with standard interfaces phase II: small distributed “Grids”, illustrator application phase III: GRIDCHEM + UK + DE ressources? can we afford this? 14 Technical Issues Implementation of the workflow environment … based on (de-facto) standard technologies used in Grid environments support for a geographically distributed workflow environment (collaborative environment) Middleware … Condor, Condor/G, … evaluation of Grid middleware regarding the easy-to-use support for job submission and monitoring Workflow engine … Taverna, Triana interface to applications mediated through Web or Grid services Simulation and data analysis Data storage and representation CML/XML, large-volume QC data and temporary QC workflow data in Q5 format Pro and Cons of centralised versus distributed storage systems (e.g. SRB) must be evaluated. Human-machine interface: tool box idea (known from AVS) 15 Expertise Cambridge: CML/XML, comp chem ontology, comp chem workflows high-throughput computing ETHZ: comp chem simulation techniques comp chem ontology, DB application, XML/XSLT Erlangen: comp chem workflows for QSAR/QSPR Sevilla: comp chem workflows in material sciences ZIB: Grid tech, distributed data management, HPC, workflows in bioinformatics CSCS: Grid tech, HPC 16.