Technical Note on e-Research Applications Services, Intermediate Version

Workpackage 5 VRE Infrastructure and Services Design and Development

Task 5.4 E-Research Application Services

Author (s) Raul Palma PSNC

Marcin Krystek PSNC

Pedro Gonçalves T2

Andres Garcia ESI

Ugo Di Giammatteo ACS

Reviewer (s) Jose Manuel Gomez Perez ESI

Cristiano Silvagni ESA

Approver (s) Pedro Gonçalves T2

Cristiano Silvagni ESA

Authorizer Mirko Albani ESA

Document Identifier EVER-EST DEL WP5-D5.4

Dissemination Level Public

Status Draft to be approved by the EC

Version 1.0

Date of issue 05/12/2016

H2020 – EINFRA – 2015 – 1 Page 1 of 76

Document Log

Date Author Changes Version Status

10/10/2016 Raul Palma TOC 0.1 Draft

11/10/2016 Raul Palma TOC improvements 0.2 Draft

20/10/2016 Raul Palma Sections 1,2,3 0.3 Draft

31/10/2016 Raul Palma Sections 1,2,3 0.4 Draft

02/11/2016 Marcin Kristek Section 3 (now 4) 0.5 Draft

03/11/2016 Pedro Gonçalves Section 3 (now 4) 0.6 Draft

04/11/2016 Andres Garcia New Section 3, Section 4 0.7 Draft

07/11/2016 Raul Palma Sections 1,2,3,4 0.8 Draft

13/11/2016 Ugo Di Giammatteo Input Section 5,6 0.9 Draft

14/11/2016 Raul Palma Compilation 0.10 Draft Implementation of 21/11/2016 Raul Palma 0.11 Draft changes from reviewers Reorganisation and 22/11/2016 Raul Palma 1.12 Draft polishing Draft to be approved 05/11/2016 Raul Palma Final version released 1.0 by the EC

H2020 – EINFRA – 2015 – 1 Page 2 of 76

Table of Contents

1 Introduction ...... 11 1.1 Purpose and scope ...... 11 1.2 Relation to other workpackages ...... 11 1.3 Who shall read this document ...... 12 1.4 System context ...... 12 1.5 Services list and their relation to D5.1 ...... 13 1.6 Document structure ...... 14 2 Research Object Management ...... 15 2.1 ROHUB portal overview ...... 15 2.2 ROHUB portal developments ...... 16 2.2.1 ROHUB Portal v1 (production prototype) ...... 16 2.2.2 ROHUB portal v2 (beta prototype) ...... 17 2.3 Comparison of technologies and capabilities between ROHUB portal v1 and portal v2 ...... 22 3 Management ...... 23 3.1 Workflow modelling ...... 24 3.2 Workflow execution ...... 25 4 Management of Execution Environments ...... 26 5 Preservation Aspects ...... 26 5.1 Application of the OAIS model ...... 26 5.2 Digital Objects Identifiers (DOIs) for Research Objects ...... 29 5.3 Research objects assessment ...... 29 5.3.1 Research objects checklists ...... 29 5.3.2 Research objects stability ...... 30 5.4 Research objects notifications ...... 30 5.5 Research objects fixity checking ...... 30 6 E-Research Services ...... 30 6.1 Services overview ...... 30 6.2 Services visual components...... 31 6.2.1 ROHUB ...... 31 6.2.1.1 Visual components for discovering and visualising research objects, and for interacting with Portal (for non-authenticated users) ...... 32 6.2.1.2 Visual components for discovering and visualising research objects, and for interacting with Portal (for non-authenticated users) ...... 39 6.2.1.3 Visual components for creating and managing research objects, and for interacting with Portal (for authenticated users) ...... 41 6.2.2 VRE Portal RO Manager ...... 44 6.2.3 Workflow Management ...... 44 6.2.4 Cloud Platform ...... 45 6.2.5 Preservation components ...... 46

H2020 – EINFRA – 2015 – 1 Page 3 of 76

6.3 Using the service – quick overview ...... 47 6.3.1 Getting started ...... 47 6.3.2 ROHUB ...... 47 6.3.2.1 Search & discover research objects in ROHUB ...... 47 6.3.2.2 Visualising research objects in ROHUB...... 49 6.3.2.3 Signing-in ROHUB ...... 49 6.3.2.4 Creating research objects in ROHUB ...... 50 6.3.2.5 Managing research objects in ROHUB ...... 51 6.3.2.6 Managing research objects evolution in ROHUB ...... 55 6.3.3 VRE portal RO manager ...... 55 6.3.4 Workflow management ...... 55 6.3.4.1 Transforming and extracting metadata of a Workflow ...... 55 6.3.5 Cloud Platform ...... 56 6.3.5.1 Application deployment ...... 56 6.3.5.2 Continuous Integration ...... 57 6.3.6 Preservation components ...... 59 6.4 Service public API’s ...... 60 6.4.1 ROHUB ...... 60 6.4.2 VRE portal RO manager ...... 60 6.4.3 Workflow management ...... 60 6.4.4 Cloud Platform ...... 61 6.4.5 Preservation components ...... 61 6.5 Command-line commands ...... 62 6.5.1 RO manager tool ...... 62 6.5.1.1 Configuration and miscellaneous ...... 62 6.5.1.2 Creating and populating an RO ...... 62 6.5.1.3 Annotating an RO or component ...... 63 6.5.1.4 Exchange RO with repository ...... 63 6.5.1.5 Analyse RO contents ...... 63 7 Future Work...... 63 Annex A Installation Guides ...... 65 A.1. ROHUB ...... 65 A.1.1 Frontend: Portal ...... 65 A.1.1.1 Software prerequisites ...... 65 A.1.1.2 Hardware prerequisites ...... 66 A.1.1.3 Service installation ...... 66 A.1.1.4 Uninstallation Procedure ...... 67 A.1.2 Backend: Research Object Digital Library (RODL) ...... 68 A.1.2.1 Software prerequisites ...... 68 A.1.2.2 Hardware prerequisites ...... 69 A.1.2.3 Service installation ...... 69 A.1.2.4 Uninstallation procedure ...... 71

H2020 – EINFRA – 2015 – 1 Page 4 of 76

A.2 VRE portal RO manager ...... 72 A.3 Workflow management ...... 72 A.3.1 WF-RO transformation service ...... 72 A.3.1.1 Software prerequisites ...... 72 A.3.1.2 Hardware prerequisites ...... 72 A.3.1.3 Service installation ...... 72 A.3.1.4 Uninstallation procedure ...... 72 A.3.2 Workflow runner service ...... 73 A.3.2.1. Software prerequisites ...... 73 A.3.2.2. Hardware prerequisites ...... 73 A.3.2.3. Service installation ...... 73 A.3.2.4. Uninstallation procedure ...... 73 A.4 Cloud Platform ...... 73 A.4.1 Software prerequisites...... 73 A.4.2 Hardware prerequisites ...... 74 A.4.3 Service installation ...... 74 A.5 Preservation components ...... 75 A.5.1 Preservation front-End ...... 75 A.5.2 Preservation middleware ...... 75 A.5.3 Preservation DB of user profiles ...... 75 A.5.4 RO assessment ...... 76 A.5.4.1 Checklist component ...... 76 A.5.4.2 Stability component ...... 76 A.5.5 RO notification and preservation component in ROHUB ...... 76

H2020 – EINFRA – 2015 – 1 Page 5 of 76

List of Figures

Figure 1-1 Work package dependencies ...... 12 Figure 1-2 EVER-EST VRE architecture (updated version from D5.1) ...... 14 Figure 2-1 ROHUB portal v1 main page ...... 18 Figure 2-2 ROHUB portal v1 RO main page ...... 19 Figure 2-3 ROHUB portal version 2 main page (1/2) ...... 20 Figure 2-4 ROHUB portal version 2 main page (2/2) ...... 21 Figure 2-5 ROHUB portal version 2 RO main page ...... 22 Figure 3-1 Workflow excerpt associated with a Land Monitoring research object ...... 24 Figure 3-2 Taverna components and related tools (source: https://taverna.incubator.apache.org/introduction/) . 25 Figure 5-1 OAIS Archive environment ...... 27 Figure 5-2 Overall preservation flow ...... 28 Figure 6-1 Keyword search component ...... 32 Figure 6-3 Faceted Filters List ...... 33 Figure 6-4 Featured research objects component ...... 33 Figure 6-5 Recent activity component ...... 33 Figure 6-6 SPARQL search component ...... 34 Figure 6-7 RO overview component ...... 35 Figure 6-8 RO advanced view component...... 35 Figure 6-9 Resource navigation component ...... 36 Figure 6-11 Resource details - basic view component ...... 36 Figure 6-12 RO relations component ...... 36 Figure 6-13 RO quality component ...... 37 Figure 6-14 RO notifications component ...... 38 Figure 6-15 RO history component ...... 38 Figure 6-16 ROHUB sign-in component ...... 39 Figure 6-17 ROHUB sign-up component ...... 39 Figure 6-18 ROHUB my ROs view component ...... 40 Figure 6-19 ROHUB access control component ...... 40 Figure 6-20 ROHUB comments component ...... 41 Figure 6-21 RO create/delete component ...... 41 Figure 6-23 RO overview manager ...... 42 Figure 6-24 RO relations manager component ...... 42 Figure 6-25 RO resource manager (basic annotations view) ...... 43 Figure 6-26 RO resource manager (advanced annotations view) ...... 43 Figure 6-27 RO evolution manager...... 44 Figure 6-28 WF-RO component in ROHUB ...... 45 Figure 6-29 Cloud Platform Dashboard with Resource Usage ...... 46 Figure 6-30 Cloud Platform Dashboard Listing VM Templates ...... 46

H2020 – EINFRA – 2015 – 1 Page 6 of 76

Figure 6-31 RO monitoring tool ...... 47 Figure 6-32 RO keyword-based search ...... 48 Figure 6-33 RO faceted-based search...... 48 Figure 6-34 Visualise research objects in ROHUB ...... 49 Figure 6-35 Sign-in in ROHUB ...... 50 Figure 6-37 Create RO from scratch ...... 51 Figure 6-38 Upload a resource in a RO ...... 51 Figure 6-39 Upload an RO bundle ...... 51 Figure 6-40 Add a folder in a RO...... 52 Figure 6-41 Import annotations file ...... 52 Figure 6-42 Add resource type annotation ...... 53 Figure 6-43 Add comment ...... 53 Figure 6-44 Change RO access control settings ...... 55 Figure 6-45 Transforming and extracting metadata of a Workflow in ROHUB ...... 56 Figure 6-46 Cluster processing self-service - Master node ...... 58 Figure 6-47 Cluster processing self-service – Slave node(s) ...... 59

List of Tables

Table 1 comparison of technologies between ROHUB portal v1 and portal v2 ...... 22 Table 2 comparison of capabilities between ROHUB portal v1 and portal v2 ...... 22

H2020 – EINFRA – 2015 – 1 Page 7 of 76

Definitions and Acronyms

Acronym Description ABAC Attribute Based Access Control ACL Access Control List AJAX Asynchronous JavaScript and XML AMQP Advanced Message Queuing Protocol API Application Programming Interface BAPI Business Application Programming Interface CMS Content Management System CRL Certificate Revocation List CRUD Create read Update Delete CSV Comma-Separated Values DOI Digital Object Identifier E-PDSC Extended-Preservation Dataset Content EAI Enterprise Application Integration EDA Event Driven Architecture EDI Electronic Data Interchange EO Earth Observation ERP Enterprise Resource Planning ES Earth Science ESA European Space Agency EVAT Expert-user Visual Analysis Tool European Virtual Environment for Research - Earth Science EVER-EST Themes FTP File Transfer Protocol FTPS FTP over SSL GIF Graphics Interchange Format GUI Graphical User Interface HTML Hypertext Mark-up Language HTTP Hypertext Transfer Protocol HTTPS HTTP over TLS, HTTP over SSL, and HTTP Secure IDE Integrated Development Environment IIOP Internet Inter-ORB Protocol IMAP Internet Message Access Protocol IO Input Output IPR Intellectual Property Rights IS Identity Server

H2020 – EINFRA – 2015 – 1 Page 8 of 76

ISO International Organization for Standardization IT Information Technology IWA Integrated Windows Authentication JIT Just In Time JMX Java Management Extension JPEG Joint Photographic Experts Group JSON JS Object Notation LDAP Lightweight Directory Access Protocol LGPL Lesser General Public License MEA Multi-sensor Evolution Analysis MLLP Minimum Lower Layer Protocol MOM Message Oriented Middleware MQTT MQ Telemetry Transport OAGIS Open Applications Group Integration Specification Organization for the Advancement of Structured OASIS Information Standards OCSP Online Certificate Status Protocol OGC Open Geospatial Consortium OPI Operator Panel Interface PBAC iPolicy Based Access Control PDSC Preservation Dataset Content PNG Portable Network Graphics POP Point of presence RBAC Role Based Access Control RDF Resource Description Framework REST Representational State Transfer RO Research Objects RODL Research Object Digital Library RSS Rich Site Summary SFTP Secure File Transfer Protocol SLA Service Level Agreement SMTP Simple Mail Transfer Protocol SPARQL Protocol and RDF Query Language SQL Structured Query Language SSO Single Sign-On SVG Scalable Vector Graphics TCP Transmission Control Protocol UDP User Datagram Protocol

H2020 – EINFRA – 2015 – 1 Page 9 of 76

URI Uniform Resource Identifier URL Uniform Resource Locator VAC Visual Analysis Client VRC Virtual Research Community VRE Virtual Research Environment WCF Windows Communication Foundation WCPS Web Coverage Processing Service WCS Web Catalogue Service WebCGM Web Computer Graphics Metafile WFMS Workflow Management Systems WFS Web Feature Service WMS Web Map Service XML EXtensible Mark-up Language

Applicable Documents

Document ID Document Title EVER-EST DEL WP5-D5.1 VRE Architecture and Interfaces Definition

Reference Documents

Document ID Document Title [1] VRE Architecture and Interfaces Definition. EVER-EST deliverable D5.1. March, 2016 [2] Virtual Research Environment detailed definition of use cases. EVER-EST deliverable D3.1. March, 2016 [3] Technical note on digital information and e-collaboration services. EVER-EST deliverable D5.3. November, 2016 [4] Design, implementation and deployment of Research Objects components for Earth Science Phase 1. EVER-EST deliverable D4.3. November, 2016 [5] Design, implementation and deployment of Workflow Integrity and Authenticity Maintenance components – Phase II. Wf4Ever deliverable D4.2v2. July, 2013 [6] B. Hales and P. Pronovost, The checklist-a tool for error management and performance improvement. Journal of critical care, vol. 3, no. 21, pp. 231-235, 2006. [7] S. Perez, O. Corcho, . Palma, P. Holubowicz. Knowledge Hub: Best Practices for Archival Processing of Research Objects (a librarian view). Technical Document.

H2020 – EINFRA – 2015 – 1 Page 10 of 76

1 Introduction 1.1 Purpose and scope The purpose of this document is to provide technical information regarding the implementation so far and usage of e-research services towards the development of the integrated EVER-EST Virtual Research Environment. The document includes:  The context of the main features addressed by the e-research services (RO management, management, execution management, and preservation).  Technical information of the visual components provided by the services.  Technical information on how to use the services.  Technical description of the services APIs.  Associated command line tools (if available).

Additionally, the document includes an annex with the services installation guides, for those technical users who would like to replicate the environment. The information provided in this document is technical and not intended to provide a general description of each component, as this was already covered by D5.1. The components described in this documents are those already deployed and being used, and that are currently under the integration with the rest of components. Some other components of the e-research services (see Section 1.3) are still under implementation or testing, and therefore will be described in the next version of this document. Note that although this document provides short guides on how to use the services, these guides are not intended to be comprehensive user guides, rather information at a more technical level on how to interact with the services.

1.2 Relation to other workpackages This deliverable has direct relation to deliverables D5.1 (VRE Architecture and Interfaces Definition) and D4.3 (Design, implementation and deployment of Research Objects components for Earth Science Phase 1) of work packages 5 and 4 respectively. D5.1 describes the high-level description of the EVER‐EST Virtual Research Environment (VRE) architecture along with its components and provide a common understanding concerning the technical approach and available technologies. On the other hand, D4.3 presents the main mechanism for research object retrieval in the VRE, the search engines and the recommender system based on the collaboration spheres metaphor, the checklist design for earth observation research objects and their implementation in ROHUB in order to assess the research objects quality, the efforts toward the generation of DOI to research objects, the new research object life cycle, and the programs developed to harvest research object citation information. Thus, the relation between this document and D4.3 is very strong: D4.3 presents the research challenges associated with the management of research objects, the approach taken for solving these challenges, and the implementation of the solutions. On the other hand, this document, provides a more technical view on the resulting component (APIs, visual components, usage, installation), and how these services get integrated in a more general framework in order to provide the e-research capabilities in EVER-EST virtual research environment. Note that, in turn, D5.1 and D4.3 are directly related to D3.1 (Virtual Research Environment detailed definition of use cases), which describes the uses cases representing the different needs of the research communities that are part of the project. Figure 1-1 shows the links between the corresponding work packages in EVER-EST.

H2020 – EINFRA – 2015 – 1 Page 11 of 76

Figure 1-1 Work package dependencies

1.3 Who shall read this document This document is intended for technical readers who are interested either on the technical aspects of the EVER- EST components regarding the e-research capabilities, e.g., to integrate or use them from other services, or that would like to deploy the e-research components in other environments. For a general description of the components and their characteristics, please refer to D5.1.

1.4 System context In reference to the general architecture of EVER-EST virtual research environment described in D5.1, and depicted with some improvements in Figure 1-2, this document deals with the components encapsulated in the e-research box at the top of the diagram, along with the middleware services connected with RO management, workflow management and execution environment (cloud platform). In particular, these include:  ROHUB portal (on the left bottom corner of the diagram) implementing the yellow functionalities of the RO access & usage box, through the interaction with ROHUB backend service, providing a holistic solution for the management of research objects.  EVER-EST portal (above ROHUB on the diagram) implementing a subset of the yellow functionalities of the RO access & usage box, tailored for users in the Earth Science communities, through the interaction with ROHUB backend service.  Collaboration spheres (between the yellow/orange boxes) that interacts with ROHUB backend service and with other new RO related services under development, including the recommendation service, in order to provide an advanced exploratory interface for the discovery of research objects  RO assessment (between the yellow/orange boxes) that interacts with the checklist and stability evaluation services, providing measures of the quality of a research object.

H2020 – EINFRA – 2015 – 1 Page 12 of 76

 RO Impact analytics (between the yellow/orange boxes) that interacts with the impact metrics service, in order to provide measures of the impact of a research object in a community, its citations and applications.  RO resources execution, and the execution workbench components (yellow/orange boxes) that interact either with the workflow runner service for the remote execution of a scientific workflow, and/or with the cloud platform to instantiate, deploy and manage the required execution environment and execution engines.  Other RO related services used/integrated in ROHUB, including workflow transformation (wf-ro) service, which converts workflows into research objects and extracts the workflow metadata into explicit annotations; and the notification and preservation components inside ROHUB, which together with the RO assessment services support the long-term preservation of research objects.

From these components, this document describes ROHUB (portal and backend), RO assessment (checklist and stability), WF-RO transformation, workflow runner, cloud platform, notification and preservation components. In addition to RO related preservation components, this document also covers more specific data preservation challenges and initial design of the components. Regarding the other components, which are not covered in this document, the RO component in EVER-EST portal are described in D5.3 with the rest of the portal description, while collaboration spheres, semantic search, recommendation and scholarly communication services (explained in D4.3) will appear in the next version of the document as they are still under development or testing. Similarly, the data preservation components, currently under development, will be described in more detail in the next version of the document.

1.5 Services list and their relation to D5.1 In order to facilitate the reading of this document after the release of D5.1, this section clarifies how the current list of services presented in this document map to the structure and categorisation made in D5.1.  In D5.1, ROHUB was introduced as a major component of the VRE portal. The reasoning was that one of the three main objectives of EVER-EST is to implement and validate the use of “research objects” in Earth Science. This requires the provision of functionalities to discover, create, manage, share, (re) use and preserve the research artefacts related to the VRC use cases/studies through the research object (RO) concept and tools. However, as it was described in Section 3.1 of D5.1, ROHUB plus other RO services, along with the cloud platform, workflow management components, and preservation components, address the e-research functionalities to be provided by the EVER-EST VRE. Hence, this technical note on e-research services clearly covers ROHUB (portal and backend) plus all other RO related services.  In D5.1, the VRE portal covered mainly the ROHUB portal. This has been revised and clarified in the current set of deliverables, particularly D5.3 and D5.4, where the EVER-EST portal was introduced (D5.3) instead of the VRE portal, in addition to the ROHUB portal (D5.4). Note, however, that both of these portals enable end-users to use, access and manage research objects, at different degree and detail (as discussed in Section 2 ).  The technical documentation of cloud platform component in this document corresponds directly to the cloud platform description in D5.1 within the e-research services.  The technical documentation of workflow management components in this document corresponds directly to the same components in D5.1 within the e-research services. This section, however, covers not only the workflow runner service as in D5.1, but also the workflow transformation service (wf-ro).  The technical documentation of preservation components in this document also correspond to the components description in D5.1 with the following changes: o The data preservation packager and preservation assistant described in D5.1, have been removed and replaced by the Preservation Front-End, Preservation Middleware and Preservation DB of

H2020 – EINFRA – 2015 – 1 Page 13 of 76

User Profiles components. The reason for this change is that after an analysis of the originally proposed components, which were results from SCIDIP-ES project, it was concluded that they were not mature or stable enough. On the other hand, the Enterprise Service Bus used to support other services in the VRE (see D5.1) can be leveraged to support the required capabilities with the new components. o The checklist and stability/reliability services are grouped in this document under the RO assessment subsection of the preservation components, but correspond directly to the components descriptions in D5.1 within the preservation e-research services. o The notification and preservation components in ROHUB correspond also directly to the components descriptions in D5.1 within the preservation e-research services.

Figure 1-2 EVER-EST VRE architecture (updated version from D5.1)

1.6 Document structure This document is structured as follows:  Section 2 describes ROHUB portal (both the current and the new portal that is being developed) and its capabilities as the main user interface for research object management.  Section 3 introduces the workflow management capabilities and associated components.  Section 4 introduces the cloud platform for the management of execution environments.  Section 5 discusses the preservation challenges, application of OAIS model in EVER-EST, and the different capabilities to be provided with the associated components.  Section 6 provides the technical documentation of the individual components supporting the e-research capabilities, including the visual components provided by the services, a technical guide how to use and interact with the services, information about the implemented APIs by the services, and any related command line tool available. H2020 – EINFRA – 2015 – 1 Page 14 of 76

 Finally, Section 7 briefly discusses the future work, and Appendix A provides installation guides for each of the services.

2 Research Object Management One of the key goals in EVER-EST project is the adoption and implementation of the research object concept and technologies to support the research work and collaboration of scientists in Earth Science. In this line, EVER-EST is leveraging the existing ROHUB system, which supports the complete management and preservation lifecycle of research objects, and is extending it to cover the specific needs of scientists in the Earth Science domain. ROHUB comprises both a backend service (RODL), implementing and exposing a set of (RESTful) APIs, and a reference web client application (ROHUB portal), exposing the research object functionalities to the end-users (scientists and researchers). In the context of the EVER-EST VRE, RODL is one of the central middleware components that supports (and interacts with) the other components in the architecture (as depicted in Figure 1-2). However, the VRE provides two frontends to the end-users for using, accessing and managing research objects: the EVER-EST portal (plus the specific VRCs portals), and the ROHUB portal. The EVER-EST portal enables scientists of the four VRCs and interested users (e.g., citizens) to discover, produce and manage their research work in Earth Observation (EO) through the Research Object (RO) and EO tools. This portal is the main entry point to the VRC Portals, each of them providing a dedicated working environment interface customized to fit the VRC needs. Internally, these portals use research objects as the underlying mechanism to manage and preserve their work (interacting with RODL). However, they abstract the research object terminology and details from the user interface in order to provide a tailored access to (some of) the research objects management capabilities. ROHUB portal, on the other hand, provides an alternate, management oriented, interface exposing the full set of research object management capabilities to the end-users. It is intended for more technical/advanced users that have been already familiarised with research objects, or who would like to analyse or manage the research object in more detail. So, while in the ROHUB portal, the user may need to perform multiple individual operations to build a research object (create, annotate, add resources, etc.), the EVER-EST portal may encapsulate all these operations in a single control. Note, however, that ROHUB portal can also support inter-VRCs collaboration, for instance, by providing richer interfaces to discover research objects and resources based on cross-domain characteristics, or by enabling open discussions around them. In the remainder of this section the ROHUB portal is described in more detail, while the EVER-EST portal (plus the specific VRCs portals) is described in D5.3 (as described in Section 1.4). The ROHUB backend APIs are discussed in Section 6.4.1.

2.1 ROHUB portal overview ROHUB portal is a web client application providing a comprehensive user interface for the management and preservation of research objects (ROs). ROHUB portal integrates and provides access to different research object services, including: ● The core RO backend services, provided by RODL component, enabling the creation, storage, maintenance and access to research objects, the management of their lifecycle, and their preservation. ● RO notification service, enabling the subscription to events related to a particular research object (e.g., changes in content or quality), or to the portal itself (e.g., when new ROs are created). ● WF-RO transformation service, enabling the transformation of workflows into research objects, and the exposure of the annotations and resources embedded in the workflow according to the RO model. ● RO checklists service, providing remote access to the minim-based evaluation of research objects, used to test for completeness, runnable or repeatability.

H2020 – EINFRA – 2015 – 1 Page 15 of 76

● RO stability service, enabling the evaluation of the RO through time by capturing concrete values provided by the checklist service in different moments of its evolution. It allows testing the ability of a research object to achieve its original purpose after being subject of changes on its resources. ● RO monitoring tool, providing an interface to visually monitor and keep track of the status of external datasets and web services required for workflow execution. It's based on the stability service.

Additionally, the portal provides interfaces for access control and user management, which are also part of the core services implemented by RODL. There are currently two implementations of ROHUB portal, the old implementation (ROHUB portal v1) that is a result from Wf4Ever project, and a completely new portal (ROHUB portal v2), which is being implemented as part of the EVER-EST project activities. Note, however, that the old implementation has been somewhat improved and extended as part EVER-EST, as it is still the main interface used by the VRCs for the creation and management of their research objects. For instance, it has been integrated with EVER-EST identity management component, and the set of RO templates has been extended to cover the needs of Earth Science users. Note that portal v1 enabled the research object team to introduce and demonstrate the research objects in practice, and also enabled VRC members to understand the concepts and create their first research objects from an early stage of the project. However, it has been decided to implement a completely new portal following a more modular approach in order to  Facilitate the integration of components in other portals, particularly the EVER-EST portal.  Improve the usability and performance.  Update the technologies used in the implementation. The following section introduces the two portals and their level of maturity.

2.2 ROHUB portal developments The ROHUB portal v1 has been the main user interface for ROHUB since the product was released in 2012, and it will continue to be used until the official release of ROHUB portal v2 (first half of 2017). The usage of v1 enabled the collection of substantial feedback from different user communities regarding the expectations for such interface for RO management. It also provided the means to run hackathons in EVER-EST from a very early stage to familiarize as soon as possible the VRC users with the RO concepts. The results and feedback received from the hackathons are key input driving the development of v2 of the portal.

2.2.1 ROHUB Portal v1 (production prototype) ROHUB Portal v1 is based on (6.9.0) from mid 2013. Since its release in 2012, however, many new web development frameworks have become more mature, while other technologies used in the portal have been or are becoming deprecated. For instance, the portal implemented Open ID specification from 2008, and since then many providers have switched to the new standard Open ID Connect, which is also the protocol agreed to be used by the EVER-EST identity provider. Thus, this required an update in the implementation. Similarly, it was necessary to update some libraries and the associated code implementation to be able to process RO bundles generated by the latest version of Taverna workflow management framework, since the underlying specifications were recently updated. Moreover, the technology used for v1 does not allow an integration of the individual components in other portals. And since one of the envisioned capabilities in the project is to support the integration of some of the features of the ROHUB in EVER-EST portal, it was decided to start the implementation of the v2 of the portal. Section 2.3 presents two tables summarizing the technologies and capabilities of portal v1 against portal v2.

H2020 – EINFRA – 2015 – 1 Page 16 of 76

Nevertheless, portal v1 has been a very useful resource in EVER-EST because it allowed users to familiarize with RO concepts, to provide feedback on their expectations, and to gain attention from other users in the Earth Science domain. The portal main page provides the following features (as depicted in Figure 2-1): ● Basic information about research objects and the portal itself. ● A view with the list of the three latest research objects created or uploaded to ROHUB. ● A view with the list of three featured research objects, representing golden exemplars. ● A search text box for finding research objects using keywords. ● Direct links to the (i) login page to authenticate the current user, (ii) explore page for finding and browsing research objects, (ii) my research objects page to visualise the list of the research objects owned by the authenticated user and to create new research objects, (iv) SPARQL endpoint user interface, (v) about page with information and links to documentation and publications about the product.

Moreover, after selecting a research object, the portal shows the related RO main page (depicted Figure 2-2), providing the following high-level features: ● Overview tab that displays the basic research object metadata, including a graphical representation (if available), information about its quality based a basic checklist, links for downloading the research object as a zip file or just its metadata (in different formats), links to manage the research object evolution (e.g., create a snapshot or archive), information about the notifications associated to the research object with a link to the atom feed, and a section for adding and visualising comments at the research object level. ● Content tab, to navigate the RO content, including controls for adding, updating, moving, deleting and downloading resources, controls for annotating and commenting resources. ● Relations tab, to add and delete relations between resources in the research object. ● Quality tab, to evaluate the quality of the research object based on different criteria, and to visualise the stability of the research object throughout time with the RO monitoring tool. ● Notification tab, to visualise detailed information of the notification associated to the research object. ● History tab, to visualise the evolution of the research object, in terms of the related live RO, snapshots and archive. ● Access control tab to visualise and modify the permissions associated to the research object.

The instructions for performing particular operations are provided in Section 6.3.2.

2.2.2 ROHUB portal v2 (beta prototype) The new portal is based on Play framework providing a lightweight, stateless, web-friendly architecture, and angularJS, a structural framework for dynamic web apps. The combined usage of these technologies enables creating a modular web application with a set of visual components that can be easily reused in other applications. Section 2.3 presents two tables summarizing the technologies and capabilities of portal v1 against portal v2. The development of the portal is being conducted in two parallel threads. The first thread deal with the implementation of the set of visual components required to implement the existing functionalities available in portal v1 plus the extensions identified based on the specific needs of the users in the Earth Science domain. The second thread has been focussed on creating a set of mock-ups for the new portal design based on a usability analysis. Thus, for the next period the work will be focussed on implementing these mock-ups through the visual components. The list of components under development is discussed in Section 6.2.1, and the mock-ups of the main page and RO page are depicted in Figure 2-3, Figure 2-4 and Figure 2-5.

H2020 – EINFRA – 2015 – 1 Page 17 of 76

Figure 2-1 ROHUB portal v1 main page

H2020 – EINFRA – 2015 – 1 Page 18 of 76

Figure 2-2 ROHUB portal v1 RO main page

H2020 – EINFRA – 2015 – 1 Page 19 of 76

Figure 2-3 ROHUB portal version 2 main page (1/2)

H2020 – EINFRA – 2015 – 1 Page 20 of 76

Figure 2-4 ROHUB portal version 2 main page (2/2)

H2020 – EINFRA – 2015 – 1 Page 21 of 76

Figure 2-5 ROHUB portal version 2 RO main page

2.3 Comparison of technologies and capabilities between ROHUB portal v1 and portal v2

Table 1 comparison of technologies between ROHUB portal v1 and portal v2 ROHUB Portal v1 ROHUB Portal v2

Apache Wicket 6.9.0 Play Framework 2.5 + AngularJS v2 OpenID Authentication 2.0, updated to OpenID Connect 1.0 support OpenID Connect 1.0 Research Object Bundle 2013, updated to Research Object Bundle 1.0 Bundle 1.0 Solr 4.2, update to support OpenSearch 1.1 Solr 6 + OpenSearch 1.1 Jena 2.11 RDF4J 2.0

Table 2 comparison of capabilities between ROHUB portal v1 and portal v2 ROHUB Portal v1 ROHUB Portal v2

H2020 – EINFRA – 2015 – 1 Page 22 of 76

No modular design Modular design No visual components Visual components No UX design UX design No API REST API

3 Workflow Management The use of workflows in scientific disciplines goes beyond the automation of the research process and reaches other important aspects of research outputs as scholarly communications. Workflows contribute to reproduce and reuse research results; they can be repurposed for other goals or be used as training material about the research at-hand. Workflows are first class citizens of the research object model and are the mechanism to specify the computational process that supports the scientific research.

Workflow systems are used intensively in different scientific disciplines including , cheminformatics, medicine, astronomy, social science, and . The experimental science community is a remarkable case since it has adopted workflows as integral part of the research process. Nevertheless, one of the conclusions drawn from D4.1 was that in Earth Observation, the use of workflows is incipient, and it is characterised by a long tail of executable resources, including programming languages like R, Matlab, Fortran, Java, etc., some web services and cloud virtualization. The adoption of workflows by Earth Science communities represents a challenge and has the potential to influence positively their research processes.

The e-Research services ease the use of workflows to earth scientists. This section provides a short introduction executable workflow, given their growing usage in research disciplines and the existing support provided by research objects, both in terms of models and technologies.

The workflow description ontology (wfdesc), which is part of the research object vocabularies, allows describing abstract workflows, which could either be hand-crafted by users ("ideal workflow description") or extracted from workflow definitions of existing workflow systems, and the Workflow Execution Ontology (wfprov) supports the description of the provenance information generated by the execution of a scientific workflow. Note that this definition keeps separated the workflow templates from their execution results.

This separation of workflow templates and executions is a key feature of the research object model where conceptual workflows may be run in different workflow systems by the development of simple end to end mappings, so that abstract workflows could be enacted on a particular scientific workflow system such as Taverna1, Wings2, Kepler3 or Galaxy4. Deliverable D4.3 identifies the research objects types of interest for EVER- EST research communities, and workflow-centric research objects where regarded as very prominent for their research process. ROHUB provides the technological support for the lifecycle of workflow-centric research objects as well as for the other research object types.

1 https://taverna.incubator.apache.org/ 2 http://www.wings-workflows.org/ 3 https://kepler-project.org/ 4 https://galaxyproject.org/ H2020 – EINFRA – 2015 – 1 Page 23 of 76

Figure 3-1 Workflow excerpt associated with a Land Monitoring research object

3.1 Workflow modelling Note that the research object model is not tied to any particular workflow management system. However, it is worth highlighting Apache Taverna workflow management system for two reasons: (i) it has a plugin that enables to export a workflow execution provenance as a research object, which can then be imported, for example, in ROHUB, and (ii) the WF-RO transformation service, integrated in ROHUB, enables the extraction of annotations and resources from a Taverna workflow specification in order to generate a research object encapsulating the workflow itself along with all the extracted resources/annotations (see information on how to use it in Section 6.3.4.1, and on the implemented API in Section 6.4.3).

Apache Taverna is a suite of tools for designing and executing scientific workflows. Taverna includes a desktop application (workbench) and a web application (tavern online) for workflow design, a command line tool for quick workflow execution, a server that supports the remote execution of workflows, and a web interface plugin for submitting workflows for remote executions. Taverna is an open source project supported by the apache foundation and all the documentation regarding the prerequisites, installation, uninstallation and user guides are available at the project web page https://taverna.incubator.apache.org/.

H2020 – EINFRA – 2015 – 1 Page 24 of 76

Figure 3-2 Taverna components and related tools (source: https://taverna.incubator.apache.org/introduction/)

With Taverna users can easily design new workflows, load existing workflows (from a disk, myExperiment or a URL), view workflow layout and logic, modify existing workflows, load workflows in off-line mode (when disconnected from the Internet), include nested workflows (sub workflows), validate workflows during design time for debugging while composing a workflow, and detect when a service’s interface changes or a service go off-line during design time. In addition, the platform can be used to run the workflows and keep track of the execution results. A description of Taverna installation and functionalities is out of the scope of the present document given the broad set of tools and scenarios covered by Taverna (see Figure 3-2) where workflows can include web services, code in a variety of languages, e.g., R and Java, the input data can come from spreadsheets or user interaction via the web browser, and the fact that it is a third-party project with a strong developer and user community that keeps updated both the software and its documentation.

3.2 Workflow execution Workflow management systems typically provide workflow execution capabilities within the application workbench, from the command line and from code; additionally, in some cases they also support the remote execution of workflows using a server instance of the system. The workflow runner service leverages such capabilities. It provides an abstraction layer on top of the workflow management systems supporting execution of workflows remotely or programmatically, in order to generate research objects encapsulating the execution provenance. The service implements a lightweight REST API (see Section 6.4.3) and the current implementation is based on Taverna; however, the underlying workflow management system is independent of the service specification.

H2020 – EINFRA – 2015 – 1 Page 25 of 76

4 Management of Execution Environments The management of execution environments refers to the capabilities enabling to specify, launch and terminate Virtual Machines (VMs) where scientists will be able to execute their processes. The VMs are appliances running on an infrastructure managed by a Cloud Controller, and accessible as an independent domain once instantiated. These capabilities are provided by Terradue Cloud Platform, an application integration framework for the EVER- EST communities. The Platform architecture is a Hybrid Cloud infrastructure associating PSNC resources with a cloud controller managing scalable data management frameworks that are exploited by a Developer’s application to deliver a Cloud Appliance. The Platform architecture also features distributed data repositories (Cloud storage and web-based resources), and a Portal application component, interacting with the Cloud Appliance, and providing user management and user interface functionalities. The primary purpose of the Hybrid Cloud Platform is to facilitate the management of elastic compute resources taken from PSNC. As the project evolves, the EVER- EST communities will be shown more complex architectures where these components and other types of application components can be combined and deployed to deliver operational services. A detailed description of the Cloud Platform and its internal architecture is described in D5.1.

5 Preservation Aspects The preservation components are the part of the EVER-EST infrastructure that deals with ensuring that data, processes and algorithms of the scientific communities, related to their experiments/investigations, remain accessible and usable over time. In order to address the preservation aspects in EVER-EST, the project approach relies on leveraging and applying the concept of research objects as the mechanism to encapsulate the resources used and/or produced, including processes and provenance of their executions with the links to the relevant associated resources and metadata. The methodological approach to ensure digital preservation in EVER-EST is based on the paradigm widely used in the Earth Science community, the OAIS model. This ISO-compliant reference model defines the guidelines to design and implement methods and systems for long-term data/information/knowledge preservation and access. The OAIS has been widely applied to the Earth Science domain in the SCIDIP-ES project, which served as a preliminary implementation environment for some of the key components of a preservation system. The complete implementation of the preservation components is foreseen in the final version of the VRE. This version of the technical note on e-Research application services describes the overall technological approach for preservation, the main components and introduces the existing ones. The final version (D5.8) will contain the full design information and the installation procedures of all components.

5.1 Application of the OAIS model An organization that has to preserve some (small to large to huge) amount of digital information for the long term can take advantage of the reference model provided by The Consultative Committee for Space Data Systems (CCSDS) for Open Archival Information System (OAIS, see https://public.ccsds.org/pubs/650x0m2.pdf ) . The OAIS model aims at facilitating understanding of what is required to preserve and access information in the long term. Traditional data archives are commonly understood as facilities which preserve records, originally generated by an organization, for access by specific communities. As per the OAIS concept, “the archive accomplishes this task by taking ownership of the records, ensuring that they are understandable to the accessing community, and managing them so as to preserve their information content and authenticity”. In this context, an OAIS archive intends to preserve the data and the information needed for access and use by a Designated Community, in the environment depicted below:

H2020 – EINFRA – 2015 – 1 Page 26 of 76

Figure 5-1 OAIS Archive environment  Producer is the role played by those persons, or client systems, which provides the information to be preserved (represented by the VRCs in EVER-EST).  Management is the role played by those who set overall OAIS policy. It has to be noted that Management is not involved in day-to-day Archive operations. The responsibility of managing the preservation on a day- to-day basis is included within the Archive in an administrative functional entity.  Consumer is the role played by those persons, or client systems, that interact with Archive services to find and acquire preserved information of interest. A special class of Consumers is the Designated Community. The Designated Community is the set of Consumers who should be able to understand the preserved information. A given individual or system may act in the role of both a Consumer and a Producer.

Another basic concept regarding OAIS model is that of the Information Package. An Information Package contains both the actual data to be preserved (content information) and the information needed to describe it and to allow its retrieval. In the EVER-EST perspective an Information package will thus be made of the data about an experiment plus a set of auxiliary data and metadata necessary for its preservation.

This set of auxiliary data and metadata will be created by the user with the aid of a Preservation Assistant and inserted in the research object to be stored in ROHUB.

Note that a detailed analysis and mapping of the processing and preservation of research objects involving scientific workflows, with respect to the OAIS model, is available in [7]. This document also provides the recommended activities to comply with the OAIS requirements.

The foreseen preservation flow is depicted in the following figure, which provides a detailed view of the preservation related components in Figure 1-2.

H2020 – EINFRA – 2015 – 1 Page 27 of 76

Figure 5-2 Overall preservation flow

Once the user has finished or reached a milestone in her/his experiment and saved all the relevant data/process/workflow she/he has the possibility to access the preservation functionality. The user thus can click on the “Preserve” button in the graphical user interface (GUI) and is guided through the process of completing all the information needed to preserve the experiment. The user inserts information according to a checklist specific to her/his community and/or research object type. The relevant user profiles are stored in the DB. Possible users’ classes are:  Generic Earth Science User (accessing via the generic EVER-EST portal).  User of a specific VRC.  Data Providers.

Similarly, relevant research object types have been identified and described in D4.3, including:  Basic research objects.  Workflow-centric research objects.  Data-centric research objects.  Research products research objects.

The set of information (mandatory and optional) for each class of user must be defined with the aim of maintaining the preservation preparation easy, though in compliance with OAIS guidelines.

This will result in the definition of templates for the different user communities, i.e. a set of standard resources to be inserted in the research and to be used to ensure retrieval understanding and usability of the content even after a long time.

During this preparation phase the user can add files and links to external resources provided with a persistent identifier.

H2020 – EINFRA – 2015 – 1 Page 28 of 76

5.2 Digital Objects Identifiers (DOIs) for Research Objects One of the basic requirements for the management and preservation of digital artefacts, is that these resources have a unique, resolvable and persistent identifier. This requirement can be addressed with the adoption of DOIs, which are persistent identifiers that are used to uniquely identify an object, either physical or digital. They are typically used by scientists in all disciplines, and in particular in the Earth Science, to share and cite research outcomes. Currently, however, DOIs are typically used for scientific publications, and in some cases to datasets or other digital artefacts. In EVER-EST it has been decided to follow this practice in order to generate and assign DOIs to research objects. DOIs are assigned by registration agencies who implements the DOI system by providing domain-specific identifiers for various applications using the DOI framework. For instance, DataCite generates DOIs for referencing and sharing scientific datasets. Application providers then contact these registration agencies, or become one of them, in order to generate and maintain DOIs for the artefacts managed by their applications. In EVER-EST, the application that should support DOIs is ROHUB as the research object management platform. Hence, PSNC as application provider is currently under negotiations with DataCite to become assignator of DOIs. A more detailed description of DOIs and ongoing process is described in D4.3. Note that in relation to the different stages of the research object lifecycle (see D4.3), research objects snapshots and archives are the only ones suitable to get a DOI as they are immutable. A live research object is under continuous modifications and changes, and therefore is not applicable for a DOI.

5.3 Research objects assessment Deliverable D4.1 introduced the metrics of completeness and stability as quantitative measures related to research object Integrity and Authenticity, two desirable features when it comes to research object preservation. In particular, completeness measures how complete is the research object information regarding the minimum information it is expected to contain, while stability measures the degree to which a research object remains functionally unchanged with respect to its specification and properties in the presence of changes on its resources. The completeness measure is implemented through checklists that specify the desirable features in a research object, providing the basis for the implementation of the stability measure that assess the research object completeness over time.

5.3.1 Research objects checklists Checklists are a widely used tool for controlling and managing quality assurance processes [6], and they provide a measure of fitness for purpose rather than some overall measure of quality. In EVER-EST project, checklists are used as the mechanism to evaluate research objects regarding their completeness, i.e., the existence of specific resources and metadata in the research object, and other quality features (e.g., availability of related resources, runnability). The suitability of a research object for different purposes may be evaluated using different checklists: there is no single set of criteria that meaningfully applies in all situations, which leads to a need to describe different quality requirements for different purposes. Accordingly, checklists are described using a rich model, called the Minim model5. Detailed information of the model can be found in the [5]. Also detailed information about the different checklists exctracted from the VRC requirements in relation to the different types of research objects can be found in EVER-EST D4.3.

5 http://purl.org/minim/ H2020 – EINFRA – 2015 – 1 Page 29 of 76

5.3.2 Research objects stability The stability (and reliability) service offers a time-based trace of quality assessments on research objects. It includes the metrics of completeness, stability and reliability. These metrics are used as indicators of research object integrity so that research object decay can be identified [5]. Completeness is calculated by applying a weighting scheme to the satisfied and non-satisfied rules in checklists at different levels (Must, Should and May). Stability is the standard deviation of the differences between completeness results at different points in time. The reliability value is a combination of Completeness and Stability that reflects the confidence that a user may have when reusing the evaluated RO.

5.4 Research objects notifications Another capability related to the preservation of research objects is the related to the generation of notifications whenever something changes on the research object content or its related resources that affects the quality of the research object. For instance, if a dataset referenced by the research object is no longer available, hampering its completeness or ability to execute the related processes, then the quality of the research object drops. Similarly, if a used by some of the processes in the research object, changes its interface, then the quality will also drop. Notifications can be provided through many different channels, like email, sms or web feeds. In the case of ROHUB, it provides web feeds for research objects. Hence, users are able to subscribe to a research object and get notifications whenever it or its related resources change. Note that in relation to the different stages of the research object lifecycle (see D4.3), a research object in a Live state, its content is under ongoing changes; however, in a Snapshot or Archive, the research object are immutable. Thus, notifications on a live research object may refer to changes in its content and related resources, while notifications on a snapshot/archive research object refer mainly to changes in the related resources.

5.5 Research objects fixity checking Finally, a basic capability for any long-term preservation platform requires fixity checking, which refers to the process of verifying that a digital object has not been altered or corrupted6. In EVER-EST, this translate to the verification that a research object has not been altered or corrupted. Note that in relation to the different stages of the research object lifecycle (see D4.3), this capability is only applicable to snapshots/archives of the research object, which are supposed to be immutable objects.

6 E-Research Services This section provides the technical documentation of the individual components supporting the e-research capabilities in EVER-EST described in Sections 2,3,4 and 5.

6.1 Services overview As described above, the e-research services include the components providing the functionalities to discover, create, manage, share, (re-)use and preserve the research artefacts related to their use cases or studies through the research object (RO) concept and tools that support their capability in the use of data. They also include the

6http://www.loc.gov/standards/premis/pif-presentations/rebecca-SKOS/preservationEvents-FixityCheck.html H2020 – EINFRA – 2015 – 1 Page 30 of 76

functionalities to manage the executable artefacts (application, jobs, workflows) along with their execution environment, and to preserve of these resources. These components include: ● ROHUB, which provides a holistic solution for the management and preservation of research objects. ● The VRE portal RO Manager component, which provides a user-friendly interface to research objects main operations tailored for users in the Earth Science domain. ● Cloud Platform components, which provides the capabilities for the allocation, launching and management of execution environments. It includes two main sub-components: o Cloud controller. o User interface. ● Workflow Management components, which provides the capabilities for the modelling, design and execution of executable workflows. It includes three main components: o Workflow modelling – provided by a workflow management system. In EVER-EST, Taverna system is highlighted given the partner's experience and availability of RO services tailored for this system; however, it should be noted that research objects are not tied to any particular system. o WF-RO transformation Service, supporting the transformation of workflows resources into research objects, and the exposure of annotations and resources embedded in the workflow. o Workflow Runner, enabling the remote execution of workflows and the generation of research objects encapsulating the results. ● Preservation components, which provide the capabilities for preserving research objects through time, including the generation/collection of the metadata required for the correct preservation of the research object and related artefacts (e.g., datasets). o Preservation Front-End. o Preservation Middleware. o Preservation DB of User Profiles. o RO assessment, including the checklist and stability evaluation services, which provide measurements of the research objects quality in terms of their completeness, stability and reliability. o RO Notification Service, which provides notifications in the form of atom feeds about events related to a research objects, such as changes in its content and quality, and about the overall RO collection, such as when new ROs are created. o RO Preservation components in ROHUB, which provide long-term preservation functionalities such as fixity checking, and monitoring of the RO quality through time (in collaboration with the stability service).

6.2 Services visual components In the context of EVER-EST, a visual component is an independent web element written using open web technologies (HTML, CSS, and JavaScript/jQuery) that can be added to any web page using a few lines of code. Visual components make it possible to centralize a feature, add it on multiple web pages keeping the same behaviour and, at the same time, customize it with some page-related options. The visual components for each e-research service described in Section 6.1 is described in the remainder of this section.

6.2.1 ROHUB The ROHUB portal v2 is being implemented as a set of visual components. This section describes each of these components and provides a graphical depiction of its implementation in the portal. Note that portal v2 is currently under development; therefore, in the cases where the component is still not available, the counterpart interface in the portal v1 is used as example for the description. H2020 – EINFRA – 2015 – 1 Page 31 of 76

6.2.1.1 Visual components for discovering and visualising research objects, and for interacting with Portal (for non-authenticated users)

Keyword search This component enables to find research object by keywords (see Figure 6-1), looking through all the metadata annotations available, e.g., title, description, creator, etc. and optionally by selecting the research area related to the investigation encapsulated.

Figure 6-1 Keyword search component

Faceted search This component enables users to browse and navigate the research objects collection organising the objects according to their associated semantic properties, and allowing users to discover the objects by applying multiple filters. This component comprises two sub-components: ● Faceted Filters List ( Figure 6-3) ○ Contains all filters for searching through RO collection, including the research object related research area, its type, creators, status and others. ● Faceted Search Results List (Figure 6-2) ○ Contains the results list view, which enables the visualisation of the results in different formats.

Figure 6-2 Faceted Search Results List H2020 – EINFRA – 2015 – 1 Page 32 of 76

Figure 6-3 Faceted Filters List

Featured research objects (shows featured ROs) Presents the list of ROs that are promoted because of their quality (measured based on the RO compliance according to some specified criteria like completeness) or because of their popularity (e.g., golden exemplars).

Figure 6-4 Featured research objects component Recent activity Presents the latest’s ROs that were created or modified by users, as well as other relevant activities happening in ROHUB, e.g., new authors, comments, citations, etc.

Figure 6-5 Recent activity component Geolocation search (for searching ROs by current / given geo location) Geolocation component is under development. The purpose of this component is to allow user to find ROs that are somehow relevant to a selected geographical area. Typically, this will require a map component SPARQL endpoint (for writing custom SPARQL queries) Allows user to query semantic repository using custom SPARQL queries. Enables user to find nontrivial and user specific connections between ROs.

H2020 – EINFRA – 2015 – 1 Page 33 of 76

Figure 6-6 SPARQL search component

RO overview Displays the RO information/metadata and enables to download the RO as zip file or its metadata. It comprises two sections: the research object overview section, and the research object advanced view section (optional) ● RO overview section provides a summary of the RO information, including its title, description, research area, creator and credits, creation date, status, sketch, summary of its quality, number of resources and annotations, comments on the RO and download links (see Figure 6-7). ● RO advanced view section provides the complete list of metadata annotations related to the RO, in a view targeted for advanced users (see Figure 6-8).

H2020 – EINFRA – 2015 – 1 Page 34 of 76

Figure 6-7 RO overview component

Figure 6-8 RO advanced view component

RO content Displays the RO aggregated resources, enabling browsing and navigation. When a resource is selected, its metadata information is displayed and a links for download is available. A new version of this component is under development; however, its functionality will be similar to the one implemented in portal v1. It comprises two sections, the navigation section for traversing the RO content (Figure 6-9), and the resource details section, which can provide two different views, target for regular and advanced users: ● Basic view (Figure 6-11) - a component showing basic resource metadata: Title, URI, Creation date, Author and others. It allows also downloading the selected resource. ● Advanced view (Figure 6-10) - a component showing the complete list of metadata annotations related to the RO, in a view targeted for advanced users.

H2020 – EINFRA – 2015 – 1 Page 35 of 76

Figure 6-9 Resource navigation component

Figure 6-10 Resource details- advanced view

Figure 6-11 Resource details - basic view component

RO relations (shows relations between RO resources) The component enables to visualise relations between resources in the research object, or between the research object and other resources. The source resource should be the research object or a resource aggregated by the research object, and the target resource may be any of those or an external resource. This component is under development in Portal v2; however, a simple interface is available in Portal v1.

Figure 6-12 RO relations component

RO quality Enables users to assess the RO quality according to some predefined quality criteria (e.g., completeness). This component interacts with the quality checklist service to get the results. Additionally, this component enables to open the RO monitoring tool to assess the RO quality through time. This tool is currently outside the ROHUB portal. A new version of this component is under development; however, its functionality will be similar to the one implemented in portal v1. H2020 – EINFRA – 2015 – 1 Page 36 of 76

Figure 6-13 RO quality component

RO notifications Presents a list of notifications that were generated by the system regarding RO activity (e.g., changes in the RO content, modification of resources, and their associated metadata, including comments) and quality (e.g., changes in the completeness, stability and reliability). A new version of this component is under development; however, its functionality will be similar to the one implemented in portal v1.

H2020 – EINFRA – 2015 – 1 Page 37 of 76

Figure 6-14 RO notifications component

RO history Displays the RO evolution. In particular, it shows the snapshots and, if completed, the archive (release) generated for the RO throughout time. This component is under development in Portal v2; however, a simple interface is available in Portal v1. Note that the new component will provide a richer interface.

Figure 6-15 RO history component

Sign-in and sign-up Sign-in allows user to specify its username and password or to choose an OpenID provider for the authentication purpose. ROHUB portal is integrated with the EVER-EST single sign on infrastructure. Sign-up allows users to create an account in ROHUB if they don't use any OpenID provider.

H2020 – EINFRA – 2015 – 1 Page 38 of 76

Figure 6-16 ROHUB sign-in component

Figure 6-17 ROHUB sign-up component

6.2.1.2 Visual components for discovering and visualising research objects, and for interacting with Portal (for non-authenticated users)

My ROs Provides a clear view of all research objects that were created by the logged user.

H2020 – EINFRA – 2015 – 1 Page 39 of 76

Figure 6-18 ROHUB my ROs view component

Access Control Access Control component gives the user the ability to manage the access rights of his research objects. The owner of the RO may change RO visibility status to private, public or open. The owner of a private RO may also share it - give special read/write rights, to any other user of the ROHUB platform. This component is under development in Portal v2; however, a simple interface is available in Portal v1. Note that the new component will provide a richer interface.

Figure 6-19 ROHUB access control component

Comments Comments is a component that allows users of the ROHUB platform to interact with each other by exchanging short messages related to the whole RO or to the individual resources aggregated. A discussion in comments may

H2020 – EINFRA – 2015 – 1 Page 40 of 76

become a valuable source of information about RO quality and future RO development directions. The same component is available at the RO level and at the resource level.

Figure 6-20 ROHUB comments component

6.2.1.3 Visual components for creating and managing research objects, and for interacting with Portal (for authenticated users)

Create RO Supporting user in creating research object is one of the main purposes of ROHUB portal, which provides different methods for performing this operation (Figure 6-21). The three most common scenarios of creating RO are: ● From a wizard that provides the user with a step by step approach of creating RO. User will fill in a set of simple forms, providing metadata information about RO, initial content and structure for organising resources. This component will be developed in the next phase. ● From scratch, the user creates an empty RO. The RO metadata and resources should be added using other UI components. User will provide basic metadata about RO and optionally selecting an initial skeleton for organising resources. This component is under development in Portal v2. ● From a zip upload, If the user already has a set of resources as basis for the RO, they can be collected in the form of a zip file, and upload it to ROHUB to create an RO aggregating these resources. This component is under development in Portal v2.

Figure 6-21 RO create/delete component After finishing the creation process, the user may use other UI components to manage RO resources.

Delete RO A simple UI component that allows user to remove existing RO from the ROHUB. See Figure 6-21.

Import/export RO bundle An RO bundle is special file format designed for distributing research objects. The existing RO may be exported from the ROHUB or imported into it using Import/Export RO Bundle component. This component is under development in Portal v2. Currently in Portal v1, this functionality is available only to import an RO bundle inside another research object as another resource (see resources manager), automatically extracting the resources and creating a nested research object

RO Overview Manager

H2020 – EINFRA – 2015 – 1 Page 41 of 76

The RO Overview Manager should be considered as a collection of other components grouped together in one place in order to provide user access to the most important functionality, such as add/remove/edit RO metadata (annotations). These functionalities are available from the RO overview tab, and the RO advanced view section.

Figure 6-22 RO advanced view manager

Figure 6-23 RO overview manager

Import annotations file (bulk update) To simplify the process of annotating RO user may upload a set of RO annotations at once. The annotations will be read from the uploaded file and added to the selected RO or to the selected resource in the RO. The same component is available at the RO level ("Annotate” button in Figure 6-22) and at the resource level (“Annotate” button in Figure 6-26).

RO Relations Manager The RO Relations Manager will allow user to define and manage relations between RO resources. It provides a set of predefined relations that the users can assign to associate two resources. At least the source resource from the relation should be a resource of the selected research object (or the research object itself), and the target resource from the relation can be either another resource of the selected research object or an external resource. This component will be developed in the next phase in Portal v2; however, a simple interface is available in Portal v1. Note that the new component will provide a richer interface, e.g., drag & drop editor.

Figure 6-24 RO relations manager component RO Resources Manager

H2020 – EINFRA – 2015 – 1 Page 42 of 76

The RO Resource Manager is a collection of other components grouped together in one place that provide users all functionality regarding resource management. By using this component user may: ● Add/remove/edit/move resources and folders (Figure 6-25 and Figure 6-26). ● Add/remove/edit resources metadata, both in the basic view (Figure 6-25) and advanced view (Figure 6-26) components.

Figure 6-25 RO resource manager (basic annotations view)

Figure 6-26 RO resource manager (advanced annotations view)

RO Evolution Manager The RO Evolution Manager enables users to manage the RO evolution, including a wizard that guides user through this process, to generate snapshots, forks or archiving the RO. This component will be developed in the next phase in Portal v2; however, a simple interface is available in Portal v1. Note that the new component will provide a richer interface, e.g., wizard.

H2020 – EINFRA – 2015 – 1 Page 43 of 76

Figure 6-27 RO evolution manager Geolocation annotation wizard/editor In order to effectively work with the geolocation data, an additional and specialised component needs to be designed and developed. This component will allow users to annotate ROs with geolocation data in a simple drag and drop way or by accepting directly a list of coordinates.

6.2.2 VRE Portal RO Manager As these components are part of the VRE portal, the visual components are described in D5.3.

6.2.3 Workflow Management The workflow modelling visual components are out of scope of this document as EVER-EST does not prescribe any particular workflow management system to be used. Hence, the reader should refer to the Website of the selected workflow management system, e.g., Taverna Website 7. The remote workflow execution is currently supported only via the API described in next Section 6.4.3. Regarding the workflow transformation, there is currently one related visual component, which is provided by ROHUB portal:  The workflow transformation and metadata extraction component is enabled in ROHUB for workflow resources (if the current user is the owner or has permissions for writing into the research object). It appears next to the type of the resource with the control “Annotate & Transform”. This component will enable users to specify the resources embedded in the workflow (if available) that should be extracted and aggregated directly into the research object, and where these resources and the workflow bundle (new Taverna workflow format) generated should be placed inside the research object (default is root). Note that this component does not generate a new research object, as the workflow already exists inside a research object.

7 https://taverna.incubator.apache.org/ H2020 – EINFRA – 2015 – 1 Page 44 of 76

Figure 6-28 WF-RO component in ROHUB

6.2.4 Cloud Platform The Terradue Cloud Controller uses the OpenNebula Sunstone GUI intended for both end users and administrators that simplifies the typical management operations in private and hybrid cloud infrastructures. This GUI allows to easily manage all resources and perform typical operations on them. Using the OpenNebula Sunstone Views it is possible to provide simplified UI aimed at end-users of a cloud. The views are fully customizable and can easily enable or disable specific information tabs or action buttons with multiple views for different user groups. Each view defines a set of UI components so each user just accesses and views the relevant parts of the cloud for her role. The views can be grouped into two different layouts. On one hand, the classic layout exposes a complete view of the cloud, allowing administrators and advanced users to have full control of any physical or virtual resource of the cloud. On the other hand, the Cloud layout exposes a simplified version of the Cloud where end-users will be able to manage any virtual resource of the Cloud, without taking care of the physical resources management. The simplified views for users focus on giving a status and usage of their resources (Figure 6-29) or to list the availability of specific VM ready to be deployed (Figure 6-30).

H2020 – EINFRA – 2015 – 1 Page 45 of 76

Figure 6-29 Cloud Platform Dashboard with Resource Usage

Figure 6-30 Cloud Platform Dashboard Listing VM Templates

6.2.5 Preservation components The preservation front-end, middleware and DB of user profiles will be described in the next version of this technical note. The visual component of the RO notification service displays all the notifications associated with a research object, and provides detailed information for the selected one. This interface is implemented as the ROHUB notification component, as depicted in Figure 6-14. There is no visual component for fixity checking in ROHUB. Regarding the RO assessment components:

H2020 – EINFRA – 2015 – 1 Page 46 of 76

 the checklist visual component displays the results of the completeness assessment against a selected checklist. This component is integrated as part of the ROHUB RO quality component, as depicted in Figure 6-13.  the stability visual component displays the trace reported by the stability service through the RO- Monitoring tool. This component was conceived as a web application where the user can explore the research object quality evolution over time so that he can analyse the decay. This web application combines information of the stability service, the checklist evaluation and the research object evolution data to show results in a meaningful way to the user. The monitoring tool (See Figure 6-31) provides a visual friendly interface to explore daily evaluations of completeness alongside with its correspondent scores of stability and reliability, thus allowing a more comprehensive access to the quality information of a RO.

Figure 6-31 RO monitoring tool

6.3 Using the service – quick overview

This is a technical note and not a user manual. It describes, for those services providing visual components, an overview for interacting with the service from the user point of view.

6.3.1 Getting started The description related to each of the individual components is provided in a separate section below.

6.3.2 ROHUB This section focusses on the functionalities that are fully operational and in production, i.e., in Portal v1.

6.3.2.1 Search & discover research objects in ROHUB Currently, users can search & discover ROs using keywords and the faceted search interface.

Keyword-based search ● In the keyword search visual component (see Figure 6-32), type the keyword(s) H2020 – EINFRA – 2015 – 1 Page 47 of 76

● The input keywords will be searched within all the available metadata annotations associated to the RO, e.g., title, description, creators, etc.

Figure 6-32 RO keyword-based search

Faceted-based search ● In the faceted filters list (left of Figure 6-33), select the search criteria for searching and discovering ROs, e.g., status of the RO, creator, creation date, etc. The results will be displayed on the Faceted Search Results List (Right of Figure 6-33). Results can be sorted according to different properties.

Figure 6-33 RO faceted-based search

H2020 – EINFRA – 2015 – 1 Page 48 of 76

6.3.2.2 Visualising research objects in ROHUB Once a research object is selected, e.g., from the Faceted Search Results List, or by entering the research object URI in a Web browser, the RO overview page will open (Figure 6-34). From there, the user can visualise the RO metadata information or download it (from the overview tab), navigate the RO content (from the content tab), visualise relations in the RO (from the relations tab), assess the RO quality (from the quality tab), visualise the notifications related to the RO (notifications tab) and visualise the RO history (from the history tab).

Figure 6-34 Visualise research objects in ROHUB

6.3.2.3 Signing-in ROHUB In order to sign-in, the users should click on the sign-in area on the top right of ROHUB. The sign-in page appears (Figure 6-35) and the users should select an identity provider (such as EVER-EST) or provide their OpenID account in order to authenticate with the identity provider.

H2020 – EINFRA – 2015 – 1 Page 49 of 76

Figure 6-35 Sign-in in ROHUB

6.3.2.4 Creating research objects in ROHUB

Users can start creating research objects once they have logged-in using any of the available methods from the “My ROs page” (see Figure 6-18) as described in Section 6.2.1.

Create RO from scratch ● From the “My ROs page”, the user clicks “+Create” and the “create RO” dialog appears (Figure 6-37). ● The user should provide the RO identifier (compulsory), and provides optionally a title and description. Additionally, the user can optionally select a template for generating automatically a folder structure. These templates enable to organize resources according to predefined requirements in a community/application. Create RO from ZIP file ● From the “My ROs page”, the user clicks “Create from ZIP” and the “create an RO from ZIP” dialog appears (Figure 6-36). ● The user selects the local or remote ZIP file and a research object is automatically created with the contents of the ZIP file, including the folder structure, if present. The RO identifier is the by default the name of the ZIP file.

H2020 – EINFRA – 2015 – 1 Page 50 of 76

Figure 6-36 Create RO from ZIP file

Figure 6-37 Create RO from scratch

6.3.2.5 Managing research objects in ROHUB

Adding/updating/removing resources to the RO ● To add a resource, the user clicks “+Resource” from the Resources manager component (Figure 6-25 and Figure 6-26) ● The user selects a local file or specifies a web resource URI in the “Upload a resource” dialog, and optionally specifies the resource type from a set of predefined types (see Figure 6-38)

Figure 6-38 Upload a resource in a RO Note: if the selected resource is an RO bundle, ROHUB asks the user confirmation to import it as a bundle (Figure 6-39), i.e., to extract the resources automatically in order to generate a nested research object.

Figure 6-39 Upload an RO bundle ● To update a resource, the user selects the resource and clicks “Update” from the Resources manager component – basic annotations view (Figure 6-25). ● To remove a resource, the user selects the resource and clicks “Delete” from the Resources manager component – basic annotations view (Figure 6-25).

H2020 – EINFRA – 2015 – 1 Page 51 of 76

Creating/removing folders in the RO ● To create a folder, the user navigates to the parent folder (or root) and clicks “+Folder” from the Resources manager component (Figure 6-25 and Figure 6-26). ● The user specifies the name of the folder in the dialog (Figure 6-40) and clicks “ok”.

Figure 6-40 Add a folder in a RO

● To remove a folder, the user selects the folder and clicks “Delete” from the Resources manager component – basic annotations view (Figure 6-25).

Adding/editing/removing/importing annotations to the RO ● To add an annotation to the RO, the user opens the RO overview manager component, and ○ From the basic view, clicks the edit control next to title or description, enters the value of the annotation and clicks the apply control (see Figure 6-23). ○ From the advanced view, clicks the annotate control, enters the annotation property, the value of the annotation, and clicks the apply control (see Figure 6-22). The annotation property box provides automatic hints as the user starts typing. ● To edit an annotation in the RO, the user opens the RO overview manager component, and ○ From the basic view, clicks the edit control next to title or description, enters the updated value of the annotation and clicks the apply control (see Figure 6-23). ○ From the advanced view, clicks the edit control next the existing annotation, enters the updated annotation property, the updated value of the annotation, and clicks the apply control (see Figure 6-22). The annotation property box provides automatic hints as the user starts typing. ● To remove an annotation from the RO, the user opens the RO overview manager component, and ○ From the advanced view, clicks the delete control next the existing annotation (see Figure 6-22). ○ Note: this functionality is not available from the basic view. ● To import a set of annotations to the RO at once, the user opens the RO overview manager component, and from the advanced view, ○ Clicks the import control (Figure 6-22) to open the Import annotations dialog (Figure 6-41). ○ Selects the RDF file with the annotations and clicks ok.

Figure 6-41 Import annotations file

Adding/editing/removing/importing annotations to the resources ● To add an annotation to a selected resource, the user opens the Resources manager component, selects the resource and

H2020 – EINFRA – 2015 – 1 Page 52 of 76

○ from the basic view, clicks the edit control next to title, type or description, enters the value of the annotation (or selects one of the possible values in the case of the type as depicted in Figure 6-42) and clicks the apply control (see Figure 6-42 and Figure 6-25).

Figure 6-42 Add resource type annotation

● From the advanced view, clicks the annotate control, enters the annotation property, the value of the annotation, and clicks the apply control (see Figure 6-26). The annotation property box provides automatic hints as the user starts typing. ● To edit an annotation in a selected resource, the user opens the Resources manager component, selects the resource and ○ From the basic view, clicks the edit control next to title, type or description, enters the updated value of the annotation and clicks the apply control (see Figure 6-25). ○ From the advanced view, clicks the edit control next the existing annotation, enters the updated annotation property, the updated value of the annotation, and clicks the apply control (see Figure 6-26). The annotation property box provides automatic hints as the user starts typing. ● To remove an annotation from a selected resource, the user opens the Resources manager component, selects the resource and ○ From the advanced view, clicks the delete control next the existing annotation (see Figure 6-26). ○ Note: this functionality is not available from the basic view. ● To import a set of annotations to the selected resource at once, the user opens the Resources manager component, selects the resource and from the advanced view, ○ Clicks the import control (Figure 6-26) to open the Import annotations dialog (same as for the RO) ○ Selects the RDF file with the annotations and clicks ok.

Adding/editing/removing comments to the RO/resources ● To add a comment to the RO, the user opens the RO overview manager component, and ○ From the basic view, clicks the “+comment” component (see Figure 6-23), enters the text of the comment and clicks the apply control (Figure 6-43).

Figure 6-43 Add comment H2020 – EINFRA – 2015 – 1 Page 53 of 76

○ From the advanced view, clicks the annotate control, selects the property comment by start typing the word and selecting the provided hint, enters the value of the annotation, and clicks the apply control (see Figure 6-22). ● To add a comment to a selected resource, the user opens the Resources manager component, selects the resource and ○ From the basic view, clicks the “+comment” component (see Figure 6-25), enters the text of the comment and clicks the apply control (as with the RO). ○ From the advanced view, clicks the annotate control, selects the property comment by start typing the word and selecting the provided hint, enters the value of the annotation, and clicks the apply control (see Figure 6-26). ● To edit an owned comment from the RO, the user opens the RO overview manager component, and ○ From the basic view, clicks the edit control next to comment, enters the updated comment and clicks the apply control (see Figure 6-20). ○ From the advanced view, clicks the edit control next the existing comment, enters the updated comment, and clicks the apply control (see Figure 6-22). ● To edit an owned comment from a selected resource, the user opens the Resources manager component, and ○ From the basic view, clicks the edit control next to comment, enters the updated comment and clicks the apply control (see Figure 6-20). Note this is the same component as for the RO. ○ From the advanced view, clicks the edit control next the existing comment, enters the updated comment, and clicks the apply control (see Figure 6-26). ● To remove an owned comment from the RO, the user opens the RO overview manager component, and ○ From the basic view, clicks the delete control next to comment (see Figure 6-20). ○ From the advanced view, clicks the delete control next the comment (see Figure 6-22). ● To remove an owned comment from a selected resource, the user opens the Resource manager component, and ○ From the basic view, clicks the delete control next to comment (see Figure 6-20). Note this is the same component as for the RO. ○ From the advanced view, clicks the delete control next the comment (see Figure 6-26).

Adding/removing relations to the RO ● To add a relation to the RO, the user opens the Relations manager component, clicks “Add Relation” component (see Figure 6-24), and ○ Selects the source resource. This can be either one of the resources aggregated by the RO, or the RO itself. ○ Selects a relation from the predefined list. ○ Selects the target resource (also either one of the resources aggregated by the RO, or the RO itself), or enters an URI for an external resource.

Changing the RO access control settings ● To set the RO privacy mode, the user opens the access control tab (Figure 6-19) and selects one of the three modes available: open, public or private. Open allows read/write access to anybody, the public allows read access to anybody and write access only to the owner or specified users, the private mode allows read/write access only to the owner or specified users. ● To allow read/write access to specified users, the user clicks the “Grant privilege” control and enters the specified user URI and its role (reader/editor) as depicted in Figure 6-44.

H2020 – EINFRA – 2015 – 1 Page 54 of 76

Figure 6-44 Change RO access control settings

6.3.2.6 Managing research objects evolution in ROHUB

Creating RO snapshots ● To create a snapshot of the RO, the user opens the RO Overview Manager, clicks the “evolution” control and selects “snapshot” (Figure 6-27). ● This will start a background process where the snapshot is being generated (i.e., a copy of the RO and its current state is created). When the process is finished, ROHUB notifies the user with a message on the top of the RO Overview Manager.

Creating an RO archive ● To create an archive of the RO (e.g., when the research concluded), the user opens the RO Overview Manager, clicks the “evolution” control and selects “release” (Figure 6-27). ● This will start a background process where the archive is being generated (i.e., a copy of the RO and its current state is created). When the process is finished, ROHUB notifies the user with a message on the top of the RO Overview Manager.

6.3.3 VRE portal RO manager As these components are part of the VRE portal, the guide for using the service is described in D5.3.

6.3.4 Workflow management For using the workflow modelling components, please refer to the Website of the selected workflow management system, e.g., Taverna Website 8. The remote workflow execution is currently supported only via the API described in next Section 6.4.3.

6.3.4.1 Transforming and extracting metadata of a Workflow ● To transform a Taverna t2flow workflow (to the new Taverna format - wfbundle) and to extract the embedded annotations and resources, the user clicks the “Annotate & Transform” control that appears next to the resource with type workflow (Figure 6-28). ● The user selects the resources that should be extracted and aggregated into the research object (Figure 6-45), and the location where these resources, and the workflow bundle, should be placed inside the research object (default root).

8 https://taverna.incubator.apache.org/ H2020 – EINFRA – 2015 – 1 Page 55 of 76

Figure 6-45 Transforming and extracting metadata of a Workflow in ROHUB

6.3.5 Cloud Platform The Terradue Cloud Platform provides EVER-EST communities with provided well-defined operational processes and procedures in order to allow the automatic packaging of DCS Applications (i.e. within the Developer Cloud Sandbox). The Developer Cloud Sandbox is a Virtual Machine (VM) that provides scientific developers with an Exploitation Platform-as-a-Service (PaaS). It consists of a development environment for processor integration and testing, and a framework for Cloud provisioning. The Developer Cloud Sandbox PaaS allows to plug scientific applications written in a variety of languages (e.g. Java, C++, IDL, Python, R), then deploy, automate, manage and scale them in a very modular way. The algorithm integration is performed from within a dedicated Virtual Machine, running initially as a simulation environment (sandbox mode) that can readily scale to production (cluster mode). Accessed from a harmonized Shell environment, support tools also facilitate the data access and workflow management tasks. The application packaging happens during a development & integration phase, and its output is a file in RPM format that, once validated, can be easily installed on a production host or cluster, exposing the same software baseline as the original DCS environment. The package provides both the Application and the 'delta'-environment required for executing the Application itself on the target Cluster. The ‘delta’-environment can be also defined as the explicit dependencies of the Application.

6.3.5.1 Application deployment After the packaging, the application can be deployed. In order to do that, the developer is guided through a number of steps to ensure (1) the quality of the package in terms of its dependencies and (2) the application scalability. These steps are: ● RPM quality check, where the User copies the application package on a newly created Developer Cloud Sandbox and a series of test cases to validate the nominal working of the application package; ● Pre-production cluster, where the Support team prepares a small cluster (typically 1 master and 4 slaves) with the application package already installed. Then the User executes a series of tests cases in order to validate the scalability of the DCS application. ● Production cluster, where the Support team prepares a production cluster template based on the User feedback, which installs the application package on an adequate number of slave nodes. Then, the User executes a series of production cases using the Cluster resources. Typically, this step is performed using external Cloud providers’ resources (like Amazon or Interoute) by leveraging the Cloud bursting capabilities of the Platform. The current approach still has a number of limitations in the process that can be addressed. They are summarized in the following list, ordered by time during the deployment process:

H2020 – EINFRA – 2015 – 1 Page 56 of 76

1. Before the RPM quality check, for every command mvn clean install, a RPM package is produced. Depending on the Application, the packaging may require a lot of time. 2. During the RPM quality check step, a User has to manually copy the RPM package from the development environment to the testing environment. This constraint is not an issue itself, but it doesn’t transmit the sense of the ‘automation’ in the process. 3. After the RPM quality check step, the Support team has to manually release the RPM package and upload it on a YUM repository. 4. Before the Pre-production cluster step, the Support team has to prepare in advance the OpenNebula cluster template. 5. During the Pre-production cluster step, many iterations can occur between the User and the Support team, because the User could need to repeat the previous points 2. and 3.

To automate the Application deployment process three main elements need to be considered: ● A Continuous Integration (CI) server, specifically using Jenkins, by leveraging the following technologies: ○ Entire build/test/deploy ‘Pipeline as Code’. ○ Docker containers, instantiated with application code to execute tests. ● Dedicated pairs (testing, stable) of YUM repositories on the Platform’s Artifactory for central management of remote repositories, binary artefacts and their dependencies. These pairs can be defined: ○ For each community active on the Platform. ○ For each partner active on the Platform. ● A dedicated Maven parent POM, where to transfer as much as possible the automation logic so that it can become transparent for users.

6.3.5.2 Continuous Integration Within this application deployment flow, the User continues working as usual during the application development & integration phase, that is: 1. The user creates the first structure of the application: ○ $ mvn archetype:generate 2. The user develops his Application and uses Git/Github: ○ $ git commit -m "My commit" ○ $ git push 3. The user installs the Application: ○ $ mvn clean install 4. The user executes and validates the Application: ○ $ ciop-run 5. The user repeats the steps 2., 3. and 4. as needed.

The automation elements occur in a transparent way during: ● The creation of the DCS Application skeleton (step 1. here above), where a new version of the Maven Archetypes is used. In particular, this new version contains a pom.xml file with an external dependency to a new parent POM (proposed name dcs-parent), which in turn allows to (1) reduce complexity for the Users (2) increase control over the POMs versioning during the time. ● The push of the DCS Application on GitHub (step 2. here above), where a specific Jenkins job performs the build, using a pre-defined Maven profile, and an isolated Docker image, to package the Application and push the output file on the proper YUM repository. At this stage, the application package is automatically stored by the Continuous Integration (CI) server on a YUM repository aimed at testing. The following schematic view describes the interactions between the automation components during the step 2: H2020 – EINFRA – 2015 – 1 Page 57 of 76

User pushes on --> GitHub <-- pulls from CI stores on --> YUM testing After the Development phase, the User has to perform the RPM quality check step. To do so, the user starts a new Developer Cloud Sandbox instance, by using a dedicated parameterised VM template and providing as parameter his and , and executes the test cases, as shown in Figure 6-46 and Figure 6-47, featuring some “Custom attributes” taking in charge the Application’s references for its deployment at scale on a selected Cloud processing cluster.

Figure 6-46 Cluster processing self-service - Master node

Then the user performs the Pre-production cluster step. Also in that case, it consists in starting a new Cluster instance, by using a dedicated parameterized Cluster template and providing as parameter the and . The User autonomously performs the Application Release by using the provided Git-flow instructions and pushes the released code on GitHub. The CI server builds the release code and produces a stable released package, storing it in a stable YUM repository. Finally, the user performs the Production cluster step, by starting a new Cluster instance, using a dedicated parameterized Cluster template and providing as parameter the and .

H2020 – EINFRA – 2015 – 1 Page 58 of 76

Figure 6-47 Cluster processing self-service – Slave node(s)

6.3.6 Preservation components The preservation front-end, middleware and DB of user profiles will be described in the next version of this technical note. To use the RO notification component, the user has to open a research object, and then select the notifications tab, as depicted in Figure 6-14. From there, on the left side of the window, the user can visualise all the notification associated to the research object, including those related to changes in its content and quality. When the user selects a notification entry, the details of the notification are displayed on the right side of the window (as depicted in Figure 6-14). Regarding the RO assessment components: ● For using the checklist visual component, in ROHUB the user has to open a research object, and then select the quality tab, as depicted in Figure 6-13, and selects the desired checklist to use for the evaluation of the research object. If the user wants to re-execute the evaluation, he should click the reload button.

H2020 – EINFRA – 2015 – 1 Page 59 of 76

● For using the RO monitoring tool (stability visual component), the user has to open a research object, and then select the quality tab, which provides a link to the RO monitoring tool (“See quality history with RO Monitoring Tool”), as depicted in Figure 6-13. After clicking on this link, the quality evolution of the selected research object can be visualised over time in the RO monitoring tool (as depicted in Figure 6-31).

6.4 Service public API’s 6.4.1 ROHUB ROHUB implements a set of open and well documented APIs. The complete documentation for each of them is publicly available9. Thus, this section provides only a brief introduction for each of these APIs. The two main APIs are: ● RO API enables the storage and retrieval of ROs and their aggregated resources, as well as annotating them. It defines the formats and links used to create and maintain ROs in accordance with the RO model, hence recognising concepts such as aggregations, annotations and folders. The RO model is also used to specify relations between different resources. ● RO evolution API enables to manage the transformation of ROs based on their lifecycle, and to retrieve their evolution history. It defines the formats and links used to change the RO lifecycle stage, most importantly to create an immutable snapshot or archive from a mutable live RO, as well as to retrieve the RO evolution provenance. The API follows the RO evolution model. Given that the semantic metadata is an important component of a research object, RODL supports content negotiation for the metadata resources, including formats such as RDF/XML, Turtle and TriG.

Besides the two core APIs the system provides the following APIs: ● Search API provides methods for searching research objects programmatically. At the moment this API is provided by the underlying solr index; however, as part of the ongoing development activities, an OpenSearch API is being deployed on top of solr. ● Notification API provides methods to retrieve Atom feeds (similar to RSS) with notifications of events about any RO, such as changes in its content and quality, and about the overall RO collection, such as when new ROs are created. ● User Management API provides methods for identity management to create/retrieve/update/delete users (based on SCIM), and methods for authorization management (based on OAUTH 2.0) to create/retrieve/delete access tokens and to register/retrieve/update/delete client applications. ● Access control API enables to grant permissions to the research object based on roles and modes. User roles can be set as owner, editor and reader, while RO modes can be set as private, public and open.

RODL also provides a sparql endpoint that allows performing SPARQL queries over HTTP to the metadata of all stored research objects.

6.4.2 VRE portal RO manager This component uses the main two APIs implemented by ROHUB: RO API and RO Evolution API, described above.

6.4.3 Workflow management There are two main APIs associated to these components:

9 https://github.com/wf4ever/apis/wiki/Wf4Ever-Services-and-APIs H2020 – EINFRA – 2015 – 1 Page 60 of 76

● WF-RO transformation API exposes a service that transforms workflows into research objects. Workflows are often complex data structures that embed data such as their sub-resources, annotations and provenance. The service described by this API creates a research object that exposes these data according to the RO model. ● Workflow Runner API provides a common lightweight interface for the remote execution of workflows on a workflow management system, and to expose the data from the workflow run as a research object. At its heart, this API mirrors the RODL API, but the ROs exposed by this service each represent a particular workflow run, structured to show inputs, outputs, console logs, provenance and annotations containing wfprov and wfdesc mappings. The complete documentation for each of them is publicly available10.

6.4.4 Cloud Platform During the application testing and integration in the Cloud Platform, the command ciop-run is used to test the correct execution of the application. The application can also be invoked using the Open Geospatial Consortium (OGC) Web Processing Service (WPS). The Hadoop Sandbox exposes the OGC WPS GetCapabilities document at the address: http://localhost:8080/wps/WebProcessingService?service=wps&version=1.0.0&request=getCapabilities The Sandbox WPS service exposes a single process. Its WPS description document is accessed at the URL: http://localhost:8080/wps/WebProcessingService?service=wps&version=1.0.0&request=DescribeProcess&i dentifier=com.terradue.wps_oozie.process.OozieAbstractAlgorithm The Execute WPS request can be submitted using the URL: http://localhost:8080/wps/WebProcessingService?service=wps&version=1.0.0&request=Execute&identifier =com.terradue.wps_oozie.process.OozieAbstractAlgorithm&dataInputs=startdate=2012-04- 05;enddate=2012-04- 05T23%3A59%3A59;format=GeoTIFF;&ResponseDocument=result_distribution&storeExecuteResponse=tru e&status=true The Sandbox dashboard provides a simple WPS client that eases inputting the process parameter values, allows submitting the processing task, monitors its progress, and finally accesses the generated products. The Sandbox dashboard is accessed from http:///dashboard (and then the ‘invoke’ tab) Once the application development is validated, and the application behaves as expected, the Sandbox can be converted to an Appliance and be part of a Cloud marketplace. As an appliance, it can be deployed on a Cloud infrastructure as a computing cluster with several computing nodes and thus processing larger amounts of data in parallel.

6.4.5 Preservation components The preservation front-end, middleware and DB of user profiles will be described in the next version of this technical note. The notification API is explained in Section 6.4.1. Regarding the RO assessment components (both APIs are also publicly available with the other ro related APIs11): ● The checklist API is intended to provide access to the minim-based evaluation of research objects, used to test its completeness, availability and other desired features.

10 https://github.com/wf4ever/apis/wiki/Wf4Ever-Services-and-APIs 11 https://github.com/wf4ever/apis/wiki/Wf4Ever-Services-and-APIs H2020 – EINFRA – 2015 – 1 Page 61 of 76

● The stability/reliability API provides access to the stability evaluation of a research object, i.e., a measure of how its checklist evaluation has changed over time.

6.5 Command-line commands 6.5.1 RO manager tool RO Manager is a command line tool for creating, displaying and manipulating local research objects. It enables scientists to create research objects in an environment that is most familiar to them, i.e. their local file system, before publishing and sharing it in the wider world. It also allows exchanging ROs with an RO repository implementing RO API (e.g., ROHUB), and creating an immutable snapshot or release of a research object in a RO repository implementing the RO Evolution API (i.e. ROHUB). A local research object must be first pushed to such repository to be snapshotted. RO manager is a tool independent of ROHUB; however, it can be used to interact with it. A brief summary of the commands supported is provided below. A complete and detailed description of the tool, the commands, and installation instructions is available in the project website12.

6.5.1.1 Configuration and miscellaneous ro config initializes configuration parameters for the current user, including a base directory under which all ROs are created, details of the RO SRS service which RO-manager uses to save and retrieve ROs in an RO repository. ro help lists commands and options recognized by RO-manager. ro --help displays a list of command options (the exact effect and applicability of these generally depends on the particular command used). ro --version displays the current program version, and exits.

6.5.1.2 Creating and populating an RO ro create creates a new RO in a designated directory. ro status displays status information and metadata about an RO (local or remote) or snapshot/archive operation. ro add adds a file or files to an RO manifest. ro list lists files in an RO directory tree.

12 http://wf4ever.github.io/ro-manager/doc/RO-manager.html H2020 – EINFRA – 2015 – 1 Page 62 of 76

ro snapshot prepare a snapshot of an existing RO. ro archive prepare an archive of an existing RO. ro freeze freeze a previously prepared snapshot or archive making it immutable.

6.5.1.3 Annotating an RO or component ro annotate creates an annotation for an RO or a component. ro annotations lists annotations on an RO or component.

6.5.1.4 Exchange RO with repository ro checkout retrieves or updates an RO from a repository into the local file system. ro push creates or updates an RO in a repository from the local file system.

6.5.1.5 Analyse RO contents ro evaluate completeness evaluates the content of an RO for completeness with respect to an indicated minimum information model.

7 Future Work

For the next period, the roadmap for the e-research services includes the following tasks: ● Regarding ROHUB ○ Continue the implementation of ROHUB portal v2 ○ Release and deployment of first version of portal v2. ○ Continue the implementation of ROHUB backend extensions and adaptations to Earth Science. domain, e.g., extended RO model, implementation of OpenSearch API with geospatial extensions, extended lifecycle and automatic citations, implementation of DOIs, etc. ○ Release and deployment of extended ROHUB backend. ○ Migration of research objects from current backend to extended backend. ● Regarding other RO components ○ Continue implementation of new RO components, including semantic search, recommendation, collaboration spheres, and the scholarly communication related components. ○ Integration of these components with ROHUB. ○ Continue the implementation of RO Manager in EVER-EST portal providing components tailored for earth scientists. ● Regarding the workflow management components

H2020 – EINFRA – 2015 – 1 Page 63 of 76

○ Integrate the workflow runner component. ● Regarding the cloud platform ○ Finalize integration of cloud platform with PSNC infrastructure. ○ Integrate cloud platform with the rest of components. ○ Integrate cloud platform visual components in the portals. ● Regarding the preservation components ○ Implement and integrate the preservation front-end, middleware and DB of user profiles.

H2020 – EINFRA – 2015 – 1 Page 64 of 76

Annex A Installation Guides A.1. ROHUB A.1.1 Frontend: Portal The following description applies to ROHUB v1 which is the production prototype.

A.1.1.1 Software prerequisites Direct dependencies are: ● base framework Java 8 Javax Servlet 2.5 Apache Wicket 6.9.0 ● external service communication Hibernate 3.6.7 Apache Httpclient 4.5.2 RODL Client 2.9.1 ● logging 1.2.16 Slf4j 1.6.2 ● help tools 2.11.0 Google Guava 14.0.1 -io 2.4 Apache Commons-configuration 1.9 OpenRDF Sesame-rio 2.7.0-beta2 Joda-time 2.0 Javassist 3.12.1.GA Google Guice 3.0 Hamcrest 1.3 Gson 2.7 Jackson 1.8.5 ● OpenID Openid4java 0.9.7 Nimbus JOSE + JWT 4.26.1 ● compile and build framework maven 3.3

Beyond direct software dependencies ROHUB portal requires also a set of services that need to be deployed and configured separately.

H2020 – EINFRA – 2015 – 1 Page 65 of 76

● RO Digital Library (RODL), is a main portal backend service. It provides functionality of storing, searching and manipulating of research objects. Current implementation of RODL provides also API for user authentication, authorisation and registration. ● Relational database, required for storing basic application data. During development process MySQL and PostgreSQL databases were successfully used. ROHUB portal uses Hibernate middleware to interact with database, so any other database management system compatible with Hibernate framework may be also used. ● Data index, required for full text search functionality. Current implementation of ROHUB portal uses SOLR as data index. ● Servlet container, a web server that will host and manage ROHUB portal application. ROHUB portal application was developed and tested with 7 servlet container. It is recommended to use Apache Tomcat 7 or higher for the deployment.

A.1.1.2 Hardware prerequisites Current installation work on the Virtual Machine with the following parameters: Processors – 2 x 2,1 GHz RAM – 8GB Disk space – 65GB

A.1.1.3 Service installation In order to install and run fully functional ROHUB Portal the following steps need to be undertaken: 1. Ensure that all backend services are up and running ○ For the basic ROHUB portal functionality the following services are required: RODL, database (MySQL/PostgreSQL), Solr server. ○ The full ROHUB functionality can be achieved when the following services are installed and configured properly: Workflow Runner Service, Workflow Transformation Service, Checklist Evaluation Service, Quality Evaluation Service, Recommender Service, Workflow Abstraction Service. 2. Build ROHUB portal application To build ROHUB portal application, open terminal window, change directory to the one where ROHUB portal source code is located and run following command: $ mvn package If the command successfully finishes, the war archive with the application will be created in the target folder: target/portal.war 3. Install ROHUB portal application in the web application server Let’s assume that $CATALINA_HOME represents the Tomcat web server home directory. To install ROHUB portal application copy target/portal.war archive to the $CATALINA_HOME/webapps directory. Tomcat Application Manager UI or API can be also used for this purpose. If Tomcat server works properly, the war archive will be extracted automatically. 4. Configure backend service locations and access credentials The ROHUB portal configuration files are located in the $CATALINA_HOME/webapps/portal/WEB- INF/classes/ directory Update following configuration files according to your installation environment: ● admintoken.properties

H2020 – EINFRA – 2015 – 1 Page 66 of 76

adminToken - rodl administrator username and password in format: username:password encoded using base64 ● checklist.properties service.uri - RO evaluation service location ● portal.properties rodlURL - RODL service location sparqlEndpointURL - RODL SPARQL endpoint searchEndpointURL - RODL Solr index location recommenderEndpointURL - RO recommendation service location stabilityEndpointURL - RO stability service location userAccessTokenEndpointURL - RODL user access token endpoint userAuthorizationEndpointURL - RODL user authorization endpoint wf2ROService - workflow transformation service location

● oidc.ev.properties oidc.ev.redirectUri - callback endpoint location, this endpoint will be called by Identity Provider (IDP) in the OpenID Connect protocol callback step oidc.ev.clientId - application client id issued by the IDP provider oidc.ev.clientAuthToken - application authorization token issued by the IDP provider oidc.ev.issuer - url representing user token issuer oidc.ev.authorizeEndpoint - IDP authorization endpoint oidc.ev.tokenEndpoint - IDP token endpoint oidc.ev.jwksEndpoint - JSON Web Key Service endpoint which provides IDP public key

● tokens.properties myExpConsumerKey - application key that allows portal to access myExperiment infrastructure myExpConsumerSecret - application secret

In the $CATALINA_HOME/conf/server.xml define database resource. For the name property use: jdbc/rosrs value. See $CATALINA_HOME/webapps/portal/WEB-INF/classes/hibernate.cfg.xml for more details. 5. Start application server To start and stop Tomcat server use standard procedure. From the $CATALINA_HOME/bin directory run: $ ./startup.sh - to start server $ ./shutdown.sh - to stop server

A.1.1.4 Uninstallation Procedure In order to remove ROHUB portal form the Tomcat server user Tomcat Application Manager UI to un-deploy the application. If the Application Manager is not available, the portal application can be uninstalled manually by removing portal directory and portal.war archive from $CATALINA_HOME/webapps directory. $ cd $CATALINA_HOME/bin $ ./shutdown.sh

H2020 – EINFRA – 2015 – 1 Page 67 of 76

$ cd $CATALINA_HOME/webapps $ rm -r portal portal.war $ cd $CATALINA_HOME/bin $ ./startup.sh

A.1.2 Backend: Research Object Digital Library (RODL) RODL exposes a set of REST APIs, as described in Section 6.4.1. Additionally, RODL provides a SPARQL endpoint that allows performing SPARQL queries over HTTP to the metadata of all stored research objects. RODL has been used as a demonstration service since 2012. During this time the number of users and the number of created ROs has increased significantly. In order to serve the increasing network traffic and effectively manage the growing volume of data generated by the EVER-EST users, the current development of RODL is focused on migrating semantic data storage from local Jena repository to the Virtuoso server. This requires also an upgrade of the internal data model used by the RODL to represent and manage semantic data. As a result of this task the overall performance and scalability of the RODL will increase. Other recent or ongoing changes in RODL include  OpenID single sign on that was based on the specification from 2008. Since then, new specifications and standards were introduced. Hence, in order to support latest OpenID Connect standard, which is also the protocol agreed to be used by the EVER-EST Identity Provider, additional efforts needed to be undertaken. The goal was to integrate RODL with the EVER-EST infrastructure and make it fully OpenID Connect compliant.  The EVER-EST users require to extend RO model to better support geolocation as well as introduce new evolution states of the ROs. In order to support these requirements RODL data model needs to be upgraded to reflect changes in the RO model itself.

A.1.2.1 Software prerequisites ● base framework Java 8 Javax Servlet 3.1.0 Jersey 1.19.2 ● external service communication Hibernate 4.1.7.Final Apache Httpclient 4.4 Solr 4.1.0 ● logging Slf4j 1.7.12 Log4j 1.2 ● help tools Rdf4j 2.0M3 Jsonld 0.8.3 Gson 2.7 XercesImpl 2.11.0 Xml-apis 1.4.01 Rome atom/rss 1.0

H2020 – EINFRA – 2015 – 1 Page 68 of 76

Apache Commons Io 2.1 Apache Commons Lang 3.0.1 Apache Commons Codec 1.5 Jackson Annotations 2.7.3 Javassist 3.12.1.GA Quartz Scheduler 2.1.7 ● compile and build framework maven 3.3

Beyond direct software dependencies RO Digital Library uses a set of services that need to be deployed and configured separately. The required services are: ● Relational database, used to store basic application data and registered user information. During development process MySql and PostgreSQL databases were successfully used. RODL uses Hibernate middleware to interact with relational database, so any other database management system compatible with Hibernate framework may be also used. ● Semantic data store, used to store semantic data. Current implementation of RODL uses Virtuoso Semantic Store implementation. RODL uses RDF4J middleware to interact with semantic store and uses SPARQL standard queries. If required Virtuoso can be easily replaced by any other semantic sore implementation that is supported by the RDF4J middleware. ● Data index, used to provide fast and simple access to the most important information about publicly available ROs. This index is used mostly for searching and providing basic RO information for the ROHUB portal. Current RODL uses SOLR as data index implementation. ● External identity provider allows users to use single identity in many services. RODL accepts user access tokens issued by the EVER-EST Identity Provider and allows EVER-EST users to interact with the RODL service. RODL provides its own user management and authorisation API, therefore external identity provider should be considered as an optional service. ● RO stability monitor is an optional service that can be used to monitor RO quality. ● Servlet container, a web server that will host and manage RODL application. RODL application was developed and tested with Apache Tomcat 7 servlet container. It is recommended to use Apache Tomcat 7 or higher for the deployment.

A.1.2.2 Hardware prerequisites Current deployment of the RODL application works on the Virtual Machine with the following parameters: CPU - 4 RAM - 8GB Disk space - 80GB

A.1.2.3 Service installation 1. Ensure that all backend services are up and running In order to successfully run RODL application the following services must be available:  database - MySQL/PostgreSQL for relational data management  Virtuoso server - for semantic data management  Solr server - to index and expose publicly available data

H2020 – EINFRA – 2015 – 1 Page 69 of 76

In order to provide RO stability monitoring functionality additional RO stability monitoring service must be provided.

2. Configure backend service locations and access credentials Use any file manager and navigate to the directory where RODL source code is located. The two configuration profiles are defined:  dev, for development deployment, located in profiles/dev directory  prod, for production deployment, located in profiles/prod directory Choose the right configuration profile, navigate to the profiles/[profile_name] directory and change the name of the config.properties.template file to config.properties. Open config.properties file in any text editor and provide values for the following configuration parameters:

admin.token - md5 hash of the RODL administrator username and password in format username:password encoded in base64

db.username - database user name used by the RODL to connect to the database db.passwd - database user password db.jdbc.url - database location url in the JDBC format specific for the chosen database implementation db.jdbc.driver - JDBC driver class name db.dialect - database dialect class name used by the Hibernate framework to interact with database

virtuoso.host - host name where Virtuoso server is located virtuoso.port - port number where Virtuoso server is listening for connections (standard port is 1111) virtuoso.username - user name used by the RODL to connect Virtuoso server virtuoso.passwd - user password virtuoso.defGraph - default graph URI, represents the graph where triples are stored if no context is provided virtuoso.useLazyAdd - put Virtuoso connection in lazyAdd mode. Possible values are: true/false. See Virtuoso documentation for more details.

virtuoso.conn.poolsize.init - initial size of the connection pool virtuoso.conn.poolsize.min - minimum size of the connection pool virtuoso.conn.poolsize.max - maximum size of the connection pool virtuoso.conn.idletime.max - maximum number of seconds that connection remain unused in the pool before the it is closed

filesystemBase - directory in the filesystem where ROs and RO resources will be stored

solrServer - SOLR server location url

H2020 – EINFRA – 2015 – 1 Page 70 of 76

oauth.everest.authorizationToken - token issued by the EVER-EST Identity Provider to identify RODL application oauth.everest.validationEndpoint - EVER-EST Identity Provider OAuth2 token validation endpoint oauth.everest.userinfoEndpoint - EVER-EST Identity Provider OAuth2 userinfo endpoint oauth.everest.validationRequestBodyTemplate - SOAP Envelope request body template used by the RODL to pass user token to Identity Providers OAuth2 token validation endpoint

checklist_service_url - optional parameter, RO stability monitoring notification endpoint location checklist_author_source - optional parameter, RO stability monitoring service location

3. Build RODL application In the terminal window navigate to the directory where RODL source code is located. Run the following command to build RODL application archive. For the -P argument provide choosen in the step 2. configuration profile name. $ mvn package -P [profile_name] If the command successfully finishes, the war archive with the application will be created in the target folder: target/rodl.war

4. Install RODL application in the web application server Let’s assume that $CATALINA_HOME represents the Tomcat web server home directory. To install RODL application copy target/rodl.war archive to the $CATALINA_HOME/webapps directory. Tomcat Application Manager UI or API can be also used for this purpose. If Tomcat server works properly, the war archive will be extracted automatically.

5. Start application server To start and stop Tomcat server use standard procedure. From the $CATALINA_HOME/bin directory run: $ ./startup.sh - to start server $ ./shutdown.sh - to stop server

A.1.2.4 Uninstallation procedure In order to remove RODL from the Tomcat server user Tomcat Application Manager UI to un-deploy the application. If the Application Manager is not available, the application can be uninstalled manually by removing rodl directory and rodl.war archive from $CATALINA_HOME/webapps directory. $ cd $CATALINA_HOME/bin $ ./shutdown.sh $ cd $CATALINA_HOME/webapps $ rm -r rodl rodl.war $ cd $CATALINA_HOME/bin $ ./startup.sh

H2020 – EINFRA – 2015 – 1 Page 71 of 76

A.2 VRE portal RO manager As these components are part of the VRE portal, the software and hardware prerequisites plus the installation procedures are described in D5.3.

A.3 Workflow management This section will focus on the wf-ro transformation service and the workflow runner. Please refer to Taverna website13 for instructions regarding the installation of this particular workflow management system.

A.3.1 WF-RO transformation service The WF-RO transformation service API is a Java web application that has to be first compiled and then deployed in an application server that supports Java Enterprise Edition platform. The project is released with an instance of Jetty where the service is deployed by default.

A.3.1.1 Software prerequisites Java Development Kit JDK 8 to compile the java project and generate the web application archive (war) that later on is deployed on the web server. 3 to build the project. If the user wants to use an application server different to Jetty, he should install it in the system.

A.3.1.2 Hardware prerequisites The hardware requisites are driven by the application server selected for the deployment of the service and the Java Development Kit used. In general, a minimum of 2GB of RAM is required and at least 1GB available in the hard drive. Regarding CPU, the processor should run at least at 1GHz and its recommended two cores or more.

A.3.1.3 Service installation First, download from GitHub repository https://github.com/wf4ever/wf-ro the source code as a zip file and unzip it in your file system. Next in a terminal go to the unzipped project folder and generate the war file with the following instruction: mvn install. Finally run the server in the terminal with the following command: mvn jetty:run. If this last command runs successfully then the service should be installed in this direction: http://localhost:8086/jobs/ .

A.3.1.4 Uninstallation procedure If the service is running, it has to be stopped first by typing in a different terminal mvn jetty:stop. Once the service is stopped manually delete the source folder.

13 https://taverna.incubator.apache.org/ H2020 – EINFRA – 2015 – 1 Page 72 of 76

A.3.2 Workflow runner service The workflow runner service is RESTful web service written in Clojure. The application has to be first compiled and then deployed in an application server that supports Java Enterprise Edition platform. The service is distributed under the Eclipse Public License, the same as Clojure.

A.3.2.1. Software prerequisites The service has the following dependencies: ● clojure 1.3 ● compojure 1.1 ● ring-middleware-format 0.2 ● clj-http 0.4.3 ● clj-time 0.4.3 ● net.kronkltd 0.1.0-SNAPSHOT Additionally, Leiningen to compile the project and generate the web application archive (war) that later on is deployed on the web server. Furthermore, as the current implementation of the workflow runner service is based on Taverna, it needs a Taverna server instance up and running.

A.3.2.2. Hardware prerequisites The hardware requirements are driven by the application server selected for the deployment of the service used. In general, a minimum of 2GB of RAM is required and at least 1GB available in the hard drive. Regarding CPU, the processor should run at least at 1GHz and its recommended two cores or more.

A.3.2.3. Service installation First, download from GitHub repository https://github.com/wf4ever/workflow-runner the source code as a zip file and unzip it in your file system. Next in a terminal go to the unzipped project folder and generate the war file with the following instruction: lein ring uberwar. The resulting WAR file will be written into the target directory. Next, this war file can be dropped into the application server. Then, start the application service. The service will be running in this direction: http://localhost/runner/.

A.3.2.4. Uninstallation procedure If the service is running, it has to be stopped first by stopping the application server. Once the service is stopped manually delete the source folder.

A.4 Cloud Platform A.4.1 Software prerequisites The Terradue Platform leverages the ‘RPM’ software packages manager technology to enable access to standard software packages installation operations, available from well-known repositories (e.g. Elgis, Epel, etc.). For the majority of applications, the RPMs are typically offering specialised software packages such as GDAL, R, etc. These RPMs are accessed by the Platform users with ‘sudo’ (substitute superuser do) rights, granted for programs that need the higher security privileges of the ‘superuser’, like for the ‘yum’ updater and modifier utility. Some

H2020 – EINFRA – 2015 – 1 Page 73 of 76

operations requiring the ‘root’ user privileges will have to be performed by the Operations Support team (Terradue), and will have to be requested by the partners via the User Support site. To check the availability of a software package for installation on the user Virtual Machine (VM): > sudo yum info To find a correct package name based on a query term: > sudo yum search To install a software package on the user VM: > sudo yum install To update a software package on the user VM, if a package update is available on a repository: > sudo yum update Additionally, the Platform supports utilities to manage Hadoop Sandbox applications. Such applications must follow a specific file structure14 where users can save and retrieve multiple Hadoop-based applications (like different versions of their own developments) using the GitHub repositories provided by the project to the partners, and living under the EVER-EST organization. A.4.2 Hardware prerequisites The Cloud infrastructure supporting the EVER-EST services is operated from a virtualized data center, enabled by Terradue's OpenNebula Cloud Controller, and is featuring Orchestration and Auto-Scaling of Multi-Tiered Cloud Applications. An administrator of Hosted Processing services can therefore define, execute, and manage “services composed of interconnected Virtual Machines”, with deployment dependencies between them. The Platform equips the VRE application developers with a virtualized computing environment, to develop, test and validate their data processing applications. Developers benefit from a cost-effective PaaS environment, to develop and simulate their application behaviour on computing clusters. Once deployed, the resulting application is a highly scalable data processing service. In this setting, services are composed by roles, each one corresponding to one VM template, and each one with cardinality, i.e. the number of instances of the same VM template. The role cardinality can be adjusted manually, based on metrics, or based on a schedule, that define auto-scaling policies.

A.4.3 Service installation The Platform integrates services to support EVER-EST communities with the usage of automation tools for moving a Developer Cloud Sandbox application to the production phase. This is addressed in 4 main steps: 1. Initial pre-production testing; 2. Application packaging and GitHub usage (with additional guidance for Python developers); 3. Application package quality check (generally, using the RPM format); 4. Production cluster on a Public Cloud, using the bursting capabilities of Terradue Cloud Platform. The following section provides further details for each step. Initial pre-production testing: Pre-production testing is provided on a mini-cluster, in order to validate application behaviour in a cluster environment. Platform users are provided with a mini cluster (4 nodes) for their application tests. In such a case, the Sandbox represents the master node of the cluster. A Hadoop Sandbox VM is initiated from the Cloud Controller environment, to create a Master node of a Hadoop Cluster formed by 4 nodes. Partners can launch their application as usual, but this will be using 8 (or more) slots for mapper jobs, instead of 2 with the Sandbox mode used for Cluster simulation. Python and Anaconda support:

14 http://docs.terradue.com/developer-sandbox/start/github/index.html#organizing-the-repository H2020 – EINFRA – 2015 – 1 Page 74 of 76

For Python applications, the Platform supports the Anaconda software to overcome the issues related to python dependencies and versions. Specific guidance is provided by the support in order to make a transition from a new user’s current python dependencies to the ones provided through Anaconda (including cioppy - CIOP (Cloud Computing Operational Pilots) Python bindings15). Application packaging and GitHub usage: This step aims at building a single package (typically in RPM format) containing the application resources and the dependencies specification. The goal is to build an application compatible with the DCS applications family (such as the one used for the EVER-EST training16). Such applications have the advantage of being self-contained. For this step, users are provided with a private GitHub repository17 under to host the application wrappers (application workflow & Hadoop streaming executables) that are making use of the RPM resources. The application archetypes are also provided to guarantee the correct Cloud deployments of such applications. Production cluster on a Public Cloud: The final phase covers the execution of the application on a PSNC cloud, using the bursting capabilities of Terradue Cloud Platform. The target production cluster can be dimensioned and the application can be deployed from the Platform’s Cloud Controller on each computing node of the Cluster. For this phase, partners will have to establish a testing plan (e.g., number of computing nodes, metrics to measure) in order to define with the Operations Support team at Terradue the best deployment settings.

A.5 Preservation components A.5.1 Preservation front-End The preservation Front-End will be integrated in the GUI and will be constituted by javascript code. Further information will be provided in the next version of this technical note.

A.5.2 Preservation middleware The preservation middleware has the objective of bridging the Front-End with the ROHUB Back-End. This will involve invoking ROHUB APIs in order to package and store the proper auxiliary information in the ROs. The preservation middleware will also access the DB to get the information related to the specific class of users. The preservation middleware will be based on WSO-2 ESB. Further information will be provided in the next version of this technical note.

A.5.3 Preservation DB of user profiles The users profiles DB contains the profiles of the various classes of users and is accessed during the creation of the ROs. It allows following the checklist of the information necessary to ensure the preservability. This DB will be based on SQLite and configured according to OAIS recommendation and to VRCs advice. Further information will be provided in the next version of this technical note.

15 https://anaconda.org/terradue/cioppy 16 https://github.com/Terradue/dcs-beam-algalbloom/ 17 https://github.com/ec-everest H2020 – EINFRA – 2015 – 1 Page 75 of 76

A.5.4 RO assessment

A.5.4.1 Checklist component A detailed installation guide of the checklist service is available at https://github.com/wf4ever/ro- manager/tree/master/src/roweb .

A.5.4.2 Stability component The stability service and the RO monitoring (visual component) are implemented in java as a web service and a web application respectively. Thus, the installation requires a JDK so that the source code can be compiled, and a web application server, that implements the servlet specification, so that both components can be deployed. The source code of the stability service and the RO monitoring is available at: https://github.com/wf4ever/reliability. After the code has been download, it must be compiled so that a war file is produced. To do so, the standard jar command has to be used within the project home directory: jar -cvf my_web_app.war * This war file is ready to be deployed in the web application server of your choice. Note that each application server has their own means to deploy web applications and therefore the reader should refer to the specific documentation.

A.5.5 RO notification and preservation component in ROHUB The notification service, as well as the other RO preservation component are installed as part of the ROHUB.

H2020 – EINFRA – 2015 – 1 Page 76 of 76