ARCHITECTURE SPECIFICATION CI

Version 3-00 Document Control Number 2130-00003 2011-04-20

Consortium for Ocean Leadership 1201 New York Ave NW, 4th Floor, Washington DC 20005 www.OceanLeadership.org in Cooperation with

University of California, San Diego University of Washington Woods Hole Oceanographic Institution Oregon State University Scripps Institution of Oceanography Architecture Specification CI Document Control Sheet

Version Date Description Originator 1.00 2010-02-23 Release 1 Life Cycle M. Meisinger Objectives Review Baseline 2.00 2010-08-18 Release 1 Life Cycle M. Meisinger Architecture Review Baseline 3.00 2011-04-20 Annual Review Year 2 M. Meisinger Baseline

1. Architecture and Design ...... 4 1.1 CIAD Executive Summary ...... 6 1.2 CIAD AV Architecture Project ...... 9 1.3 CIAD AV OOI Context ...... 12 1.4 CIAD AV OOI Use Cases ...... 17 1.5 CIAD AV OOI Operations Overview ...... 25 1.5.1 CIAD AV OOI User Persona Overview ...... 28 1.6 CIAD AV ION Integration and Deployment ...... 30 1.7 CIAD AV Transition to Operations ...... 38 1.7.1 CIAD AV Scope Release 1 ...... 44 1.7.2 CIAD AV Scope Release 2 ...... 45 1.8 CIAD AV Glossary ...... 46 1.9 CIAD OV System Decomposition ...... 48 1.10 CIAD OV Subsystem and Service Dependencies ...... 52 1.11 CIAD OV Resource and Object Model ...... 54 1.12 CIAD OV Data Flows ...... 58 1.13 CIAD OV User and Application Interfaces ...... 62 1.14 CIAD OV Instrument Integration ...... 65 1.15 CIAD OV External Interfaces ...... 67 1.16 CIAD OV External Observatory Integration ...... 71 1.16.1 CIAD EOI Dataset Agent Overview ...... 72 1.16.1.1 CIAD EOI Dataset Agent Details ...... 73 1.16.2 CIAD EOI Dispatcher ...... 78 1.17 CIAD OV Operations Support Systems ...... 79 1.18 CIAD SV Integration Strategy ...... 80 1.19 CIAD SV Deployment Strategy ...... 82 1.20 CIAD SV CyberPoPs ...... 88 1.20.1 CIAD SV CyberPoP General Development Strategy ...... 91 1.20.2 CIAD SV Incremental CyberPop Rollout ...... 97 1.20.3 CIAD SV CyberPoP Internal Connectivity ...... 97 1.20.4 CIAD SV CyberPoP Physical Layout ...... 103 1.20.5 CIAD SV CyberPoP Management ...... 107 1.21 CIAD SV Network Architecture ...... 111 1.22 CIAD SV Technology List ...... 113 1.23 CIAD TV Technical Standards ...... 113 1.23.1 CIAD TV FIPA Specifications ...... 114 1.24 CIAD COI Common Operating Infrastructure ...... 116 1.24.1 CIAD COI OV ...... 118 1.24.2 CIAD COI OV Capability Container ...... 121 1.24.2.1 CIAD COI OV Capability Container Exchange Interface ...... 124 1.24.2.2 CIAD COI OV Capability Container Internal Processes ...... 126 1.24.2.3 CIAD COI OV Process Management ...... 127 1.24.2.4 CIAD COI SV Container Messaging ...... 129 1.24.2.5 CIAD COI SV Java Capability Container ...... 130 1.24.2.6 CIAD COI SV Python Capability Container ...... 130 1.24.2.6.1 Python CC Startup ...... 131 1.24.2.7 CIAD COI TV ESB ...... 135 1.24.2.8 CIAD COI TV Open Telecom Platform ...... 135 1.24.3 CIAD COI OV Distributed State Management ...... 136 1.24.3.1 CIAD COI OV Attribute Store Design ...... 137 1.24.3.2 CIAD COI OV Common Object Model ...... 141 1.24.3.3 CIAD COI OV Data Store Service ...... 146 1.24.3.4 CIAD COI OV Service State Repository ...... 147 1.24.3.5 CIAD COI SV GPB Object Encoding ...... 148 1.24.4 CIAD COI OV Exchange ...... 149 1.24.4.1 CIAD COI OV Exchange Management Service ...... 152 1.24.4.2 CIAD COI OV Messaging ...... 157 1.24.4.3 CIAD COI SV Common Message Format ...... 161 1.24.4.4 CIAD COI SV Distibuted IPC Facility ...... 164 1.24.4.5 CIAD COI SV Interaction Levels ...... 168 1.24.4.6 CIAD COI SV RabbitMQ Exchange ...... 171 1.24.4.7 CIAD COI TV AMQP ...... 171 1.24.4.7.1 CIAD COI TV AMQP 1.0PR1 & 1.0PR2 Models ...... 174 1.24.4.7.2 CIAD COI TV AMQP 1.0PR3 models ...... 186 1.24.4.8 CIAD COI TV Distributed IPC Facility Models ...... 201 1.24.4.9 CIAD COI TV RabbitMQ ...... 204 1.24.4.10 CIAD COI TV Rich Service Architecture ...... 205 1.24.5 CIAD COI OV Governance Framework ...... 207 1.24.5.1 CIAD COI OV Agents and Monitoring ...... 210 1.24.5.2 CIAD COI OV Federated Facility ...... 211 1.24.5.3 CIAD COI OV Governance Activities ...... 212 1.24.5.4 CIAD COI OV Governance Concepts ...... 213 1.24.5.5 CIAD COI OV Governance Domain Models ...... 220 1.24.5.6 CIAD COI OV Governance Interactions ...... 232 1.24.5.7 CIAD COI OV Governance Use Cases ...... 234

1 1.24.5.8 CIAD COI OV Interaction Management ...... 239 1.24.5.9 CIAD COI TV Governance ...... 241 1.24.6 CIAD COI OV Identity and Policy Management ...... 241 1.24.6.1 CIAD COI OV Identity Management Activities ...... 245 1.24.6.2 CIAD COI OV Idm Relevant Nomenclature ...... 252 1.24.6.3 CIAD COI OV Policy Management ...... 253 1.24.6.4 CIAD COI OV Secure Messaging ...... 253 1.24.6.5 CIAD COI SV End to End Identity Management ...... 257 1.24.6.6 CIAD COI SV IdM Technology Mapping ...... 257 1.24.6.7 CIAD COI SV Roles and Permissions ...... 263 1.24.6.8 CIAD COI TV CIlogon ...... 263 1.24.6.9 CIAD COI TV SAML ...... 264 1.24.6.10 CIAD COI TV WS-Security ...... 264 1.24.6.11 CIAD COI TV X509 ...... 265 1.24.6.12 CIAD COI TV XACML ...... 265 1.24.7 CIAD COI OV Presentation Framework ...... 267 1.24.7.1 Available Presentation Frameworks ...... 268 1.24.7.2 CIAD COI SV Web ...... 269 1.24.7.3 CIAD COI TV Grails ...... 269 1.24.7.4 Presentation Use Cases ...... 269 1.24.7.5 Scientific Process ...... 271 1.24.8 CIAD COI OV Resource Management ...... 274 1.24.8.1 CIAD COI OV Resource Lifecycle ...... 277 1.24.8.1.1 CAID COI OV Resource Lifecycle States by Resource Type ...... 295 1.24.8.1.2 CIAD COI OV Implications of Policy over Resource Lifecycle ...... 296 1.24.8.2 CIAD COI OV Resource Registry ...... 298 1.24.8.3 CIAD COI SV Resource Tutorial ...... 302 1.24.8.3.1 Address Book ...... 309 1.24.8.3.2 Coupled Mooring Composite ...... 310 1.24.8.3.3 Mooring Composite ...... 311 1.24.9 CIAD COI OV Service Framework ...... 312 1.24.9.1 CIAD COI OV Service Agent ...... 314 1.24.9.2 CIAD COI OV Service Integration ...... 314 1.24.9.2.1 CIAD COI OV Service Discovery ...... 316 1.24.10 CIAD COI OV User Interfaces ...... 318 1.25 CIAD CEI Common Execution Infrastructure ...... 319 1.25.1 CIAD CEI OV ...... 320 1.25.2 CIAD CEI OV Elastic Computing ...... 322 1.25.2.1 CIAD CEI OV Elastic Computing Terminology ...... 329 1.25.2.2 CIAD CEI OV Elastic Processing Unit ...... 334 1.25.2.3 CIAD CEI TV Eucalyptus ...... 338 1.25.2.4 CIAD CEI TV Nimbus ...... 339 1.25.3 CIAD CEI OV Execution Engines ...... 339 1.25.4 CIAD CEI OV Process Execution Management ...... 340 1.25.5 CIAD CEI OV Registries and Repositories ...... 341 1.25.6 CIAD CEI OV Taskable Resource Management ...... 341 1.25.6.1 CIAD CEI OV Planner ...... 344 1.25.6.2 CIAD CEI OV Resource Agent ...... 345 1.25.6.3 CIAD CEI OV Resource Agent Interactions ...... 347 1.25.6.4 CIAD CEI SV Resource Agent State Machine ...... 351 1.25.7 CIAD CEI OV User Interfaces ...... 352 1.25.8 CIAD CEI SV System Bootstrapping ...... 353 1.26 CIAD DM Data Management ...... 357 1.26.1 CIAD DM OV ...... 359 1.26.2 CIAD DM OV Distribution ...... 362 1.26.2.1 CIAD DM OV Information Model ...... 364 1.26.2.2 CIAD DM OV Topic Exchange ...... 367 1.26.2.3 CIAD DM SV Common Data Model ...... 368 1.26.2.4 CIAD DM SV Notifications and Events ...... 375 1.26.2.5 CIAD DM SV R1 Data Distribution Specification ...... 378 1.26.2.6 CIAD DM TV DAP ...... 379 1.26.2.7 CIAD DM TV Google Protocol Buffers ...... 381 1.26.2.8 PubSub resource model ...... 382 1.26.3 CIAD DM OV Ingestion ...... 383 1.26.3.1 CIAD DM SV R1 Ingestion Service Specification ...... 384 1.26.4 CIAD DM OV Inventory ...... 385 1.26.4.1 CIAD DM OV Data Model ...... 388 1.26.4.2 CIAD DM OV Data Set Registry ...... 391 1.26.4.3 CIAD DM OV Information Resource Management ...... 391 1.26.4.4 CIAD DM SV Associations ...... 392 1.26.5 CIAD DM OV Presentation ...... 394 1.26.6 CIAD DM OV Preservation ...... 395 1.26.6.1 CIAD DM SV Cassandra Schema Specification ...... 399 1.26.6.2 CIAD DM SV Content Addressable Store ...... 401 1.26.6.3 CIAD DM SV Persistence Architecture ...... 402

2 1.26.6.4 CIAD DM SV R1 Persistent Archive Service ...... 404 1.26.6.5 CIAD DM SV Virtual File Store ...... 404 1.26.6.6 CIAD DM TV Cassandra ...... 404 1.26.6.7 CIAD DM TV GIT ...... 406 1.26.6.8 CIAD DM TV iRODS ...... 406 1.26.7 CIAD DM OV Transformation ...... 406 1.26.8 CIAD DM OV User Interfaces ...... 407 1.27 CIAD SA Sensing and Acquisition ...... 409 1.27.1 CIAD SA OV ...... 411 1.27.2 CIAD SA OV Data Acquisition ...... 417 1.27.3 CIAD SA OV Data Calibration Services ...... 419 1.27.4 CIAD SA OV Data Processing ...... 420 1.27.5 CIAD SA OV Data Product Activation ...... 420 1.27.6 CIAD SA OV Data Product Generation ...... 420 1.27.7 CIAD SA OV Data Validation Services ...... 420 1.27.8 CIAD SA OV Direct Access ...... 421 1.27.9 CIAD SA OV Domain Models ...... 422 1.27.10 CIAD SA OV Instrument Activation ...... 424 1.27.11 CIAD SA OV Instrument and Platform Agents ...... 426 1.27.11.1 CIAD SA SV Instrument Agent Interface ...... 432 1.27.11.2 CIAD SA SV Instrument Driver Design ...... 449 1.27.11.3 CIAD SA SV Instrument Driver Interface ...... 452 1.27.11.3.1 Test Instrument Interface ...... 460 1.27.11.4 Instrument Drivers ...... 462 1.27.12 CIAD SA OV Instrument Life Cycle ...... 463 1.27.13 CIAD SA OV Instrument Management ...... 463 1.27.14 CIAD SA OV Marine Facility ...... 466 1.27.15 CIAD SA OV Marine Platform Services ...... 466 1.27.16 CIAD SA OV Marine Resource Scheduling ...... 466 1.27.17 CIAD SA OV Observatory Management ...... 467 1.27.18 CIAD SA OV User Interfaces ...... 469 1.27.19 CIAD SA SV Instrument Development Kit ...... 470 1.27.20 CIAD SA SV Instrument Driver Framework ...... 471 1.27.21 CIAD SA SV Technology Mapping ...... 473 1.27.22 CIAD SV Instrument Agent and Driver Integration Interfaces ...... 473 1.28 CIAD AS Analysis and Synthesis ...... 474 1.28.1 CIAD AS OV ...... 474 1.28.2 CIAD AS OV Data Analysis ...... 477 1.28.3 CIAD AS OV Interactive Analysis ...... 478 1.28.4 CIAD AS OV Model Integration ...... 483 1.28.5 CIAD AS OV User Interfaces ...... 484 1.28.6 CIAD AS OV Workflows ...... 484 1.28.7 CIAD AS SV Technology Mapping ...... 485 1.29 CIAD PP Planning & Prosecution ...... 486 1.29.1 CIAD PP OV ...... 487 1.29.2 CIAD PP OV Autonomous System Control ...... 489 1.29.3 CIAD PP OV Instrument Interactivity ...... 492 1.29.4 CIAD PP OV Mission Execution ...... 494 1.29.5 CIAD PP OV Resource Planning ...... 497 1.29.6 CIAD PP OV User Interfaces ...... 499 1.29.7 CIAD PP SV Deployment Scenarios ...... 500 1.29.8 CIAD PP SV Technology Mapping ...... 502 1.29.9 CIAD PP TV ASPEN-CASPER ...... 503 1.30 CIAD APP DoDAF Reference ...... 503 1.31 CIAD APP UML Reference ...... 506 1.32 CIAD APP References ...... 512

3 Architecture and Design

This is the entry point to the OOI Integrated Observatory Architecture Specification. Please follow the links below for details to specific thematic pages. Note that many pages have child pages. This specification is structured according to the DoDAF framework. Released exports of this architecture specification are under OOI configuration control as document 2130-00003.

See the end of this page for information about maintaining this Architecture.

Executive Summary

CI System Level

Most content is in the Subsystem Level pages, linked from the right and under Subsystem Level below.

AV All Views (Introduction)

About this Architecture Specification About the Ocean Observatories Initiative (OOI) OOI Use Cases for Science and Education OOI Integrated Observatory Operations Overview Integrated Observatory Network Integration and Deployment Transition to Operations: CI Project Structure, Subsystems and Releases (see Scope Release 1, Scope Release 2) Glossary of Terms, Abbreviations and Acronyms

OV Operational Views (Logical Architecture)

Operational Overviews System Decomposition Subsystem and Service Dependencies Integrated Observatory Resource and Object Model Data and Control Flows External Interfaces and Integration User Interfaces Instrument Integration Interfaces to External Systems External Observatory Integration Operations Support Systems

SV System Views (Technical and Deployment Architecture)

Integration Strategy Deployment Strategy CyberPoP Design and Rollout Network Architecture Technology List

TV Technical Standards Views

System level Technical Specifications Technology Catalog - Reference materials for technologies, non authoritative

Subsystem Level

Common Operating Infrastructure (COI) subsystem Basic infrastructure components and services providing service integration, reliable communication, security, persistence ( Overview) Common Execution Infrastructure (CEI) subsystem Services for managing processes and services, enabling flexible deployments in heterogeneous environments (Overview ) Data Management (DM) subsystem Managing, storing, distributing and presenting data and other information (Overview ) Sensing and Acquisition (SA) subsystem Supports instrument control, data acquisition and data processing applications (Overview )

4 *Analysis and Synthesis (AS)*subsystem (starts in Release 2) Analyzing data, synthesizing derived products, integrating numerical models and providing interactive user workspaces ( Overview) Planning and Prosecution (PP) subsystem (starts in Release 3) Advanced observatory command and control with resource scheduling and instrument autonomy (Overview )

Appendices

DoDAF Reference UML Notation Reference: Class Diagrams and Message Sequence Charts References

Maintenance and Open Issues

For instructions on maintaining and commenting on these pages, please visit Procedures to Edit the Architecture and Design Document.

The following pages summarize topics needing attention in the Architecture and Design pages.

Open Issues in Architecture Pages Completed Architecture Pages Awaiting Review Architecture Pages with Comments Generate an Architecture Baseline Export

Material Covered

After reading this page, you should be able to answer the following questions:

Where in the Architecture document is most of the design documentation? What documentation framework have we used to organize the documentation? How many subsystems and implementations are documented?

Quick Links

Subsystems: COI CEI DM SA AS PP

External Integration: IPAA EOI

5 CIAD Executive Summary

This OOI Integrated Observatory Architecture Specfication (CIAD) specifies the system architecture and design for the science and education-driven applications of the OOI Integrated Observatory together with its infrastructure services. This system is developed and will be operated by the OOI Cyberinfrastructure (OOI CI) implementing organization.

OOI Integrated Observatory applications include :

Interfacing with environmental sensors, instrument platforms, and observatory infrastructure, enabling data and command flow, Acquisition of observational sensor data and external data and their ingestion into the OOI integrated observatory, Generation and distribution of qualified science data products in (near) real time, Synthesis of derived data products such as QA/QC'ed data products Integration of numerical ocean models and their output as derived data products, Access to, syntactic transformation, semantic mediation, analysis, and visualization of science data, data products, and derived data products, Interactive analysis and visualization of OOI integrated observatory data products in a social networking environment, Planning and control of complex, long-running ocean observations, and of event-triggered adaptive observations, Interactive control of observatory infrastructure.

The Integrated Observatory infrastructure services provide the foundation for the Integrated Observatory application capabilities supporting a broad range of integrated observatory user applications. Integrated Observatory infrastructure services include:

Management of distributed information repositories, including data stores for science data and derived data products, and general purpose repositories, Management of observatory resources of various types and keeping track of their life cycle and internal state, A common operating infrastructure, comprising capabilities for message based communication and service-oriented application integration, with consistent cross-cutting identity management, interaction governance, and policy enforcement, A common execution infrastructure that provides location-independent management of heterogeneous executable resources, including provisioning and control of compute and storage cloud resources, Operation and management of the integrated observatory system.

The contents of this document have been developed and structured in accordance with the Department of Defense Architecture Framework ( DoDAF), which provides guidelines for developing architectures for large-scale systems and for presenting relevant views on the architecture data in a number of products. The target audience includes decision makers, subsystem implementers, and end users. This documentation describes architectural principles, terms , design intent of system elements as well as integrated technologies. It contains detailed specification drawings and blueprints for construction, externally available under configuration control in the CI specification repository (see References, CI SPECS).

The system's core functional capabilities are structured into six subsystems. These subsystems provide extensive services that support the user applications capabilities and the infrastructure services listed above. The application supporting subsystems are Sensing and Acquisition, Data Management, Analysis and Synthesis and Planning and Prosecution. The infrastructure subsystems are Data Management, the Common

6 Execution Infrastructure and the Common Operating Infrastructure. Data Management provides both infrastructure and application services.

As with other components in this architecture, the subsystems are not all implemented at the same time; in particular, the Analysis and Synthesis starts in Release 2, and Planning and Prosecution starts in Release 3. Where practical, we have highlighted items not targeted for Release 1 with the phrase '(not in Release 1)', or similar language. Details of implementation schedules are provided in the Overview for each subsystem, and for the whole system in the Transition to Operations.

The core subsystem services are deployed and enabled within multiple target environments through dedicated "implementation projects". Implementation projects exist for deployment into the OOI Marine Observatories, namely the satellite connected Coastal and Global Scale Nodes (CGSN) and the cabled Regional Scale Nodes (RSN). Other implementation efforts integrate with external observatories, such as the Integrated Ocean Observatories System (IOOS), Neptune Canada, and WMO, and provide an interface to the National Compute Infrastructure (TeraGrid/XD). The implementation projects apply tailored user experience strategies for the various target deployment environments and user groups, in order to optimize the support for user processes and needs. Note: some design elements of these implementation project are not covered in this architecture documentation.

Subsystems and their Services

These descriptions represent the overall architectural intent for each subsystem. For information on which features are available in a given release, please view the Overview page for the corresponding subsystem.

The Common Operating Infrastructure (COI) provides the integration substrate for all functional capabilities, as well as user and application interfaces thereby realizing an integrated system of systems. COI's integration strategy is based on asynchronous reliable message exchange and service orientation. COI provides consistent identity management, as well as governance and policy enforcement across the distributed system and its many domains of authority. COI also provides uniform life-cycle management and description for the various kinds of resources governed by the integrated observatory. Resources range from physical (instruments, devices), to computational (source code, services, instrument agents, executing processes) and to information products of any kind including science data products and other system internal information. The COI services enable the management of the resources, their supported activities and their representation of state in a uniform way. All of the COI capabilities and services are provided as part of capability containers that will be implemented in different technologies. Technologies applied include Python and Java for the two core capability container implementations, AMQP messaging with a RabbitMQ message broker infrastructure for the Exchange , Google Protocol Buffers for the message encoding, and Internet2 security technologies including CIlogon and Shibboleth for the identity management and governance components.

The Common Execution Infrastructure (CEI) provides the means to schedule, provision and manage any kind of computation in the observatory network at any location independent of the characteristics of the executing environment. CEI provides a framework with general purpose abstractions to manage taskable (i.e., controllable) resources, with adapters for specific execution environments and specific types of resources. CEI manages the provisioning and remote monitoring and control of taskable resources. Taskable resources range from Infrastructure as a Service (IaaS) style deployable virtual machine images with core OOI CI services and capability containers to user-defined processes such as algorithms, workflows and numerical models. CEI supports the execution of OOI computation provisioned on demand in elastic computing environments such as the Teragrid and its logical successor XD, the Open Science Grid (OSG) and commercial facilities such as 's Elastic Compute Cloud (EC2) and Microsoft's Azure. CEI applies the cloud computing paradigm also to local deployments on OOI operated hardware. The technologies applied include Nimbus Virtual Workspaces for contextualization in cloud environments, and underlying virtualization technologies including libvirt, KVM, VMware.

The Data Management (DM) subsystem enables information distribution , persistence and access , making information accessible across the observatory network over extended periods of time. Information includes observational data and derived data products, as well as descriptive (metadata) and system internal information required for the operation of the integrated observatory. All information artifacts are managed together with their metadata. Data distribution is based on the COI Exchange messaging infrastructure and provides a topic based data publish/subscribe metaphor. Technologies applied include iRODS (Integrated Rule-Oriented Data System) and Apache Cassandra for data preservation and replication, the Unidata Common Data Model as basis for the internal OOI canonical data format, NetCDF as data import/export format, and OpenDAP/DAP/THREDDS as a data externalization servers and catalogs. The Data Management subsystem also provides user facing science and application services based on the underlying data distribution and preservation infrastructure. These include ingestion, transformation and presentation services. Ingestion targets bringing observational and external data and metadata into integrated observatory repositories in the canonical OOI data and metadata format. Transformation services support syntactical data format transformations as well as ontology-supported semantic mediation. A data access and "presentation" strategy supports multiple communities of use; for data access by providing a flexible access search and navigation of data products based on metadata and other search criteria. Standard models that apply include the VSTO (Virtual Solar-Terrestrial Observatory) ontology model and the ESG (Earth System Grid) Faceted Search for data access based on vocabularies from the Marine Metadata Interoperability (MMI) project.

The Sensing and Acquisition (SA) subsystem provides services for instrument and instrument platform access, data recovery, as well as for observatory management and data product generation. S&A services are relevant for the integration of CGSN (Coastal Global Scale Node) and RSN (Regional Scale Node) observatories and platforms and of PI-provided instruments deployed on the OOI infrastructure. S&A also supports the acquisition of data from external data sources, such as from IOOS (Integrated Ocean Observing System) and Neptune Canada external observatories and their users. All instruments and platform resources are managed through agents (i.e., device drivers) that provide a consistent observatory interface for heterogeneous resources of specialized capability in order to support data retrieval and device control. These agents/drivers directly interface with vendor provided software and hardware. S&A capabilities include sensor data acquisition, instrument control, and observatory operations management. Technologies applied include various existing instrument driver platforms, such as MBARI's SIAM (Software Infrastructure and Application for MBARI MOOS), the MIT MOOS architecture (Mission Oriented Operating Suite), and the Antelope platform. Device control is based on IEEE 1451 and SensorML standards. Instrument identification and activation is supported in form of the MBARI PUCK instrument interface.

The Analysis and Synthesis (AS) subsystem (not in Release 1) supports a wide variety of data product analysis, manipulation, generation and presentation capabilities, especially also supporting advanced visualizations. A flexible workspace component provides access to standard sets of analysis and visualization tools for interactive analysis and visualization applications directly driven by the user. A&S provides a virtual collaboration platform leveraging social networking paradigms that will be applied to realize virtual observatories, classrooms and laboratories. A&S provides the interfaces to integrate tools and applications provided by OOI's science and education users. Important A&S capabilities include

7 event and pattern detection, data assimilation, and numerical model integration and execution. Technologies applied include Kepler and Pegasus for distributed workflow execution and resource mapping. A&S will provide a framework for the integration and execution of scientist-provided numerical ocean models such as the Regional Ocean Modeling System (ROMS) and Harvard Ocean Prediction System (HOPS). A suite of integrated applications, including a standard Web portal interface and Matlab, Kepler, and WS-BPEL workflow editors will support process and model specification, simulation, analysis, and visualization.

The Planning and Prosecution (PP) subsystem (not in Release 1 and 2) provides situational awareness and advanced operational command and control on the level of the entire OOI integrated observatory. P&P leverages and integrates the capabilities of all resources and services provided by the other subsystems, supporting closed-loop optimized observe-analyze-act workflows. P&P supports the definition of long-term and adaptive observational missions. It provides generalized resource planning and control activities that will be applied to plan, schedule, and prosecute multi-objective observational programs. P&P provides an event-response framework and also embedded control of autonomous sensor systems, such as intermittently connected, low-bandwidth global mooring controllers, AUVs and to the extent possible Gliders and Profilers. Technologies applied include ASPEN and CASPER from NASA JPL (Jet Propulsion Laboratory) for resource planning and control and MIT MOOS (Mission Oriented Operating Suite) for interfacing with autonomous instrument platforms such as gliders and AUVs. MOOS is an open source middleware for connecting software components on an autonomous platform, enabling event capture, characterization and response. In addition, the behavior-based autonomous control software MOOS-IvP for autonomous vehicle motion control will be used and further developed. MOOS-IvP extends MOOS via interval programming (IvP), a unique, new mathematical model for representing and solving multi-objective optimization problems for reconciling vehicle behaviors during deployments.

Implementation Projects

The Instrument and Platform Agent Architecture (IPAA) project, associated with the Sensing and Acquisition subsystem, provides the core architecture and capabilities to interface with individual sensors, sensor platforms, platform controllers, and observatory infrastructure such as power and communication bandwidth controllers, as well as with observatory infrastructure management systems. The production of multiple sensor-specific drivers is based on this architecture.

The External Observatory Integration (EOI) project realizes the integration of the OOI Integrated Observatory with external observatories such as IOOS, Neptune Canada and WMO. This integration will support the integration of external data products into the OOI integrated observatory as part of the joint set of OOI resources. It will also support the delivery of data products to be assimilated for users of these external observatories in community specific formats, such as in the format for IOOS Regional Association observatories and numerical models. (Neptune Canada and WMO support is not part of Release 1.)

The Terrestrial CyberPoP and Network project is responsible for the design, deployment and operation of the network and and network management infrastructure, as well as for the physical plants (Terrestrial CyberPoPs) that will provide core computation and storage.

The Marine CyberPoP and Network element (in the project requirements module) captures the requirements for integrating with the Marine Observatories. Implementation of these requirements is performed by the IPAA project above, so we do not refer to this element as an implementation project.

The User Experience (UX) team provides a uniform experience to the integrated observatory end users, through web-based and mobile user interfaces. It will provides designs, strategies and user workflows targeted to specific user groups. It leverages the COI presentation framework as platform for UI developments.

Fundamental Strategies

The integration strategy is based on two core principles: asynchronous reliable messaging , and service orientation. A high-performance message Exchange provides the communication conduit with dynamic routing and interception capabilities for all interacting elements of the system. The message interface is defined independently of any implementation technology. The messaging infrastructure provides scalability, reliability and failure tolerance. Service-orientation is the key to managing and maintaining numerous and complex applications within a heterogeneous distributed system of systems. All functional capabilities and resources are represented exclusively through services with precisely defined service interfaces. Services can be accessed independent of location throughout the integrated observatory network through secure messaging. Services are defined independently of implementation technologies.

The deployment strategy uses the concept of the COI capability container that provides all essential infrastructure capabilities and selected deployment-specific application support. Capability containers can be deployed wherever CI-integrated computation is required across the observatory network, and capability containers will adapt themselves to available resources and their environment. This includes platform controllers on remote, intermittently connected global moorings, compute units placed in the payload bay of AUVs and gliders and the full range of terrestrial CI deployments (CyberPoPs). (Note the deployment of OOI assets into the marine environment is scheduled during the CI's Release 1.)

The OOI multi-facility strategy supports the participation of multiple independent organizations and communities in the OOI integrated observatory. Each facility represents its own domain of authority with its own rules and policy decisions, bringing its users and resources to the integrated observatory. Consistent governance is applied throughout the system of systems determined by electronically represented agreements and contracts between the participating facilities. This model requires and enforces no central authority and policy rules. Instead, the participation of the facilities and their principals is fully subject to the agreements between the facilities, with policy enforced consistently by the integration infrastructure. This integrated observatory network is open and can be joined by user facilities. (The multi-facility strategy will be minimally applied in Release 1.)

The network strategy supports the scalable, low latency, high bandwidth and secure distribution of science data in real time to end users and affiliated organizations across the country and world wide. It applies global load balancing to route traffic and access data to CyberPoP deployments most proximate or most suitable to satisfy requests at any given time. The network strategy is essential in providing a robust, geographically redundant, highly available Integrated Observatory Network presence.

Project Organization and Transition to Operations

The capabilities of the integrated observatory are designed modularly and support incremental development and transition to operations. The OOI

8 CI implementing project will deliver five incremental releases that increasingly support user applications and processes, beginning from automated data preservation and distribution, ending at advanced concepts of interactive ocean science, including instrument and observatory interactivity exploiting knowledge gained through observations and analyses.

The five releases will support user applications as follows:

Release 1 provides a fully capable automated end-to-end data distribution and preservation infrastructure, supporting the needs of data consumers such as numerical modelers and the immediate needs of instrument providers. Release 2 adds end-to-end control of how data are collected, supporting more advanced processes of instrument providers with managed instrument control. Release 3 adds end-to-end control of how data are processed, supporting more advanced processes of instrument providers and data product consumers, as well as on-demand measurements supporting event-driven opportunistic observations. Release 4 adds control of integrated ocean models driven by the data collection process, supporting data product developers and the numerical modeling community Release 5 adds control of data, processes, and models to drive the collection process, supporting observatory interactivity and transformative ocean observatory science for all users.

After its final release 5, the integrated observatory will support real-time modeling and data assimilation, adaptive sensing and platform control, rapid response and event capture, and closed loop, integrated sensing, modeling, and distributed network control.

This schedule is presented in more detail in the Transition to Operations. Release 1 scope is defined in Release 1 Scoping, Release 2 scope is defined in Release 2 Scoping.

CIAD AV Architecture Project

Architecture Project

Overview

This architecture and design specification, the Integrated Observatory Network Architecture Specification (IOCI-AD, OOI document 2130-00003) describes all the user-supporting applications and the enabling software infrastructure of the Cyberinfrastructure (CI) part of the Ocean Observatories Initiative (OOI) program. It replaces the previous released documents: the Cyberinfrastructure conceptual architecture [CI-CARCH], the CI preliminary design [CI-PAD ] and the final design (FDR) [IOA-AD, IOI-AD].

This document was developed by the OOI CI Architecture Team, a team within the CI Implementing Organization that is led by the OOI CI Senior System Architect. Document approval follows the OOI Configuration Control process. Changes affecting the CI system scope only are approved by the CI Level Change Control Board. Changes that affect the entire OOI program are approved by an OOI level Change Control Board.

Purpose

This architecture specification:

Defines the OOI Integrated Observatory Network (ION), developed and operated by the CI Implementing Organization as a part of the NSF Ocean Observatories Initiative (OOI) Defines the relationships of the OOI Integrated Observatory Network to non-OOI observatory programs. This includes programs such as IOOS and NEPTUNE Canada. Defines the OOI Integrated Observatory Network as an OOI-wide integration system for the observatory elements (i.e., regional, coastal, and global elements) of the OOI, and a basis for the development of OOI's educational and public engagement component (EPE IO). Establishes a common terminology and architectural principles for the OOI CI organization and all its integrated product development teams Identifies the interfaces of the CI with both Marine IOs (CGSN IO and RSN IO) and the EPE IO. Provide approved design baselines for subsequent construction in product development teams, in the context of a spiral development process. Design baselines are subject to objectives and architectural reviews. Provides implementation specifications for the CI subsystems that are in compliance with stakeholder requirements, as expressed by the OOI requirements in DOORS and by OOI level and CI level Concepts of Operation, by designing and documenting a set of information system capabilities, and by designing and documenting concrete system components and interfaces.

This architecture specification serves the goal of providing a consistent, structured, up-to-date representation of the OOI CI design and architectural principles, providing specialized views for diverse groups of stakeholders. Such views include operational views, required by users and decision makers, and deployment and process views, required by CI implementers and subsystem architecture and design teams. This architecture also serves the goal of entraining potential participants in, and advocates for, the OOI Integrated Observatory Network and its connected observatories.

The Operational Views capture the viewpoints of the potential users of application and infrastructure services, while the System Views capture the technology integrator and subsystem implementer viewpoints.

Scope

9 The IOCI-AD describes the architecture and design of the application and infrastructure services of the OOI Integrated Observatory Network in its full extent of a planned 5 year construction period for a planned 25-year operations period. The scope of the incremental releases is defined in the Transition to Operations.

Applications address user concerns. They include scientific, educational, and operational applications that leverage the CI infrastructure capabilities. The CI infrastructure provides a secure and reliable communication, computation and resource management framework. The IOCI-AD does not cover the other OOI components such as the regional, coastal and global observatories, their infrastructure components, or their education and public engagement components.

The operational activities enabled by the CI span the entire OOI system. They make the OOI appear to its users and the environment as one integrated system. The CI provides a "face" to the OOI system: the "OOI Integrated Observatory". As the central integration infrastructure element, the CI defines interfaces to OOI components outside of the CI, as well as to other external systems. The scope of each of these components - the CI infrastructure, its interfaces, and the user-facing applications - are delineated in this document.

Guidelines

The final CI architecture and design has been developed according to the guidelines in the DOD Architecture Framework DoDAF .

The IOCI-AD complies with the following guidelines:

Assess the use of off-the-shelf components from commercial sources or other science-oriented distributed network systems, and include them where feasible. Provide of a clear distinction between the responsibilities of the CI, representing the "over-arching" Integrated Observatory, and those of the individual observatories and education functions. Support the definition and management of interfaces between the CI, other OOI entities, and entities external to OOI. Incorporate science, education and technology community knowledge and concerns into the design specifications, as prominently brought forward in the OOI CI requirements and design workshops (see Related Documents below).

Funding for the development and operation of the CI follows NSF regulations. All government standards applicable to the CI (for example, metadata standards) therefore apply.

Certain described data streams, data products and processes may fall under the jurisdiction of national security organizations. No architectural changes to the Cyberinfrastructure have been required to meet the requirements of such entities.

Assumptions and Constraints

The OOI CI architecture document is a living document that depends on input from various stakeholders and in particular the OOI implementing organiations. Individuals and organizations in the following stakeholder communities guide the specific requirements and features to be implemented in the OOI CI.

OOI community:

Ocean scientists directly involved with observatory experiments. Ocean scientists not directly involved with observatory experiments. Ocean scientists who develop data products based on observatory data. Ocean technologists and engineers who develop instruments and platforms for use on ocean observatories. Education application developers who develop outreach products for the OOI. Educators who use outreach products developed by OOI. Policy makers who use the Data Products of the OOI.

External community:

Environmental scientists directly involved with other observatory efforts (e.g., NEON, WATERS, LTER). Environmental scientists not directly involved with other observatory efforts. Informatics technologists who develop software for analysis and integration of environmental science data. The public at large. The operational oceanography community, and especially the Integrated and Sustained Ocean Observing System (IOOS). United States national security authorities.

The following constraints must be considered for the entire architecture and design:

The OOI CI is in service to scientific investigation, discovery and innovation, and its development must be science-driven. System engineering and integration activities for the OOI CI should comply with the procedures and standards supported by the International Council on System Engineering (INCOSE) (http://www.incose.org) and EIA 632 "Processes for Engineering a System", to the greatest extent practicable. The national security concerns of the United States government must be accommodated. Legal requirements must be accommodated (for example, Section 508 of the US Rehabilitation Act, and the International Traffic in Arms Regulations). The hardware/software interface between the wet and dry elements of the global, coastal, and regional cabled observatories and the OOI CI must be managed throughout the design cycle. The Operational Concepts (OpCon) should comply with existing stakeholder policies and procedures. OOI CI system engineering and integration activities should not disrupt existing architectural frameworks identified in each stakeholder area of interest.

10 In addition, the architecture and its documentation are constrained by the availability of information for subsystems that may be influenced by commercial confidentiality or other factors.

Related Documents

The following documents provided input for this document (see the List of References for pointers to these documents).

Reference Document Name

CI-CARCH Concept Architecture

CI-PAD Preliminary Architecture

CI-IRD Integrated Requirements Document

CI-RWS1 First CI Requirements Workshop Report

CI-RWS2 Second CI Requirements Workshop Report

CI-ROOP CI Requirements Workshop Report: Ocean Observing Programs

CI-RDPG CI Requirements Workshop Report: Data Product Generation

CI-RIOM CI Requirements Workshop Report: Integrated Observatory Management

CI-REPE CI Requirements Workshop Report: Education and Public Engagement

CI-RUA CI Requirements Workshop Report: User Applications

IOA-AD Integrated Observatory Applications Architecture Document

IOI-AD Integrated Observatory Infrastructure Architecture Document

OPCON Operational Concepts

Architecture Concepts and Artifacts

Figure 1 shows the central entities of system design, implementation and integration in an abstract notation. The entities (rectangles) are shown with their dependencies (lines and arrows) in the notation of class diagrams. See the UML Notation Reference for a brief tutorial and introduction to the kind of notation that will be used in the remainder of this document. One of the benefits of the UML notation is that it shows the core entities of a topic, here the design, implementation and integration artifacts, and their relationships in a concise, graphical way much better than text alone would. Additional text provides further explanations and detail.

11 Figure 1. 2650-00016 Architecture and Artifacts, Domain Model (OV-7)

The designed and to-be-constructed System is the OOI Integrated Observatory. More specifically it stands for the software system developed, assembled and integrated by the CI system development team (SDT) as a distributed system of systems. The system provides a number of applications to its Users . On the back end, it interfaces with a number of External Systems, for instance marine observatory infrastructure management systems and other observatories, such as IOOS.

A foundation for the development and integration of the system is the Integration Strategy. This strategy explains how heterogeneous technologies and tools are assembled into an integrated system of systems that remains maintainable over an extended period of time.

The deployment concerns during system implementation are addressed by the Deployment Strategy. This is the strategy for distributing the components of the system across the various locations on the network and considering the heterogeneous hardware and software environments at these locations.

The Operational Architecture describes how the system behaves in its operational environment and what the functions and needs of its constituent elements are. The architecture description is conformant to the DoDAF standard.

On the logical level, when decomposing the system into operational (i.e., functional) concerns, the system is structured into Services Networks. These are groups of services fulfilling a specific purpose. Each of the six services networks of the CI is specified as an Operational Node. Operational nodes are part of the operational architecture. Further operational nodes exist representing external systems, interfaces and entities, as well as finer-grained services on the next level of decomposition of services networks.

Operational Activities define how operational nodes interact and what information they exchange. This can be seen as an orchestration of the services that are part of the services networks.

Operational Data Models, mostly in form of domain models, describe and characterize operational activities. In particular, they specify the information elements exchanged as part of operational activities.

The software implementation and integration of the CI system occurs through the construction of six Subsystems. Subsystems are are comprised of Services and User & Application Interfaces. Subsystems implement services that can depend on other services (from other subsystems). By convention, one subsystem implements all services of the services network of the subsystem's name.

Services are representations of capability available to the system via the network by a defined name. Each service has a Service Specification that in particular defines the interaction pattern and message formats necessary to access the service. Services can depend on other services. Dependency must not be cyclic and does not have to be a tree hierarchy. This creates a directed, acyclic service dependency graph. The arrangement of subsystem service construction and integration needs to reflect these dependencies.

Subsystem Product Teams construct the subsystems as part of the CI construction. The construction process includes the specification and development of services and user/application interfaces. All subsystem product teams have a Subsystem Lead, and several Designers and Developers. Specific designers and developers can be technology representatives, which are the resident experts for Technology Implementations that should be integrated as a service or interfaced to.

The Operational Architecture, comprising Services Networks, Services, Operational Nodes, Activities and Data Model are specified in this section of the architecture documentation.

Integration and Deployment Strategies, Subsystems and Technology Mappings to Subsystems and Services are specified in the deployment architecture section.

Technologies are cataloged and briefly described in the standards and technology section.

Structure of Design Elements

There are elements on 3 levels:

System Subsystem Subsystem service components

Each element is structured if full detail like this: (not all content have to be present)

Element overview page:

OV pages SV pages TV pages

Child elements are listed in the overview page of the parent with links to all detail pages

CIAD AV OOI Context

Context

12 This section provides context for the contents targeted by this architecture and design specification document, namely the OOI Integrated Observatory Network, in particular the OOI Cyberinfrastructure system.

Mission

In order to provide the U.S. ocean-sciences research community with access to the basic infrastructure required to make sustained, long-term and adaptive measurements in the oceans, the National Science Foundation (NSF) Ocean Sciences Division has initiated the Ocean Observatories Initiative (OOI). The OOI is the outgrowth of many years of national and international scientific planning efforts. The OOI builds upon recent technological advances and experience with existing observatories, and is underpinned by several successful pilot and testbed projects. As these efforts mature, the research-focused observatories enabled by the OOI will be networked, becoming an integral part of the proposed Integrated and Sustained Ocean Observing System (IOOS; http://ioos.noaa.gov). IOOS is an operationally-focused national system, and in turn will be the enabling U.S. contribution to the international Global Ocean Observing System (GOOS; http://www.iocgoos.org) and the Global Earth Observing System of Systems (GEOSS; www.earthobservations.org).

Goals and Vision

The OOI (see Figure 1) comprises three types of interconnected observatories spanning global, regional and coastal scales. The global component addresses planetary-scale problems via a network of moored buoys linked to shore via satellite. A regional cabled observatory will 'wire' a single region in the Northeast Pacific Ocean with a high-speed optical and power grid. The coastal component of the OOI will expand existing coastal observing assets, providing extended opportunities to characterize the effects of high frequency forcing on the coastal environment. The OOI Cyberinfrastructure (CI) constitutes the integrating element that links and binds the three types of marine observatories and associated sensors into a coherent system-of-systems. An Education component will provide an infrastructure for education and public engagement applications. Indeed, it is most appropriate to view the OOI as a whole-the Integrated Observatory-that will allow scientists and citizens to view particular phenomena irrespective of the observing elements (e.g. coastal, global, regional, ships, satellites, IOOS...) to which the observations belong.

The objective of the OOI CI is provision of a comprehensive federated system of Observatories, Laboratories, Classrooms, and Facilities that realizes the OOI Mission. The infrastructure provided to research scientists through the OOI will include the cables, buoys, deployment platforms, moorings and junction boxes required for power and two-way data communication with a wide variety of sensors at the sea surface, in the water column, and at or beneath the seafloor. The initiative also includes components such as unified project management, data dissemination and archiving and education and outreach activities essential to the long-term success of ocean observatory science. A fully operational research observatory system would be expected to meet the following goals:

Continuous observations at time scales of seconds to decades Spatial measurements from millimeters to kilometers Sustained operation during storms and other severe conditions Real-time or near-real-time data as appropriate Two-way transmission of data and remote instrument control Power delivery to sensors between the sea surface and the seafloor Standard plug-n-play sensor interface protocol Autonomous underwater vehicle (AUV) dock for data download/battery recharge Access to deployment and maintenance vehicles that satisfy the needs of specific observatories Facilities for instrument maintenance and calibration A management system that makes data publicly available An effective education and outreach program

13 Figure 1. OOI Components (AV-1)

The vision of the OOI CI is to provide the OOI user base, beginning with the science community, access to a system that enables simple and direct use of OOI resources to accomplish their scientific objectives. This vision includes direct access to instrument data, control, and operational activities described above, and the opportunity to seamlessly collaborate with other scientists, institutions, projects, and disciplines.

The core capabilities and the principal objectives of ocean observatories are collecting real-time data, analyzing data and modeling the ocean on multiple scales and enabling adaptive experimentation within the ocean. A traditional data-centric CI, in which a central data management system ingests data and serves them to users on a query basis, is not sufficient to accomplish the range of tasks ocean scientists will engage in when the OOI is implemented. Instead, a highly distributed set of capabilities are required that facilitate:

End-to-end data preservation and access, End-to-end, human-to-machine and machine-to-machine control of how data are collected and analyzed, Direct, closed loop interaction of models with the data acquisition process, Virtual collaborations created on demand to drive data-model coupling and share ocean observatory resources (e.g., instruments, networks, computing, storage and workflows), End-to-end preservation of the ocean observatory process and its outcomes, and Automation of the planning and prosecution of observational programs.

Figure 2 shows the of closed loop scientific investigation activities enabled by OOI integrated observatory, based on Cyberinfrastructure capabilities. Observations from various sources are assimilated and feed into models of the ocean such as ROMS, the Regional Ocean Modeling System. Their output feeds into analyses and that are subsequently exploited for refining future observations and sensor configurations, for instance by providing specific taskings of observing programs using gliders.

14 Figure 2. 2650-00017 Closed Loop Scientific Investigation Activities supported by the CI (OV-1)

In addition to these features, the CI must provide the background messaging, governance and service frameworks that facilitate interaction in a shared environment, similar to the role of the on a computer.

The OOI system construction will occur during the confluence of several significant technology innovations in web and distributed processing: semantic webs, social networks, Grid computing, sensor networks, service-oriented architectures (SOA), event-driven architectures, policy-based security and machine virtualization. Each offers different capabilities, and each may increase the scope and reliability of the OOI system while lowering its complexity and cost. The challenge to building the CI at this time of convergence is finding an appropriate integration architecture and roadmap to deliver a functioning system as early as possible, while maintaining the ability to refine and extend operating characteristics as technology evolves.

Science Drivers for the OOI Integrated Observatory

The community over the last decade has identified high priority science needs, and the OOI has been designed to quantitatively address these questions. This is especially critical as the oceans are changing in our lifetimes, and developing a quantitative understanding of relevant processes is crucial to understanding the possible trajectories of these changes and potential impacts on human society. The OOI will provide scientists a sustained presence in extreme ocean environments, enabling fundamental discoveries. Given the need to develop a quantitative picture of the ocean, scientists require spatial time series spanning many scales across a range of marine biomes. Most importantly scientists require the ability to measure the interactions at the boundaries between the ocean-atmosphere-sea floor- and coasts. Fully sampling the ocean is not possible so the OOI has focused on deploying instruments capable of resolving a range of scales on the boundaries of the oceanic gyres which represent regions that play a disproportionately large role influencing the cycling of energy, elements, and biota on Earth. The OOI will accomplish this by deploying a distributed but linked infrastructure in regions to enable the collection of data that will allow fundamental processes to be characterized across a range of marine systems. The spatially distributed full OOI network will be required to quantitatively test our understanding of the high priority science questions. The infrastructure will allow scientists to quantify the interactions between the sea floor and the overlying water column using regional scale cabled networks, the interaction between the atmosphere and the ocean with novel robust networks capable of the withstanding the extreme weather, and nested robotic grids to resolve the interaction between the deep sea and coastal arrays.

Given this, there is a need to develop a robust Cyberinfrastructure to allow all of the distributed assets to be coordinated in an integrated manner. These assets will be used to address many scientific questions reflecting the scientific diversity of the earth system science community.

The science motivating the OOI network is based on the research community input. The numerous community reports emphasized the need for simultaneous, interdisciplinary measurements to investigate a spectrum of phenomena, from episodic, short-lived events (tectonic, volcanic, biological, severe storms), to more subtle, longer-term changes in ocean systems (circulation patterns, climate change, ecosystem trends). The introduction of high power and bandwidth will allow the transition from ship-based data collection to the management of interactive, adaptive sampling in response to remote recognition of an "event" taking place. Sophisticated CI tools will enable individual and communities of researchers to tackle their specific research questions. The following are integrative examples of some of the broad science questions that the OOI network will be able to address (see [SCIPROSP).

15 What is the ocean's role in the global cycle? How important are extremes of surface forcing in the exchange of momentum, heat, water and gases between the ocean and atmosphere? How important are severe storms and other episodic mixing processes affect the physical, chemical, and biological water column processes? How does plate scale deformation mediate fluid flow, chemical and heat fluxes, and microbial productivity? What are the forces acting on plates and plate boundaries that give rise to local and regional deformation and what is the relation between the localization of deformation and the physical structure of the coupled astenosphere-lithosphere system? How do tectonic, oceanographic and biologic processes modulate the flux of into and out of the submarine gas hydrate "capacitor," and are there dynamic feedbacks between the gas hydrate methane reservoir and other benthic, oceanic and atmospheric processes? How do cyclical climate signals at the ENSO, NAO and PDO timescales structure the water column and what the corresponding impacts on the chemistry and biology in the ocean? What are the dynamics of hypoxia on continental shelves?

Researching answers to these science questions involves the combination of several science and engineering activities, that need to be supported by the different components of the OOI program, including sensors, 24/7 marine observatory infrastructure and cyber-infrastructure components. Specific user requirements coming from the community have been elicited successfully in the seven user requirements workshops held 2007 and 2008, see [CI-RWS1, CI-RWS2, CI-ROOP, CI-RDPG, CI-RIOM, CI-REPE, CI-RUA].

Education and Public Engagement (EPE)

The education goals and the key science questions that frame the OOI infrastructure are tightly coupled. The science questions provide the interdisciplinary context for effective marine education that, in turn, develops the intellectual capital needed to build research capacity and an ocean literate and engaged public. The goal of elevating ocean literacy recognizes the vital relationship between society and the ocean, and will require sustained educational efforts targeted at multiple audience levels. The OOI cyber-infrastructure will provide the technological platform to make unique educational contributions to both "free-choice" audiences and post-secondary learners.

The ability to engage and serve a range of education providers and communities and to encourage partnerships between researchers and educators will be a critical contribution of OOI infrastructure. These efforts will help ensure that national and international policy and science priorities are simultaneously addressed at a variety of scales (global to local) and tailored to account for differences in geographic regions, cultural diversity, digital capabilities, as well as different ocean uses, interactions, and phenomena within these areas. The OOI will participate in a nationally "coordinated effort to develop and promote a comprehensive education message about the ocean and its role in the Earth System, and to enable the use of ocean-observing data for management and educational purposes" (NSTC-JSOST, 2007).

The nine EPE drivers listed below define the overall focus of the OOI education effort and the purpose of creating the EPE Implementing Organization (IO) as part of the OOI. These drivers (i.e., high-level requirements) were developed by the OOI EPE Planning Group using expert input and discussion at the EPE Drivers and Requirements Workshop. OOI Education and Public Engagement Drivers include:

The OOI will enable communication, education, and public engagement efforts that tightly interweave the key OOI science themes with the essential principles of ocean literacy. The OOI will support online post secondary training programs with a focus on increasing participation and diversity in ocean science and technical careers. It will also support "free choice" learning in a variety of both physical and virtual settings with a focus on increasing public engagement with ocean science and technology. The OOI will enable multiple forms of access to and engagement with the development path and construction history of the OOI enterprise in order to support innovative engineering and technology education. The OOI will have the capacity to engage and respond to audiences of diverse cultural or economic backgrounds, or who may traditionally have been underserved in ocean education. The OOI will enable multiple forms of interaction and collaboration that assist in the formation of ocean policy at both national and international levels. The OOI will support enhanced field experiences for students engaged in OOI activities including construction, operation, maintenance, and research. The OOI will enable multiple forms of interaction and collaboration that facilitate networked community access among scientists, engineers, and educators. The OOI will enable open access to EPE data products, visualizations, and other educational materials developed as part of the OOI effort for a wide range of users. The OOI will be developed in collaboration with, and support of, the national community of marine education providers, in order to leverage the unique contributions of the OOI and to more effectively reach a broad audience.

Organizational Context

Figure 3 shows the OOI organizational structure. The central element is the OOI Program Office, run by the Consortium for Ocean Leadership (COL), which provides overall project management and oversight functions for the implementing organizations and sponsors an extensive advisory committee structure.

16

Figure 3. 2650-00014 OOI and CI Organization Chart (OV-4)

The Program Office headed by a Program Director and reports both to the JOI Board of Governors and to the National Science Foundation. The three implementing organizations for the regional scale node (RSN), coastal-global scale node (CSGN), and cyber-infrastructure elements (CI) report to the project office through the OOI Director of Engineering. A fourth implementing organization for education and public engagement (EPE) is planned.

The OOI Advisory Committees advise the Program Office on policies and procedures for observatory operations, usage, and data management, approve annual OOI Science and Operations Plans, and carry out program planning and development functions.

The Cyberinfrastructure implementing organization defines several teams: The architecture team (ADT) is responsible for maintaining a consistent architecture of the CI system, designing the CI system and its interfaces to the other OOI components, reflecting the stakeholder concerns and providing the specifications for construction. The system development effort comprises six subsystem integrated product teams (IPT) and an integration test and validation (ITV) team, and works together with the ADT to ensure compliance with the design and standards. The ITV team carries out system integration and system level testing activities. The operations and maintenance team (O&M) is responsible for the actual operation of the constructed system from the time the first release transitions to operation. The quality management (QM) team carries out ongoing quality assurance and control activities during the design/build cycle.

CIAD AV OOI Use Cases

Science and Education Application Scenarios Overarching Scenario: Scientific Investigation Use Scenario 1: Large-scale ocean observatory with access to external data sources Use Scenario 2: Using numerical models to coordinate multi-resource observations Use Scenario 3: Interactive Control of a Remote Laboratory Use Scenario 4: Autonomous Control of Mobile Instrument Platforms Variation: Deploy new Instruments Scenario Extension: Education Application Development Science Workflow Use Cases Configure OOI system to accept data products System performs publication when software runs Publishing data as a stream Data Stream Archival Publishing as publicly available data Publishing notifications Release Specific Use Cases Material Covered

Science and Education Application Scenarios

17 This section provides a representative set of user application scenarios, based on direct user input received at the requirements and design workshops, and the system requirements. Some of the scenarios are grounded in the Concepts or Operation (see References CI-COP1, CI-COP2, CI-COP3).

In particular, the following scenarios are brought forward to illustrate the application design specified in this document:

Numerical Modeling [CI-RWS2 ] Scenario I (4.5): "Test the shelf productivity hypothesis" (Numerical model analysis) Ocean Observing Programs [CI-ROOP ] Scenario 2 (4.4.2): Objective driven observations with gliders Data Product Generation [CI-RDPG ] Scenario 2 (4.6): Instrument Lifecycle Data Product Generation [CI-RDPG ] Scenario 4 (4.9): Virtual Observatory Integrated Observatory Management [CI-RIOM ] Scenario 1 (4.4.1): "A day in the life of a Test Pier Operator" Education and Public Engagement [CI-REPE ] Scenario 1 (4.2.1): "What is the role of the ocean in the CO2 problem?"

The following four scenarios are examples identified as mandatory requirements for the CI. However, they are not meant to be exhaustive or all inclusive.

Overarching Scenario: Scientific Investigation

The central use case scenario supports the activities of scientific investigation through an environmental scientist or researcher. The OOI Integrated Observatory, specifically its CI component, provides the capabilities and user interfaces to perform this core application. In the most general case, the OOI Integrated Observatory is a federation of organizations.

As a prerequisite to use the CI capabilities, any CI science user needs to have an electronic identity established from their home organization, such as a university account and login. The general public is not required to have any credential, but has restricted access to dedicated educational spaces. Using existing trust relationships between the OOI and user organizations or a specific user registration process, the OOI can verify the user's identity when the user accesses a central OOI user interface (there can be many interfaces tailored to specific audiences). With valid credentials, the user can then access a project-specific workspace to interact with observatory resources from data products to instrumentation. This workspace, previously defined by a project administrator, represents a virtual observatory and provides the users with tools to access and manipulate the OOI resources of interest for a specific project setting.

The following scenarios derive from this basic understanding of the OOI CI mode of operation. There are several assumptions as prerequisites to any of these scenarios:

Observational data is available in OOI archives with appropriate detail for the region and variables of interest. Large-scale model outputs are available as data products on OOI (either archived or automatically recomputable) for region and time frame of interest. Ocean model algorithms are available for the environmental processes of interest, taking (historic) observational data as well as larger-scale resolution model output as input. Such algorithms are configurable and tunable according to scientists needs. Data processing and transformation tools are available through the OOI to the scientist.

Analysis and presentation tools are available to analyze model output and data series.

All scenarios imply the following basic steps, with variants being discussed to illustrate CI capabilities and highlight different architectural constraints detailed in this document:

1. Within a project workspace on OOI, research and import all data, model output, model algorithms and configurations; 2. See Virtual Observatory Scenario (RDPG, Scenario 4); 3. Make the necessary processing, transformation and configuration steps; 4. Run the model and compare with expected observational data; 5. Analyze and present results with respect to the hypothesis.

Use Scenario 1: Large-scale ocean observatory with access to external data sources

18 Figure 1. Large-scale ocean observatory scenario

Figure 1 shows a large coastal observatory comprised of long and short range coastal radar (CODAR) nodes and a mix of buoys and glider tracks covering most of offshore southern California. This constitutes a regional framework for coastal science processes and events composed of semi-autonomous resource nexuses (e.g., discrete buoys). At the node level, data gathering and resource allocation (e.g., power or bandwidth) is comparatively simple and can be implemented in local hardware or autonomous software. However, coordinating large numbers of nodes into a coherent scientific whole that is larger than the sum of the individual parts is a significant challenge.

Regarding just data access, we assume that the data provider is not part of the OOI. Thus, we enhance the basic scenario with the variation that observational data is not available within OOI archives, but it is accessible through the Internet (external database, published on a website, FTP, etc.). In this case, this external data source needs integration with the CI infrastructure. The following are additional steps required for the integration:

1. Configure external data source; 2. Define or develop data transformation and processing required; 3. Use OOI internal and external data within OOI workspace as input for an existing numerical model on the CI.

We can also extrapolate this scenario to other usage patterns. For example, linking the functionality of CODARs up and down the coast without human intervention is a major science requirement. Management of diverse types of data and their associated metadata is another. CI is needed to provide a consistent and automatic control of these and other aspects of the overall observatory. Hence, in a very real way, the concept of a regional framework is important at the operational as well as the scientific level. One of the major operations and maintenance challenges for a distributed ocean observatory is tracking and coordinating the state of observatory resources. Thus, through CI the science use case is also the operations use case.

Use Scenario 2: Using numerical models to coordinate multi-resource observations

Traditional data assimilation models operate in open loop form, incorporating retrospective or real-time data into the model run without altering the measurement protocols. Dynamic data-driven application systems (DDDAS; Darema, 2005) close the loop by allowing modification of sampling by the assimilation model. The assimilation model may change sample rates for selected instruments in response to an event. It could also steer instruments on a mobile platform (such as a ship) to locations where property gradients are largest in the simulation. Complexity builds up when we incorporate the addition or removal of fixed or mobile instruments from the domain of interest in response to model output.

19 Figure 2. Observatory comprised of ships, aircraft and autonomous vehicles linked to assimilation modeling capabilities on shore

Based on Figure 2, in this scenario, we assume that the ocean model algorithm is not available on the OOI; hence, we need an extension of an existing model or a new model. The following steps are necessary:

1. Develop and add new/enhanced model to the CI 2. Run new model with existing observational data and nested model output for initial and boundary conditions.

Accomplishing a DDDAS scenario with fixed instruments pushes further the complexity, by requiring a wide range of resource allocation, instrument control, and instrument communication services to coordinate the functionality of the assimilation model, the instrument suite, and the ocean observatory infrastructure. If some of the instruments are mobile or the sensor mix changes with time, then additional services for discovery and localization or tracking may be needed. Crosscutting requirements for time synchronization and security services also exist. Hence, a CI with such capabilities is of paramount importance to support this scenario.

Use Scenario 3: Interactive Control of a Remote Laboratory

20 Figure 3. Site on regional cable observatory containing power-intensive interactive instruments

Consider a more elaborate use case, which encompasses many heavily instrumented sites distributed around a regional cabled observatory (e.g., ten or more multidisciplinary moorings extending through the water column). This adds additional complexity through shared use of instruments and resources by multiple users and the difficulty of remote coordination of resources over large distances.

Figure 3 depicts a single science site where a diverse suite of sensors and actuators are deployed over a small area (for example, on the scale of a hydrothermal vent field) to accomplish multidisciplinary science. The sensor suite may include physical, chemical, and biological types, and the science mission may require frequent changes in their location or mix. Heavy use of stereo HDTV and high resolution acoustic imaging are anticipated, with concomitant demands on bandwidth and power resources.

Acquisition and storage of physical samples for later retrieval and onshore analysis may be needed. Accurate repeat positioning of actuators for sampling may also be required, imposing closed loop control constraints on the hardware and software infrastructure. This use case involves stringent demands on the shared use of instruments and other resources by many users. Quality of service, latency, and jitter requirements implied by real-time stereo HDTV and closed loop control of sampling actuators are strict.

We consider the following variation of the basic scenario: the required sensors and observational infrastructure exist with OOI, but they need to be reconfigured/interactively controlled to provide the desired resolution and frequency. This variation translates into a need for reconfiguration, tasking and interactive control of existing instrumentation. Thus, we have to perform the following steps:

1. Simulation of ocean observing and modeling steps 2. Develop a plan for reconfiguring or tasking instruments 3. Await instrument reconfiguration approval 4. Interactively control instruments during availability 5. Use additional collected observational data to run model and do analysis

From the CI perspective, a diverse set of services for resource allocation, time synchronization, instrument monitoring and control, bi-directional instrument communication, cross-calibration, coordination of sensing regimes (e.g., optical or acoustic), localization, tracking, and security are required. Closed loop control may not be feasible in the presence of high seafloor-to-shore latency without CI assistance, such as that used in remote surgery applications.

Use Scenario 4: Autonomous Control of Mobile Instrument Platforms

Looking a decade into the future, the sensor suite at ocean observatory sites of interest may consist of a mix of large numbers of low capability, low cost fixed sensors (e.g., for the measurement of temperature over an area) and small numbers of high capability, high cost sensors (e.g., in situ spectrometers) in mobile platforms.

21 Figure 4. A coordinated set of autonomous underwater vehicles

This combination simultaneously accomplishes a continuous areal-scale overview with high resolution and directed, local-scale resolution measurements in an economical fashion. The enabling technology that makes this approach feasible is a network of high bandwidth optical modems that provide a wireless extension of the observatory infrastructure, both making it possible to accommodate large numbers of sensors without physically attaching them to the observatory and allowing real-time access to fixed sensors and mobile platforms.

The mobile platforms (illustrated in Figure 4) may operate continuously to accomplish pre-programmed sampling missions or under human control for exploratory sampling. Arrays of sensors that fuse into coherent sensor networks are a rapidly evolving application in terrestrial monitoring. This can be accomplished by either linking all sensors to an optical modem network or through pervasive, direct peer-to-peer interconnection. Since the characteristics of the terrestrial wireless and seafloor optical environments are similar, it is reasonable to expect both methods to be widely utilized on the seafloor in the future.

This use case aggregates all of the requirements of the previous three scenarios, involving both resource-intensive applications and an ever-changing mix of mobile sensors that are complex in their own right, and whose operation must be coordinated in real-time. Additional services to provide for discovery of topology and location-aware routing in a time-varying network may be necessary. Sensor networks may also require group management and collaborative information processing applications. A cross-cutting requirement is one of simplicity; for example, low cost sensors with wireless links may not have the capability to process complex time services.

Variation: Deploy new Instruments

Consider the case that a new instrument or sensor is deployed, with the requirement that its observational data should be accessible throughout the integrated observatory.

Additional Steps:

1. Deploy instrument on OOI infrastructure 2. Sub-scenario: Develop instrument drivers and data processing 3. Sub-scenario: Test new instrument before/after on OOI 4. Develop data processing steps 5. Run models and analyses

Scenario Extension: Education Application Development

Consider the following extension of the Use case 4:

The environmental process (e.g. carbon flux) should be presented to an educational audience, for instance in form of an interactive museum display An education application needs to be developed and operationally hardened on OOI hardware and software

Additional Steps

1. Develop educational application 2. Use analysis and visualization widgets 3. Idealize and simplify model output 4. Provide interactive access to historic observational time series 5. 22 5. Install operational procedures for automatic data computation in regular intervals on OOI

Science Workflow Use Cases

These use cases have been adopted from the longer OOI Science User Concept of Operations, initially developed as part of the OOI Cyberinfrastructure Conceptual Design.

Dr. Adrian Chu of the University of Nebraska Oceanography Center has been working on an analysis tool for some time. The tool integrates several sources of oceanographic data into a small model, and produces a prediction when certain conditions are met. Data from multiple OOI observatories are blended together in the model. Dr. Chu will work with several international researchers from Canada (Dr. Nicole Jones), France (Mlle. Jeanne Fleuris) and Russia (Dr. Dmitri Istantov). He has previously used the OOI cyberinfrastructure to set up his collaborative work group and construct a virtual workspace for them to use. The group has then interactively modified and updated Dr Chu's model and added new features to it. The model has been tested by subscribing to data streams from the OOI observatories. This resulted in further model changes, and Dr Chu and his team are now ready for an operational run. As he starts running the software, although he doesn't fully appreciate it, the OOI infrastructure is performing a lot of steps to make sure the products show up where they are expected.

Configure OOI system to accept data products

Just as Dr. Chu received pointers to subscribe to specific resources, his colleagues Dr. Jones and Dr. Istantov received resource descriptors when configuring the code to publish its observational events and summary data. To obtain this publication resource, the collaborators had to enter information into a publication metadata form that describes the source and nature of their publications. These metadata descriptions help users learn more about data products, assist administrators to troubleshoot any problems, and allows the CI to create a processing history for each of the data products. They are also critical to supporting search functions for the products created by OOI. Because the forms use dropdown menus with controlled vocabularies to fill out most of the fields, and auto-population of subfields based on user selections, all of the members of the team fill out the metadata form consistently and relatively quickly.

When the software runs, it uses the publication resources to announce to OOI that it is the source of this particular observational event, data stream, or data set. OOI can then connect the people or systems who have sought out and requested these observational events or data.

System performs publication when software runs

As modified by Dr. Jones, Predicitve Ocean Integration Model (POIM) publishes a prediction whenever it detects an observational event. Although she marked this output as an 'observational event', it also has the characteristics of a data stream: it arrives repeatedly over time (not necessarily at a consistent interval), the same type of information is in every record, and it is associated with a single data source, in this case a software process. The additional identification of this record as an observational event serves several purposes: it lets people find the item by searching within a list of publishable observational events, it helps describe the nature of the item (specifically, that arrival of the publication constitutes a message of significance), and it enables general-purpose event-oriented tools (event counters and summarizers, news bulletin generators) to be developed by OOI or other organizations.

Now that the software is executing, observational events will be published on an occasional basis. Each publication is logged by the OOI infrastructure, so that it can be reviewed later in the context of other activities. As described earlier, each publication can be obtained by OOI members in one of several forms: as a subscription, as an email or other notification, upon request ("show me the last observational event of this type"), or in archived form. People who have not registered with OOI can see data products (e.g., the archived logs of observational events), but not the more complicated services.

Publishing data as a stream

Just as the observational events are published (and accessed) as a resource, so too can the data summaries from the model. In fact, this same publication technique can publish any OOI data stream, including those generated by OOI instruments. The key characteristics necessary to publish data as a stream are that the data be described in advance, that the data creator (the software or instrument which generates them) use the OOI APIs to submit the data to OOI, and that the resource identifier for the data stream be associated with every data record that is output as part of that data stream. If developers writing software that creates data generation want to take full advantage of OOI's capabilities to integrate, display, and process data - and most developers on OOI are either strongly urged, or required, to do so - they must describe their data in a consistent format, and output it in a way that the format can describe. If a data source like a GPS (or modeling software) actually generates multiple types of data records (for example, one data record, one summary record, and one error record), then the developer must create a separate description for each record, get a separate resource identifier for each record, and publish each record type along with the appropriate resource identification. While this seems like a lot of work up front, it usually is fairly straightforward and saves a lot of time in postprocessing the data streams.

In this case, Dr. Chu's colleagues have used these features well, and Mlle. Fleuris in particular quickly understood the process of describing her outputs from the model. She created a metadata description for the model summaries she produced, defining the meaning of each item in the summary and the data source used to present it. Unfortunately, Dr. Chu's model output, which is the data source for her summary, is itself unpublished, since he is keeping it private for now. She plans to suggest to Dr. Chu that the model itself be published as an OOI resource, so that users can trace the sources for these summaries and predictions back through the entire chain of operations in the OOI workflow system. For now, she has referenced the unpublished data by description, as well as pointing back to the observational data streams that Dr. Chu's model uses.

Data Stream Archival

23 Mlle. Fleuris set up the publication of the model to occur once every hundred times the model runs, as well as every time the model generates an observational event prediction. This allows the team to review the operation of the system over time and contrast its operation in predictive and non-predictive cases. Since the model runs hundreds or even thousands of times a day, this technique should limit the output to only a few outputs each day. This output volume is not very large, and the OOI infrastructure will respond accordingly by archiving them for an extended period. The holder of any reference to an OOI data stream can ask to view the data's historical records, as Dr. Chu did for the other data he wanted to review. If the reference holder has permission to view the data, they can be obtained from the OOI operational data archive. At this point, the events and model summaries can be viewed on-line or from the archive by the collaborators on the team. When the verification period (a period set by OOI policy, during which only proprietary access is allowed, so that the data can be evaluated and tested) expires, the data will be available to the public. At first Dr. Chu found this idea to be disturbing, but he has gotten used to it since he wants to use the full capabilities of the OOI.

Publishing as publicly available data

In fact, Dr. Chu expects he will make these data products - the events and the summaries, at least - publicly accessible well before the validation period expires. This takes a minor effort on his part, and he knows a lot of colleagues will want to take advantage of the resulting predictions for their own studies. As an enlightened act of self-promotion, he intends to make the results available with a request to acknowledge him on any papers that ensue. While he knows he may only be acknowledged on half of the papers that use the work, his name will still become widely known as the originator of the information. From his previous experience in publishing data from an instrument, Dr. Chu knows there are several steps required to make his results publicly available, including certification and verification. First, he must certify that the data source meets the standards described in the OOI service agreements. For software, this is little more than has already been specified in the metadata, along with running the software on an OOI test bed system. Obviously, standards for instruments to be deployed at ocean depths are somewhat more demanding. The observatory on which the data source is deployed will confirm that the interface specifications have been met. This is done automatically for software, and with some manual confirmation for hardware interfaces. A further step required before releasing software is the verification step. This consists of evaluating the results from the data source to confirm that it is operating as expected. As Dr. Chu has already accomplished this step to his own satisfaction, reviewing the OOI products from his system should be simple. He is prepared to quickly go to the trouble of releasing his data to a wider audience, and establishing its verified status on OOI. For core instruments on OOI observatories, more detailed criteria must be met, including verification that the metadata describing the data source are correct and QA and QC procedures are in place.

There are several advantages to publishing the data - in this case, events, notification of an event, and model summaries - to a wider audience. First, it makes the data immediately available to the public through the data products that OOI produces. It also makes the services easier to reference and use within OOI - while this could also be achieved by changing the access permissions on his data sets, making the data public automatically changes those access permissions. Further, it makes it clear that Dr. Chu has reviewed the data sources and believes they are functional. Finally, making his data public advertises his products to a wider audience, since the OOI data product registries will only replicate complete metadata descriptions for a data product if that product is in fact publicly accessible. Once the OOI data product registries announce their availability, Dr. Chu's results will become visible in four other data publication registries (three of which are internationally well known), and he will get extra credit and attention for his work.

Dr. Chu has been following some interesting developments related to the publication of metadata in external registries. Some scientists have been quoting as part of their "publication rate" the number of entries they have in data product registries, and some search tools have begun indexing the registries as a way to provide more contextual information about data sources, data owners, and data systems. As a result, the "free registration" that OOI provides will likely have benefits for Dr. Chu's work.

Publishing notifications

Dr. Istantov wants to email a notification to each member of the team whenever the software detects an event, and he has used an almost identical mechanism as the others. Some of the metadata for his "data stream" are different, but much of the cyberinfrastructure used for publishing the notifications is the same as for events and other data streams. In fact, although he didn't realize it, Dr. Istantov's metadata form was made easier to fill out because Dr. Jones and Mlle. Fleuris had filled out almost identical ones earlier that was used to pre-populate some of the fields on Dr. Istantov's form. While he was testing his code, he sent the notifications to himself, but after completing his code changes, he updated the distribution list. Because the notification message is published via email, Dr. Istantov can select the destinations from a number of email address lists, including a list of aliases, of actual users, and of virtual laboratories of which he is a member. He configures the email destination for this published message to be Dr. Chu's newly created virtual laboratory, and awaits further word.

Release Specific Use Cases

Each Release of the OOI Integrated Observatory Network has a different focus area and select focus user group, as described in Transition to Operations.

The Release specific use cases are covered in the Product Description for each Release:

Release 1 Product Description: Data Distribution Network Release 2 Product Description: Instrument Measurement Network

Material Covered

After reading this page, you should be able to answer the following questions:

Where did the OOI CI use cases come from? What is a dynamic data-driven application systems, and is the OOI CI an example? What will any CI science user need in order to use the CI capabilities, and where can he or she get it?

24 CIAD AV OOI Operations Overview

OOI Integrated Observatory

The OOI Integrated Observatory is a system of systems operated under multiple domains of authorities. The core elements are the two marine observatories RSN (Regional Scale Nodes) and CGO (Coastal-Global Observatories) integrated by the OOI Integrated Observatory Network, under the governance of NSF and the Consortium of Ocean Leadership (COL). The OOI Integrated Observatory Network is developed and operated by the CI (Cyberinfrastructure) organization.

Network Overview

Figure 1 shows schematically the physical elements and network links of the OOI Integrated Observatory as well as the operational responsibilities for the various system elements.

The shapes represent physical elements (i.e., nodes) in the network. Nodes are deployed at physical sites with computation and storage resources, and can range from embedded computing platforms on moored buoys and on mobile assets to server clusters in traditional compute infrastructures and data centers. Lines and clouds represent communication networks that connect these nodes over different physical media, from satellite telemetry to fiber-optics cable.

Figure 1. 2650-00004 OOI Integrated Observatory Operational Domains (OV-4)

The color indicates a domain of authority (i.e., a facility) for sensors, network infrastructure and nodes. The two marine observatories RSN and CGO (blue color) each operate their own facility independently. Both marine observatory facilities are connected to the Integrated Observatory facility (green color), which is operated by the CI IO. The OOI Integrated Observatory presents all its resources uniformly to all users. End users who interact with the Integrated Observatory, such as scientists and educators, represent their own facilities (yellow color).

Each facility is a domain of authority. As such, it typically comes with its own administrators, operators and policy makers. The presence of the Integrated Observatory facility enables that resources from different domains of authority can be shared within the Integrated Observatory. The core OOI instruments provided by CGO and RSN that is placed the marine observatory infrastructure are examples. Such instruments can be accessed uniformly and consistently throughout the Integrated Observatory, subject to OOI and marine observatory policy.

External users can join the OOI Integrated Observatory and participate by consuming instrument data streams and by requesting exclusive access to the instruments for experiments. Beyond that, users (PIs) can also provide their own instruments to be placed on OOI marine infrastructure, leveraging all the physical, network and software infrastructure that the OOI provides. Consistent governance across the system-of-systems guarantees that the policies of the observatory and the resource provider's domain of authority are enforced.

RSN and CGO are the two marine observatory facilities that are part of the initial deployment of the OOI Integrated Observatory. In the future provision will be needed for existing and new sites to join the OOI observatory network, such that they can remain independently operated but with the ability to collaborate with the other OOI and non-OOI observatories.

All user connect to the OOI Integrated Observatory for accessing its resources. The Integrated Observatory provides collaboration applications in form of virtual OOI laboratories and classrooms that allow groups of people to work together within a standard on-line framework. Many existing on-line environments now provide similar services; OOI will apply the same concepts to a scientific and educational collaborative environment.

25 OOI core sensors, marine and Cyberinfrastructure form the core OOI Integrated Observatory. OOI users can contribute their own resources into the OOI network, thereby extending the resources available to the public. The OOI provides a virtual observatory concept to all of its users. Users can select and operate their resources virtually, from any location in the network, subject to policy and existing agreements for resource utilization.

For instance, a researcher may want to access an instrument that belongs to a marine observatory. Even when the researcher accesses their own instrument (indicated in yellow in Figure 2), they still have to obey the set of policies regarding power usage, allowed research activities, and timing of activities at the observatory node. Because each facility has its own set of policies, the permissible actions are constrained by all policies. The Cyberinfrastructure provides for the management and governance of resource access across authority domains. This requires contract agreements, access policies, identity federation, and resource usage tracking. The architecture supports the deployment, operation, and distributed management of resources across a Cyberinfrastructure operated by independent stakeholders. It facilitates a seamless communication along the levels of hierarchy without the Researcher or Instrument being aware of the fact that they are communicating with entities out of their authority domain.

Integrated Observatory Users

The OOI Integrated Observatory users can be characterized by the activities they perform. Users fall into one or several of the following categories:

Table 1. Integrated Observatory Users

User Category User Roles

Scientist Principal Investigator Project Scientist Support Scientist Graduate Student

Engineer Project Engineer Instrument Provider Instrument Technician Application Developer Mission Safety and Security

Data Professionals Data Analyst Data Modeler Archivist/Curator

Operations Professionals Program Manager Observatory Operator Logistic Coordinator Mission Planner Scheduler

Educational Professionals Developer of Educational Materials

Teacher Digital Librarian

General Public Outreach material providers Science Public Everyone else

Details of the User Personal Model have been developed in several white papers and project activities, and are further discussed in the OOI User Persona Overview.

Relationships to Major Observing Systems

Figure 2 shows the relationship of the proposed Integrated and Sustained Ocean Observing System (IOOS) management and reporting structure and of other terrestrial environmental observatories to that of OOI. The solid lines with single arrows depict a reporting relationship, while a dotted line with double arrows shows a coordination relationship. IOOS is an operationally-oriented national system managed by NOAA ( http://ioos.noaa.gov) that receives federal sponsorship and oversight from many agencies, with NOAA taking the lead role. IOOS is the key US contribution to the international Global Ocean Observing System (GOOS; http://www.iocgoos.org) and the Global Earth Observing System of Systems (GEOSS; http://www.earthobservations.org).

26 Figure 2. 2650-00013 Reporting and coordination structures of IOOS and OOI (OV-4)

IOOS has an advisory committee structure whose most pertinent component is the Data Management and Communication (DMAC) activity, and a number of regional associations that integrate ocean observing activities in their geographic areas. Related terrestrial observatory efforts include the National Eco-logical Observatory Network (NEON; http://www.neoninc.org), Linked Environments for Atmospheric Discovery (LEAD; http://lead.ou.edu), the Long Term Ecological Research Network (LTER; www.lternet.edu/) and the WATer Environmental Research Systems Network (WATERS; http://www.watersnet.org).

OOI is the research-oriented component of IOOS. It is managed separately, as indicated in Figure 2. OOI has an advisory committee structure reporting to the OOI Program Office, and it has formulated four Implementing Organizations for the regional scale observatory, coastal-global scale observatory, cyberinfrastructure, and education and public outreach components of the Ocean Observing Initiative. OOI is managed by the Consortium for Ocean Leadership (COL) receiving federal sponsorship and oversight from the National Science Foundation (NSF). Further details on the internal OOI relationships are given in the Integration and Deployment Overview.

OOI and its advisory committees have informal relationships with IOOS and various IOOS committees, and notably between the OOI CI IO and the DMAC Steering Committee and expert teams.

NEON, LEAD, and WATERS all have Cyberinfrastructure definition components at differing states of maturity, with LEAD being in the implementation phase and NEON and WATERS being at the concept level. As with IOOS, the primary focus is on data resources and their archiving and dissemination.

A major element of system coordination lies in the inter-operation of data resources, so that publicly-available data products from any project will be accessible to all communities. It is anticipated that, at a minimum, registration, discovery, and access services to OOI data resources will be supported according to IOOS standards and best practices. Data resource coordination with NEON, LEAD, and WATERS needs to be established.

OOI Membership Model

Figure 3 shows the external provisioning, consumption, and policy relationships for OOI. The central element is the OOI Integrated Observatory (OOI) node that includes a human management component. It has two types of relationships with external entities.

27 Figure 3. 2650-00011 OOI External Operational Relationships (OV-4)

In the first instance (at the top of the diagram), OOI publishes data and data products and provides them to external users. OOI also consumes externally produced data and data products via a mediation layer that transforms them into OOI-compatible formats. These data and data products are in turn distributed to Laboratories, Observatories, and other collaborations that are internal to OOI, and those collaborations also provide their own data and data products to the OOI system (see Figure 3).

In the second instance, OOI has analogous relationships with two classes of actors: Identified Members and Resource Providers. OOI defines membership agreements to which members must agree and after which they may access OOI resources and services. OOI also defines service level agreements to which external resource providers must agree, as well as service templates to which the provided external resources must adhere. Upon accepting the agreements and following the templates, the resource provider's resource(s) may be integrated into the OOI system, and thereby become available to OOI members.

Agreements are intended to assure effective operation of the OOI system for all of its users and providers. They are analogous to negotiated interface protocols that allow the effective exchange of data among different components of the system.

Possibilities for Users and Organizations to Interact with the OOI Integrated Observatory

The OOI Integrated Observatory offers a variety of different mechanisms for users to integrate with the CI:

Accessing the web user interface, receiving notifications and data subscriptions via email or messaging, downloading data from DAP data servers, executing user provided processes (algorithms, software) on OOI CyberPoPs, executing user provided processes on external cloud providers through the ION, accessing ION services and application interfaces via messaging over the Internet from external applications, operating an own ION instance that connects via the messaging Exchange over the public Internet, operating an own ION instance that connects via the messaging Exchange and directly taps into the high bandwidth distribution network

CIAD AV OOI User Persona Overview

As described in the Operational Overview, there are multiple categories of Users of the OOI, and many roles within each category. Early analyses (see References CI-RUA, CI-PERS) developed initial user roles, and these can be decomposed further to reflect the specific activities performed across the OOI.

The main categories of user persona identified through analyses are:

Scientists Engineers Data Professionals

28 Operations Professionals Educational Professionals General Public

Further decomposition of these categories (below) yields significant information about their roles, relationships, and shared activities.

While this analysis has considerable detail, it is subject to further verification through User Experience interviews and requirements analysis.

Analysis by Category

Each of the above categories is analyzed in more detail in the sections below. The categories are:

Dev: Responsible for OOI development activities Ops: Responsible for OOI operations activities Use: Active user of OOI resources

A large X indicates a strong role, and a small x a less significant role.

Roles that are in italics were not included in the original analyses of User Persona (those cited in the references).

Scientists

User Role Dev Ops Use Description

Principal Investigator X X X (PI) Most senior scientist on the project, responsible for grant and project decisions

Project Scientist X X X A senior scientist, with a high degree of operational responsibility for science execution

Support Scientist x x X Junior scientist, with responsibility for day-to-day science activities

Graduate Student X Scientifically trained student, typically participating as part of a science team

Engineers

User Role Dev Ops Use Description

Project Engineer X X X Architect of project solutions with senior oversight and coordination responsibilities

Instrument Provider X X x Entity that provides an instrument for use on the observatory

-> Instrument Vendor x x Commercial entity that sells an instrument to OOI for installation on the observatory

Instrument Technician X X X Acts on behalf of OOI to perform most instrument life cycle activities

Application Developer X X X Creates software to implement OOI functionality

Mission Safety and Security x X x Ensures operations are not compromised by untoward events

Data Professionals

User Role Dev Ops Use Description

Data Analyst x x X Looks at scientific (or other) data, possibly transforming it, to gain knowledge

Data Modeler X x Formulates frameworks for representing data in the Integrated Observatory Network

Archivist/Curator X X X Determines and provides assurance for policies and procedures for ION data assets

Data Integrator x X X Brings data from outside sources into OOI, transforming its form as appropriate

Operations Professionals

User Role Dev Ops Use Description

Program Manager X X x Directs the work of a major component of the program (i.e., an Implementing Organization)

Observatory Operator x X X Directs the activities of a major component of the OOI

-> Marine Operator x X X Directs the activities of a major marine component of the OOI (a marine IO or observing system)

29 --> Platform Operator X X Operates one or more platforms; usually a Marine Operator sub-role

---> Mobile Platform X X Operates one or more mobile platforms Operator

-> Cyber Operator x X X Directs the activities of the Integrated Observatory Network (also known as Integrated Observatory Operator)

-> Org-Facility X X Administers the settings of an OOI Facility, e.g., a PI-Classroom Team Administrator

Logistic Coordinator X x Provides logistical support and oversight for observatory and mission activities

Mission Planner x X X Creates, approves, and executes plans for observing missions (any scale)

Scheduler X X Creates and executes detailed schedules and mission plans to accomplish larger mission plans

Educational Professionals

User Role Dev Ops Use Description

Developer of Educational Materials X Creates lesson plans, curricula, or material to teach a specific topic

Teacher x X Educates students in classroom setting (or equivalent)

Digital Librarian x X Recommends best practices from digital library community; references OOI data

Docent X Presents material on environmental (ocean) science in museum, aquarium, or similar

General Public and Policy Makers

User Role Dev Ops Use Description

Scientific Public X Members of the public who are not scientists, but have a strong science interest and perspective

Outreach material x x X Developers of materials explaining and educating about the OOI, its features, and its use providers

Policy Makers X Decision-makers (including staff) in the oceanographic and government agencies

Students (K-16) x Classroom participants using the system at the direction of Teachers

Lay Public (Everyone else) X Members of the public who are not scientifically oriented

References

References CI-PERS - link on that page points to Alfresco document References CI-RUA - link on that page points to Alfresco document Persona-Actitivty-Model-9-1.xls Persona Activity Model spreadsheet (Excel) OOI-CI-Persona-Model-Roles.pdf Persona-Model-Roles—Discussion and analysis CIAD AS Analysis and Synthesis Activities specifically related to workflow and data analysis and synthesis

CIAD AV ION Integration and Deployment

Integration Overview Deployment Overview Deployment Scenario Cyberinfrastructure Points of Presence (CyberPoP) Multi Facility Concept Physical and Virtual Observatories Physical Deployment OOI-Internal Operational Relationships

Integration Overview

30 Figure 1. CI Capability Container activities, resources and infrastructure (OV-1)

Figure 1 shows the CI Capability Container as platform to provide the capabilities required for the OOI Integrated Observatory applications and infrastructure. The main activities include

Scientific Investigation: supporting scientists and researchers in studying environmental processes though observations, simulation models and expressive analyses and visualizations, with results directly feeding back into improving future observations. Education and Participation: supporting education application developers, educators and the general public to access and understand OOI resources in ways suitable for specific target audiences. Community Collaboration: enabling OOI users to share knowledge and resources and to work together around project settings or in ad-hoc communities.

To support these activities, the OOI Integrated Observatory needs to manage a variety of resources of different type and purpose throughout the network:

Observation Plans, providing activity sequences, service agreements and resource allocations for specific observational campaigns or as templates for event-response behaviors Data Sets, representing observational and derived data and data products Processes, representing data collection and processing workflows comprised of multiple steps involving multiple actors and resources. Instruments, as the virtual representations of physical sensors and observatory infrastructure Models, such as numerical models and their configuration, Knowledge, representing the entire wealth of metadata, ancillary data, analysis results, reference and correspondence links between resources, and knowledge captured in ontologies for mediation purposes

The CI Capability Container supports these goals by hosting capabilities, i.e., services that support the main activities and that provide access to required resources. In addition, the CI Capability Container provides access to a variety of other infrastructure services for resource management, interaction, communication, process execution and presentation. These infrastructure capabilities may be deployed locally with the Capability Container or accessible through the network by the Capability Container.

A CI Capability Container is deployed on one node in the OOI Integrated Observatory network. The entirety of CI Capability Containers, the capabilities they are hosting and the resources they provide access to all integrate into the OOI Integrated Observatory, potentially spanning multiple domains of authority.

31 Figure 2. CI Capability Container with External Interfaces (OV-1)

Figure 2 shows an illustrative depiction of a Capability Container, indicated by the octagon shape, connected to its environment. The local environment consists of physical resources such as locally connected instruments and other observational infrastructure, storage resources such as disks and network drives, and execution resources such as grid nodes, cloud computing instances, and CPUs on mobile assets such as AUVs.

The capability container provides its capabilities and resources via the observatory network to user interfaces, user tools and applications, user provided resources of various types and to capability containers in different facilities with their own domains of authority.

No matter where deployed, a Capability Container provides access to all the infrastructure and application capabilities of the OOI Integrated Observatory, subject to network connectivity. The figure above indicates this by showing all six CI subsystems in the octagon. The extreme case, a Capability Container without any capabilities deployed on it provides access to all OOI capabilities via the network only and acts as a user client adapter and gateway. It will always ensure authentication and authorization and provide access to the internal OOI communication networks when permissible.

Resources can be connected locally to a Capability Container. The figure above indicates this by showing locally connected resources and the resource interfaces (drivers) needed to access these resources.

A Capability Container can host the capabilities required locally at a deployment site. The footprint of a Capability Container can vary depending on the resource constraints of its hosting environment. The selection of capabilities hosted in a specific Capability Container depends on the needs and resource availability at this specific location in the network. For instance, on an intermittently connected instrument platform, instrument access, data acquisition and data buffering capabilities are required (provided by the Sensing and Acquisition and Data Management subsystems), while at the core installation sites, data processing, numerical model integration and event response capabilities need to be present. Many COI and CEI capabilities are provided by each Capability container transparently, for instance identity management, process scheduling, and message based communication.

Deployment Overview

Deployment Scenario

Figure 3 provides an illustrative depiction of an OOI deployment scenario. The primary depicted elements are the facilities of marine observatories, the integrated observatory (CI) and user organizations. The marine observatory facilities for RSN and CSGN show with wet-side and shore-side deployment locations respectively. The integrated observatory facility is the integrating element, operated by the CI IO. Further facilities are connected on behalf of user organizations joining the OOI integrated observatory network based on contractual agreements.

32 Figure 3. OOI Deployment Strategy Scenario (OV-1)

The rectangular shapes show physical deployment nodes within the integrated observatory hosting a CI Capability Container, potentially in addition to other software processes. The octagon shape within the deployment nodes indicates a deployed instance of a CI Capability Container. Each such instance host a specific set of selected infrastructure and application capabilities, driven by many factors, such as local resource availability, network connectivity, domain of authority, redundancy requirements and performance. Specific profiles of such configured Capability Container deployments are defined by name, such as the "Marine Execution Point", the "Acquisition Point" etc.

Cyberinfrastructure Points of Presence (CyberPoP)

CyberPoPs are physical sites operated by the CI IO, with substantial local hardware and network resources and professional operations and maintenance support. These sites are highly available due to server, storage and network redundancy and host a variety of equipment as well as specific profiles of Capability Containers. For instance the main data acquisition site connected to the regional cable, the Portland OR Acquisition Point, has multiple hundred TB of redundant storage, redundant switches, services with tens of high performance CPUs. The main software operating redundantly on these servers is the Acquisition Point Capability Container profile, with potentially tens of instances of such Capability Containers. All these instances register themselves as part of the OOI Integrated Observatory.

Acquisition Point CyberPoPs are deployed at the shore stations where the connections to the marine observatories are established. Distribution Point CyberPoPs are deployed at network management centers across the country to provide network management, web presence and load balancing functionality and to act as peering points into commercial and academic networks. Execution Point CyberPoPs are executing user processes in various target execution environments.

The CGSN wet-side deployment location shows the exemplar deployment of a Capability Container hosting an instrument adapter specific to the instrument platform and its sensors. The user facilities run are instances of capability containers hosting execution engines for Kepler workflows and for executing the ROMS/HOPS numerical models, respectively.

All Capability Containers are connected through the OOI Integrated Observatory network and in their entirety comprise the OOI Integrated Observatory. All resources are operated within their respective domains of authority, and policy specific to these facilities applies independent of the behalf on which the resource is used.

To show the full extent of the OOI network, Figure 4 depicts the OOI network deployment. CI capability containers are deployed as Acquisition Points, Distribution Points, Execution Points and Observatory and Marine Management nodes. In addition they can be hosted by Marine observatory array platforms (e.g., buoys and AUVs) and their instruments.

33 Figure 4. 2660-00010 OOI Network Deployment (SV-2)

Multi Facility Concept

There are six CI services networks, implemented by six subsystems, providing infrastructure and user application capabilities: Common Operating Infrastructure (COI), Common Execution Infrastructure (CEI), Data Management (DM), Sensing and Acquisition, Analysis and Synthesis and Planning and Prosecution. Physical hardware operated by the OOI IOs is designed to only host a core subset of these services directly or in limited capacity; some may be contracted out to other organizations, according to a set of interface standards and agreements defined by OOI. The ability to precisely specify the operating characteristics and interfaces of an OOI Integrated Observatory Facility is a capability provided by the OOI CI through the Capability Container.

The OOI Cyberinfrastructure facilitates the creation of specialized user facilities that do not correspond to any particular physical entity. Figure 5 illustrates this by showing the OOI marine observatory and integrated observatory facilities as well as external virtual observatory, laboratory and classroom facilities that can join the OOI network. In that sense, the Classrooms and Laboratories in the diagram can be virtual environments; there do not have to be corresponding physical classrooms or laboratories for the creation of such collaboration environments.

34 Figure 5. OOI Facilities and their Operational Domains (OV-1)

Physical and Virtual Observatories

One obvious and fundamental application for a virtual facility is an "Observatory" representing all of OOI's observatory systems. By making the OOI's observed data, resources, instruments, and collaborations from all the OOI observatories accessible within a single virtual observatory, OOI can create a distributed, interoperable view of its diverse observing assets. This is called the OOI Integrated Observatory.

Other virtual observatories may be oriented to address specific observing themes, geospatial domains, or campaigns. The OOI Cyberinfrastructure does not predefine these views; they may be constructed at will during OOI system operations. Each observatory may select the OOI resources that it makes accessible to provide a coherent, focused environment for its users.

Once registered to participate in OOI activities and services, users will be able to take part in all of the open services, associations, and capabilities offered by the OOI Integrated Observatory. By policy and design, open access to resources will be encouraged. Typically the first interaction of users will be through participation in a particular association, such as a Laboratory (see Figure 3.2.1.1-1) run by another scientist. These associations will provide a window on the entire OOI system and a framework within which the user can operate. Of course, users are also free to create their own associations or personal workspaces, consistent with resource limitations enforced by the system. The CI Operational Concept [OPCON , see also here ] lays out such a scenario in more detail.

Physical Deployment

Components of the OOI Cyberinfrastructure will be deployed on oceanographic platforms, including some moorings, benthic instrument platforms, and mobile platforms like autonomous underwater vehicles (AUVs). A collection of OOI Cyberinfrastructure components can be deployed together within a Capability Container, enabling a broad range of capabilities on that platform. As in the preceding section, a capability container or capability may correspond to a physical entity like a platform or instrument. Some OOI resources, such as software applications, do not have any corresponding physical presence.

Each Capability Container deployed on an OOI platform may provide access to a combination of instruments (with sensors and actuators) and

35 capabilities known as services. Services are software resources that provide capabilities such as data storage, data transformation, and communications. The number and type of services deployed on each Capability Container will vary according to the platform's needs and capabilities, but will always include the fundamental communication infrastructure needed to interoperate with other OOI components.

Communication between remote CI Capability Containers and the shore station may be via satellite, radio signals, benthic seafloor-to-ocean-surface cables, or other regular or intermittent connection. AUVs and gliders, for example, may have no communication for extended periods, followed by bursts of rapid communication when docked or surfaced. The OOI Cyberinfrastructure provides services that facilitate interaction under a wide range of communication channel models.

The OOI Integrated Observatory facility aggregates data and communications from data sources and presents them to users and other observatory components. Nominally, there is an Observatory facility for each physical observatory in OOI, but as noted above, other Observatories can be created to fulfill analogous needs of OOI users. There are several functions that demand special attention in an Observatory. The ability to monitor science and operational parameters enables maintenance and control of observatory assets. Access to observatory Capability Containers for both monitoring and control is needed. Observatory resources such as power and bandwidth must be allocated, and physical components must be managed. Although these functions might be useful in any OOI environment, they will receive particular visibility in the OOI observatories.

A facility such as an OOI Laboratory or Facility will emphasize other resources from the OOI collection according to their own particular needs. Laboratories may value collaborative resources and tools, while Facilities may consider auditing capabilities very important. In each case, there is a collection of common OOI services by which the desired resources can be aggregated and, if necessary, enhanced.

The OOI Cyberinfrastructure binds diverse physical assets such as oceanographic platforms and instruments together, and must reflect the state of those assets as closely as possible. A significant interaction will be the interplay of deployed physical assets, the operators performing those deployments, and the OOI resources that correspond to those assets.

A typical deployment operation will leverage three architectural features: the capabilities of a plug-and-work instrument that presents a standardized description of itself to the OOI system, easily used deployment logging and metadata definition interfaces to capture deployment results as effortlessly and accurately as possible, and the Cyberinfrastructure's ability to associate this information with the appropriate resource.

Once initially captured, instrument metadata descriptions will also be used to characterize the data that flows through the system to the end users. As those data are used and analyzed, and their metadata are examined, end users can create their own associations by commenting on and correlating the data that are presented to them. By capturing these associations, the OOI Cyberinfrastructure provides a rich fabric of information about each resource, and can present that information to the other users of the resource. In this way, users in an OOI Classroom can see last month's assessments of the researchers in a Laboratory across the country, and add their own observations for other Classrooms to discover.

OOI-Internal Operational Relationships

Figure 6 shows the internal organizational relationships among the classes of actors and nodes, as well as the services that mediate these relationships.

36 Figure 6. 2650-00012 OOI internal operational relationships (OV-4)

Two important elements in the figure above are the Marine Observatory Facility and the Laboratory Facility. The Marine Observatory Facility is governed by the Marine Operator, who uses Policy defined by the OOI Program Office Oversight actor and contracts with the Cyberinfrastructure Operator for Cyberinfrastructure services that he/she uses to operate a network of instruments. The Marine Observatory Facility applies control, processing, and resource management services to operate and govern a suite of Instruments, and provides real-time or near real-time Data Products, Event Detection and Instrument services for use in Laboratories.

Laboratories are a second principal organizing node, and allow participants to collaborate in a dedicated virtual space. Instrument Providers, Collaborating Investigators, and other invited participants share infrastructure resources under the supervision of the Principal Investigator who governs the collection of resources. The Laboratory makes use of a wide range of Cyberinfrastructure services and produces Data Products and some modeling services that can then be made available by OOI.

The Cyberinfrastructure Operator governs the Cyberinfrastructure and contracts with Resource Providers for Resources such as computing, storage, networking and their associated services. The Cyberin-frastructure both uses resource services and provides a wide range of core services to Observatories and Laboratories, including identity management, governance, security, policy enforcement, logging, messaging and routing, service orchestration, interaction monitoring, resource cataloging and persistence, and accurate synoptic time.

The Instrument Provider develops Instruments for use in the Observatory, and may participate in Laboratories through an expert role. The Instrument node receives instructions through services provided by the Marine Node, and produces data products for use by Laboratories and other members of OOI. Availability of Instruments and the Data Products they produce to the community is governed by the policies of OOI, as expressed via the Marine Observatory and Laboratory.

37 CIAD AV Transition to Operations

The OOI program went through a replanning in May 2011 to reflect current progress and future scope. During this replanning, the number of OOI CI releases was reduced from 5 to 4. The content of release 5 was pulled forward one release. This page reflects post-replanning content pending approval. See here for a pre-replanning version.

Transition to Operations Strategy

Releases are the increments of the CI system in the operational environment. The CI will be rolled out in four releases.

Configuration Items are the primary deliverables of the OOI Cyberinfrastructure Implementing Organization to the system integrator (COL) representing NSF. Configuration items undergo acceptance testing before commissioning.

Configuration Items

CI deliverable configuration items include (see Table 1).

Table 1. Configuration Items

Marine Interface Agents Software/Hardware

Instrument Agents (See Instrument Agent Catalog) Software

CGSN and RSN Platform Agents Software

CI Software Releases

R1 - Data Distribution Network Software

R2 - Managed Instrument Network Software

R3 - On Demand Measurement Processing Software

R4 - Interactive Ocean Observatory Software

Cyber Points of Presence (CyberPops)

Instrument Development Kit (IDK) Hardware

Marine Execution Point (MEP) Hardware

Observatory Acquisition Point (OAP) Hardware

Observatory Distribution Point (ODP) Hardware

Observatory Execution Point (OEP) Hardware

Operations Management Point (OMP) Hardware

National Internet Infrastructure (NII) Hardware

System Engineering Environments

System Development Environment (SDE) Hardware and software

System Test Environment (STE) Hardware and software

The Commissioning Plan [CI-COMM ] provides details about configuration items and deployment locations. See also the CI Network Design.

Master Schedule

Figure 1 shows the four incremental releases of CI as a master schedule.

38 Figure 1. 2660-00003 CI Master Schedule

Work Breakdown: Subsystems

The work breakdown and design of the OOI CI supports the incremental construction in five releases by six subsystem integrated product teams (IPT). Each subsystem IPT is responsible for the delivery of the respective subsystem services for each release. Once integrated, the subsystems will operate services networks within the Integrated Observatory system of systems.

The six subsystem IPTs are:

Sensing and Acquisition (S&A) Subsystem Integrated Product Team Data Management (DM) Subsystem Integrated Product Team Analysis and Synthesis (A&S) Subsystem Integrated Product Team (will be formed in Release 2) Planning and Prosecution (P&P) Subsystem Integrated Product Team (will be formed in Release 3) Common Execution Infrastructure (CEI) Subsystem Integrated Product Team Common Operating Infrastructure (COI) Subsystem Integrated Product Team

The integration of tools and applications is based on the framework that the CI infrastructure provides, in particular the service-oriented integration architecture based on governed message exchange. The deployment of system entities across the observatory network is also based on the framework that the CI infrastructure provides. The following sections provide further details about the subsystems and the releases to the degree relevant for the CI architecture and design specification. Further details are covered in the project plans, in particular the System Engineering Master Plan [OOI-SEMP], Transition-to-Operations Plan [CI-TROP ] and the Commissioning Plan [CI-CP ]. Figure 2 shows the planned construction and integration of subsystems into the four CI releases.

Figure 2. 2660-00004 Subsystem Development Schedule

39 Subsystem dependencies are presented in CIAD OV Subsystem and Service Dependencies.

Work Breakdown: Implementation Projects

There are several implementation projects

IPA: Development of the Instrument and Platform Agent Architecture (IPAA) and construction of Sensor Set Packages EOI: External Observatory Integration Terrestrial CyberPoPs and Network

Releases and Target User Audiences

The architecture of the OOI Integrated Observatory is structured in a way that supports an incremental transition to operations. In particular, the design supports five incremental releases of the CI that increasingly deliver capabilities and thereby support user relevant applications and processes; beginning from data distribution and storage in release 1 to advanced concepts of interactive ocean science, including instrument and observatory interactivity exploiting knowledge gained through observations and analyses. The goal is to deliver capabilities to the user communities at the earliest time possible and to facilitate incremental change as requested by the users, in addition to the planned increase in capability. Each release has a specific theme and is targeted at providing value to a specific group of stakeholders, building on and potentially revising capabilities previously delivered.

The four releases have the following themes and primary target user audiences:

Release 1 provides a Data Distribution Network, a fully capable automated end-to-end data distribution and preservation infrastructure. It targets observational data consumers, such as data analysis and numerical modelers. In addition, it will support the needs of the OOI Marine observatories as instrument providers for instrument integration. Release 2 provides a Managed Instrument Network, adding end-to-end control of how data are collected and supporting more advanced processes of instrument providers with managed instrument control. The primary target user community are OOI Marine Observatory instrument providers and operators. Release 3 delivers OnDemand Measurement Processing. It adds end-to-end control of how data are processed, supporting more advanced workflows of instrument providers and data product consumers, as well as on demand measurements supporting event-driven opportunistic observations. Release 4 delivers the Interactive Ocean Observatory, which adds control of integrated ocean models driven by the data collection process, supporting data product developers and the numerical modeling community. It also adds control of data, processes, and models to drive the collection process, supporting observatory interactivity and transformative ocean observatory science for all users.

In its final deliverable after Release 4, the CI will provide real-time modeling and data assimilation, adaptive sensing and platform control, rapid response and event capture, closed loop, integrated sensing, modeling, and distributed network control.

Breakdown By Subsystem

The six subsystem IPTs will construct deliverables that will be integrated by the integration team and are subject to QA by the test and validation team. Deliverables may be of different character and include:

Table 2. Deliverables for Each Subsystem

Category Explanation

Services Functional capabilities exposed as named services via the COI subsystem, deployed on CI Capability Containers. Such services are available to service consumers within the CI (i.e., other subsystem services and processes) and externally (i.e., functions presented to users via web UIs or application interfaces). Services include core infrastructure services and application level capabilities. Services never expose the technologies they are constructed in; these technologies may be replaced over time while maintaining the same service interface.

Components Software ready to be extended and configured for deployment by other subsystems. An example is the instrument agent and framework, which is the software that can be specialized into specific types of instrument agents. Frameworks

Data Models Data models and resource specific configuration specializing general underlying services. For instance resource, message type and and object definitions, and data models for specific information repository instance or policy applicable to types of resources. Configuration

User and Presentation forms such as graphical web user interfaces and Web Service externalization interfaces for services and Application resources of the integrated observatory network. Includes the actual user and application interfaces as well as any form of Interfaces service and resource integration software necessary to make the interfaces work.

The following sections provide detail for the subsystem specific work packages that are part of each release.

Release 1: Data Distribution Network

Provides end-to-End automated Data preservation and distribution. See also Release 1 Scoping and Figure 3 for an illustration of service components.

40 Figure 3. Release 1 service components (illustrative)

Table 3 below lists the work packages of this release.

Table 3. Release 1: End-to-end Automated Data Preservation and Distribution

Subsystem Work Packages (High-Level Services)

Common Operating Infrastructure Federated Facility (Virtual Organization) Services. This service provides governance of resources in the scope of facilities. Enterprise Service Bus & Capability Container. This includes the Messaging Service (Exchange), the Distributed State Management, the Presentation Framework and the Service Framework Identity & Policy Management Services Resource Catalog & Repository Services

Common Execution Infrastructure Elastic Computing Services Execution Engine Catalog & Repository Services Resource Management Services

Data Management OOI Common Data and Metadata Model Dynamic Data Distribution Services Data Catalog & Repository Services Persistent Archive Services

Sensing and Acquisition Instrument Direct Access Instrument Management Services Instrument and Data Process Repository Data Acquisition Services

41 Release 2: Managed Instrument Network

Provides End-to-End control of how Data are collected. See Figure 4 for an illustration of service components.

Figure 4. Release 2 service components (illustrative)

Table 4 below lists the work packages of this release.

Table 4. Release 2: End-to-end Control of How Data are Collected

Subsystem Work Packages (High-Level Services)

Common Operating Infrastructure Federated Facility (Virtual Organization) Services (Part 2) Enterprise Service Bus & Capability Container (Part 2). Add Java language support, multi-facility support, federated exchange Resource Lifecycle Services Resource Activation Services Resource Collaboration Services

Common Execution Infrastructure Process Management Services Process Catalog & Repository Services Integration w/ National Computing Infrastructure

Data Management OOI Common Data and Metadata Model (Part 2). Adds concepts of versioning, provenance, semantics Persistent Archive Services (Part 2). Adds capabilities to manage persistent archives, define their policy and add new storage resources. Search and Navigation Services External Data Access Services

Sensing and Acquisition Marine Facility Services Instrument Activation Services Data Processing Services Data Product Catalog and Repository Services

42 Analysis and Synthesis Data Analysis and Visualization Services and Repository

Release 3: OnDemand Measurement Processing

Provides End-to-End control of how Data are processed. See Figure 5 for an illustration of service components.

Figure 5. Release 3 service components (illustrative)

Table 5 below lists the work packages of this release.

Table 5. Release 3: End-to-end Control of How Data are Processed

Subsystem Work Packages (High-Level Services)

Common Operating Infrastructure continuation (TBD)

Common Execution Infrastructure continuation (TBD)

Data Management Aggregation Service Attribution and Association Services

Sensing and Acquisition Data Calibration and Validation Services Marine Resource Scheduling Services Data Product Activation Services

Analysis and Synthesis Data Analysis and Visualization Services and Repository (Part 2). Adds workflows, external Laboratory and Classroom Facility Services Event Detection Services Model Catalog and Repository Modeling Services

43 Planning and Prosecution Interactive Observatory Facility Planning Services Event Response Services Mission Catalog and Repository Services Portable Control Software

Release 4: Interactive Ocean Observatory

Provides control of Models driven by the Data collection Process and provides control of Data, Processes, and Models to drive the Collection Process. Figure 6 shows the related service components.

Figure 6. Release 4 service components (illustrative)

Table 6 below lists the work packages of this release.

Table 6. Release 4: Control of Models Driven by the Collection Process and Control of Data, Processes, and Models to Drive the Collection Process

Subsystem Work Packages (High-Level Services)

Analysis and Synthesis Modeling Services (Part 2) Model Activation Services

Planning and Prosecution Mission Coordination Services Mission Simulator Portable Control Software (Part 2)

CIAD AV Scope Release 1

This page explains the scope of the architecture Release 1.

Deliverables

Release 1 delivers capabilities from the Common Operating Infrastructure (COI), Common Execution Infrastructure (CEI), Data Management (DM) and Sensing & Acqusition (S&A) subsystems. For each of these subsystems, the extent of delivered functionality is limited by the work package assignment (see below), and further by the scope defined in the Product Description (see below). The Analysis & Synthesis (A&S) and Planning & Prosecution (P&P) subsystems are not delivered in Release 1.

For all four delivered subsystems, the architectural structures will be mostly built out in software and software interfaces. This means, that most of the services will actually be in place, but implemented functionally only to the extent required by the Product Description. For the two remaining subsystems, no software is produced.

Work Package Assignment

Prior to OOI CI construction, the subsystems were structured into work packages and the work packages were assigned to releases in the order of dependency. Work packages and their assignment to releases are listed in the overview pages for all subsystems (see below). Work packages that are not in Release 1 will not be implemented. However, some of the architecture needed to support the addition of the work package in a subsequent release may be built out. Work packages that are in Release 1 are planned for implementation. The product description, however, limits the extent of functionality actually being produced.

44 Product Description

The Release 1 Product Description is the most authoritative definition of Release 1 extent. The Product Description is a document that list a number of use case scenarios for the capabilities (functions) that are relevant for the target user audience in Release 1. In addition, it contains use cases scenarios for system internal infrastructure capabilities, required to enable the user visible capabilities. The Product Description use cases do not prescribe the way the capabilities are implemented. This is covered in this architecture documentation.

The Product Description is the agreed upon authoritative definition of Release 1 scope. Project stakeholders, management, user representatives, user experience team, designers and implementers agree on this scope during and past the LCO review. The Product Description is subject to (moderate) modification with required approval, for instance as needed in order to perform descoping. Lost scope may be added to subsequent releases in a similar way. Overall architectural scope is not reduced through this process.

See Also:

Transition to Operations: Releases and assignment to subsystems COI Overview CEI Overview DM Overview S&A Overview

CIAD AV Scope Release 2

This page explains the scope of the Release 2 Integrated Observatory Network, the Managed Instrument Network.

This release adds end-to-end control data collection methods, and supports advanced processes of instrument providers, including support for managed instrument control. The primary target user community is OOI Marine Observatory instrument providers and operators. The limited user engagement in Release 1 will be expanded to include more scientists and science roles, allowing comprehensive evaluation of prototypical user interfaces.

Themes, Users, and Feature Categories

Primary Themes and Users: End-to-end control of data collection, and advanced control of managed instruments, supporting OOI Marine Observatory instrument providers and operators.

Secondary Themes and Users: Expanded data manipulation, analysis, and visualization tools for observational data consumers. Prototyping advanced features including ION interfaces for handheld devices (smartphones, tablet devices) for select users.

Feature Categories: The release includes features in the following major categories:

Operate Marine Observatories Operate Platforms and Instruments Manage Instrument Lifecycle Test and Troubleshoot Instruments Acquire Data and Generate Data Products Search Data Visualize Data Manage the Integrated Observatory Network Mobile Delivery Prototyping

Deliverables

Release 2 delivers capabilities from the Common Operating Infrastructure (COI), Common Execution Infrastructure (CEI), Data Management (DM), Sensing & Acqusition (S&A), Analysis and Synthesis (AS), and User Experience (UX) subsystems. For each of these subsystems, the extent of delivered functionality is limited by the work package assignment (see below), and further by the scope defined in the Product Description (see below). The Planning & Prosecution (P&P) subsystem is addressed beginning in Release 3.

For all delivered subsystems, the architectural structures will be mostly built out in software and software interfaces. This means, that most of the services will actually be in place, but implemented functionally only to the extent required by the Product Description. For the two remaining subsystems, no software is produced.

Work Package Assignment

Prior to OOI CI construction, the subsystems were structured into work packages and the work packages were assigned to releases in the order of dependency. Work packages and their assignment to releases are listed in the overview pages for all subsystems (see below).

Work packages that are not in Release 2 will not be implemented; however, some of the architecture needed to support the addition of the work package in a subsequent release may be built out.

Product Description

The Release 2 Product Description is provides an overarching description of Release 2 extent. The Product Description is a document that list a

45 number of use case scenarios for the capabilities (functions) that are relevant for the target user audience in Release 2. In addition, it contains use cases scenarios for system internal infrastructure capabilities, required to enable the user visible capabilities. The Product Description use cases do not prescribe the way the capabilities are implemented; this is covered in this architecture documentation.

The Product Description is an agreed-upon and normative definition of Release 2 scope. Project stakeholders, management, user representatives, user experience team, designers and implementers agree on the specifics of this scope during and after the LCO review. The Product Description is subject to (moderate) modification with required approval, for instance as needed in order to provide clarification or adjust scoping. Lost scope may be added to subsequent releases, so long as overall architectural scope is not reduced through this process.

Overview Pages for Subsystems

Transition to Operations: Releases and assignment to subsystems COI Overview CEI Overview DM Overview S&A Overview A&S Overview

CIAD AV Glossary

Definitions

This section defines the most essential terms that will be used throughout the remainder of this document. See Table 1.

Table 1. Definitions for selected essential terms

Term Definition

Agent An interacting system entity, realized as a capability container process, representing a principal (user or organization) acting on their behalf

Capability Extendable, deployable base unit of CI software providing the core CI software infrastructure platform ("container") to host Container system and user provided services. Enables application and resource integration. Provides secure and reliable access to all CI infrastructure and application services, resources and interfaces either locally or via the Exchange. Provides a platform to present services externally through user and application interfaces. The physical deployment environment can range from embedded mobile platforms to high availability data centers with replication.

Cyberinfrastructure Physical deployment location part of the Integrated Observatory with network, storage, processing and operations. Hosts Point of Presence a purpose-specific configuration of one or multiple CI capability containers in one physical hosting environment with (CyberPoP) access to local resources. Includes hardware and operating system and may have additional software deployed.

Exchange The Integrated Observatory communication network that is accessible via the CI capability container. Realized using AMQP message broker infrastructure. Provides dened message format and explicit control of participants, secure, reliable message-based information exchange, policy enforcement and reliability. Enables flexible routing, interception for extensibility.

Facility An independent domain of authority owning and operating resources. As part of the OOI Integrated Observatory, may choose to take part in collaborations with other OOI facilities to share and use resources. A facility is the basis to realize the "virtual observatory" metaphor. Marine observatories, classrooms, laboratories and user facilities are specialized instances of facilities.

Integrated the computer systems, networks, and software that together present the OOI's assets as a single system Observatory Network

Model An algorithm for hindcasting, nowcasting or forecasting the state of the environment over a specific spatial and temporal (environmental) domain. Requires initial and boundary conditions and a forcing field.

Observation Plan Following an observation request, as part of a service agreement proposal and the subsequent service agreement, the observation plan meets the request and the agreement by defining a sequence of activities leading to the intended result.

(Capability Software hosted and managed by a Capability Container. Processes control local/external resources, compute results and Container) interact via messages. Process

(User) Process A user defined algorithm or scientific workflow scheduled to execute in the Integrated Observatory Network

Process Definition Description of an algorithm in source or executable form that can be scheduled and instantiated into the actual executing process.

Registry (service) A registry service enables the registration of resources with their descriptions (metadata).

46 Repository A repository service enables the registration of information resources with their descriptions (metadata) as well as the (service) storage of the information content of the resource itself.

Resource Any entity associated with the Integrated Observatory that provides capability and has a life cycle. Special classes of resources includes information resources (e.g. science data, annotations, derived data products, workflows, user identities), and taskable resources (e.g. instruments, compute nodes, workflow processes, scripts).

Resource, A resource under CI governance that exists as an artifact represented in electronic form, without behavior. Information

Resource, A resource under CI governance that has behavior. Taskable

Service Capability available within the OOI Integrated Observatory through the observatory network. Accessible by name via the network by following a specified service access protocol (interaction pattern and message format). Service are provided by deployed software component packages within a CI capability container.

Service Request to use a resource together with a proposal of conditions, constraints and parameter ranges. For instance, the Agreement request to use a specific sensor in a certain time interval once an hour for 1 minute and its associated bandwidth and Proposal impact on the environment.

Services Network A group of interdependent services covering a specific topic (such as Sensing and Acquisition). Will be implemented by a subsystem product team.

Subsystem Implementation and integration product development team for all services, resource data models and user interface associated with this subsystem in the WBS and System Architecture

See Also

OOI Reference Module (DOORS), OOI document 1125-00000.

Acronyms and Abbreviations

ANF: Array Network Facility (seismic observatory) AS: Analysis and Synthesis (subsystem) CEI: Common Execution Infrastructure (subsystem) CG: Coastal Global (observatory) CGSN: Coastal and Global Scale Nodes (observatory) CI: Cyber Infrastructure, or cyberinfrastructure CIAD: CI Architecture Document COI: Common Operating Infrastructure (subsystem) COL: Consortium for Ocean Leadership CyberPoP: Cyberinfrastructure point of presence (see above) DCL: Data Concentrator and Logger (see CSGN system) DM: Data Management (subsystem) DoDAF: Department of Defense Architecture Framework EOI: External Observatory Integration EPE: Education and Public Engagement (organization) FDR: OOI Final Design Review GEOSS: Global Earth Observing System of Systems GOOS: Global Ocean Observing System IO: Implementing Organization IOC: Initial Operating Capability (milestone and review) IOOS: Integrated and Sustained Ocean Observing System ION: Integrated Observatory Network (system the OOI CI project is building) IPA: Instrument Platform Agent IPAA: Instrument Platform Agent Architecture JIRA: (Not an acronym) The Atlassian task-tracking software used by OOI. LCA: Life Cycle Architecture Review LCO: Life Cycle Objectives Review LOC: Lines of code MOOS (MBARI): Monterey Ocean Observing System (originally MBARI Ocean Observing System) MOOS (MIT): Mission Oriented Operating Suite - an AUV control system, also referred to as MOOS-DB OOI: Ocean Observatories Initiative PDR: Preliminary Design Review PP: Planning and Prosecution (subsystem) PUCK: Pluggable Underwater Connector with Knowledge MBARI) RSN: Regional Scale Node SA: Sensing and Acquisition SIAM: Software Infrastructure and Application for MOOS (MBARI's MOOS), an infrastructure or middleware system SSDS: Shore Side Data System, software developed by MBARI (a candidate design reference for the DM subsystem) TRL: Technology Readiness Level

47 Implementation Related Terms

Carrot: A third-party Python package providing an abstract API for the use of specific backend AMQP clients. LCAArch: Prior name of ioncore-python, the OOI CI developed Python package with the capability container and core services for Release 1 LCA. Magnet: Name of a legacy Python package developed by OOI CI to interface with the AMQP message broker. Has been absorbed by ioncore-python pacakge

CIAD OV System Decomposition

This page and other pages in this section provide an operational view on the OOI Integrated Observatory Network. Here it is specified how the system will operate within its environment as a networked system of distributed operational nodes. These operational views realize the logical architecture; they define the operational nodes according to the DoDAF model. The operational views define the responsibilities of all operational nodes in the system and their dependencies on the environment and on other operational nodes. Most operational nodes provide functional capabilities in form of services. Activities and detailed behavior specifications define how the operational nodes interact and what information they exchange. The dependencies between operational nodes are captured as needlines. The categorization of operational nodes, the information products they exchange and the activities they are part of are represented in domain models. Altogether, these operational views are independent of any implementation and integration strategy and any specific technologies used.

System Decomposition

This section explains the highest level operational nodes and the needlines between these nodes. These operational nodes represent the high-level subsystem services, and the integration elements of the OOI Integrated Observatory Network.

Operational nodes are entities of responsibility in the operational system that depend on other such entities as indicated by a needline. Specifications of operational nodes and needlines are DoDAF OV-2 products. A needline arrow is to be read as "operational node A depends on operational node B for X", where X is the label of the arrow that goes from A to B. Domain models containing logical data models are DoDAF OV-7 products. The DoDAF entities and notations are explained in the DoDAF Reference.

Overview

Figure 1 shows the OOI Integrated Observatory Network and its application and infrastructure subsystems (in the center) in the context of external integration software components, such as instrument agents, resource adapters and interfaces to external systems, such as external observatories and the National Compute Infrastructure.

See also

User and application interfaces External interfaces

48 Figure 1. 2650-00007 Subsystems, Implementation Elements and External Interfaces

Figure 2 (illustrative) shows the six subsystem services networks. Together they provide and support the functional and infrastructure capabilities of the OOI Integrated Observatory. The infrastructure services networks provide the foundation for the application services networks. Application services thus depend on infrastructure services, as indicated by the needline arrows.

49 Figure 2. OOI Subsystems (OV-1)

Application services are in support of end user applications and are among the highest level and most complex services in the system. Examples of application services are sensor data acquisition, interactive analysis and observation mission planning. Application services are based on enabling infrastructure services. Examples of infrastructure services are secure messaging, data distribution and storage, process execution and resource management. The Data Management services can be distinguished as either application or infrastructure services.

Table 1 highlights the core responsibilities of the subsystem services networks.

Table 1. Core Responsibilities of Service Networks

ID Services Explanation Network

SA Sensing and Provides services that support interfacing with sensors (instruments), instrument platforms (with power ports and Acquisition telemetry) and other physical observatory infrastructure. All physical resources are represented as stateful, controllable resources within the integrated observatory network. Provides additional services that interface with external observatory infrastructure management systems such as for power and bandwidth allocation and state-of-health monitoring, and performs observatory management and oversight in order to realize interactive and adaptive coordinated observation. Provide services to define and execute processes to acquire and process sensor data and derived data products as data streams, such as qualified data products resulting from calibration and validation and real-time segmentation processes. Manages all the repositories to register instruments and data processes.

AS Analysis and Provides a comprehensive framework to define data analysis and manipulation processes through scientific workflow Synthesis descriptions. Workflow steps indicate computation definitions (scripts, source code, tool integration) and their scheduling and linking. Provides workflow frameworks used for event detection, model integration, data manipulation, synthesis of data products and visualizations. Provides interactive analysis and visualization based on a user workspace. Provides the basis for virtual collaborations, such a virtual observatories and classrooms.

PP Planning and Overall observatory resource planning and observation plan execution, including the development, refinement, Prosecution configuration and enactment of observation plans for observational campaigns and event-response behaviors. In addition, provides autonomous vehicle control and services to interface with autonomous vehicle systems and sensors.

DM Data Provides services to access information products of any kind together and their associated metadata and ancillary data. Management Makes use of ontology-based mediation to transform information between different syntactic and semantic representations. Handles information and data ingestion, query and access, processing and manipulation. Infrastructure services provide pervasive information preservation and distribution, including buffering, caching and archiving.

CEI Common Manages the definition, scheduling and execution of any kind of computation required for the operation of the OOI Execution Integrated Observatory. Manages and provisions all compute and storage resources needed to support the computations. Infrastructure This includes the provisioning of CI core and user provided processes, and the operation of execution engines that execute jobs.

50 COI Common Provides the underlying integration infrastructure for all other CI and user services. Provides a reliable messaging Operating system, ensures pervasive governance and policy enforcement, and identity management. Provides consistent resource Infrastructure cataloging and life-cycle management, distributed state management, and a presentation framework that supports user and application interfaces. Supports the federation of multiple facilities comprising the OOI Integrated Observatory with each facility being administrated and operated independently.

Application and Infrastructure Services

Figure 3 shows the three application oriented subsystems as (DoDAF OV-2) operational nodes together with needlines (arrows) that indicate dependencies. The figure also shows the three infrastructure subsystems and their dependencies. All user and tool interfaces are represented another operational node.

Figure 3. 2650-00008 Application and Infrastructure Services Networks and Needlines (OV-2)

The subsystem sections (see Table 1 above) provide detail for all subsystem services networks. All subsystem service specification sections are structured alike, first defining the full functional extent through a list of high-level services, second providing decomposition into finer-grained

51 operational nodes and third providing domain models capturing significant data model and other concerns relative to the services network.

Operational dependencies between services are explained in Service Dependencies.

CIAD OV Subsystem and Service Dependencies

This page describes the dependencies between subsystems based on services and components in an operational configuration of the system.

Subsystem Dependencies by Service

Figure 1 shows the dependencies of the subsystems as development projects based on the services they implement, extend and use.

Figure 1. Subsystem dependencies based on services implemented, extended, used.

Example Service Dependencies

Figure 2 shows an exemplar deployment scenario of inter-dependent services from the S&A, DM, CEI and COI subsystems. The scenario targets Release 1 and Release 2 use cases, involving sensor data acquisition and data processing. Services are dependent on others as indicated by the labeled arrows. Note that this example is neither comprehensive nor authoritative.

52 Figure 2. Exemplar service dependencies based on services implemented, extended, used.

Release 1 System Component and Service Dependencies

Figure 3 sketches end-to-end system component and service dependencies in the Release 1 deployment. We illustrate the dependencies by describing the sequence of relationships if a user accesses the system via the web.

Users access the system in one of three ways in Release 1:

1. Either anonymously (with read-only permissions) or authenticated, via the Web User Interface (see Web User Interface using Grails ) 2. As an early adopter Data Provider via a standalone data/event publisher client (see EOI data agent) 3. As an early adopter Data Consumer via a standalone dispatcher client (see EOI dispatcher)

Authentication occurs on the front-end integrated within the Web User Interface through the CIlogon service. On the backend, external user identities are linked to OOI user identities though the COI Identity Management services.

The Web User Interface is built on the extensible COI Presentation Framework. This Java-language-based technology accesses the Python-language-based services through the message-based communication middleware, the COI Exchange (implemented based on a RabbitMQ message broker). Cross-language interoperability (Java-Python in Release 1) is achieved through the Common Message Format. Completely different technology implementations inter-operate by complying to this specification. Message content, such as data objects, is encoded and later persisted in the Common Object Model.

The service layer directly accessed by the Web UI contains the Application Integration Services. These are purpose-built to support the Release 1 user interfaces and use cases, and orchestrate calls to the backend services in the DM , COI , and CEI subsystems.

All services of the Integrated Observatory Network are deployed within COI Capability Containers. The backend services are deployed within Python Capability Containers, as processes in the container Process Framework. Services are access protected through the COI Policy Enforcement and Management services.

Capability containers, as well as other applications, are provisioned and deployed though the CEI services and infrastructure. Services are made

53 elastic through the concept of a CEI Elastic Processing Unit (EPU), developed as part of CEI Elastic Computing.

Information is persistent in distributed DM Persistent Archives as objects in the Common Object Model. Science data is persisted and transported in the DM Common Science Data Model. All resources in the system are governed, and their metadata persisted, though COI Resource Registry services, and accessed through the Resource Framework. Backend technology for persisting information in the system is Cassandra .

Figure 3. End-to-End System Dependencies in Release 1

CIAD OV Resource and Object Model

This page provides an overview of the Integrated Observatory Network's Resource Model, which is used to describe information about physical and virtual resources in the system persistently. It is also used to manipulate resources and their life cycle. The resource model is based on a flexible distributed object model. The page shows how information is represented in the system for representation, transport and persistence.

See the Glossary for the definition of terms Resource, Information Resource and Taskable Resource.

Integrated Observatory Resources

Categories of Resources

Resources are objects within the OOI Integrated Observatory that can be owned, described and controlled. We speak of resources under OOI governance. OOI resources are grouped into the following high level categories:

Information Resources: exist as artifact represented in electronic form and do not exhibit active behavior Taskable Resources: have behavior and internal state; may be physical or virtual

Resources are managed by the COI Resource Management Services; in particular they are registered with the COI Resource Registry. Resources are described by metadata attributes. Resource descriptions are objects following definitions in the common OOI object model. Resources are broadly distinguished into information resources (the resource is an actual information artifact) and taskable resources (the resource has internal state and observable behavior).

54 The following figures show an overview of the ION Object Model for resources and their descriptions. Figure 1 shows how Structured Objects are used to describe Resources in the system, with their specializations Information Resources and Taskable Resources.

Figure 1. CI Structured Objects (OV-7)

List of Resources

Following are non comprehensive lists of types of resources under OOI governance. List items in italics are not first class resources in Release 1, although they may be represented as implementation concepts.

Information Resources

(COI) User Identity and Profile (DM) Dataset: a changing collection of science data of interest (see the Science Data Model) (DM) Topic: an identifier for an ordered stream of messages with a common property (COI) Conversation Type: an arrangement of conversation roles, message types and message sequences (COI) Message Type (COI) Object Type (COI) Conversation: an instance of an interaction pattern, and an assignment of roles to communicating entities (COI) Policy Definition (CEI) Process Definition (CEI) Deployable Type (CEI) Deployable Unit

Figure 2 shows exemplar Information Resources as Structured Objects.

55 Figure 2. Information Resource Structured Objects (OV-7)

Taskable Resources

(S&A) Instrument (S&A) Data Source: a adapter to an external location of data for data acquisition (S&A) Data Product: OOI acquired data from a defined origin with a characterized content, such as the processing level (COI) Resource Agent (COI) Service (CEI) Operational Unit (CEI) Elastic Processing Unit (EPU) (CEI) HA Service (S&A) Instrument Agent (resource agent specialization)

Some Taskable Resources may be instantiated from Information Resources, such as an Operational Unit, which is instantiated from a Deployable Unit (here we sometimes speak of "executable resources"). Taskable resources also include physical devices (such as sensors) and external systems.

Figure 3 shows exemplar Taskable Resources as Structured Objects.

56 Figure 3. Taskable Resource Structured Objects (OV-7)

Integrated Observatory Object Model

The OOI's Common Object Model is operationally managed in the system by COI Resource Registry based on the COI Datastore Service. A substantial part of the object model is the description of Integrated Observatory resources, with all their specializations. Science Data is also represented using the Common Object Model, see the Science Data Model.

See Also

Common Object Model COI Data Store Service Interaction Management (transport of objects internal and external to the system) Common Message Format Science Data Model

Registries and Repositories

See the Glossary for the definition of the terms Registry (service) and Repository (service).

Various registry and repository service within the Integrated Observatory are specific instances of a general purpose resource registry and of a general purpose information repository. Specific services are developed by the subsystems based on the general services. Specific services provide a subsystem-specific Structured Object definitions (i.e., data models), access and availability parameters, and quality of service configuration together with specific information management services.

Resource Registry Services

Table 1. List of some central Integrated Observatory Resource Registry services

Subsystem Registry Service Explanation

COI Resource Registry Descriptions of all resource types managed and governed by the CI. Descriptions of all resource instances managed and governed by the CI. All other resource registry services (for specialized resources) provide an added service layer based on this registry.

57 COI Service Registry Describes services and their instances within the OOI Integrated Observatory. This includes service names, descriptions, service interfaces, service instances, characteristics etc.

DM Data Set Registry All data and data products, including raw and derived observational data from OOI infrastructure instruments and external data sources.

S&A Data Source Describes all data sources defined within the OOI Integrated Observatory. Data sources are similar to Registry instruments in that observation data can be acquired from them.

DM Data Stream Describes all data streams defined within the OOI Integrated Observatory. Registry

S&A Instrument/Platform Describes all instruments and platforms Registry

S&A Data Product Describes managed data products that are created based on observations from instruments and external Registry data sources. Often such data products provide a qualified (e.g., QAQC'ed) version of some observational data.

Information Resource Repository Services

Table 2. List of some central Integrated Observatory Resource Repository services

Subsystem Repository Explanation Service

DM Information Is a service on top of the COI Resource Registry. Provides a basic service for describing and storing information Resource resources of any kind. All other resource repository services (for specialized resources) provide an added service Repository layer based on this repository.

DM Persistent Persists information and data files, based on potentially distributed, replicating storage infrastructure. Archive

S&A Data Repository for instrument agents and related information such as vendor specific drivers, calibration, output data Process products, static sensor metadata, instrument configuration, etc. Ancillary instrument resource information such as Repository manuals will be managed by the resource catalog.

A&S Model Repository for numerical model definitions and other data product synthesis processes consuming data and Definition producing data products. Includes parametrization and configuration information. (Not in R1) Repository

CEI Process CI core and user provided process definitions for scheduling and instantiation when and where needed. Process Definition definitions can be in various formats understood by the respective execution engines such as source code, Repository scripts, binaries etc.

CEI Deployable Source code and binary executable components for execution as CI processes. Can be CI core capabilities and Type user-provided processes Repository

CEI Deployable Deployable units that can be instantiated at any time as execution resources throughout the OOI network by a Unit provisioner. Such units are fully packaged and adapted to the specific execution environments they are supposed Repository to operate in.

COI User Identity User identities and ancillary information related identities of users and resources governed by the CI. Repository

COI Policy and Contains resource and user attributes related to governance, policy enforcement and access control Governance Repository

P&P Plan Resource use plans such as observation plans. These can be stored as templates or as specific plan instances Repository for scheduling and later reference. Plan templates can be customized later and executed in reaction to environmental events. (Not in R1)

DM Ontology Repository for ontologies used for semantic query and mediation of science data and data products. (Not in R1) Repository

CIAD OV Data Flows

Information Exchange

58 Table 1 defines information products that are exchanged between operational nodes during the execution of operational activities.

Table 1 Information Exchange

Information Product Definition

Data 'Data' is a term the CI tries not to use in its unqualified form.

Raw Data Any set of information, as it is provided to the CI by an information source, that is associated with the metadata required of CI information resources. Raw data can include observed data, model outputs, and human-entered information. The CI can persist any raw data it receives from data providers in the raw data's unmodified form. The content that the CI receives and considers "raw data" might have already undergone transformation, filtering and correction by the provider before it arrives at the CI (for example, observed values transformed inside the sensor), but any such modifications are outside of the control of the CI.

Observation Any set of information that has been produced by sensing a property or phenomenon in a way that produces an estimate of the Data corresponding physical quantity(ies), and is associated with the metadata required of CI information resources. Observation data is typically generated by an instrument capable of sensing the real world.

Qualified Data that has passed an evaluation or test, and is associated with the metadata required of CI information resources. (Quality Data control is one example of an evaluation or test that can lead to qualified data.)

Derived Any set of information that has been produced by processing other information resources, and is associated with the metadata Data required of CI information resources.

Data Set A self contained CI information resource with a specified set of characteristics, associated with the metadata required of CI information resources. Data Sets may change over time as new incremental updates arrive (for instance, recent observation data is added to the Data Set).

Data A CI information resource representing a flow of data messages from producers to consumers, with a set of characteristics Stream agreed upon by producing and consuming parties, where the individual packets and the overall flow are associated with the metadata required of CI information resources.

Data A coherent unit of information, with specific characteristics, that can be passed around the system and, when combined with 0 or Message more units with the same characteristics, form a data stream.

Data Any specific information resource that is associated with the metadata required of CI information resources, and can be Product presented externally to the OOI CI.

Command A unit of information that contains instructions, and can be interpreted by an agent that can apply policy and translate it into device specific representation.

Command A Command that has been translated into a device-specific representation. Packet

Raw Data A coherent unit of information that does not present higher-level data (e.g., science units), and is still in a device-specific Packet representation, not yet brought in to the OOI CI framework.

Engineering A coherent unit of information that presents higher-level data (e.g., science units), and is still in a device-specific representation, Data Packet not yet brought in to the OOI CI framework.

Figure 1 shows the hierarchical relationship of some of the Structured Objects as concerns of the relevant subsystem

59 Figure 1. CI Data Elements (OV-7)

Data Collection Flow

Overview

Figure 2 shows the basic activities of data collection and refinement supported by the infrastructure elements. The figure shows the logical sequence related to these activities and not the actual activity being executed at any point in the network. The activities are shown in sequence. However, exceptional cases and feedback loops might affect earlier activities in the sequence.

60 Figure 2. 2650-00015 Data Collection Activity (OV-5)

Deployment View

Figure 3 below shows how these data collection activities are located in defined logical deployment packages "points". The Instrument Point hosts the direct instrument/sensor access and can for instance be deployed on a buoy hosting a sensor, or on a mobile platform, such as an AUV. The Acquisition Point hosts the logic that transforms instrument specific command and data packets into the CI observatory services, for instance to perform data processing. The Ingest Point hosts the services that take external data (from sensors or other observatories) and add them to CI registries and storage repositories. The Storage Point hosts the services to store and retrieve data from disk. The Application Point hosts the services to manipulate data and derive advanced data products, such as visualizations and numerical models. The Access Portal hosts the presentation elements, such as a portal server.

Figure 3. 2650-00021 Data Collection Flow, logical deployment points (OV-2)

Data Product Generation

Figure 4 depicts the flexible data processing mechanism. Sensor- and application-specific data processing pipelines and workflows can be developed by sequencing individual processing, distribution and transformation steps that receive data and data products from the Data Distribution Network and make new products available as resources to the Data Distribution Network. This way, any interested party can tap into the process at any point. In addition, buffering, caching and archiving can be managed transparently by the CI infrastructure. A typical sequence involves raw data acquisition from instruments, automated calibration and unit transformation steps, human-in-the loop metadata association and QA/QC steps, and subsequent interactive analysis and visualization.

61 Figure 4. Data Processing and Availability (OV-5)

CIAD OV User and Application Interfaces

Overview

Figure 1 below shows an illustration of multiple OOI domains of authority (facilities). The Integrated Observatory (CI Operational Domain) is the primary point of access for end users of the system. Users can interact with the Integrated Observatory via portals, i.e. web UIs, and specialized applications such as DAP data servers. Advanced users can participate by affiliating their own facility with OOI.

Figure 1. OOI presentation overview (OV-1)

External interfaces include:

62 HTML (web user interfaces) Direct data access: DAP such as through ERDDAP and THREDDS servers (in future releases if required) Web Services (REST/SOAP over HTTP)

It is also possible for external applications to directly interact with ION services through Exchange messaging, providing they have appropriate authorization.

Figure 2. Integration of Web UIs and External applications

Figure 2 shows schematically how the web user interface and external applications are interfaced to the system. This figure shows only Exchange messaging interactions. In particular the Web UI, but also other external applications, such as data set agents deployed at participation users' systems, access the ION system's services. The services to be accessed fall into two categories:

Application integration services: Specialized services for the purposes of providing a simplified, specialized or aggregated interface to external applications Infrastructure services: Subsystem services as specified in the OOI CI architecture

Application integration services are purpose built, in particular in early releases of the system before the infrastructure services are fully generalized and built out. Often, such services have a simple call interface, receiving key value pairs coming for web user interfaces and orchestrating calls to the underlying generic infrastructure services.

The OOI Integrated Observatory Network user interfaces are based on the COI Presentation Framework.

The design of the user interfaces and its user interaction workflows are developed as part of the OOI User Experience effort.

See Also

Subsystem user and application interface pages:

63 COI User and Application Interfaces CEI User and Application Interfaces DM User and Application Interfaces SA User and Application Interfaces AS User and Application Interfaces PP User and Application Interfaces

Classes of OOI Users and Applications

The information contained below is outdated or incomplete. The OOI CI User Experience team is currently working on the ION user interfaces and user interaction strategies

OOI Users information has been consolidated at the CIAD AV OOI User Persona Overview.

Exemplars and Classes of User Applications

Application class Examples of specific applications

Statistical R SAS SPSS Matlab Excel Kepler (through R) ARCGIS/GRASS

Algorithmical Custom code Matlab Kepler Scirun Community Codes ARCGIS/GRASS

Visualization Visit IDL ENV Scirun Graphviz Maya Matlab Covise - collaborative visualization iView3D Google Earth ARCView Excel

Transformational XPath XQuery DAP, OPeNDAP ARCGIS/GRASS

Data browsing WebDAV THREDDS GoogleEarth

Data mining Weka Adam D2K

Collaboration tools Wikis Media Wikki Plone Drupal

Resource planning Pegasus Condor

User applications are implemented as:

Workflow Batch processing

64 Scripting (Matlab, , Python) GUI app Web portals

Mapping Users to Applications

Applications Priority capabilities for integrating in the CI Data Data Operator Observer Analyst Producer

Statistical Standard techniques (e.g., R, Matlab) Yes Partly - Partly

Algorithmical Higher level language support (e.g., Kepler, Matlab, custom codes) Yes Yes - -

Visualization Basic 2D, 3D geospatial, Google Earth, scientific/business graphics, Yes Partly - Partly high-end PR graphics

Transformational OPeNDAP, image transformations, geographic transformation. Yes Yes - Partly

Data mining Clustering, classification, machine learning, input/output processing, Yes - - Partly semantic data mining, regression (e.g., Weka, D2K)

Data browsing Subsetting, search, preview, sampling (e.g., OPeNDAP, THREDDS) Partly - - Yes

Interacting with In addition to the above: graph model with provenance and network field, Yes Yes Partly Yes semantic data tagging, annotations, sorting.

Monitoring Monitor jobs, processing, and resource status; produce a state model of Partly Yes Yes Partly instruments; track data streams

Resource planning Reservation of resources, estimating, creating objectives, selecting Partly - Yes - resources, predicting data flow

CIAD OV Instrument Integration

Instrument and Platform Agents

Figure 1 below shows an illustration of the interfaces of the OOI Integrated Observatory Network to the regional cabled observatory network (RSN) and the coastal/global observatory network (CGSN).

65 Figure 1. Instrument Integration (OV-1)

This diagram shows specific instances of instrument and platform agents as operational realizations of the Instrument Agent and Platform Agent specifications. The specifications and implementation frameworks are developed in the S&A subsystem team. The specific agents are assembled, configured, tested and/or developed in the CI Implementation team. The CI Implementation team als develops a new generic framework for new instrument and platform drivers to be developed.

66 Figure 2. Instrument and Platform Agents (OV-2)

Deprecated

Figures:

OV7 - Logical Data Model Instrument Agent Model

SV1 - Systems and Services Interface Description Instrument Agent Components IaaS Architecture IaaS Engine Architecture

TV1 - Technical Standards Profile IaaS Framework Example1 IaaS Framework Example2 IaaS-GRIM Implementation Stack SENSORS and OMF Proxy Architecture Model

CIAD OV External Interfaces

Overview External Systems CG Systems RSN Systems

Overview

The CI system integrates physical marine observatories into a system of systems, the OOI Integrated Observatory Network (ION). Each observatory as well as the CI as integrator are realized as a Facility and can exert their own authority and define policy for their resources. Each facility can share resources with other facilities within the OOI or external to the OOI.

Figure 1 shows the application and infrastructure subsystems of the CI system, as well as implementation level components, such as instrument agents and external observatory adapters, in the context of interfaces to users and external systems.

67 Figure 1. CI Subsystems, Implementation Components and External Interfaces (OV-2)

See also

Integrated Observatory Network User Interfaces

External Systems

68 Figure 2. External Systems

CG Systems

69 Figure 3. CGSN Systems and Infrastructure

RSN Systems

70 Figure 4. RSN Systems and Infrastructure

CIAD OV External Observatory Integration

Overview

The External Observatory Integration elements externalize the Integrated Observatory Network (ION), making it interoperable with select target communities. The initial target community is NOAA IOOS, represented by specific members of the modeling community. (Later communities include NEPTUNE Canada and the World Meteorological Organization.

"Externalization" includes the development and active targeting of community-specific user interfaces, data input and output adapters, integration tools, and scripts.

The externalization projects for early adopter users and communities provide keener understanding of stakeholder functional, performance and quality needs, driving the incremental development of the ION releases. The Product Manager and User Experience team modulate the with the target user communities, with EOI primarily attended to by the Product Manager.

71 Coincident with and based on ION Release 1, the first integration of IOOS numerical modeler teams will be pursued. This includes exemplar data feeds related to the IOOS Data Management and Communications subsystem and used by the targeted modeling teams.

Figure 1. EOI Operational Node Connectivity Diagram - (google doc)

Implementation

Functionally, EOI has two major roles:

1. Facilitating publication of external data to the OOI-CI Dataset Agent Overview ("System" view) Dataset Agent Details (specifics) 2. Facilitating subscription-based retrieval of OOI-CI resources Dispatcher

CIAD EOI Dataset Agent Overview

EOI Dataset Agents

In order to leverage the capabilities of the Java NetCDF library for transformation and translation of data sources into the CDM form required for ingestion into ION, the Dataset Agents are developed in Java. At this time, the Java Capability Container (ioncore-java) has matured to the point where it can facilitate AMQP messaging, however lifecycle state is not yet implemented. For this reason, the Java dataset agents are launched and monitored by "control" agents that reside within one or more Python Capability Containers. Through this mechanism, dataset agents can have their lifecycle state maintained by the ION system, while still leveraging the power of the Java NetCDF library.

Dataset Agents - System Level View

Details Refer to CIAD EOI Dataset Agent Details for the particulars of the Dataset Agent architecture

Overview

Dataset Agents consist of a tightly coupled "JavaAgentWrapper" + "DatasetAgentController" pairing. Each of these "units" has the following characteristics:

is bootstrapped and monitored as an Elastic Processing Unit (EPU) by CEI, is generic: a dataset agent can be used to obtain data from any of the supported data sources, is reusable: once a dataset agent performs its update action, it is free to perform an update for any other data source, and

72 responds to only one (1) update request at a time.

Figure 1. Dataset Agent Overview - (google doc)

Dataset Agent Ingestion Message Sequence Chart

The following MSC describes the communication patterns between the Dataset Agent (a pairing of the Java Agent Wrapper, or "JAW Instance" in the diagram, and Dataset Agent Controller, or "Dataset Agent Instance" in the diagram), and the various ION services (e.g. Resource Registry, Ingest Service) with which the dataset agent interacts.

Figure 2. Dataset Agent Message Sequence Chart - (google doc)

73 CIAD EOI Dataset Agent Details

EOI Dataset Agent Detail

This page provides fine grained information about the internal architecture of the Dataset Agent, including details about the internal communication/workflow, lifecycle-state (activation and termination), and data/metadata processing pathways.

Overview Refer to CIAD EOI Dataset Agent Overview for an overview of the Dataset Agent architecture

Dataset Agent Diagram

A Dataset Agent is responsible for the acquisition and transformation of a specific dataset from an External Observatory (such as the Sensor Observation Service data from NOAA's National Buoy Data Center). A Dataset Agent is comprised of two main parts: the DatasetAgentController and the JavaAgentWrapper (aka "Control Agent"). The DatasetAgentController is an external process (Java) that is tightly coupled to a Control Agent (python) which can start, stop and monitor the DatasetAgentController process. A given Dataset Agent Instance has the ability to communicate with the Control Agent (in the case of status and life-cycle messages) as well as direct communication with other ION services (in the case of transmitting data to the Ingest Service instance).

Figure 1. Dataset Agent Diagram - (google doc)

State Diagram - Activation

The activation sequence is inacted by standard Service Life Cycle events as in other ION services. However, since the JavaAgentWrapper encapsulates an external process, it must postpone transitioning to the "ACTIVE" state until successful activation of its underlying parts. This diagram highlights the necessary actions and state changes to accomplish delayed activation, wherein the JavaAgentWrapper remains in the "READY" state until its DatasetAgentController sends notification that it too is active.

74 Figure 2. Dataset Agent State Diagram - Activation - (google doc)

State Diagram - Termination

The parent/child relationship between the JavaAgentWrapper and DatasetAgentController is not only highly coupled but is designed to guarantee that the Life Cycle State of the whole (the "Dataset Agent") is totally representative of its parts. This termination diagram highlights two use cases where a shutdown may occur in either component and as a resolution, both units are taken down to ensure consistent state throughout.

75 Figure 3. Dataset Agent State Diagram - Termination - (google doc)

Data/Metadata Acquisition Pathways

This section describes the pathways by which data is brought into the system and how it is transformed/augmented during the process. Changes to both the form of the data, and its associated metadata, are covered.

Data Acquisition Overview

There are two main "groups" of datasets that have been focused on initially, NetCDF and "ASCII". These two "source types" have different initial processing, but follow a single processing chain once the data has been transformed to the Unidata CDM. This separates the unique processing associated with the two source types from the processing that can be performed in a generic way once a common form is achieved (such as calculation of geospatiotemporal ranges and conversion to the OOI Common Data Model).

76 Figure 4. Data Acquisition Overview - (google doc)

"ASCII" Data Acquisition

ASCII datasets (such as those coming from SOS, USGS and AOML) are parsed from their "raw" form into an intermediate working data structure. This structure allows generic determination of the 'feature type' of the data (i.e. station, station profile, trajectory, etc). Once the feature type has been determined, the appropriate ncml "template" is selected and used to generate a NetcdfDataset (Unidata CDM) object, which is filled with the data from the external observatory. Metadata from the source is applied to the NetcdfDataset object (as global attributes) on an agent-by-agent basis. This "intermediate" NetcdfDataset then continues through the processing chain as described in the "Overview" diagram.

Figure 5. Data Acquisition - ASCII - (google doc)

NetCDF Data Acquisition

77 The 'data_source' (ooici internal object) of each NetCDF dataset contains an 'ncml_mask', which is a string describing a Unidata NcML. This NcML "mask" is used to add/remove/modify variables and attributes in the dataset to ensure it is CF and CDM conformant. At this time, the mask is generated by hand for each dataset that EOI registers with they system. User registered datasets must ALREADY be CF compliant to be registered. The mask is opened by the NetcdfJava library and immediately represents the "intermediate" NetcdfDataset that then continues through the processing chain as described in the "Overview" diagram.

Figure 6. Data Acquisition Pathway - NetCDF - (google doc)

CIAD EOI Dispatcher

Dispatcher

This is a client side deployment of an ION Capability Container that hosts an instance of Dispatcher Service. Each deployment of Dispatcher is registered with ION and given a unique identifier (UUID). The UUID is the unique ID of the "DispatcherResource" object. The DispatcherResource that represents a dispatcher deployment is used during the subscription process to link the installation with ION resources. When notifications about a resource are generated, this linkage allows the appropriate dispatcher to react.

In the scope of ION Release 1, the dispatcher's primary responsibility is to start scientific workflow scripts that must retrieve new data and then perform processing actions on the data in preparation for use in scientific models.

Dispatcher Workflows

In the diagram below are four color-coded processes which define how the dispatcher is connected into the OOI system, how a user can pair data resource events with a dispatcher processing script, the notification delivery process and how to disconnect from these notifications when no longer of interest.

Dispatcher Initialization

During initialization (startup) of a dispatcher deployment, the following steps are performed:

1. Check for existence of local "configuration" file a. If the configuration file DOES NOT exist (only very first startup of the dispatcher deployment), the dispatcher uses the Resource Registry to generate a DispatcherResource. This DispatcherResource has a UUID which is written to a local configuration file b. If the configuration file DOES exist, the UUID is read from the file and the ResourceRegistry is called to retrieve the DispatcherResource 2. 78 2. Subscribers are generated for receiving "New" and "Delete" Modification Notifications 3. The dispatcher continues it's initialization by retrieving all DispatcherWorkflowResource(s) that are associated with the DispatcherResource and generating a DatasetChanged Subscriber for each DispatcherWorkflowResource

See the PubSub and Notification architecture pages for details on Publishers, Subscribers and Notifications.

Subscription Creation and Deletion

A user can create a subscription for their dispatcher via the WebUI in Release 1. The user will select the data resource of interest from a list of registered data resources and enter the path to a local script to process the notifications. The Application Integration Services (AIS) will retrieve the DatasetResource, DatasourceResource and DispatcherResource information for this user from the Resource Registry. A new dispatcher workflow is created and associated with both the dispatcher and the user to support internal bookkeeping. a new subscription event is published which contains the DispatcherResource_ID as the origin.

The user will also be able to remove this subscription when the processing of this data resource is no longer necessary. First the association for the dispatcher workflow resource is removed from both the user and the dispatcher resource. Then a deleted subscription event is published which contains the dispatcher id as the origin.

To see how these two notification events are processed, refer to the Subscription New or Delete Notification section below.

Dataset Change Event Notification

When new data is published for a subscribed DatasetResource, the ingest service will receive and process the data update. When the ingestion process has completed, a dataset change event notification is sent to the topic for this data set. Dispatchers that have subscribed to that topic will receive that notification and may launch the workflow script associated with that data set.

Subscription New or Delete Notification

When a user either subscribes to a new dataset or deletes an existing subscription, AIS generates a "New" or "Delete" notification event respectively. Each of these events contains reference to a particular DispatcherWorkflowResource. The dispatcher contains Subscribers for these two event types and will either add or delete the appropriate DatasetChanged subscriber as indicated by the DatasetWorkflowResource that is provided within the notification.

Dispatcher Diagram

Figure 1. Dispatcher Diagram Original (Google Doc)

CIAD OV Operations Support Systems

This diagram shows various support (enabling) systems needed for the operations of the OOI Integrated Observatory (CI) system.

79 CIAD SV Integration Strategy

The software Integration Strategy for the OOI Integrated Observatory Network rests on the following concepts, provided by the Common Operating Infrastructure (COI) subsystem.

Figure 1. Integrated Observatory Network Software Integration through COI Exchange and Infrastructure Services

Integration Strategy Elements

In particular, the integration strategy rests on:

Capability Container Capability Container OV Python Capability Container

80 Java Capability Container

Distributed State Framework Distributed State Management Resource Registry

Secure reliable messaging via the “Exchange” Messaging Concepts in the Exchange Interaction Management The Exchange (OV)

Service-oriented architecture Rich Service Architecture Service Framework Service Integration

Non-central governance Governance Framework (OV) Interacting Agents (see below)

COI Infrastructure Services Common Operating Infrastructure (COI)

Interacting Agents

(this is work in progress)

Agents play a central role in the management of the Integrated Observatory Network (ION) as a federated system of systems with no central domain of authority. Observatories and organizations can join the ION and choose to collaborate to a degree they choose autonomously. The ION as a system realizes a federation of such "facilities".

However, for OOI core assets (sensors, physical marine infrastructure, computation and storage resources), there will be some OOI wide governance in place, with possibility for an independent operation of Marine Observatories. Other facilities position themselves in their relationship to OOI (as data consumers, user platforms, science gateways or resource contributors).

Agents realize the domains of authority within the system in software and operational processes. Agents exist for resources such as instruments, compute clusters and software services. Thereby, the agents inject the authority of the owner and of the operating facility of the resources. Functionally, agents provide a command & control interface for resource, represent the resource state and send out event notifications in case of failures and state change, and provide a list of the resources capabilities.

Dependencies

The concept of agent based resource management is developed across several subsystem teams:

COI subsystem:

Defines the generic services and processes related to resource management within the OOI ION Provides the services to define and manage facilities of independent domain of authority Develops the governance services that manage and interact with agents Develops the identity management and policy enforcement capabilities needed when working with agents and resources Contributes the governance functions of a resource agent Advances the conceptual background for contracts, commitments, facility affiliation May lead the resource agent definition and integration effort

CEI subsystem:

Manages resources with behavior and state (taskable resources) through flexible management services. Defines the resource agent to management services interface Applies resource agents for computational resources (physical and software) May lead the resource agent definition and integration effort

DM subsystem:

Limited involvement in the area of information resources and their governance

S&A subsystem, with instrument and platform agents (IPA):

Applies resource agents for instrument (sensor) and platform resources Drives the concepts of proxy agents representing remote

Timeline

81 In Release 1 there will be only one facility, the OOI ION facility. This means one domain of authority but multi-site deployment. Therefore, any issues related to multiple domains of authority will not be addressed in this release.

The agent concept will be evolved first with a focus on resource management (control, monitoring, failure detection, access policy enforcement and capability listing).

In Release 2, Marine observatory facilities will be supported. Two marine facilities will be operated within OOI for CG and for RSN.

In Release 3, laboratory and classroom facilities will be supported. Some of these will be operated by OOI. Others may be operated by other organizations that have an interest to interact with OOI.

In Release 4 and 5, interactive observatory facilities will be supported. They are a logical extension of Marine facilities with the purpose of

Background

See Also

System AVs: OOI operations overview and membership model OOI deployment and multi-facility concept System OVs: Resource Agents Instrument Integration Subsystem Level: COI Governance Framework COI Resource Management Services SA Instrument and Platform Agents

CIAD SV Deployment Strategy

Integrated Observatory Domains of Authority vs. Deployment of CI software

As introduced in the Operations and Deployment Overview, the OOI Integrated Observatory consists of a number of operational domains of authority, with responsibility for physical resources and infrastructure resources. Operational domains include the two marine observatories (CGSN and RSN), the Integrated Observatory operations (CI), and end user domain of authority, which include the EPE IO infrastructure and other user domains. Figure 1 depicts the OOI operational domains and their network connections.

Figure 1. OOI Integrated Observatory Operational Domains (OV-1)

CI software elements in form of CI Capability Containers are present throughout the entire OOI integrated observatory network, from deployment locations aboard mobile instrument platforms such as AUVs, to marine infrastructure (for instance supporting global mooring platform controllers) to observatory network operations CyberPoPs.

The CI deployment strategy specified in this section determines how CI software elements are packaged for deployment, deployed and

82 provisioned such that they can become part of the OOI integrated observatory as a highly distributed, partially intermittently connected system of systems. Figure 2 provides an illustrative figure depicting a scenario with CI capability containers, depicted as octagon-shaped elements, deployed throughout the OOI network, governed by different domains of authority.

Figure 2. OOI Deployment Scenario (OV-1)

Another perspective on deployed elements of the system that may be helpful is in this end-to-end illustration of the subsystem and service dependencies.

Virtualization and Cloud Computing Strategy

All CI deployed computation on the terrestrial side of the OOI Integrated Observatory is leveraging virtualization. The CEI provides the infrastructure that enables Integrated Observatory operations to operate a system-of-systems deployed on virtualized components, thereby abstracting the applications and system software completely from the underlying hardware and network.

The CEI introduces a specific type of operational unit (i.e. a virtual server instance), the Execution Engine. The basic type of execution engine provides one or multiple Capability Containers, hosting system services, processes and agents. Specialized user application specific execution engines, for instance Kepler scientific workflow engines, or an engine that can run a numerical model, exist to be provisioned on demand to meet the OOI end user demand.

The virtualization strategy is a cornerstone of the OOI Integrated Observatory operations. Core system services and infrastructure, such as instrument management, data acquisition, data product generation, resource management, data persistence and identity management are deployed in virtualized environments in one of the OOI operated Terrestrial CyberPoPs across the country. This means that although cloud computing and virtualization concepts are applied, the computation is fully controlled under the authority and system administration of OOI Integrated Observatory operations, not of any external commercial or academic cloud provider.

To support sporadic, intermittent or longitudinal user requested computation, external commercial and academic cloud provider infrastructure is accessible seamlessly from the Integrated Observatory network. Commercial cloud providers that will be supported include Amazon EC2 and Microsoft Azure, and potentially the Google AppEngine cloud. Academic cloud computation providers include the TeraGrid and the OpenScienceGrid and various science clouds. This means that users can define their algorithms as "processes" that can be executed by execution engines, deployed on external cloud computing resources, seamlessly accessing the core services and data distribution network of the Integrated Observatory. Costs for such user demanded computation may either be covered by research grants, or by the requesting user organizations.

It is possible to operate non-production ION system instances, such as for development, testing and experimentation completely deployed within external cloud computing environments. OOI users and external organizations will be able to operate ION system instances, for instance for experimentation, standalone operation, or OOI affiliated operation, completely outside of OOI computational resources, or on cloud-based systems on user organization's infrastructure, such as a compute cluster with CEI infrastructure deployed.

The OOI Integrated Observatory Cloud Computing and Virtualization Strategy is based on the following additional assumptions:

The ION will integrate Teragrid resources and the Open Science Grid (OSG) as cloud execution provider, under the umbrella of the Science Cloud. This will eliminate the dependencies on the currently provided execution environment (e.g. TeraGrid gateways) that many scientists have problems interfacing with. This strategy is based on strong relationships with major commercial cloud computing and storage providers under the umbrella of larger academic organizations such as CENIC. The two commercial providers currently targeted are Amazon for their Elastic Compute Cloud (EC2) and the Microsoft Azure cloud services. Based on these relationships, special rates for execution and transit cost may be negotiated to the advantage of OOI users and operational budgets. External cloud execution resources will be used as overflow capacity. All core marine observatory interfaces, core data product generation including QA/QC, infrastructure monitoring and data management will be provided on secure, reliable OOI CI hardware

83 (CyberPoPs) with appropriate capacity and contingency for growth. OOI users will be able to use ION services within their own processes (such as numerical model executions), on CI Execution Points deployed within the cloud environments. Initially, the CI may cover the cost of providing the requested cloud execution resources. When computational demand increases among the user base, specific researcher provided grants can cover the execution cost. Execution cost billing is based on direct cost agreements between the users and the cloud providers. Users can consume execution time and storage as needed from the cloud providers and will be billed directly. The CI IO provides data caching and access within the different cloud environments, as well as the high bandwidth communication infrastructure and all CI services. The CI will operate designated communication infrastructure and peering points to the cloud providers and operate extensive observational data caches within these environments to eliminate the substantially high transit cost to and from the cloud networks. Cost is about $0.15 per GB for transit. The cost of cloud storage (under full RAID conditions) is in the same order of magnitude as acquiring, deploying, operating and maintaining designated hardware resources, when considering a 3-year hardware refresh/replacement cycle and the cost of 1 FTE to operate 1 PB storage and related execution resources, and when transit cost is eliminated or fixed. CI has done a calculation in 2006 for gross storage cost. Cost ranged per year and TB between $700 net and $1800 (Amazon cloud, gross). Institutional infrastructure storage cost (SDSC) was in between. The OOI operations strategy is based on fixed cost for storage. All data will reside redundantly in the OOI operated physical infrastructure, as long term storage with an approximate capacity of redundant 100 TB in the main Acquisition Point. Cloud providers are used as cache. Data transit once, after acquisition from the marine observatories. A fixed amount of data is cached in the cloud (e.g. in Amazon S3), close to computation in CI execution points. For excess storage and archiving, deep storage is provided by NSF infrastructure (e.g. NCAR storage allocation). The main storage location is the Portland Acquisition Point, providing at least 100 TB redundantly within the physical location, with sufficient computational resources to receive and process the data volumes coming from the RSN cabled network and the 10 GigE link. Geographic redundancy in the primary storage for all data is not paid for by CI infrastructure but is realized indirectly by the replication and caching strategy. The yearly estimated data volume is initially ~40 TB with most volume coming from hydrophones and seismometers. HD video footage coming from the RSN infrastructure is stored in lossless compression for time spans in the order of weeks, in lossless compression for scientist selected sections of interest for time spans in the order months and years, and for lossy compression without initial limit. Any other instrument-produced and derived data including raw HD video footage is available through the OOI network in real-time streaming without accessing a datastore. The applied CyberPoP deployment pattern is based on reliability and scalability. Dependent on the type and function of a CyberPoP there will or will not exist redundant execution, storage and networking resources. The primary Acquisition Point (Portland) provides absolute RAID6 data redundancy 8. OOI accounts for about 40 CPUs total designated and operated hardware for the acquisition, distribution and management points. The biggest installation is the Portland Acquisition Point with about 20 CPUs total. The CI operates a designated Layer2 distribution network across the country based on switched 10 GigE LambdaRail infrastructure. It is a full ring, not just a set of point-to-point connections. 3 distribution points are accounted for as 10 GigE peering points to major infrastructure and cloud providers, in McLean VA (Amazon East), Seattle (Microsoft), San Diego (Teragrid, Amazon West). Partnering organizations can directly connect to the high bandwidth distribution network for immediate and direct OOI data access. Example scenarios for user access to data are (1) using GridFTP and Layer-3 routing and (2) An application accesses an CI Distribution Point using GridFTP which then uses the Layer-2 network and efficient data transfer protocols to the Acquisition Point. All of the external OOI CI services including web portals are deployed on the Distribution Points.

Deployment Mechanisms and Activities

The Common Execution Environment (CEI) provides the underlying mechanisms and services for the deployment, execution and operation of any CI core and user provided functional capabilities at various locations throughout the OOI observatory network with heterogeneous execution environments.

The Common Execution Infrastructure Overview provides detailed definitions and domain models for the relevant concepts of the ION Deployment Strategy. For instance, domain models show the dependencies of implementation, deployment and operational artifacts, and of activities related to these artifacts, such as provisioning.

The basic deployment as depicted in Figure 3 steps include:

Within the CI Software Development Environment, software components get packaged and versioned. A (Software) Component repository exists to manage the components, any source code, binaries, make files and version information. Bundles (configurations) of software components represent Deployable Types, which are consistent executable software packages are independent of any requirements of specific execution environments. The CEI provides services to automatically create Deployable Units from Deployable Types by adapting them to a target operational environment. This adaptation step might include binding contextualization code and configuration properties specific to a target environment and the packaging in a specific binary format, such as a virtual machine image. These Deployable Units still have no identity, can be stored as blueprints in caches and repositories and can be automatically provisioned and instantiated as needed in the remote target execution environments. The CEI provides the services to provision Deployable Units within these environments. The instantiation of a deployable unit requires the execution of the contextualization activity by the newly instantiated unit within the target operational environment. Contextualization assigns an identity to the unit and registers its resources and services within the network. A fully instantiated and initialized unit realizes an Operational Unit. The CEI provides services to transparently cluster a number of operational units of same type to Ever-Present Units (EPU). Such EPUs have one virtual identity and address to the environment and perform the designated services provided by the type of operational units. The CEI provisioner ensures that enough instances of Operational Units are available to guarantee basic availability and provisions additional instances to meet dynamic demand and realizing scalability. The CEI provisioner has the capability to provision new instances of operational units within seconds where required within the network. All steps occur automatically. The CI provides efficient access to cloud computing and storage environments, such as the Amazon Elastic

84 Cloud and the Teragrid, such that true elastic computing can be realized.

Figure 3. Deployment Workflow (OV-5)

System Bootstrapping and Startup

Both COI and CEI subsystems are symbiotically responsible for bringing the OOI Integrated Observatory Network, or any similar and test installation, into being. Such a system start occurs on behalf of responsible operators, who have provided deployment and configuration information beforehand.

For details, see the System Bootstrapping and Startup page.

Capability Container Deployment

85 Figure 4. 2650-00021 Data Collection Flow (OV-2)

This is partly outdated and will be updated shortly

Figure 5 illustrates the deployment of tailored CI Capability Container services for the data collection activity scenario throughout the observatory network. It makes use of the various CyberPoP configurations as introduced in ION Integration and Deployment. This specific scenario assumes a low bandwidth, intermittent connection between the physical instrument interface of the CI with its acquisition capabilities and the remainder of the CI network. Because of the resource-constrained nature of this scenario, the Instrument Point Capability Container provides only general CI infrastructure and instrument proxy services. The Acquisition Point Capability Container illustrates the use of infrastructure services to implement the processes in the data collection activity scenario. Both Instrument Point and Acquisition Point are deployed as Marine Execution Point CyberPoPs, MEP.

86 Figure 5. 2660-00002 Capability Container Deployment Model (Global Mooring Scenario)

On the shore side, the Ingest Point (deployed in an Observatory Acquisition Point CyberPoP OAP) is responsible for accepting data from a low bandwidth satellite based network and for providing data repository ingestion, cataloging, metadata association services.

The process execution infrastructure service provides the filtering and triggering processes in this Capability Container. An additional function supported by the Capability Container is the presentation capability implemented in the Access Portal Cap Container. It contains portlets for session management and Google Earth presentation drawing upon design references from based portal frameworks, as well as an http container. The infrastructure elements that support the Data Collection Activity of a global scale, buoyed observatory are one deployment scenario. This spans the entire range of deployed systems and networks from ocean-based instruments to CyberPoPs and user applications.

The Instrument Point represents the interface between the physical world and the CI; it comprises proxies that provide a programming interface to the instruments. The Acquisition Point provides instrument control and data acquisition and transmission functions. It comprises a process and instrument controller and a data acquisition subsystem. Researcher-supplied triggers initiate data acquisition processes that the process controller translates into commands for the instrument controller.

Data from the instruments transit from the instrument controller to the acquisition component, where researcher-supplied filters result in either new data acquisition or transmission of the data to the Ingest Point. Data are sectioned appropriately by a segmentation component for transmission and handed over to the transport broker, which internally may use an Object Ring Buffer (ORB) and the communications controller to locally store and transmit the data, respectively, based on network availability and Quality-of-Service (QoS) constraints. At the Ingest Point, data arrive via the local communications controller and transport broker. The latter feeds data correction and ingestion components. The ingested data, along with their metadata, are buffered via the local storage broker.

The storage broker interacts with the Storage Point that offers repositories for data and services, as well as data and metadata cataloging. The researcher has multiple ways to view and further process data. First, an Access Portal supports data access and presentation via web and other browsers. It interfaces with the Storage Point via a local storage broker and offers components for search, navigation, and presentation. Second, an Application Integration Point supports data access, transformation and analysis via a set of analysis tools. It also connects to the Storage Point via a local storage broker, and offers programming interfaces so that researcher-supplied analysis and transformation processes can access and manipulate the data.

In this deployment scenario, the Sensing & Acquisition subsystem provides services to be deployed on the Instrument Point, namely the platform and instrument agents, acquisition and segmentation components, transport broker, ORB and communications controller of the Acquisition Point, as well as the communications controller and transport broker of the Ingest Point draw upon BRTT Antelope® components for the integration of existing instrument adapters and data processing components. The Data Management subsystem comprising the storage provider with its repositories and catalogs at the Storage Point and the federated storage brokers of the Ingest Point, Access Portal, and Application Integration Point are built from UCSD iRODS components. Presently, v1.0 of the ROADNet PoP (RPoP) integrates Antelope and SRB, the predecessor of iRODS, in a small, low-power, low-cost LINUX box using Intel XScale Processors. The next operational release will include web services support.

The Common Execution Infrastructure is implemented using researcher-provided filter and trigger processes in the Acquisition Point, a data correction process in the Ingest Point, a presentation process in the Access Portal, and the transformation and analysis processes in the

87 Application Integration Point, where the visualization and modeling capabilities of, for example MatLab, are also provisioned.

The science data management functionality of the Data Management subsystem, comprising ingest and metadata cataloger components at the Ingest Point, the metadata-based search and navigate components of the Access Portal, and the navigate component of the Application Integration point, are implemented based on design references from the MBARI Shore Side Data System (SSDS). The repositories and catalogs at the Storage Point are implemented using OOI-specific adaptations of iRODS repositories and catalogs currently deployed for BIRN/Telescience, with necessary extensions to house service repositories. All infrastructure elements are implemented using a Rich Services Deployment Pattern. They are shown with their particular plug-ins.

CIAD SV CyberPoPs

Cyberinfrastructure Point of Presence (CyberPoP)

The primary deployment element of the OOI Integrated Observatory is the Cyberinfrastructure Point of Presence (CyberPoP).

A Terrestrial CyberPoP is the primary physical hardware and software deployment location of the OOI Integrated Observatory. it is a geographical deployment site of the OOI Integrated Observatory Network with physical plant operations providing computation, storage and network resources.Terrestrial CyberPoPs are OOI Configuration Items and are commissioned according to the Transition to Operations plan, including 2 Observatory Acquisition Points (OAP), 3 Observatory Distribution Points (ODP), 1 Operations Management Point (OMP). A special form of Terrestrial CyberPoPs is the Observatory Execution Point (OEP), which is operated on the infrastructure of external computation and cloud providers.

Marine CyberPoPs are OOI deployment sites in the Marine Networks, operated by the CGSN and RSN Marine Observatories (see External Interfaces), hosting CI developed Integrated Observatory Network software deployments, such as Instrument and Platform agents.

The Integrated Observatory application and infrastructure services hosted by a CyberPoP are deployed within CI Capability Container, as described in the Integration Strategy and Deployment Strategy.

CyberPoPs provide the central connection points to the OOI National Internet Infrastructure (NII), as specified in the network architecture.

OOI-operated Terrestrial CyberPoPs

There are three functions of Terrestrial CyberPoPs:

Observatory Acquisition Point (OAP) Observatory Distribution Point (ODP) Operations Management Point (OMP)

The Observatory Acquisition Point (OAP) is a hardware environment to be deployed within a protected data center facility, comprising a CI capability container configuration that provides the primary point of access for Marine observatories to the CI and all the necessary computational, storage and network resources in a redundant layout. It provides a highly reliable, scalable and secure environment for data acquisition, initial data processing such as segmentation and QA/QC and data preservation.

The Observatory Distribution Point (ODP) is a hardware environment comprising a CI capability container configuration for OOI data distribution across the distribution network and for peering with external network providers, cloud execution and storage environments such as the Amazon Elastic Cloud and the Teragrid.

The Operations Management Point (OMP) is an environment comprising hardware and a CI capability container configuration deployed at various physical locations close to marine observatory and CI control centers, providing observatory network and resource operations and state of health monitoring capabilities.

Non OOI-operated Terrestrial CyberPoPs

In addition, there is one kind of Terrestrial CyberPoP on infrastructure of computation and cloud providers, not operated by OOI:

Observatory Execution Point (OEP)

The Observatory Execution Point (OEP) is CI capability container configuration to be deployed either on OOI operated hardware or in cloud execution environments. OEPs can be provisioned on demand when required for the execution of user processes, such as numerical models and data visualizations. The CI provides Common Execution Infrastructure services for the elastic provisioning of such CyberPoPs in cloud environments, such as the Amazon EC2 and Scientific Clouds based on the TeraGrid.

Types of Marine CyberPoPs

There is one function of Marine CyberPoP:

Marine Execution Point (MEP)

The Marine Execution Point (MEP) is a hardware environment and a CI capability container configuration to be deployed in science payload

88 hardware environments of marine observatory infrastructure, such as aboard global buoys and AUVs. MEPs interface with proprietary instrument and platform controller software and represent their resources and capabilities to the OOI network. MEPs do not modify or replace existing software and hardware installations but instead provide a layer on top of them with direct connectivity to the OOI integrated observatory network. The hardware configuration in a MEP deployment is limited in terms of available computational, storage, power and bandwidth resources. The MEP is designed to be independent of the computational and storage hardware environments embedded in off-the-shelf marine infrastructure and instrumentation components. However, the software environment around the CI capability container supports the direct deployment on available hardware, providing sufficient power, computational and storage resources are provided.

In addition, there is CI configuration items with CI software, supporting the development of Instrument Agents and Drivers:

Instrument Development Kit (IDK)

The Instrument Development Kit (IDK) is an environment comprising hardware and a CI capability container configuration that will be used for dry and wet system testing of sensors and instrument platforms and their driver software, before their actual deployment on marine observatory OOI infrastructure in the field.

The National Internet Infrastructure (NII) provides the communication network environment for the OOI integrated observatory. For its high bandwidth (data) distribution network, it is based on the CI IO operated exclusive Layer-2 10 Gigabit Ethernet network loop around the US using National Lambda-Rail infrastructure. Furthermore, it makes use of routed Internet-2 IP network infrastructure to provide access to the public Internet and as redundant lower bandwidth management network for the distributed OOI installation sites. The different CyberPoP configurations are clients to these networks.

CyberPoP Dependencies

Figure 1 shows dependencies of the various types of CyberPoPs

Figure 1. 2660-00013 Types of physical CI deployments, CyberPoPs (SV-1)

CyberPoP Deployment

Figure 2 shows a high level overview of the deployment sites of Integrated Observatory CyberPoPs, their CyberPoP function and National Internet Infrastructure connectivity.

89 Figure 2. 2660-00014 CyberPoP Deployments (SV-1)

CyberPoP Internal Decomposition

The CyberPoP is a collection of capabilities dispersed across multiple networks. These networks are implemented via physical isolation (e.g. dedicated switches), or virtualization (e.g., tagged VLANs).

Public Network From a 3rd-party external perspective, the CI provides a number of distribution points (CyberPoPs from San Diego, Seattle and Chicago) that interface with the Internet and provide classic services, such as web-based access, VPN, or more advanced messaging capabilities via the AMQP protocol (subject to deep message inspection and filtering). Hence, these CyberPoPs provide a firewalled public-facing network as depicted in Figure 3, i.e. the ONLY entry points into the OOI network from the outside world.

Peering Network All CyberPoPs feature the so-called peering network, which operates over L2 circuits at the physical level and messages at the application level (via a private message router). The peering network spans across all OOI-CI for data distribution. The Marine IO capabilities are integrated within the CI at this level though appropriate interfaces/processing. This strategy is essential for the implementation of the security concerns presented by the US Navy, i.e. the acquisition points are isolated from all Internet connectivity and all CI traffic flows through well-formed messages subject to deep inspection, policy, and filtering.

Services Network and Underlying Technologies The Services Network accommodates traffic for ION services, which may interface with other implementation technologies via the implementation network. This layered approach provides separation of concerns and traffic isolation that improves overall system security and performance.

90 Figure 3. 2660-00015 SV1 CI CyberPoP Components

See also:

CyberPoP General Development Strategy Incremental CyberPop Rollout CyberPoP Internal Connectivity CyberPoP Physical Layout CyberPoP Management

CIAD SV CyberPoP General Development Strategy

Secure Scalable Service Platform Deployment Pattern

This section specifies a logical deployment pattern for a secure, high-availability, scalable service installation. This pattern is customized and applied in the deployment of each terrestrial CI CyberPoP, as defined above. The pattern is described independent of the OOI CI as it was first developed in the context of business information and exchange systems and later adapted to fit the needs of the OOI CI deployment.

The service installation focuses on satisfying the following requirements in priority order: Security, Performance, High Availability, Scalability and Offsite Management. Figure 1 illustrates the logical deployment pattern for the CI CyberPoP in relationship to the different users that need to interact with the Installation and the Internet as the intervening communication infrastructure between them.

91 Figure 1. 2660-00006 Logical Deployment Model (SV-2)

Security is achieved by the isolation of access and the separation of functionality. Figure 1 shows the complete separation of the production environment from the management environment. It also shows the isolation of the "end user", external logic from the internal service components. Isolation has to do with a layered defense, just as any corporation employs with a DMZ model. The principle is that nothing of intrinsic value is deployed on the External Production servers that are placed on the Public network. The Internal Production servers are placed on the Service network, which is an isolated and secure network. The Service network has no direct connectivity in or out to the Internet. All production assets of value are stored on servers attached to this network or it sister network, the Data network. The Management network is completely inaccessible from the production environment. All connections between the Management and the Production environments must be established from the Management network. Services from one network are deployed to other networks (e.g., DNS or ODBC) by presenting them as Virtual IP addresses. One of the fundamental design principles from both a security and performance point of view is not to use IP packet routing "Layer 3 Routers" in the network infrastructure. It is too easy to open up holes by accident, and based on the flow of production traffic, routing does not serve production value.

Performance is obtained by removing as much in-line packet inspection on the outbound traffic as possible and through scalable concurrent execution. The first aspect is implemented by only using a switch network infrastructure. The second is achieved by decomposing the logic of the system into independent functional concerns that can be linearly scaled through the addition of incremental resources. The lines of decomposition for the Service are ordered: first by user session independent processing, then by shared read-only processing, and finally by shared read/write (transactional) processing.

High Availability (HA) of the physical plant network is the result of implementing a fully redundant system such that no single failure will bring down the whole system, and conducting regular testing of failure scenarios. There are multiple scales of concern when addressing this aspect of the system. The scales range from ensuring that every hardware system is receiving power from two independent sources and pathways to those sources, through the duplication of every hardware and software system component in the installation, to duplication of the installation in geographically-diverse locations. The implementation of an HA system is not an all-or-nothing strategy. With regard to scope of redundancy to be

92 addressed, there are risk vs. cost trade-offs. Lower level redundancy like power and Internet connectivity can be delegated to the Hosting (a.k.a. Collocation, "Colo") facility. Geographical diversity can be address at a later phase in the project's maturity. This pattern provides system redundancy within the Installation and operating on the premise that the Colo will provide uninterrupted power and continuous bandwidth within an environmentally resilient facility.

The core network infrastructure components (i.e., Firewalls, Application Routers and Ethernet Switches) are all deployed in pairs. Based on final cost constraints, a decision has to be made at each site and also at the CI level on whether to run these pairs in Active/Active or Active/Passive mode. Active/Passive mode is typically less expensive because the passive system in a pair is considered to be running a backup copy of the Active system's operating system. Active/Passive has approximately half the performance of an Active/Active configuration. The Ethernet Switches are deployed in an Active/Active configuration with VLANs trunked across them. All network infrastructure components should be run with redundant power supplies, the appropriate cross-connects between HA pairs, and multi-homed uplink connections between the HA network infrastructure layers (i.e., App Routers, Ethernet Switches, and the Firewalls, see Figure 2).

Different options for Server redundancy can also be considered. Both the power and network connectivity can be duplicated for a full HA solution. Duplicating the power is only a matter of cost. Duplicating network connections to both switches (multi-homing) and configuring the network interface cards (NIC) to failover the connection when the one of the switches fails is feasible, but incurs additional complexity to implement and test. In the case where servers are only homed to one switch and one power source, when a switch fails, the portion of the computing capacity connected to that switch is lost. At the smallest scale deployment, this can represent half of the capacity of the installation. Similarly, if a power supply or the NIC fails the server is lost. However, as the installation scales, the loss of an individual server becomes less significant and actually an event expected to happen.

A sensible recommendation is to run the initial production installation with separate physical server pairs for the Service, Data and Management VLANs. This places six servers in operation within the installation. If any of the four production servers fail, a portion of the one of the Management servers can be configured into the production network until the failed server can be replaced. The Management servers have separate duties within the Management VLAN, but should be resynced to ensure that the full capabilities of both servers are available in case one of the physical Management servers fails.

Scalability of the computing and storage infrastructure is achieved by adding more units. In most cases this can be done dynamically by the Application Router though its mapping of VIPs to pools of servers, without requiring any portion of the Service to shutdown.

Management covers a number of independent concerns and roles involved with operations and maintenance of the Installation. The main decomposition of concerns is between content, application and system level management responsibilities. The Management VLAN is designed with an independent access mechanism to ensure the separation between users and staff. The VPN ensures the confidentiality of the communication between the Installation and any Management Point, as well as providing the convenience of being on Management VLAN at the remote Management Point after establishing the connection. The VPN system gives the Installation administrator the ability to govern from where (i.e., IP address) the Installation can be managed, independent of who (i.e., username) can manage it. The recommendation is a VPN be used with a hardware token generator (Secure ID from RSA) for two-factor authenticated access.

Virtual Technologies

Figure 2 illustrates the recommended deployment of service components within the virtual network and server environment. These components (IP addresses, LANs, Servers) all have physical and virtual representations.

93 Figure 2. 2660-00011 Component Deployment Model (SV-2)

Figure 3 provides a simplified view of the core physical assets and their inter-connectivity. To understand how the Installation can economize its use of few physical assets to implement the more complex Network Deployment Model, it is necessary to have a basic understanding of the mapping of Virtual to real LANs, IP addresses, Private Networks and Servers.

94 Figure 3. 2660-00005 CyberPoP Hardware Deployment Model (SV-2)

The Virtual LAN (VLAN) is used to segment an Ethernet Switch into multiple completely isolated networks. Assigning physical ports on the switch to one or more VLANs accomplishes this. An important addition to the VLAN concept is the "tagged" VLAN. This allows a single connection between two devices (a physical network segment) to carry traffic for multiple VLANs. The implication of these two capabilities is that one connection between the Application Router and the Switch can carry the traffic from all the isolated VLANs that need to present their services to another VLAN. This means one Application Router can be used to support remote introduction of VLANs and alteration to the placement of servers on multiple VLANs without having to make physical wire modification to the Installation.

The Virtual Private Network (VPN) allows a device in one network segment to join another network segment by making an IPsec tunnel over the intervening inter-network. Typically, once a device is incorporated into a VPN, it may not communicate with devices on its network of origin directly. This prevents the incorporated device from becoming a router between the two networks, thus operating outside the control of the network environment of the VPN. The capability of tunneling is a significant productivity enhancement for remote management of the production installation. VPNs are usually combined with strong authentication such as certificates and/or single use token mechanisms. It is recommended that the VPN be protected with the use of single use tokens (i.e., Secure ID token generators in combination with Radius and LDAP servers). This access mechanism only applies to the content, application and system management personnel.

The use of a Virtual IP address (VIP) allows a cluster of IP addresses to handle the traffic directed at a single IP address. The class of networking equipment originally called "Load Balancers" and most recently "Application Routers" manage VIPs and their associated pool of IPs. The names arise out of their capability of providing a wide range of rules for how the IP traffic arriving at the VIP is directed to its pool of IP addresses. If all the connections to the VIP are stateless, such as HTTP requests, than a simple round robin or server load based routing rule works. If the connections are stateful, such as an ODBC connection, then a partitioning of the traffic can be based on the source IP address of the connection. In this case, all traffic for an established connection is routed to the same IP address in the IP pool. Based on newer technologies, routing can be based on content within the packet, commonly referred to as Content-based Routing. It is recommended that this feature be employed at the point when it becomes advantages to support the notion of a stateful user session cached on a server.

95 The most recent addition to the family of virtualized components ready for production use is the Virtual Machine (VM). Advances in the past five years in VM efficiencies, the continual improvements in CPU performance and the emergence of multi-core processors in the presence of tens of Gigabytes of memory makes this technology an extremely effective deployment choice. A single physical server with two quad core processors and 16 Gigabytes of RAM can be segmented into 1 real and 7 Virtual Machines each securely isolated with a single processor and 2 Gigabytes of RAM. Each virtual machine can be an independent set of functionality based on separate Operating Systems communicating on different VLANs. This capability could be used for a low cost initial deployment of all the Production and Management Services on a single redundant pair of physical servers (not our recommendation). One open issue that needs testing is whether tagged VLANs and VMs are compatible and secure. If not, it will only mean that each physical server will need a physical network interface for each VLAN in it is a member.

Physical Site Deployment Strategy

The decomposition of the CI Services into deployment packages evolves with the understanding of the nature of user demand and the behavior of the code base. That said, the lines of decompositions chosen for a distributed system at the outset of a development effort tend to persist for a long time given the difference between local and remote execution methodologies for invoking functionality. It is important to identify the major lines of decomposition and validate them early in a product's lifecycle in a principled way:

Separate the business logic from the data management, Separate the End User and the Content Management business logic, Separate the Content Files (i.e., images, binaries, etc.) from the Database, Using the End User perspective, separate the Read-only Database tables from the transactional tables.

As per Figure 3, this results in five functional groups: two business logic, two database and one file management, with the business logic and database groups incorporated into four separately deployable virtual packages:

User Session package containing WWW and Rights Management logic, Content Management package containing Content and Catalog update logic, Database package containing the top level database logic and all the catalog specific logic, Data Cluster package containing the transactional logic of the database.

The File Management functionality amounts to distributed file replication mechanism relying on "rsync" and driven by Content Management business logic so does not need a separate VM package. In addition to the VM packages directly associated with the Service, the recommendation is to assemble a set of Installation Support packages:

A Base Server package from which all other packages are derived contain the standard security and management functionality expected of systems in the Installation, System Services package containing the system level services that support the business service (i.e., HTTP proxy, DNS, NTP, Send Mail) The Management Services package containing all of the monitoring and service validation functionality, The AAA Services package containing the management environment authentication, authorization, and auditing (logging) functionality.

It is recommended that all VM packages be maintained in a central repository and scripts developed to parameterize their deployment into the different Installations and specific host environments on which they will be run.

Based on deployment model of supporting multiple independent distribution Channels with the same Service platform, it is recommended that these packages be made available to the distribution channels as a shared resource and give the Channels the responsibility to localize, deploy and manage their own Installation. This may cause some difference is quality of delivery across the channels by that can be mitigated by:

Developing local packaging mechanisms that can incorporate localized functionality on top of the base platform without altering the base, and Providing a validation/certification program for Channel installations.

To ensure efficient and reliable evolution of the Service, it is recommended that a formal environment and set of procedures be established for the promotion of new versions from development to production through the controlled exposure of the new release to different testing regimes. The recommended pipeline of testing regimes is:

Development Installation for development driven unit and regression testing as well as performance analysis QA Installation for the formal in-house regression, performance and user testing, Staging Installation for formal external user and integration testing as well as the product promotion of the next release

The installation of the development test environment needs to reflect the service decomposition only. It can reside on a different hardware configuration. This can be as minimal as a single server supporting multiple VMs. The QA installation should represent the service and system level configuration. It need not address the High Availability aspect of the Production environment. The Staging installation should mimic the production environment exactly to ensure all installation scripts and network configurations have been thoroughly vetted prior to deployment in production. With some care and concentration using the deployment package repository and the VM configurations, the server and networking infrastructure for development, QA and Staging can be shared. This will limit the total cost of the complete testing environment to 1x of the minimum Production installation.

The Service version upgrade strategy in Production is a non-trivial concern, especially if it is to be accomplished while the system remains running. It is important to institutionalize the fact that the QA process continues into the initial days of a Release into Production. To support this premise and mitigate any serious disruption to the service, it is strongly recommended a minimum strategy for immediate fallback to the previous Service Release be put in place during the initial period of any upgrade. More sophisticated schemes can be devised that are based of concurrent support for two versions in a phased exposure of a new Release to the user base. Both models can be supported with the Installation architecture proposed and the recommendation is to start with the simpler switch over model, until demand (scale) and experience in the Service are gained. The fundamental tools in managing the Release in both cases is the use of Application Routers VIPs to direct traffic to a particular set of VMs that

96 represent the two releases. Scaling most of the Service Installation can be accomplished while the service is in operations. That said all alteration to the service should be done during quite periods in the Service's weekly usage cycle until a thorough understanding of the upgrade procedures has been developed and practiced. The network infrastructure can for the most part be incrementally scaled through equipment upgrades and incremental addition of duplicate equipment. Scaling the services and servers are also only a matter of adding duplicate equipment. With one exception these can be added while the services is running. The exception is scaling the Data Cluster as mentioned above; it will require a restart of the Data Cluster to incorporate additional data nodes. To do this while the Service is running will require a Service software strategy to be devised for suspending interactions with transactional tables (i.e., User, Phone, Subscription, Transaction) for a period of time, on the order of less than 2 minutes.

CIAD SV Incremental CyberPop Rollout

Example: San Diego CyberPoP

Phase 1

Phase 2

Phase 3

CIAD SV CyberPoP Internal Connectivity

Core CI Network

The core CI network (National Internet Infrastructure) consists of a "ring" of CyberPoPs interconnected via high-performance 10Gbit links. CyberPoPs may include one or more collections of capabilities according to their purpose (see CyberPoPs):

Observatory Management Point (San Diego) Observatory Acquisition Point (Portland, Woods Hole) Observatory Distribution Point (San Diego, Seattle, Chicago)

As such, their internal connectivity, physical layout, and management capabilities may vary

San Diego (SAN)

97 Figure 1. 2990-00010 SV2 CI CyberPoP San Diego Connectivity

Portland (PDX)

98 Figure 2. 2990-00020 SV2 CI CyberPoP Portland Connectivity

Seattle (SEA)

99 Figure 3. 2990-00030 SV2 CI CyberPoP Seattle Connectivity

Chicago (ORD)

100 Figure 4. 2990-00040 SV2 CI CyberPoP Chicago Connectivity

Woods Hole (WH)

101 Figure 5. 2990-00050 SV2 CyberPoP Woods Hole

Optional CyberPoPs

The following sites are candidates for future extensions of the OOI-CI for peering/collaboration purposes with other academic institutions. They are not essential for the operations of the OOI-CI, but may improve external connectivity (i.e., available bandwidth) for certain users.

McLean (IAD)

Figure 6. 2990-00060 SV2 CI CyberPoP McLean Connectivity

Los Angeles (LAX)

102 Figure 7. 2990-00070 SV2 CI CyberPoP Los Angeles Connectivity

CIAD SV CyberPoP Physical Layout

The physical equipment necessary to support the operation of each CyberPoP follows industry standards for collocation. Hence, each site contains 2-4 standard 19" 40U telecom racks (1U = 1.75in height) for computation, networking, and storage equipment with side-mounted PDUs.

San Diego

The San Diego CyberPoP already contains equipment intended for program operations and CI development purposes. In Figure 1. this equipment is marked with yellow, whereas the equipment planned for purchase is left unmarked. The installation consists of 4 racks: storage, networking and monitoring, virtualization and spare capacity (overflow), engineering. Each rack has a dedicated UPS to account for transient power failures. Additional equipment intended for the development environment (i.e. switches) is housed in a different location at the hosting institution (Calit2).

103 Figure 1. 2990-00011 SV2 CI CyberPoP San Diego Physical Layout

Portland

The Portland CyberPoP will contain equipment intended for data acquisition, processing, and local storage prior to distribution to the other CyberPoPs. The installation depicted in Figure 2. consists of 4 racks: storage, network, computation, and overflow (spare capacity for further expansion of computation/storage resources). Each rack has a dedicated UPS to account for transient power failures.

104 Figure 2. 2990-00021 SV2 CI CyberPoP Portland Physical Layout

Seattle

The Seattle CyberPoP will contain equipment intended for data storage/mirroring and distribution. The installation depicted in Figure 3. consists of 2 racks: storage, network. Each rack has a dedicated UPS to account for transient power failures.

105 Figure 3. 2990-00031 SV2 CI CyberPoP Seattle Physical Layout

Chicago

The Chicago CyberPoP is identical with the Seattle CyberPoP and will contain equipment intended for data storage/mirroring and distribution. The installation depicted in Figure 4. consists of 2 racks: storage, network. Each rack has a dedicated UPS to account for transient power failures.

106 Figure 4. 2990-00041 SV2 CI CyberPoP Chicago Physical Layout

CIAD SV CyberPoP Management

107 Overview

The management of each CyberPoP is performed via a dedicated network isolated from the scientific data flows. Each site contains at minimum a pair of load balancers, message router, core ip networking devices (e.g., for DNS, DHCP, etc), network monitors, proxy/cache, testing unit, remote access servers and related security token manager, kvm, and environmental monitors. All this equipment is interconnected via a pair of switches: one for low level protocols, such as SNMP, DRAC, ILOM, and the other for more complex protocols, such as VPN/IPSEC, VNC/Remote desktop, etc.

San Diego

The San Diego CyberPoP contains additional equipment for overall CI operation and management, and also for staging of new configurations.

Figure 1. 2990-00012 SV2 CI CyberPoP San Diego Management

Portland

108 Figure 2. 2990-00022 SV2 CI CyberPoP Portland Management

Seattle

109 Figure 3. 2990-00032 SV2 CI CyberPoP Seattle Management

Chicago

Figure 4. 2990-00042 SV2 CI CyberPoP Chicago Management

110 CIAD SV Network Architecture

High Level Network Architecture

Figure 1 specifies the OOI network architecture, enabled by CI hardware and software infrastructure. The main elements of the network architecture are explained below; the CI operated elements are directly related to the CyberPoP configuration items as introduced in Transition to Operations.

Figure 1. OOI Network Architecture (SV-2)

The Marine observatory networks operated by CGSN and RSN consist of the physical infrastructure, deployed instrumentation such as sensors and mobile assets, and of shore stations with infrastructure management systems. Marine observatory assets can contain a CI software deployment, the Marine Execution Point (MEP) CyberPoP, which will interface directly with asset resources and provide CI services and CI connectivity directly at the local execution point.

The primary interface to the CI is realized in the Acquisition Point. The Acquisition Point is the Observatory Acquisition Point CyberPoP (OAP), based on a CI Capability Container implementation deployed in a physical location, such as a rented data center facility, with a local deployment following the secure, reliable deployment patterns (see below). The acquisition point provides all the capabilities provided by the Sensing & Acquisition and Data Management subsystems based on the COI and CEI infrastructure services. If needed, further services can be deployed. Thereby, observational data from the Marine observatories will be accepted and stored reliable and made available to the remaining network and the public as needed. The Acquisition Point also interfaces with infrastructure management systems of the Marine observatories and enables management and control of the Marine infrastructure and in particular the deployed instrumentation.

The Distribution Network (part of the National Internet Infrastructure NII configuration item) is a 10 Gigabit Ethernet switched Layer 2 network forming a loop throughout the US. It is based on exclusively rented bandwidth agreements using the National Lambda-Rail infrastructure. It will be operated by the CI IO. The distribution network is accessed by the various Acquisition points and further CyberPoP installations of the OOI network, such as Distribution Points and user operated Integrated Research Facilities.

The Management Network (part of the National Internet Infrastructure NII configuration item) is a Layer 3 IP network based on Internet-2 infrastructure with much lower bandwidth guarantees redundantly available connecting the different CyberPoPs and in particular providing management and operations access to the various network sites. The CI also enables direct access to observatory infrastructure and instrumentation through management network, for instance required by instrument providers for low level access and failure recovery.

The Marine Management and Observatory Management (realizing the Marine Operations Point MOP) nodes are based on a CI Capability Container implementation and provide network and resource monitoring, management and control capabilities. They also provide means and interfaces for operations control room consoles.

The Distribution Point refers to the Observatory Distribution Point (ODP) CyberPoP that acts primarily as a peering point into the various

111 execution sites that provide data access, manipulation, analysis and visualization. It is also based on a CI Capability Container.

The Execution Point is a deployment of a CI Capability Container, listed as the Observatory Execution Point (OEP) CyberPoP, providing user targeted data access, manipulation and visualization capabilities. Execution points are instantiated at various execution sites that the CI supports and has contractual agreements with. The CEI provides services to provision and manage such execution points realizing elastic computing services. User integrated numerical models, graphical visualizations of observational data and other processes can be scheduled and executed. Web portals providing access to the OOI will be deployed on execution points on CI operated hardware, while user requested executions will be deployed in Cloud execution environments.

The Internet-2 IP network provides access to execution points from anywhere in the Internet, for web browsers, user software tools such as Matlab and other applications.

If users choose to join the OOI network, they can operate an own Integrated Research Facility with a software installation based on the CI Capability Container. This facility will have the capabilities to directly integrated with the OOI network and provides access to all OOI resources and services subject to policy. Users can directly tap into the OOI distribution network with 10 GigE high bandwidth communication link, by providing (renting) a communication line to an OOI peering point.

Network Layout

Figure 2 shows the full physical layout of the OOI network indicating the configuration of deployed CyberPoP installation, network connectivity and physical sites.

Figure 2. OOI Network Deployment (SV-2)

IP Network Connectivity

At the IPv4 level, the OOI-CI operates the AS-446985 (autonomous system) for peering agreements with academic and tier 1 & 2 providers (e.g., Amazon, Microsoft) in select locations. Figure 3 shows the IPv4 allocation at each location and planned routing mechanisms. The Chicago and Seattle installations are similar in terms of hardware and performance characteristics, whereas Portland and San Diego installations are geared for large-scale storage, operations, and development respectively. The Los Angeles and McLean are optional installations for collaboration/peering with other providers and academic institutions, subject to available funding.

112 Figure 3. OOI CI IPv4 Connectivity (SV-2)

The planned CyberPoP installations operate within the same OOI-CI IPv4 address space, with dedicated allocations at each site. Global load balancing is employed for external access to CI resources via the distribution points (San Diego, Seattle and Chicago) under corresponding protocols (e.g., HTTP). Management of IPv4 core assets is centralized (San Diego) and performed via the CI backbone (dual paths to any site within the proposed "ring") or VPN in case of catastrophic backbone failure.

CIAD SV Technology List

This is the authoritative list of technologies for the CI with all its subsystems. The list is authored as a living document on GoogleDocs.

A recent export of the list is attached: 2130-00004_Technology_List_CI_2011-04-12_ver_1.04.pdf

The technology list demonstrates CI's understanding of the existing technologies, and the plans for integrating them. By doing so, this communicates both developer competence and CI's direction.

CIAD TV Technical Standards

This page and child pages list technologies, standards and specifications relevant to the system level of the Integrated Observatory Network.

Architecture Level Specifications

FIPA Specifications Rich Services Architecture Pattern

113 Implementation Level Specifications

COI Subsystem

OTP Platform design principles AMQP Messaging

CIAD TV FIPA Specifications

This page describes the FIPA Specifications and their relevance for the OOI Integrated Observatory Network architecture and system implementation.

FIPA Overview

FIPA is the IEEE Foundation for Intelligent Physical Agents. "FIPA is an IEEE Computer Society standards organization that promotes agent-based technology and the interoperability of its standards with other technologies." (FIPA Web Page)

See the FIPA Abstract Architecture spec ( HTML version) for an overview of FIPA.

Relevant Links

http://www.fipa.org/ http://www.fipa.org/repository/index.html http://www.fipa.org/repository/standardspecs.html http://www.fipa.org/repository/experimentalspecs.html

FIPA to OOI CI Correspondence

Relevance for OOI

The architecture of the OOI Cyberinfrastructure is significantly informed by and aligned with the FIPA specifications. The OOI Integrated Observatory Network represents an implementation of the FIPA abstract architecture. The FIPA specifications together with the OTP specs are the core material to understand the abstract and the concrete parts of the OOI CI architecture and implementation.

Relevant FIPA Specifications

FIPA Specification Relevant Content Comment

SC00001 FIPA Abstract Architecture Abstract architecture Most abstract overview of the FIPA architecture, applicable to OOI CI. Specification overview and definition of See correspondence of concepts and terms below. terms

SC00023 FIPA Agent Management Specification

SC00026 FIPA Request Interaction Protocol specification Basic interaction protocol to request service/agent actions Protocol Specification

Other interaction pattern specs Protocol specifications

SC00061 FIPA ACL Message Structure Specification

XC00079 FIPA Agent Software How services advertise their capabilities to the network Integration Specification

XI00082 FIPA Network Management How a network can be managed to provide desired capabilities and Provisioning Specification

XI00083 FIPA Personal Assistant The representation of users within the ION system as built by the COI Specification and UX teams

All specs can be downloaded as tar-gz archive through this link

114 Correspondence of Terms and Concepts

FIPA Concept OOI CI Concept Comment

Agent Platform Capability Container, The CC (the Python implementation in R1) represents the Agent Platform. All CC's belonging to the ION Facility ION facility can be considered part of the same distributed Agent Platform.

Agent Capability Container, The CC provides the core services or access to the core services that make up the Agent Management COI core services Management Service. These services may be present as part of each CC, or are accessible Service through each CC in the network.

Agent CC Process A specific type of CC process that supports basic notions of negotiation, interaction protocol support etc.

Service Service A capability in the network accessible by name. An agent process can provide a service (with many service worker processes managed as part of an EPU). Services may also exist externally

Directory Resource Registry, Through the CC, each process can access the Resource Registry using a ResourceClient. Facilitator (DF) Directory Service Resources are the entries to look up. A directory service provides lookup services based on a defined directory tree structure for any service requester.

Message Exchange The CC abstracts the bind and use of the Message Transport Service from the perspective of Transport agents. Currently, the only supported MTS is the default binding to the AMQP message broker Service

Frame DataObject Type Definition of a DataObject type for use as message type or part of the content of messages. Definition Currently, DataObject types are defined as Google ProtoBuf proto file.

Slot

Communicative "Performative" in The current implementation supports one communicatv Act Common Message Format headers

Action "Op" in Common The current implementation assumes a "request" message (as part of the request interaction Message Format pattern) for one action (identified by the "op" header) if not otherwise specified. headers

ACL Message Common Message Structure Format

Release 1 Correspondence

Intent

The OOI CI architecture as approved by the Release 1 LCO milestone review, as documented here, is mostly aligned with the FIPA specs but is lacking a few concepts.

The focus in Release 1 is to implement the basic architectural elements for a FIPA complient Agent Platform (AP) implementation in the Python programming language. This implementation, the Python Capability Container, shall support the management of agents and their interactions for the primary purpose of enabling service provisioning and access.

Release 1 Architecture Status and Differences

There are no critical divergences between OOI CI architecture and FIPA specs. However there are parts of the FIPA specs that the OOI CI architecture does not yet address and need to be refined in.

Things to change or add in OOI CI architecture:

Harmonization of AP and capability container concepts and terms Introduction of a Directory (Facilitator) Service (subsuming the Agent, Service and Process and configuration registry of our architecture) Harmonization of Agent controller with the FIPA AMS Introduction of ontology with the Common Object Model Harmonization of the Message Transport Service with the Exchange Enable multiple different transports, not just AMQP (not now) Separation of messages (communicative acts) from individual frames Subscription to the directory (resource registry) is by providing a search expression. Whenever the result set of the search changes, a notification is sent. Use transport locators to address services and agents on a transport

115 Release 1 Implementation Status and Differences

The current Release 1 implementation building out the defined architecture incrementally.

Things that are existing in the OOI CI Release 1 architecture to add in the implementation:

Interaction protocol support The notion of a managed agent beyond a CC process Separation of performatives (communicative acts) and service action requests. Resource requestors bind/"allocate" to a resource before use Agents/processes bind to a transport before use Add a service root (i.e. core services available at any given time) Relate the concepts of agent and service

Alto to be added are all concepts that will be introduced to the OOI CI architecture.

Future Release Correspondence

Intent

In future releases the focus will be extended to enabling federations of Agent Platforms, where agreements between independent parties (users, organizations) can be electronically negotiated, enacted and enforced by agents.

In addition, more implementation languages (in particular Java) will be added

CIAD COI Common Operating Infrastructure

Common Operating Infrastructure (COI) Subsystem Architecture and Design

This is the central page for the COI subsystem architecture and design, a part of the OOI Integrated Observatory Network. Both COI and CEI subsystems symbiotically form the "Operating System" of the Integrated Observatory Network, with COI providing more of the "user level" components. This page is structured into operational views (OV), system views (SV) and technical standards views (TV).

COI Overview

Exchange (Messaging Service) - How to interact with services, from processes to processes Exchange Management Service Messaging Interaction Levels Common Message Format (SV) RabbitMQ Exchange Infrastructure (SV) Distributed IPC Facility (DIF) Integration (SV) Standards and Technologies (TVs): Rich Services Architecture AMQP, AMQP 1.0 RabbitMQ DIF Models Data Type Representations (from DM)

Capability Container - How to define and run a process in the context of a local infrastructure environment Process Management Internal Capability Container Processes and Interceptors Exchange Interface Python Capability Container Implementation (SV) Java Capability Container Implementation (SV) Based on CEI: System Bootstrapping and Startup (SV) Standards and Technologies (TVs): OTP Open Telecom Platform Enterprise Service Bus

Distributed State Management - How processes keep state over time, even in case of failure and across multiple instances Common Object Model See also DM: Science Data Model Data Store Service Attribute Store Design Release 1 Data object encoding, Google Protocol Buffers (SV)

116 Resource Management Services - How any and all observatory resources are registered and their life cycle is managed Resource Registry: Resource Developer Tutorial Using the Resource Client Information Resources: See DM Information Resource Management, Inventory Services Taskable Resources: See CEI Taskable Resource Management, Resource Agents, Resource Agent Interactions Resource Developer Tutorial Using the Resource Client Resource Lifecycle Activities Implications of Policy over Resource Lifecycle

Service Framework - How services are managed and accessible by client processes Service Agent (a sub-type of Resource Agent) Service Integration

Presentation Framework - How UI developers can build new user interfaces, and how users can access the system Grails Presentation Platform (SV) See also: COI User Interfaces

Identity and Policy Management - How user and resource identities are managed and applied to ensure a secure distributed environment; how policy is enforced Identity Management Relevant Nomenclature Identity Management Activities Secure Messaging Policy Management Roles and Permissions (SV) Technologies (TVs): CIlogon , XACML , SAML , X.509 , WS-Security

Governance Framework - How users and resources are governed in a distributed system with no central control Interaction Management Federated Facility Agents and Monitoring Theory and Technology (TVs): Governance Concepts Governance Activities Governance Domain Models Governance Interactions Governance Use Cases Governance References

Cross-Cutting Concerns User Interfaces

Quick Links

Subsystems: COI CEI DM SA AS PP

117 CIAD COI OV

Both Common Operating Infrastructure (COI) and Common Execution Infrastructure (CEI) subsystems symbiotically form the "Operating System" of the Integrated Observatory Network, with COI providing more of the "user level" components.

The Common Operating Infrastructure provides a broad range of common services is required to bind the OOI CI into a coherent whole. The Common Operating Infrastructure (COI) services integrate the other services networks. The COI enables data distribution among CI services and allows subsystem services to be composed to handle complex interactions. It also implements crosscutting aspects such as governance and security. This section provides a specification of the core elements of the COI logical architecture.

Capabilities Overview Service Decomposition Work Products

Capabilities

The COI provides the technologies and services to play the role of (1) a unifying information conduit, enabling data and control streams to be published and consumed by all of the subsystems; (2) a platform to execute the core elements of the activity model by allowing the subsystem services to be combined as workflows; and (3) the implementation location for cross-cutting aspects of the cyberinfrastructure.

The COI provides the following capabilities:

Collaboration provisioning and agreement management; facility provisioning and rights management; identity validation and verification; service provisioning and interchange management; federation and delegation of service presentation and fulfillment; resource collection management, navigation and search; resource life cycle management; policy enactment and enforcement management; communication provisioning and interchange management.

Overview

Figure 1 illustrates the crosscutting aspects of the CI that apply to the communication between its subsystems. The COI subsystem groups these aspects into a collection of services meant to both decouple the subsystems and allow for an efficient implementation that hides their complexity.

118 Figure 1 COI Providing the Communication Conduit and Integration Infrastructure (OV-1)

Service Decomposition

Figure 2 depicts the COI architecture and services. The Exchange decouples the services of the COI and manages their interplay. The COI-provided infrastructure services include Identity Management, Governance Framework, Resource Management, Service Framework, Distributed State Management, and Presentation Framework. Governance defines the policy management framework that is implemented throughout the cyberinfrastructure. The Service Registry stores resources and associates them with their descriptions and relations with other resources. A Policy Validator is responsible for processing new policies upon submission by human operators prior to storing and enforcement.

119 Figure 2 Common Operating Infrastructure Services (OV-2)

The COI architecture identifies a number of important infrastructure services. The Exchange messaging layer decouples the services of the COI and manages their interplay. The provided infrastructure services include Identity Management, Governance, Resource Management, State Management, a Service Framework and a Presentation Framework.

The Governance Framework contains the policy management framework that is effective throughout the cyberinfrastructure.

The Identity Management service provides authentication and supports Policy Management and Governance, implementing authorization. It also participates in establishing a Federated Chain of Trust between OOI Facilities as well as components of the CI.

The Resource Management Services establish a base for every Resource Management Network in the CI. The State Management stores and manages all temporary state information about Identity Management, Policy Enforcement and Ongoing Conversations.

The Service Framework stores resources and associates them with their descriptions and relations with other resources. It allows their discovery and subscription.

The Exchange service is a fundamental capability of the COI with wide implications on the overall operations of the OOI CI. It implements the message exchange mechanism between the CI services, both within and between services networks. Following the Rich Service pattern, a message-based communications infrastructure manages the service orchestration via two main layers: Messenger and Router/Interceptor. Infrastructure services can modify interactions by re-routing, filtering, or modifying the messages exchanged.

Work Products

Table 1 summarizes the work products delivered by the COI subsystem team and explains their purpose.

Table 1 COI Subsystem Work Products

120 ID Service Service Explanation Release

1.2.3.5 (summary) The Common Operating Infrastructure provides the services and distributed infrastructure to build a R1, R2 secure, scalable, fault-tolerant federated system of independently operated observatory components.

1.2.3.5.1 Federated Provides the management and governance services for a collection of resources on behalf of a group or R1, R2 Facility individual. It represents the domain of authority for the set of resources managed by the facility. The (Virtual governance services provide for the following set of collaboration agreements: membership, partnership, Organization) federation, and delegation. Delegation, for example, is used to give a marine observatory the rights to Services operate/manage a research team's instrument on their behalf.

1.2.3.5.2 Enterprise Provides the distributed service infrastructure for the secure, scalable and fault-tolerant operation and R1, R2 Service Bus federation of the Facilities (operational domains of authority) that comprise the deployed system of & Capability systems: Presentation Framework - the web services and browser presentation containers as well as the Container web user interface "portlet" building blocks; Governance Framework - identity and policy management to govern the use of resources by participants through policy enforcement and decision services; Service Framework - provisioning, federating, delegating, and binding service interactions between resources; Resource Framework - provisioning, managing, and tracking the use of resources; Distributed State Management - managing active and persisted distributed state; Federated Message Exchange - messaging, bulk data transfer, guaranteed data transfer and provisioning streaming media channels.

1.2.3.5.3 Identity & Services that provision and securely manage information about participants used in the governance (i.e., R1 Policy authentication, authorization) of their activities across the network. The services ensure that personal Management information is owned and its exposure to other participants is controlled by the participant. Services

1.2.3.5.4 Resource Services that provide for the persistence, preservation, and retrieval of information elements associated R1 Catalog & with resources registered with the system. Repository Services

1.2.3.5.5 Resource Resource management services to transition a resource from cradle to grave. R2 Lifecycle Services

1.2.3.5.6 Resource Testing and validation services to ensure conformity with the different operational requirements in the R2 Activation network. Services

1.2.3.5.7 Resource Services that facilitate the negotiations between participants and facilities for sharing resources (e.g., R2 Collaboration instruments, processes, and models). Agreements are captured and associated with all parties materially Services involved.

The focus in Release 1 is to provide the basic mechanisms for service integration, identity management and policy enforcement, secure reliable message-based communication and governance in the context of one federated facility. Release 1 will provide capability containers in Python (primary) and Java for service deployment. Additionally a resource registration service and an initial presentation framework for user interface creation by all subsystems.

The focus in Release 2 is to provide advanced resource management and governance services.

CIAD COI OV Capability Container

The CI Capability Container provides a rich infrastructure environment for applications and services hosted within the container. More generally, it provides the infrastructure services that are required for ION services and processes. Each capability container can be thought of as a package, which contains the particular services or processes to be executed, and the infrastructure services that support them.

See also:

Python Capability Container Implementation Java Capability Container Implementation

Overview

Figure 1 shows an illustrative depiction of a Capability Container. This figure shows several layers of dependency as well. The Capability Container (in short CapCont or just CC) is depicted as the gray octagon. It is a software application developed by the COI subsystem team.

The Capability Container provides a rich infrastructure environment to hosted arbitrary processes. Processes include subsystem services, process and agents, external interface processes, and other, arbitrary deployed processes, such as a user's science data event detection routine. A Capability Container itself runs in a CEI-subsystem-provided virtualization and contextualization environment. This environment makes sure that the same container can actually be executed in multiple different physical or software execution environments.

121 Figure 1. Capability Container Illustration (OV-1)

See Also

Process Framework - for the deployment, configuration and management of Capability Container hosted application processes CC Internal Processes - for the definition of interceptor and low level management processes in side containers CC Exchange Interface - for the interface to the Exchange Service Framework - for the registration and management of Capability Container hosted services

Capability Container Architecture

Infrastructure Services

Figure 1 shows the rich set of infrastructure services of a Capability Container; see details in the following table. Infrastructure services originate from the Common Operating Infrastructure (COI) and Common Execution Infrastructure (CEI) subsystems. Some of the services are local to a container, others are available through the container in the Integrated Observatory Network.

Figure 1. Infrastructure capabilities provided by the Capability Container (OV-1)

The Capability Container provides the following infrastructure capabilities:

Capability Description

122 Agreements Negotiating agreements and contracts between federated parties; keeping track of commitments and obligations; enforcing their enactment

Catalog Keeping track of all the resources and enabling discovery or query

Execution Providing dynamic computation capability to various users of the system in a defined and controlled way.

Governance Keeping track of the life cycle of resources and multi-party interactions accessing them

Interfaces Defining and enacting templates for interactions and protocols through sequences of information/message exchange as means for interface definition

Management Governing resources throughout their life-cycle

Mediation Changing message format and content while in transit to flexibly mediate between message sender and receiver

Messaging Receiving and delivering messages via queues

Monitoring Observing the flow of messages and providing statistics

Orchestration Coordination of interactions among distributed parties in an organized way towards reaching a defined goal

Policies Defining how resources can be accessed depending on the owner, operator and accessing party. Enforcing application of the policy and providing a means to compose policies from different sources and update them dynamically throughout the system

Provisioning Providing, instantiating and executing resources and processes on behalf of other users in the system

Recovery Robust handling of exceptional system states and the ability to re-create a recent known consistent state throughout a conversation or the system

Repository Storing and retrieving data and information elements and providing access across the network

Routing Storing and forwarding messages to their destinations according to configured routes

State Keeping track of the stateful information or session related to a resource or an interaction conversation.

Not all capability containers need to provide all infrastructure services, depending on the deployment needs. Because of their pervasive nature, Capability Containers are ideally suited to addressing cross-cutting infrastructure concerns, including security, reliability, governance, and scalability. Capability Containers enable the easy deployment of the OOI collaboration and policy framework.

Decomposition

123 Figure 2. Capability Container operational nodes (OV-2)

Service Integration

CIAD COI OV Capability Container Exchange Interface

124 Behavior

The following diagram shows how to enroll a service in a capability container in the Exchange Space representing the rest of the network.

Figure 3. Enroll in an Exchange Space (OV-6)

The following diagram shows how a service on a capability container sends a message:

Figure 4. Send message from process in capability container (OV-6)

The following diagram shows how a service on a capability container receives a message:

125 Figure 5. Receive message by process in capability container (OV-6)

CIAD COI OV Capability Container Internal Processes

Figure 1 shows the Capability Container Interceptor stack for sending and receiving messages.

126 Figure 1. Capability Container Interceptor Stack (OV-1)

CIAD COI OV Process Management

A strong design reference for the process model in the COI Capability Container is the OTP (Open Telecom Platform) Architecture. A Capability Container has access to deployed code modules. On startup or on request, it can spawn these code modules into processes. Processes have a unique identifier that enables every other process in the system to send messages to the process. Each process has a parent process, the supervisor, that is responsible for restarting the process in case of failure.

127 See:

TV: OTP - Open Telecom Platform

Container Process API

The following are the operations that a Capability Container supports in order to manage processes

deploy process code module spawn process - instantiate a local source code module into a process that can receive and send messages link process - change the supervisor (parent) of a process shutdown process - terminate a process and all its child processes

Process Life Cycle

Each process goes through the following life cycle states:

NEW - Initial state. From the instance the container starts the process from source code INIT - Process has correctly initialized. All downstream ACTIVE - Process has been activated INACTIVE - Process has been deactivated TERMINATED - Terminal state. Process has shut down and will not respond to further requests. ERROR - Terminal state. An error occurred during process life cycle management and

Code deployment

The Capability Container knows where to find the source code of the processes to spawn and their dependencies. The source code may be existing locally on the host node of the capability container, or a package repository (service) such as PyPI may be needed to download the required packages and their dependencies.

releases applications modules

Process State

Although processes are executed in a container and there might be multiple processes in one container, they do not share any state. All processes maintain their complete separate local state.

If there should be some shared state service (e.g. a local cache in a container), it needs to be a separate process that is accessed via a messaging interface.

Capability Container Infrastructure Processes

These are light-weight processes that are defined in addition to application level processes. They are used for the following purposes:

As interceptors in the receive and send message path: Adding identity management and security properties to a message Checking identity management and security properties in a message Adding policy and governance attributes to a message Extracting policy and governance attributes from a message Encrypting and decrypting a message content Transforming a message content For local container services: TBD For local shared state: The local DataObject cache of a container is modeled as a light-weight service

Process Interaction

128 Figure 1. Process communication (OV-1)

CIAD COI SV Container Messaging

This page describes the sequences of function calls within the Python Capability Container on receipt and send of a message, centered around the CC Process and its Receivers. This is Release 1 specific information.

Overview

Figure 1 shows a flow diagram for receipt of an RPC-style request message and its subsequent reply. This diagram both illustrates flow and logical call-stacks.

129 Figure 1. Call sequences (OV-1)

Interactions

CIAD COI SV Java Capability Container

This page describes the current implementation of the Java Capability Container. This documentation describes an implementation of the Capability Container architecture. It is continuously evolving. This page only covers Release 1 scope.

Note: In Release 1, the Java Capability Container is a light-weight port of the Python Capability Container reference implementation

Startup

Arguments

Dependencies

Misc

Developer Interface

Logging

CIAD COI SV Python Capability Container

This page describes the current implementation of the Python Capability Container. This documentation describes an implementation of the Capability Container architecture. It is continuously evolving. This page only covers Release 1 scope.

Startup

See Startup and Life-Cycle.

Dependencies

Before starting the Python CC, dependencies must be installed through

This uses pip or easy_install to download and install any dependencies.

If additional packages should be run in the container, they need to be installed additionally as well into the virtualenv of the container environment.

Misc

Developer Interface

Processes

The main function that the capability container provides is to run processes.

Specific processes extend the base class ion.core.process.Process

See ion.play.hello_service and ion.play.hello_process for examples.

Configuration

The Python CC provides a configuration environment.

Config files are placed by default in res/config .

130 The main container configuration file is res/config/ion.config Entries in this main config file can be overridden locally by providing a file res/config/ionlocal.config

Processes and library code can access the container configuration the following way:

The variable CF_entry then contains the value of the configuration key in ion.config belonging to the module name where the above code is run.

Logging

The Python CC provides a logging environment. See Logging description.

Log level configuration file

Log levels are set by python module (i.e. python xxx.py file). Prerequisite is the proper declaration of the logger as in:

Log levels are configured in the file 'res/logging/loglevels.cfg'. Default log level for all modules is WARN.

The format of this file is a Python list of tupes ('ion.modulename', LOG_LEVEL), see the comments in the file for examples.

The file 'res/logging/loglevels.cfg' is under revision control. In order to override log level changes locally, create a copy of the file as 'res/logging/loglevelslocal.cfg' and make whatever changes necessary.

Logging mechanism configuration file.

The ION logging system in ion.util.ionlog uses Python logging. The standard Python logging configuration is done in 'res/logging/ionlogging.conf'

Integrator/Deployer Interface

Applications

Applications are groupings of source code modules and processes started in the container. Applications are defined in app-files and are started and stopped by the container. See Startup and Life-Cycle for details.

Operator Interface

Container Shell

The Capability Container provides an interactive shell that can be used for diagnostics and debugging. This shell is basically an extended python interpreter. Every line is interpreted as eval() in a python interpreter.

Some specific variables and functions are defined in the locals() namespace:

help() send() ps() ms()

Shell History

A simple history file is kept for commands entered in the REPL. It essentially persists the last 25 lines of the existing history in Twisted's shell to ~/.cchistory. This file is loaded into the Twisted shell's history buffer on startup.

Python CC Startup

Implementation details

The Python Capability Container is written as a Twisted application plugin, in order to be able to be started as "twistd cc". See Twisted's

Twisted Plugin System Twisted Application Framework Twisted Options parsing

Startup

131 Start as a Twisted plugin

In order for the Twisted plugin "cc" to be known to Twisted, the local dependencies must be installed successfully through the official installation mechanism (such as buildout install).

Arguments

include

Argument Default Explanation

-n absent Do not start as deamon but interactively

--pidfile= "./twistd.pid" File name for process id file

include

Argument Default Explanation

-h localhost Message broker hostname

-p 5672 Message broker port

-v / Message broker vhost name

-s absent Sysname (can also be set through -a sysname=xxx)

-a absent Extra arguments in the form key=value,other=more

-n absent Do not start CC shell

-i absent Do not read/write history file to ~/.cchistory

Each is the filename of a python start script, or a container application.

Standard Extra Arguments

Argument Default Explanation

sysname=<> absent The namespace that identifies multiple independent CCs as part of one system.

Capability Container Life Cycle

State Model

During interacting (with twistd) and test runs (with trial), the container transitions through the following life cycle states. Subordinate resources, e.g. message broker connections, processes, applications etc. are forced to follow this life cycle as well.

State Transitions Explanation

init The initial state

ready The container is initialized

active The container is operational

terminated Container has terminated. Cannot leave this state.

error Error catch state. Cannot recover from this state.

Life Cycle of subordinate resources

Subordinate resources of a capability container include

Message broker (AMQP) connections Applications managed by the container. Subordinate resources of an application include Application processes, including the app supervisor and other app processes

132 (Application code modules) Processes spawned by the container. Subordinate resources of a Process include Messaging queue consumers (Receivers) External TCP/IP connections Distributed state process workbench (local state cache)

Container Startup Steps

Variant 1: Started interactively from the command line through twistd cc

User/script executes twistd cc args appname.app twistd find the plugin "cc" as installed in the local dependencies The CC plugin start code in twisted/plugins/cc.py executes The Twisted service in ion.core.cc.service.py executes: a CapabilityContainer instance is created and startService() is called startService() creates a container, initializes and activates it using the command line provided arguments ... continues with "common container start"

Variant 2: Start through trial

User/script executes trial xxx trial gives control to an individual test module's setUp() A test case inheriting from ion.test.IonTestCase can start the container The test case calls self._start_container() in its setUp() function _start_container() instantiates a new container, initializes and activates it using arguments from the ion.config file ... continues with "common container start"

Common container start

ion.core.cc.container.Container is a BasicLifeCycle object that transitions through the life cycle states Container oversees subordinate managers: ExchangeManager: BasicLifeCycle object that manages the message broker connection AppManager: BasicLifeCycle object that manages the startup and stop of core and user requested apps ProcessManager: BasicLifeCycle object that manages container processes and their message broker attachment InterceptorSystem: BasicLifeCycle object that manages the stack of interceptors applied to incoming and outgoing messages initialize: Initializes all subordinate managers activate: Activate all subordinate managers The last action in activate is the AppManager starting all apps in the order of dependency

Application Startup Mechanism

General

In strong design compliance to the Erlang/OTP (see here ) platform, the capability container enables the definition and startup of applications.

The following list defines the main properties of applications:

Defined in an app definition file (typically in res/apps/xxx.app) Defines a list of code modules that belong exclusively to this application Can have configuration entries inside the app file overriding/extending the standard container config Points to one application module that has start/stop functions defined Applications are started and stopped by the container's AppManager When started, an app has one application supervisor process that oversees all other application processes Defines any number of additional application processes to be started/stopped with the application Can have dependencies on other apps that are started before by the AppManager

System Apps

The following apps are standard capability container apps:

ioncore: The "root" app that is always started. Provides the core classes and core mechanics of the container. Runs no process but initializes the core mechanics of the container ccagent: The process that represents a container to the rest of the system through a messaging interface

Each container running user specific applications involving processes will start first ioncore then second ccagent and then any specified applications and their dependencies.

App file format

A text file that Python evals to a dict.

where the entries mean the following:

133 type (str): type of config file, in this case always set to "application" name (str): the unique legible name of the app in the capability container description (str): free form description version (str): released app version. Increase for every new release, whenever a code dependency has changed mod (tuple of str, list of arguments, dict of kwargs): qualified name of module that contains the start/stop functions for this application. The tuple consists of qualified module name (not file name) of Python module with app start/stop functions inside args (list, optional) for start() args kwargs (dict, optional) for start() kwargs (not implemented so far) modules (list of str): names of python modules exclusively belonging to this application (not implemented so far) registered (list of str): public names of CC processes that are registered within the application in start applications (list of str): dependencies; names of applications that have to be started before this application can start. The CC will perform the startup config (dict): entries to override the ion.config values. Note that at time of app load, some config values may already have been read and applied during container startup

Mandatory are the entries type, name, version and mod.

Note: It is mandatory that the filename is consistent with the value of the name entry in the app file. E.g. the app res/apps/myapps.app must have a "name':"myapps" entry.

Configuration vs. App Arguments

Configuration (available via ioninit.ion_config):

Container level global configuration attributes, in multi-level key-value (dict) structure. Accessible to any class, module, app in the container. The configuration entries are separated by keys named after source code modules (e.g. ion.core.process.process) referencing Contains public configuration values of general interest (albeit possibly very detailed) There is a defined override order, see above. However, after the overrides have been applied in order, the configuration should be stable and not change. If there are two instances of the same thing (e.g. process, app), then app arguments should be preferred

Arguments:

Provided to the start() function of the app Contains private and instance specific values (future) Could be sent over the network to convey global configuration Should be interpreted with higher priority than configuration, if both exists

Arguments can be overriden providing container command line args (with the -a) with apparg_APPNAME key with the value a strict eval' to a dict of kwargs.

Release Startup Mechanism

General

In strong design compliance to the Erlang/OTP (see here ) platform, the capability container enables the definition of releases, which are aggregates of applications.

The following list defines the main properties of releases:

Defined in a rel definition file (typically in res/deploy/xxx.rel) Defines a list of applications that belong exclusively to this release Releases are started by the container's AppManager, indirectly starting and stopping applications Releases can override the configuration for applications and can provide app args

Release (rel) file format

A text file that Python evals to a dict.

where the entries mean the following:

type (str): type of config file, in this case always set to "release" name (str): the unique legible name of the release in the capability container description (str): free form description version (str): release version. ioncore (str): minimum version of ioncore-python compliance apps (list of dict): one entry for each app to be started, with keys name (str): app name

134 version (str): app minimum version (not yet evaluated) config (dict): configuration override args (dict): kwargs for the start() function of the app mult (bool, default False): if True, will start this app multiple times

Mandatory are the entries type, name, version and apps.

Note: The container configuration is extended/overridden for every app started, before the time of startup. Subsequent apps may change the same config entries, especially if the same app is started multiple times with different arguments. Make sure the config values are read at the time of app startup and not later in this case.

Note: Each app in the "apps" entry is referenced by name. As mentioned above, it is expected that the app name in the app file and the app filename are consistent. App files are looked for by default in res/apps

Order of config entry override (higher levels extend/override lower levels)

1. ion.config 2. ionlocal.config 3. app file config (note: for each app in the order of startup) 4. res file app config override (note: for each app in the order of startup)

Order of argument override (higher levels extend/override lower levels)

1. app file arguments (in the "mod" entry) 2. res file app args override (note: for each app) 3. command line args override (using the apparg_APPNAME key with the value a strict eval'd to a dict of kwargs)

CIAD COI TV ESB

Enterprise Service Bus

Enterprise Service Bus (ESB) technologies are rapidly emerging as the standard approach to system-of-systems integration. ESBs provide a highly scalable integration platform built on web and other open standards that combine access to heterogeneous data sources with messaging and web-service technologies to produce coherent, event-driven, service-oriented architectures that can rest on practically any transport and network access protocol. In essence, an ESB consists of four major components: a service/data interface, a messaging component, a router/interceptor component, and a set of infrastructure plug-ins.

The service/data interface acts as a gateway and adapter, connecting the ESB with other services and heterogeneous data and message sources connected directly to it. The service/data interface also serves as a gateway to other ESBs, enabling a wide range of integration topologies. The messaging component provides reliable messaging via freely configurable point-to-point and publish-subscribe communication channels.

The router/interceptor component captures messages intended for the messaging component and subjects them to a suite of infrastructure plug-ins according to a freely configurable routing scheme. Infrastructure plug-ins process messages they receive via the router/interceptor. Examples of plug-ins are data transformers, encryption engines, authentication, policy enactment, and failure management components. This combination of a router/interceptor mechanism and infrastructure plug-ins is known as a dependency injection mechanism or aspect-oriented infrastructure. The message resulting from the routing/processing combination is then made available for consumption via the appropriate channel of the messaging component.

Further core ESB capabilities are:

Message Transformation Message Enhancement Protocol Transformation Process Choreography Service Orchestration Transaction Management Security

Different ESB implementations provide different subsets of these core services plus further extended product specific services.

The main benefit of an ESB solution is that it enables the enactment of domain processes based on domain service descriptions that are isolated from technical concerns such as service implementation technologies, service orchestration, messaging technologies and message formats. An ESB provides the middleware to isolate the domain layer from the technical layer through configurable mapping.

CIAD COI TV Open Telecom Platform

See

OTP Design Principles Erlang/OTP - Erlang is one implementations of the OTP Architecture Erlang.org

135 CIAD COI OV Distributed State Management

Distributed State Management

The Distributed State Management services store and manage objects within the OOI Integrated Observatory Network. The data store provides the infrastructure to store structured objects. The Distributed Information service with its service state repository provides a means to distribute and persist local service state information (i.e., conversation sessions) with eventual consistency temporary service state information.

See Also

Common Object Model Release 1 Data object encoding: Google Protocol Buffers

Figure 1 illustrates the use of the distributed state framework for messaging and resource registration. Services invoked through the Service Framework; messages are exchanged of defined message types, with content encoded within as data objects of the Distributed State Framework. The types of these data objects are defined. Service implementations can manipulate resource objects through the Resource Framework. Resource objects have defined type and are encoded using the Distributed State Framework as well.

Figure 1. Use of the Distributed State Framework for Messaging and Resource Registry (OV-1)

Figure 2 depicts the internal structure of the Distributed State Management service.

136 Figure 2. Distributed State Management services (OV-2)

The Distributed State Management Services enable the COI to keep track and enforce the desired behavior of the identity management, policy/governance, and conversations (via exchange) through adequate state models. A user may also define state models for processes or other applications, such that invalid/erroneous behavior can be detected and corrected. The State Manager identifies the state model corresponding to a message and delegates it to the corresponding infrastructure service. For instance, a message containing a new login information, is redirected to the identity management state models service, pointing to the login capability in the state 'not logged in, yet'. A follow-up message might go to both the identity management and policy governance state models to identify the user and the policies that apply to that user in the initial state of this conversation (e.g., the user may use only the identity provider assigned to his facility, and no other 3rd party).

Realization

Attribute Store Specification Data Type Representations Release 1 Data object encoding: Google Protocol Buffers

CIAD COI OV Attribute Store Design

The attribute store is a generic repository of information organized around key + value pairs. The semantics of the keys (including any form of addressing), and values (including any form of state related information), and hashing of keys to support load balancing (e.g., place half of the keys on a server, and the other half on another) is outside the scope of the attribute store. Main purpose: fast, reliable data storage and retrieval for lightweight data elements (not intended to offer the flexibility of a full-blown SQL engine). Examples include identity management, dataset metadata, etc.

137 Figure 1. Attribute Store Domain Model

The Attribute Store has three main constituents: (a) Repository to store the actual information; (b) Command Processor to receive, interpret, and then execute commands from its environment onto the information stored into the Repository; and () Specification, which describes the capabilities of the Repository and how to match stored entities. The Command Processor operates with a Command Set composed of a set of Commands, including Read, Write, Update, Query and optionally Search (content based). The way of executing the commands depends on the Specification and the capabilities provided by the underlying Repository. At minimum the Lookup Specification describes the way to match entities in the Repository such as string based match (Atom) or a regular expression (Composite of "special" Atoms such as wildcards, patterns, etc).

Figure 2. Core interaction pattern

The core interaction pattern between the Attribute Store and an Application (any kind) is via a lightweight request / response pattern, where the request contains a Command with optional arguments, whereas the response contains the outcome of executing that Command.

Commands and Arguments

Command Arguments Response Semantics (Input) (Output)

WRITE Key, OldValue, Locate pair (key, *) and if exists then assign to OldValue to the current value, otherwise assign to NewValue FAILURE OldValue to NewValue. Set/Create pair (key, NewValue) and return OldValue or failure when creation failed.

READ Key Value, Locate pair (key, *) and return the value associated with the key (if found). Return invalid key when INVALID there's no pair with that key, or failure when the read could not be performed. KEY, FAILURE

DELETE Key SUCCESS, Locate pair (key, *) and delete it. Return invalid key when the pair could not be found, failure when INVALID the pair could not be deleted. KEY, FAILURE

QUERY Regexp [Keylist] Searches for keys matching the regexp pattern and returns a list of them. The list is void when there are no matching keys.

SEARCH Regexp [Keylist] Searches for values matching the regexp pattern and returns a list of their keys. The list is void [optional] when there are no matching keys.

STATUS Set (map) of Output status information in using key, value pairs (such as CPU Time: 0.02s) [optional] [key,value] pairs

138 NOOP SUCCESS, Heartbeat [optional] FAILURE

Argument Type Encoding Semantics

Key String str32-utf8 AMQP 1.0 str32-utf8 up to 4M chars

Value Variable sized vbin32 AMQP 1.0 vbin32 - Book III, Section 2 binary

Value List list32 AMQP 1.0 vbin32 - Book III, Section 2 - limited to results as integers, could be used to return key hashes

Regexp List/String list32/str32-utf8 AMQP 1.0 str32-utf8 up to 4M chars, Simple (*/?) or PCRE matching

Response Type Encoding Semantics

SUCCESS Byte = 0x00 Uint8 Command executed successfully

FAILURE Byte = 0x01 Uint8 Command execution failed.

INVALID KEY Byte = 0x02 Uint8 Command argument specified an invalid/non existing key.

[keylist] List list32 List of keys (may be null), each key as defined in the Arguments table.

Map[key,value] Map map32 A set of pairs [key,value] matching the search/status criteria.

Interaction Patterns

Figure 3. WRITE (substitute for CREATE, UPDATE, REPLACE)

139 Figure 4. READ (equivalent with GET)

140 Figure 5. DELETE

See here for an investigation of various type representation systems, for instance to be used for the Attribute Store within message contents.

CIAD COI OV Common Object Model

This section describes the common object model that is underlying the description of all resources, metadata attributes, and associations in the system. It is also used to specify message formats and is leveraged by the Capability Container infrastructure to encode and decode messages and data object for transport between two interacting processes.

Applications in the ION

Data Store Service - Service to persist immutable objects Resource Registry Service - Resources (descriptions) are defined as object types Common Science Data Model - Science Data sets are resources with complex object structures Common Message Format - Every message contains one object (and referenced objects) in the message content Service State Repository - Transient process state is saved as object Associations between Objects - First class references between objects

Realizations

Release 1 Google Protocol Buffer Object Encoding - A binary encoding of object types specified in GPB proto file format

Object Model Basics

The ION Common Object Model is a specification describing the definition of the format of mutable and immutable objects, the association between object instances and a specific format definition, and the encoding of objects for transport over the wire or for persistence. All objects following the common object model share common properties. The root of the object model is the Structured Object (sometimes also called Data Object).

Structured Object use includes the following purposes:

Structured Object instances provide a container for attributes of different types including nested objects, and for references to other objects. Each object instance identifies the object type it is based on

141 Structured Object references are references to another object. References are either to the most recent state of an object or to a specific version of an object Structured Object instances that may be coupled with the persistence layer. When an object's attributes are changed, the new state can be updated in the persistence layer and is associated with the preceding state Structured Object instances that contain metadata about an object, such as information to define query and filter expressions for an object of a given format

Objects in the ION system can be persisted by the COI Data Store Service.

Higher level services, such as resource registries (see COI Resource Management) may use the Data Store Service as a backend to persist and manage Structured Objects. Such registries may also directly persist Structured Objects to a backend storage implementation. The backend data storage implementations may internally leverage eventual consistency models.

Figure 1 shows the basic concepts of the ION Common Object Model, and their dependencies

Figure 1. Common Object Model (OV-7)

Entity Objects: Are stateful objects with an identity and a name and a set of attributes. Each Entity Object has one or multiple States that

142 represent the immutable values of the object. Whether the object knows of a "current state" or manages multiple concurrent states is up to the Entity Object implementation.

State: Identifies an immutable Value for a specific Entity Object. States belong to Entity Objects, while Values may be shared among multiple States and Entity Objects. There might be a dependency graph between the states of an Entity Object, for instance identifying the "most recent" State of the Entity Object.

Value: The actual immutable value. Has an identity. Can be referenced by multiple States in multiple Entity Objects. In its leaf expression, Value consists only of a set of attributes of one of the base types (e.g. String, int, Array, Record). In its composite expression, it may also have references to dependent values. Values are immutable. A reference to a dependent value is thereby guaranteed to be immutable, recursively. Values may have a hash code that "uniquely" represents the contents of the Value, namely its attributes and the references.

Structured Object: A graph of Entity Objects and Associations between Entity Objects. Structured Objects are used to represent the data structures of the system and carry the values in persistence (storage) and transport (messages) representations.

Common Object Model Definition

Object Structure

Figure shows objects in the Common Object Model built as directed acyclic graphs (DAGs) of object nodes, connected by directed links. Each object node has a defined object type specification.

A composite object is represented by the root node object of an object graph. There can be multiple "root" objects sharing the same node objects. This makes the common object model a heterarchical ("multi rooted") structure.

Figure 2 illustrates object structures as DAGs of nodes and links.

Figure 2. Objects as Directed Acyclic Graphs of Object Nodes and Links

Object Links

Object Links are second-class directional references from one object node to another object. See Associations for first-class bidirectional references between objects.

Links are attributes of an object containing the identifier of another object with additional selector of the target object state.

143 Figure 3. Object Links (directed) realized by reference attributes containing identities of other object nodes

Figure 4 shows different types of links between objects:

CASRef: Reference an immutable object through its content addressable identifier, e.g. SHA1 hash of root node in basic encoding IDRef: Reference to a mutable object (such as a Resource) by identity Branch Head ID Ref: Reference to a branch head, such as default branch head (i.e. to the most current state of the mutable object) Object State ID Ref: Reference to a specific commit (i.e. state) of a mutable object by commit ID

Figure 4. Types of Object Links

Immutable Objects (Content Objects)

Immutable Objects are node-link structures of immutable node objects of defined object type. Links between immutable object nodes are CAS references. The entire graph of objects is immutable. Holding on to a CAS reference to the root object will forever reference the immutable state of the entire object structure.

144 Figure 5. Immutable Object structures (DAGs)

Mutable Objects

Mutable objects first class objects in the system that have their identity registered with a governing service. For instance, all resources governed by the Integrated Observatory Network are registered with the Resource Registry Service. The governing service for a resource exclusively generates mutable object identities and creates initial (empty) states of the objects.

Mutable Objects can be looked up through a registry service. This registry service maintains references to the most recent "commits" for the object. Commits reference ancestor commits. Thereby, the current state of the mutable object as well as any predecessor states can be accessed.

Figure 6 shows Mutable Objects, as Mutable Nodes pointing to commits of immutable object structures.

145 Figure 6. Mutable Objects keeping references to Immutable Objects to represent Object State

CIAD COI OV Data Store Service

Data Store Service

The Data Store Service manages the persistence of Structured Objects (i.e. the "Business Objects") in the ION system, and the definition of Structured Object types. It makes use of DM Preservation services and components to persist the actual information on disk or in the network.

Service Interface

Operations

(Manipulation of content addressable storage)

CRUD Blob Object in namespace (key is the content unique and secure hash) CR Value CR State CR Association Query Objects (scoping object, filter expression)

(Use of Structured Objects)

CRUD Structured Object (instance) by key or reference in namespace

(Definition of Structured Objects)

146 CRUD Structured Object Type

Objects

Blob Objects: Commits, Trees, Blob values Structured Object Structured Object Type

Content Addressable Storage

The ION data store is internally based on a content addressable store. Content addressable storage (CAS) refers to storage architecture in which a unique identifier (aka hash, digest) is derived from the content of the stored objects. The same content always produces the same unique identifier, no matter where and when. Different objects are guaranteed to have a different identifier. Content addressable objects are inherently immutable; otherwise they would result in a different identifier.

To generate the identifier, the object content and its type are encoded using a defined serialization and encoding into a Blob (binary large object, a sequence of bytes). A unique, secure hashing algorithm is used to derive the identifier from this blob.A good candidate for hash generation is the SHA1 algorithm. SHA1 provide a cryptographically strong hashing algorithm, resulting in a 160 bit digest. It is secure in a way that it is almost impossible to conclude from the hash to the object content.

Content objects can be referenced from higher level composite structures, which themselves may realize content addressable objects again. This way, complex composites of objects, such as a filesystem of files can be represented as an immutable tree of (trees of) file blob objects. Modifications of individual files result in different file blob identifiers, which propagate up the tree, resulting in an entirely different complex composite object. If links to previous versions of these composites

The location of content addressable objects in a distributed system becomes irrelevant and objects can be retrieved from wherever the object has been replicated to. Weak consistency storage strategies (such as eventually consistent data stores) can be applied based on a content addressable storage. Distributed data stores contain replicas of the content addressable storage objects.GIT is a good example and design reference for a content addressable storage system used for version control of file system directories.

Data Store Backend

Higher level services can short-circuit the Data Store service and directly communicate with a data store backend technology, such as a RDBMS or a NoSQL key-value store. Data Store Backend Clients are running in the process space of the higher level service, hiding the complexity of directly interacting with the backend technology.

The responsibility of the backend is persistence of value blobs in the system, identified and retrieved by keys strings.

Data Store Backend Clients

CIAD COI OV Service State Repository

The Service State Repository is based on the ION Common Object Model. Its purpose is to provide persistent storage for transient process state at the end of the processing of messages.

Will be realized in Release 2. In Release 1, processes requiring persistence can use the Attribute Store Service

147 CIAD COI SV GPB Object Encoding

Google Protocol Buffers are used as Release 1 encoding for data objects in the Distributed State Framework.

The Capability Container provides a tool chain to define types of these objects and to instantiate these types during run-time.

Develop/compile time

Data object type identification

Data object types have a unique identifier and a version designator. The combination of Data object type identifier and version designator uniquely identifies a specific data object type specification.

Data object type identifiers are assigned manually during development time (see ION Object Definitions).

Data object type specification

Data types are specified as messages within GPB proto files.

Links from one data object to another data object are represented through a specific type: CASRef (see proto definition). Links to other data objects are by CAS reference, such as the SHA1 secure hash of the binary encoding of the immutable content of a data object.

Links from a data object to a mutable object identity are represented through a specific type: IDRef (see proto definition). IDs for resource instances are provided during run-time by the Resource Registry Service.

An exemplar data object type specification looks as shown:

This exemplar shows the data object that is associated with every single message transported through the Exchange. Each data object type has Data object type identifier and version, as enum. The enum is part of the object definition, but not part of the binary encoding of the object values.

Object attributes are declared as either optional or repeated message fields. Required fields are forbidden.

Links to data object are expressed through a CASRef field.

Run time

Data object representation

148 Data Objects are locally managed by infrastructure in the Capability Containers, accessible by CC processes. A Workbench is defined for each CC process. Workbenches can host object repositories. Consistent states ("commits") of repositories and commit histories can be pushed to central repository service ("resource registry") and towards the local process. Processes can hold on to object repositories as long as they want, for instance for caching purposes.

Figure 1. Object management model (OV-1)

Python:

Java:

Data object type reference

Data object instantiation

See Also

CIAD COI OV Exchange

The Exchange service is a fundamental capability of the COI with wide implications on the overall operations of the OOI CI (Figure 1). It provides the message-based communication mechanism used between the CI services, other CI processes such as instrument agents, the presentation platform and potentially external service clients. The Exchange and its associated resources is managed by the Exchange Management Service.

Background: The Exchange follows the Rich Service architectural pattern and implements the concepts described in the Messaging section.

Overview

A message-based communications infrastructure manages the service orchestration via two main layers. The Messenger layer is responsible for transmitting messages between services. The Router/Interceptor is responsible for intercepting messages placed on the Messenger and then routing them among all services involved in providing a particular capability. This allows the injection of policies governing the integration of a set of services. Thereby, lower level infrastructure services can modify interactions by rerouting, filtering, or modifying the messages exchanged.

Figure 1 shows the highest level of decomposition of the Exchange service into operational nodes.

149 Figure 1. Exchange (OV-2)

Exchange Spaces and Exchange Points

Exchange Spaces and Exchange Points are the resources that the Exchange service manages and provides, see the section on Messaging. In short, Exchange Points are managed resources, where messages can be sent to and received from, without publishers and consumers having to know directly one another's existence. Exchange spaces are the organizational entities that group a number of Exchange points and there permitted users.

For instance an Exchange Space for unprocessed science data coming from all instruments of an observatory, such as the Coastal-Global (CG) scale observatory of the OOI. Publishers and consumers need to register with the Exchange Space beforehand, before being permitted to publish to or consume from an exchange point.

Figure 2 shows the operational nodes related to managing Exchange Spaces as part of the Exchange service.

150 Figure 2. Exchange Space services (OV-2)

Figure 3 shows the services related to the management of Exchange Points, as part of the Exchange service.

151 Figure 3. Exchange Point services (OV-2)

CIAD COI OV Exchange Management Service

The Exchange Management Service is the service that manages the Exchange and its associated resources. Responsibilities include:

Managing Exchange Spaces Managing the namespace of Exchange Names within one Exchange Space Managing Exchange Names of different type, such as process, service, exchange point and queue Managing message brokers and their access

Overview

152 Figure 1. Exchange Management Overview (OV-1)

Exchange Resources

Overview

153 Figure 2. Exchange Resources (OV-1)

Exchange Space

An Exchange Space is an independently operated part of the Exchange. It's main functions are

1. To be be independently operable and manageble 2. To be a namespace for exchange names

In Release 1, there will be exactly one Exchange Space, the default ION exchange space.

154 Exchange Name

An Exchange Name is a name in the namespace defined by an Exchange Space.

Every communicating entity using the Exchange service must

1. Be a member of one or more Exchange Space 2. Have one or more Exchange Names

The following types of Exchange Names exist:

Process: Exclusive name and inbox for a process Service: Exclusive name and shared inbox for a service with many workers processes Exchange Point: Router for messages based on content/headers Queue: Anynomous inbox owned by a process or other entity of the system

Process Exchange Name

Exclusive name and inbox for a process

AMQP Mapping

Exchange:

exchange_type : 'topic' durable : False auto_delete : True

Binding:

queue : binding_key :

Queue:

'durable' : False 'queue' : 'exclusive' : True 'mandatory' : True 'warn_if_exists' : True 'no_ack' : Fals 'auto_delete' : True 'immediate' : False

Service Exchange Name

Exclusive name and shared inbox for a service with many workers processes

AMQP Mapping

Exchange:

exchange_type : 'topic' durable : False auto_delete : True

Binding:

'queue' : 'binding_key' : 'routing_key' :

Queue:

'durable' : False 'queue' : 'exclusive' : False 'mandatory' : True 'warn_if_exists' : True 'no_ack' : False 'auto_delete' : True 'immediate' : False

155 Exchange Point Exchange Name

Named router for messages based on content/headers. In Release 1, routing is based on the AMQP topic exchange mechanisms, where the subject header of a message is used to determine where a message is routed to. Bindings exist to queues that are recipients of this routing.

In later releases routing will occur based on an XML content description of the message content.

AMQP Mapping

Exchange:

exchange_type : 'topic' durable : False auto_delete : True

Binding, Queue

Bindings and queues are owned by the consumers.

Queue:

There are

'durable' : False 'queue' : '' 'binding_key' : 'exclusive' : True 'mandatory' : True 'warn_if_exists' : True 'no_ack' : False 'auto_delete' : True 'routing_key' : 'immediate' : False

Queue Exchange Name

AMQP Mapping

Queue:

There are

'durable' : False 'queue' : '' 'binding_key' : name 'exclusive' : True 'mandatory' : True 'warn_if_exists' : True 'no_ack' : False 'auto_delete' : True 'routing_key' : name 'immediate' : False

Supported txAMQP Values:

Out of all of these values, txAMQP only explicitly defines named python parameters for an Exchange. Unfortunately, there isn't any documentation for this library. We can feed it keyword arguments and hope for the best (?).

Here are the named parameters for define_exchange(): channel=None , ticket=0 , exchange='' , type='' , passive=False , durable=False ,

156 auto_delete=False , internal=False , nowait=False ,

See also

PubSub Controller Service (DM): This service uses the Exchange Management Service to setup publish/subscribe for data flows.

CIAD COI OV Messaging

Background: Message-Oriented Architectures and Message-Based Integration

The communication system of the OOI Integrated Observatory Network uses messaging as the central paradigm of inter-application information exchange. Message-oriented systems and message-oriented middleware (MOM) technology realize a system architecture pattern that enables the flexible integration of systems of systems with loose coupling. Loose coupling is an important architectural property that has beneficial influences on maintainability, extensibility, robustness, scalability and other quality properties of the system and its individual software components.

Messaging infrastructures are based the concept of a message as the exclusive means of information exchange between the distributed components of the system. All information that is passed between two components is contained in the exchanged messages. Message exchange is asynchronous in that the sender of a message does not wait for the message to be delivered or returned. It only waits for the MOM to acknowledge the hand-over of the message. Delivering messages between participants uses the concept of queues. A client in a message-oriented architecture only knows the incoming queues where it receives messages from as well as the outgoing queues it delivers messages to, and has a notion of the message formats that pertain to these queues. The messaging infrastructure provides the capability for system integrators to connect these queues to known endpoints in the network; it thereby handles routing, storing and delivery of messages to the intended recipients across the network. The concepts of message exchange as the only means of inter-component communication, of logical, configurable queues and of message storing and routing are the main enablers for loose coupling of distributed components, such as in wide-area network connected, federated systems with intermittent network connections.

Messaging infrastructures typically provide two styles of addressing messages: point-to-point and publish-subscribe. The first case connects one sender with one receiver through a queue and thus decouples both components while still providing a peer-to-peer communication link. The second model enables one or multiple components to publish messages to queues that are then submitted to all components that subscribe to the queue.

The messaging infrastructure provides a number of quality of service guarantees that enable robust application design. Once the MOM acknowledges the hand-over of a message from the sending component, it guarantees delivery of the message to the intended recipient or alternatively notification of delivery failure to the sender. Persistent, transactional storage (e.g., a database) provides this basic capability. Similarly, for message delivery the MOM guarantees delivery of a message exactly once (and re-delivery in case of a failure) until the receiving component acknowledges its correct handling. Certain messaging infrastructures enable configuration of further quality-of-service parameters such as priority delivery level.

The MOM paradigm is strongly supported by modern development environments. For instance, Java supports messaging through the JMS (Java Message Service) API that enables access to any compliant messaging service provider. JMS is part of the Java EE (Enterprise Edition) capabilities. A recent movement to standardize the underlying transport has resulted in the Advanced Message Queuing Protocol (AMQP) defining the behavior of a messaging server and its client so that implementation is truly interoperable. The protocol provides point-to-point and publish/subscribe messaging, or combinations of the two.

OOI Integrated Observatory Messaging

Figure 1 depicts the general architecture of the CI messaging system based on Message Brokers as central infrastructure elements represented as Exchange Points, responsible for the routing and delivery of messages. Message Clients provide the interfaces to the application logic.

157 Figure 1. CI Message-Broker Architecture (SV-2)

Figure 2 specifies the roles and responsibilities of the different actors that are part of the CI messaging architecture.

Figure 2. CI Messaging Actor Model (SV-2)

An Exchange Point receives and manages messages, manages and fulfills subscriptions, has an identity, has a message persistence strategy, is reified across multiple brokers and is a finite state machine (FSM).

An Exchange Space provides administrative scope to the set of exchange points, has an identity, has a persistence strategy and is reified across multiple brokers.

158 A Resource publishes, queries and subscribes to Exchange Points, binds Exchange Points through subscriptions, owns and manages the life cycle of Exchange Points & Exchange Spaces, communicates with Brokers using Sessions, manages the life cycle of sessions, has as identity (an address, name, and owner) and is a Finite State Machine.

A Broker reifies, persists and federates Exchange Points & Exchange Spaces, communicates with Brokers & Clients using Sessions, manages the life cycle of Sessions, has an identity and is a Finite State Machine.

A Session manages message transfer, manages the life cycle of network connections, provides connection authentication and security, has an identity, has end point addresses, has a network protocol strategy and is a Finite State Machine. An example for message transfer routing is

C1-1 transfers a message through the Exchange Point P1 on Exchange X to C1-2: C1-1 -> B1.X.P1 -> C1-2 C2-1 subscribes to X.P1 reified on B2 C1-1 publishes Message to X.P1 reified on B1 X.P1 fulfills C2-1 subscription: C1-1 -> B1.X.P1 -> B2.X.P1 -> C1-2

Figure 3 shows a data product generation scenario using the entities defined above. The scenario makes use of the physical deployment sites of the designed OOI observatory network infrastructure. Observational data from a sensor is accepted by the Portland CyberPoP Acquisition Point via the instrument agent. The messaging infrastructure services provided by the CI Capability Container wrap the acquired data in form of self-contained, self-descriptive messages of type "raw data" and make them available to the Exchange. The sequence of data messages containing the raw data and descriptive metadata realize a data stream that can be consumed by any interested party connected to the Exchange. The Exchange transparently provides routing and distribution, for instance to data processing (in the Amazon cloud), the repository (in San Diego) and event detection (at a research team's institution) consumers connected to the Exchange. All consumers are based on the CI Capability Container and its messaging services. All of the data stream targets can be located at different points in the network. The Exchange realizes the data stream distribution network. It is internally comprised of message brokers that route and relay the message and ensure security and enforce policy.

Figure 3. CI Data Product Generation Scenario (SV-2)

Exchange Spaces and Exchange Points

This section describes the fundamental architecture of the COI Messaging Service architecture, as investigated and refined in three prototypes, described below. The three prototypes include the Messaging Service Broker Infrastructure (Rabbit), the Messaging Service Client Application Adapter (Magnet), and the underlying Distributed IPC Facility communication framework (DIF). The following describes the concepts and how they inter-depend.

Figure 4 shows two applications interacting. Application here stands for any software client, intending to communicate via the COI Messaging Service. This is the case for all internal service to service interactions, as well as for some external interfaces to the CI. We'll explain below what

159 qualifications applications have to meet to be able to access the Messaging Service.

The Messaging Service, i.e., the "Exchange", represents itself to any application as a set of Exchange Spaces. Exchange Spaces are communities in the sense of the Agent Control Network. As such, they provide a community specification, which encodes terms of use. The community's member entities are the applications using the Exchange Space. Applications have to enroll (register) with the Exchange Space before using its resources, Exchange Points, for message-based communication. Being a member of an Exchange Space enables the applications to interact via message exchange, by producing and consuming messages.

Figure 4. Application to application communication scenario

Figure 5 shows a more detailed view of the same scenario. The Exchange Space is represented as a set of Brokers (Message Brokers, for instance AMQP servers [20]) in a distributed networked communication environment. Applications maintain a point of attachment with their Broker, i.e., they maintain a connection to this broker. Different applications may be connected to different Brokers, which all represent the same Exchange Space. Applications play different roles when interacting with their Broker. They play the roles of Producers or Consumers of messages. In some cases, applications can play both roles. The resources over which Producers and Consumers exchange messages are Exchange Points. Applications can query the Exchange Space for a list of Exchange Points they are interested in, can then request (allocate) the use of an Exchange Point and subsequently produce messages and receive messages from such an Exchange Point. Receipt of messages is a consequence of a preceding subscription of a Consumer role to an Exchange Point. Note that the concept of an Exchange Point is a logical entity in the Message Service architecture. It is represented everywhere across the distributed Messaging Service.

Figure 5. Inside view of application of communication using an Exchange Space

Internally, the Exchange Space and Exchange Points manage registration of applications, allocation of producers (publishers) and consumers (subscribers) of messages, the efficient routing of messages across the distributed network to consumers, the pre-allocation of message broker resources (exchanges and queues) based on subscriptions transparently to the applications.

The Distributed IPC Facility provides the underlying communication architecture and mechanism for secure interaction with the Messaging Service and within the Messaging Service.

Messaging Service Client Adapter

The Exchange (i.e., the COI Messaging Service or the Messenger and Router/Interceptor in the Rich Services architecture) is the central integrating element of the COI. It provides access to the communication mechanisms of Exchange Spaces and Exchange Points throughout the system-of-systems, abstracting from the physical communication infrastructure across multiple domains of authority. Client applications may publish messages on Exchange Points within Exchange Spaces. An Exchange Space represents a "community of interest" that collects and controls all of the Exchange Points in its scope and enforces policy of use for a registered set of users and applications. An Exchange Point is represented through a set of named exchanges on one or multiple AMQP [2] message brokers. Thereby, the Exchange provides a comprehensive, uniform view of a federation of message brokers: from the point of view of a publish/subscribe client (i.e., producers and consumers of messages), the fact that the messaging system is built as a federation of independent message brokers and not as a single broker is hidden. The CI integration strategy determines how individual software components integrate into the system-of-systems through a message-broker integration infrastructure. The communication system of the OOI CI applies messaging as the central paradigm of inter-application information exchange, realizing the Messaging Service, the integrating element of all services.

160 Message-oriented middleware (MOM) (see [6], [9]) is based on the concept of a message as the exclusive means of information exchange between the distributed components of a system. All information that is passed between two components or services is contained in messages exchanged asynchronously (i.e., non-blocking) over a communication infrastructure. The sender of a message does not wait for the message to be delivered or returned; it only waits for the MOM to acknowledge receipt of the message. Delivering messages to recipients utilizes the concept of queues. An application component in a message-oriented architecture only knows the incoming queues that it receives messages from as well as the outgoing queues it delivers messages to, plus the message formats that pertain to these queues. The MOM provides the capability for system integrators to connect these queues to known endpoints (i.e., addresses) in the network; consequently it manages routing, reliable storage and delivery of messages to intended recipients across the network. Standardization is on the way for the underlying message wire transport protocol: the Advanced Message Queuing Protocol (AMQP) [2] defines the interactions of a message broker with its clients, promising interoperability between message brokers of different provenance.

Figure 6 provides an exemplar application scenario within the OOI CI. Capability containers host the application logic that interconnects using the Messaging Service. This is exemplified through an Instrument Agent publishing a raw data stream on an Exchange Point (a queue) via messaging. Any number of consumers may choose to subscribe to such an exchange point. In the example, the data processing application as well as the data repository will receive the published messages. A data stream is a continuous series of related self-contained messages on a given exchange point. There is a second exchange point for another data product containing processed data that is consumed by an event detector process. The physical deployment of all applications is irrelevant. The Exchange realizes all connectivity.

Figure 6. OOI CI Exchange exemplar messaging scenario

CIAD COI SV Common Message Format

Overview

The OOI common message format applies to all messages within the OOI system.The common message format specification defines message headers and message content structure definitions. It allows for arbitrary message content, as long as it satisfies the message content structure definition (i.e. the "schema" of the message content). The COI Capability Container is the component implementing this message format.

Message types are defined as specific objects of the OOI Common Object Model. As such, their structure is described by a unique object model (i.e. message) format specification. Encoding/decoding as well as type checking and stub generation can be handled by the infrastructure.

Message Format and Interaction Levels

To reduce the complexity of the OOI CI interactions, we distinguish between several levels of abstraction:

At the application level, two applications engage into a conversation based on a previously agreed upon interaction pattern.

161 At the integration level, each application instance resides in a capability container, and each message is routed through the governance agent and identity manager. Communication will be performed via exchange points on the Exchange. At the messaging level, we see all capability containers through their messaging clients. The conversation from the integration level becomes a conversation between a messaging client and a previously known broker that contains a representation of the Exchange Point.

The following figure shows how each level for communication adds its own header to the message. All headers are always piped through the Identity Management system (IdM) corresponding to the current layer. The existence of the IdM presupposes a domain of authority.

Figure 1. Different levels of abstraction in message interactions

The diagrams in the behavior section of the COI capability Container page show the interaction pattern within the capability container when an application sends a message.

When the message is sent, it goes to the capability container (integration level) who sets some headers and sends the message to the Agent. The agent will update its knowledge base and act - the sender agent might block the message to go out, in which case the denial should propagate back to the application. If the agent does not block the message, it goes to the signer, who adds the signature and other IdM headers. Then, the Messaging Abstraction (messaging level) component converts it to AMQP format and sends it to the Broker.

Common Message Headers

Overview

A message transported via a message-oriented middleware infrastructure (such as an AMQP Message Broker) is a unit of communication that can be transported, understood and processed atomically. The underlying network layer ensures that messages pass through the network unmodified with exactly once delivery semantics.

The bytes comprising a message can be distinguished into the following parts

Messaging level transport header and footer: For AMQP based systems, these are AMQP headers and footers. Also called the "message envelope". Certain headers can be modified while a message is in transit and is routed across multiple hops Integration level headers (immutable): Headers specific to the OOI Exchange system, describing invariant properties about the message content, such as sender and receiver name and specification of the message content in form of encoding, structure, language and ontology and association of the message to a conversation and an interaction specification, Application level content (immutable): The actual payload of a message. Entirely

Messaging Level Header

For the OOI, consists of the AMQP message headers with mandatory and optional fields.

Parameter Description

162 durable

priority

transmit-time

ttl

former-acquirers

delivery-failures

format-code

message-attrs

delivery-attrs

AMQP Message properties (immutable).

Parameter Description

message-id Unique AMQP level identifier for the message

user-id

to

subject

reply-to

correlation-id

content-length

content-type

*security headers Digital signature, certificates, attributes

Integration Level Header

Headers are based on the FIPA ACL Message Structure

Parameter Category of Description Parameters

op Communicative Act The operation (also known as performative, method) expressed by the message

sender Participant in The exchange name of the sender. Do not use this name to reply. Use reply-to for this purpose. communication

receiver Participant in The exchange name of the receiver, which can be the name of a topic. communication

reply-to Participant in The exchange name for any replies, such as error and undeliverable messages, or protocol replies communication

language Description of Content

encoding Description of Content

ontology Description of Content

protocol Control of conversation

163 conv-id Control of Conversation ID. The initiator of the protocol must assign a non-null value to the conversation-id conversation parameter. All responses to the message, within the scope of the same interaction protocol, should contain the same value for the conversation-id parameter

reply-with Control of The reply-with parameter is designed to be used to follow a conversation thread in a situation where conversation multiple dialogues occur simultaneously

in-reply-to Control of conversation

reply-by Control of The timeout value in the reply-by parameter must denote the latest time by which the sending agent would conversation like to have received the next message in the protocol flow (not be confused with the latest time by which the interaction protocol should terminate)

ts Timestamp

Application Level Content

Message types are defined as specific objects of the OOI Common Object Model

Reference Standards

Requirements for OOI message format

Should support conversations that instantiate an interaction pattern => protocol, conversation-id Interaction pattern Language (e.g ASN.1) Encoding (e.g. PR, XML) Ontology Not relevant: performative (part of application content)

Message Content Encodings

Data Type Representations (TV-1)

DM Common Data Model (OV-7)

CIAD COI SV Distibuted IPC Facility

Distributed IPC Facility Concepts

We are currently investigating a special case of community called the Distributed Inter-Process Communication Facility (DIF) [5]. A prototype implementation is currently being developed. Entities, representing processes that require inter-process communication (IPC), enroll in this community and are assigned a name valid throughout the community as well as an address that the community uses internally to direct communication. The resources of the community are local endpoints of the DIF, which provide resource allocation (open/close a connection to another named endpoint) and read/write capabilities.

DIF Model

See DIF Models

DIF and OOI Messaging Service

This DIF facility is intended to be the underlying distributed system primitive within the OOI system-of-systems. As is apparent, in conceptual terms, DIFs relate naturally to the notion of communities that we motivated in the foregoing. Other communities will be defined applying similar patterns for other purposes than communication, such as scalable, elastic computing environments, with entities including the requestors of a service and the responding nodes.

The power of the DIF model is that it can be stacked in order to increase scope. One DIF can leverage a lower level DIF for communication purposes and present a DIF facility of larger scope to its member entities. Thereby, the design of how to architect the communities becomes the driving element in the architecture of a distributed system. Any topology and architecture is possible here, exceeding pure layered architectures.

We are applying the DIF model to the COI Messaging Service. Figure 19 shows the logical distributed concept of the Exchange Space, represented by multiple distributed Brokers across the network, applied to local resources of AMQP message broker instances. Applications in the roles of Producers and Consumers of messages communicate with Brokers on the logical level. At the networking level, this comes down to applications using a connection to a message broker instance to publish messages on AMQP Exchanges and to subscribe to AMQP queues in

164 order to consume messages after they arrive. Realizing a distributed Messaging Service, where applications can be connected to different brokers, requires the AMQP broker instances to federate. We are currently prototyping an extension of the RabbitMQ [21] message broker that provides such a federation. Oversimplified, it comes down to relaying messages that are produced on one broker's local AMQP Exchange to any remote queue that has subscribers for the same topic of messages, as represented by an Exchange Point.

Figure 1 Distributed Exchange Space/Point concepts mapped to AMQP message broker instances

The broker-to-broker communication in a federation needs to be very controlled, secure, based on mutual trust and, at the same time, efficient and resource aware. In the first step of our prototype, we are applying the concepts of the DIF, implemented as local file system calls, to broker-to-broker communication. Brokers that intend to become part of a federation need to enroll in the Wide Area DIF. Consequently, they receive a unique name with in this DIF that is then made aware to the other members of the DIF. Any broker can now request a communication flow (i.e., a connection) to any other broker. Through this flow, brokers can distribute messages that arrive on local AMQP Exchanges across the network where there are subscribers to these messages. The DIF makes this exchange secure and exclusive to registered, trusted brokers.

Through its inherent abstraction of names from addresses, the DIF provides the basic mechanisms for efficient relaying of messages across the network. A name can be a representative (indirection) for a number of addresses of actual brokers. By defining unique names not only for broker instances but also additionally for each Exchange Point in the system, brokers can send messages to such a name whenever they receive a message from a local Producer application on such an Exchange Point. The DIF hides the complex algorithms that resolve the DIF name for an Exchange Point to the set of addresses of registered brokers that require updates to the Exchange Point because the have subscribed Consumer clients.

The DIF abstraction is very powerful in describing this interaction pattern and provides a fully secure and scalable communication environment. It is dependent on the policy within the Wide Area DIF alone how routing and network resource management are performed.

Figure 2 shows our vision of applying the same DIF implementation not only between brokers (as the Federated DIF), but also between application clients (DAF stands for Distributed Application Facility) and their local broker instance. We assume that organizations will operate their separate AMQP broker (cluster) instance, and all application clients within the domain of authority of this organization will connect to this local broker. Initially the client to broker connection occurs over the local LAN using TCP. The protocol between clients and brokers is defined by the AMQP Transport specification. We envision the same AMQP Transport protocol occurring over an Organizational DIF. As such, it requires the explicit enrollment of all clients and the broker process and provide then a secure local communication environment that additionally hides any network complexity of organization wide routing and performance.

Figure 2 Organizational client-broker and inter-broker federated DIFs

The same figure also shows the distinction of client applications into Worker and Supervisor roles, in addition to them being Producers and

165 Consumers of messages. The notion, strongly based on the Open Telecom Platform (OTP) [17] design principles, is that any worker needs to have a supervisor that controls the worker's life cycle and restarts it in case of failure. Supervisors themselves can have supervisors, cascading as a tree of processes that all communicate in a distributed environment exclusively by messages.

DIF Services

DIF and Message Brokers

This diagram shows an exemplar deployment of application capability containers and message brokers communicating in a federated system. The left hand side of the figure constitutes Org1, the right hand side Org2. Each organization runs a separate message broker. The message brokers establish their communication and federation through the "Inter-Broker DIF". The application capability containers and the organization's messaege broker communicate though an "Organizational DIF1". The same occurs in Org2 with a separate "Organizational DIF 2". The message brokers realize the application that bridges the different DIFs

166

DIF and Exchange Spaces

This diagram shows the exemplar deployment of two message brokers that each have realizations of an exchange space XS. The exchange point XP1 is realized on both brokers, the exchange points XP2 and XP3 are only realized on one broker each. Exchange space realizations and exchange point realizations are first class entities interacting with the local DIF clients. As such they have location independent names within the DIF. All exchange points of one name have the same name in the DIF. The DIF takes on the responsibility for reliable message multicast. If a message gets published for name1 in the DIF, for instance, all exchange point realizations with name1 receive this message.

This diagram shows the exemplar deployment of three message brokers that each have realizations of exchange spaces and exchange points. There are different deployments of exchange spaces and exchange points within the distributed system.

167

CIAD COI SV Interaction Levels

OOI Abstraction Stack

To address the complexity of the OOI-CI, we distinguish between several levels of abstractions that can be logically thought as stacking on top of each other. Note that this is not a layered architecture, although in this context this would be a simple approximation. Many details regarding the interactions between various CI constituents are left out for this discussion.

OOI Application Level

At the application level, two applications (i.e., resources of a particular type, such as producer and consumer) engage into a conversation (marked with (1) on the diagram below) based on a previously agreed upon interaction pattern. The interaction pattern is already known to both applications and recorded into the Service Repository (we omit at this point how this is done). The interaction pattern consists of a couple of application level messages (e.g., msg1, msg2, msg3) that serve the business logic of these applications. The conversation may be subject to a particular policy (e.g., A must receive msg2 at most 3 minutes after msg1, otherwise the conversation becomes irrelevant). The verification and enforcing of such policy has to be performed on both sides of the conversation. The conversation happens transparently of how the underlying layers implement it.

Note that at this point we do not specify which instance of each application is actually engaged into the conversation. This allows flexible routing between multiple instances of an application to implement load balancing, multiple workers, fault-tolerance at lower levels of abstraction, without complicating the logic of each application.

Integration Level

At the integration level, we have to deal with particular application instances. Here, a capability container wraps each application instance, subject to a binding policy. Such policy may prevent the application instance from not respecting an agreed upon interaction protocol with the capability container (e.g., APIs, sequence of commands, etc). An agent supervises and controls the application and the respective policies.

168 The application level communication (1) is translated at this level into two distinct conversations: (2) with the functionality provided by the capability container, and (3) with the OOI Exchange. Here, the OOI Exchange provides a number of Exchange Spaces (e.g., ES.z) according to some previously agreed upon rules. Each Exchange Space is represented by an Exchange Point (e.g., XP.z1) that is a "tangible" peer for any conversation. The transparent conversation between apps of (1) becomes (3) with an Exchange Point as explicit "middle-man". Wrapped messages from the application layer are now delegated to the Exchange Point with a label indicating their target destination. The Exchange Point takes the responsibility to deliver those messages to the destination. Each interaction is subject to an integration policy which may define who is allowed to use an Exchange Point and for what purpose. Other policies may also be implemented at the capability container and XP level.

Note that we do not include here the logic of how capability containers and application instances are provided, nor the capabilities of the agents. We also omit the setup of the exchange spaces and points and their discovery and use.

OOI Messaging Level

At the messaging level, we see all capability containers through their messaging clients. The conversation (3) from the integration level becomes (4) between a messaging client and a previously known broker that contains a representation of the Exchange Point used in (3). Brokers between different facilities are previously engaged into an inter-broker DIF (i.e., some form of connection that allows them to exchange low level messages, such as AMQP), subject to a specific federation policy. Both the capability container and broker have agents that supervise the implementation of the integration policy and other security/safety features.

169 Policies and Contracts

170 Conversation Elements

At each level of abstraction, the conversation (1) through (5) gets enriched with additional information pertaining to the underlying levels.

(1) - resource identifier (i.e. name, such as application name), resource type (used to identify which interaction patters apply and are valid) (2) - binding protocol (i.e, API, sequence of commands, etc) (3) - two types of elements: explicit (related to each message): interaction pattern identifier, conversation identifier, source resource name, target resource name implicit (implemented by the integration layer, related to a conversation): exchange space, exchange point (4) - interaction pattern identifier, conversation identifier, source resource name, target resource name, exchange space, exchange point (5) - (4) + DIF routing information

CIAD COI SV RabbitMQ Exchange

TBD.

This page describes the use of RabbitMQ to realize the message broker infrastructure for the COI Exchange.

CIAD COI TV AMQP

AMQP Protocol

171 AMQP [4] defines a standards-based messaging infrastructure for queue-based messaging. AMQP specifies both a binary-wire protocol and a model that messaging brokers need to implement. The protocol can be implemented on top of network transports such as TCP and can be used in different programming environment and operating systems. AMQP targets application domains with high demands on reliability, performance, and publish-subscribe capabilities. AMQP encompasses and goes beyond JMS semantics.

AMQP is split into three layers: transport, session, and model. The model layer specifies the routing and queuing services. The session layer provides reliable transport, synchronization, and error handling. The transport layer is a binary protocol that provides framing, channel multiplexing, failure detection, and data representation.

The AMQP Model (Figure 1) decouples producer and consumer applications via three main concepts: exchanges, message queues, and bindings. A middleware server can provide several virtual hosts that have to implement the components of the AMQP Model. Producers publish messages to exchanges, and consumers get messages from message queues. Exchanges accept messages, examine them, and route them to the appropriate queues. The binding between exchanges and message queues defines the routing criteria; thus, exchanges abstract different middleware delivery models. AMQP supports (among others) direct point-to-point, store-and-forward, and publish-subscribe message delivery. On the one hand, producers choose which exchange should route their messages. On the other hand, consumers subscribe to message queues. The bindings are the arguments passed to the exchanges for instructing the exchange about which messages to route into the queues, but it is not always clear which entity should provide the binding information.

Figure 1. AMQP Model Layer Domain Model (TV-1)

AMQP messages are self-contained and long-lived, and AMQP does not impose restrictions on their size. The binding key specifies the matching criteria, and the routing can be done based on message headers (routing key or other properties) or message content. Exchanges do not store messages and can duplicate messages to several queues. Queues store messages in memory or on disk as requested by the consumer, and can also search and reorder messages. AMQP aims at providing reliable, pervasive, fast, and secure shared access to a distributed network of message queues [3].

Sessions (see Figure2) are interactions between AMQP peers that provide reliability: guaranteed command execution, recovery from network failure, and reconciliation of state when peers fail. After network failures, the session layer is responsible with replaying the appropriate commands without duplicating delivery; this involves a negotiation between peers. To achieve reliability, the session maintains its state while it is detached.

172 Figure 2. AMQP Session Layer (TV-1)

AMQP supports multiple virtual networks within the same physical network (see Figure 3). All virtual hosts in a server share the same authentication scheme, but the authorization can be different. All channels within a connection work with the same virtual host.

Figure 3. AMQP Virtual Hosts (TV-1)

AMQP assumes an underlying stream-based protocol such as TCP. The transport layer transmits sequential frames over channels, and supports multiple channels on one connection. The framing is depicted in Figure 4. AMQP has a three-level structure, with an Assembly encoding commands and data content, and Frames being the unit sent on the network. Assemblies and Segments have no size limit, but Frames are limited by the underlying transport mechanism.

173 Figure 4. AMQP Transport layer (TV-1)

References:

[1] D. Box et al. Web Services Eventing. Available from: http://ftpna2.bea.com/pub/downloads/WS-Eventing.pdf [2] OASIS. WS-Notification (v1.2). Available from: http://docs.oasis-open.org/wsn/2004/06/ [3] Pieter Hintjens, AMQ Background - Background to the AMQ Project. Available from: http://www.openamq.org/doc_background.txt_flat.html [4] AMQP, Advanced Message Queuing Protocol specification, version 0-10.

CIAD COI TV AMQP 1.0PR1 & 1.0PR2 Models

AMQP 1.0PR1 Models Summary

The AMQP 1.0 specification (currently PR1) provides a different (simplified) set of abstractions compared with 0.8/0.9 versions. In AMQP 1.0 Broker and Client are just two kinds of Applications related with the classical view of MOM. This specification is more generic, and allows for having producers/consumers into brokers; also, apps can also act as brokers for proxying, etc. The following models capture the essence of the AMQP 1.0 Specifications as described at http://jira.amqp.org/confluence/display/AMQP/AMQP1.0+SIG.

Layers model Link Models (two parts)

174 Session model Connection Model

Session Endpoint (two parts) Frame

Commands and Controls Transfer command

Command Stream Conversation snapshot

175 AMQP 1.0 Layers

AMQP 1.0 defines the concept of nodes as peers in a conversation. There are various types of nodes, the most common ones being producer, consumer, queue, and service. Nodes are aggregated into Containers, which are typically implemented as processes in a regular OS. The standard defines the behavior of two types of containers, namely Brokers and Client applications. Nodes communicate through messages carried over unidirectional links. Links operate on top of sessions carrying commands between containers. Messages exchanged along a session may be fragmented into fixed sized units depending on their size. Commands and their body are encapsulated into communication frames. For two nodes from different containers to communicate, the containers must first establish a connection that carries the frames containing the information exchanged by the two nodes. A connection may multiplex multiple sessions with multiple links.

176 AMQP 1.0 Link model (part 1)

A link is an unidirectional communication channel between two endpoints (nodes) and carries any number of messages. The two parties are represented as link sender and receiver that exchange transfer units, with the flow controlled by a transfer limit set by the receiving end. Each message is split into transfer units before being sent. A message has a body and some properties that can be inspected by filters that constrain the message flow.

AMQP 1.0 Link model (part 2)

A session is established between two containers and carries a number of links. Each session endpoint contains a number of link endpoints for the nodes from the respective container associated with the session endpoint. There are routing tables associated with each session endpoint, which

177 map the link to handle of the session endpoint. The incoming routing table provides the mapping from the handle to the link, whereas the outgoing routing table provides the reverse mapping from link to handle. The routing tables are used to identify each link's context on both sides of the session.

AMQP 1.0 Session model

The session and connection operate through a similar pattern as the link and session. A session carries conversational state through commands between two session endpoints. Commands may be grouped into transactions and the information flow is guarded by timeouts. Each session endpoint retains the session state and has a command sender and a command receiver, which process the outgoing, respectively incoming command streams. The connection endpoints associated with the respective session endpoints maintain two routing tables, namely outgoing and incoming to map from/to the session to/from the channels. Channels are the mechanism used to multiplex sessions over a connection.

AMQP 1.0 Connection model

At he lowest level, the AMQP 1.0 transport can be implemented on top of TCP/IP connections. Each connection carries frames guarded by heartbeat timeout. The two connection endpoints map to TCP/IP connection endpoints, which in a similar way with the session model have two routing tables, incoming and outgoing. These routing tables are used to identify the connection between the two containers engaged in the TCP/IP connection.

178 AMQP 1.0 Session Endpoint State Model

As explained in the session model, there are one or more sessions established on top of a connection between containers. Sessions may carry multiple links between the nodes of those containers. Each session follows the lifecycle depicted below. In this model, we assume that one peer initiated the session by attaching it to an opened connection.

Focusing on the attaching and detaching states, there are internal intermediate states that differ from the perspective of the two peers of the session. In the following diagram, we assume one peer is the sender "S", whereas the other is the receiver "R". Note that this names have no real association with the purpose of the two peers. In contrast with links, the sessions are bidirectional and both peers can play the sender/receiver roles.

179 AMQP 1.0 Frame model

The frames implemented by the AMQP 1.0 transport layer contain two parts: header and body. The header carries information regarding the state of the transport and its characteristics -- the size of the frame, its type, flags, channel number for an active session, and a state block. The body may carry either controls or commands, but not both simultaneously within the same frame. Controls operate at the connection level between containers, whereas commands operate at the session level. The type field in the frame header indicates whether the body contains control information or a command. When the frame carries a command, the state block of its header contains two confirmation fields -- acknowledged and executed, a flow control field -- capacity, and the ID of the command encapsulated within the body. The executed field refers to the incoming stream of commands and contains the remote ID of the last command successfully received and executed by the peer sending the frame. The acknowledgment field refers to the outgoing stream of commands and contains the ID of the last command issued by the peer sending the frame that was acknowledged by the remote peer through a corresponding executed field in another command. More details of the command exchange are presented under the command stream (see below).

180 AMQP 1.0 Controls and Commands

The AMQP 1.0 standard defines a set of 8 controls and 13 commands. The controls are used to establish AMQP 1.0 connections between containers. The commands are used to establish sessions, maintain the proper command flow, relink them when there are intermittent communications, transfer messages, group commands into transactions, etc. The high level application data is encapsulated as messages carried with the transfer command.

181 The transfer command has a number of fields for flow control, fragmentation, and delimitation of the actual application data. The handle field refers to the link layer.

182 AMQP 1.0 Command stream

The following diagram illustrates a possible conversation between two nodes X and Y carried through a common session. There are two unidirectional links transporting the commands from X to Y, respectively from Y to X. The command from X to Y have a numerical designation C[CIDev:0-10], whereas those from Y to Z use a literal designation C[CIDev:a-i]. The upper part of the diagram presents a spatial view of the two command streams, with C1 being the first message sent by X and C9 the last (C10 is not sent at this time but would probably be sent sometime in the future). In the reverse direction, the Ca is the first message sent by Y, Ch the last, whereas Ci is not yet sent. The spacing between the messages illustrates some time difference between the messages (i.e, the nodes require some time to process and send them), and also points to the fact that messages do not follow a regular/timely pattern.

The temporal view presents the state of the two nodes following the exchange of commands C1 through C3 & C4, respectively Ca and Cb. We assume that C1 initiates the conversation by sending C1, which has no Ack or Exec flags set. While C1 is being received, the node Y sends Ca (we could say that the two messages are simultaneously on-the-wire but in different directions). Ca carries no Ack/Exec as C1 was not yet executed. After receiving and executing Ca, the node X responds with the command C2, which carries the Ca identifier within its Exec field. Upon receipt of C2, node Y updates its state with the knowledge that Ca was executed (lastAcknowledged contains the id of Ca). While node X executed Ca, node Y executed C1, and upon receiving and executing C2, it responds with the command Cb. The command Cb contains the id of C2 as last executed and Ca as last acknowledged. When node X receives Cb, it has the confirmation that node Y successfully received X's notification of the execution of Ca, and also the notification that C2 was successfully received and executed. Node X proceeds by sending C3 and C4 without waiting for another command from Y. Both commands carry the same set of ids in their Ack and Exec fields. Upon receipt of C3, node Y updates its state and gets another confirmation of this state through C4.

183

The conversation between X and Y continues following this pattern. A shapshot of their internals states at a moment in time after the command C8 was received by Y is depicted below. As recipients, each node has a notion of commands that were executed, some which are just received but not yet executed, and an upper limit of how many commands can be received within an input buffer before they are executed. As senders, each node has to keep track of which commands have been successfully received, executed and then confirmed by the remote party. Similarly with the input buffer, there is a replay buffer for sending commands. Based on a previously established limit on the size of these buffers, some commands remain unsent or even blocked from the higher layers at any particular moment in time.

184

AMQP 1.0 PR2 Type system

185

CIAD COI TV AMQP 1.0PR3 models

AMQP 1.0PR3 draft brings several changes with regard to the structure and semantics of the transport layer. The commands/control fields of previous versions are now replaced by different frame types used by the connection, session, or link endpoints. The following MSCs describe their use for the most common operations of each transport element.

Low level transport

Note: the designations of "low level transport", "logical level transport", and "application level transport" are not officially used in AMQP1.0.

Protocol negotiation

Before any communication can take place between any AMQP1.0PR3 peers, they have to establish a common "dialect" for their communication. Although this is a logical requirement, it is described at the level of the physical connection (a mechanism for carrying over information blocks between two containers over a reliable connection transport such as TCP/IP).

186 Opening / Closing a connection

After the protocol format headers are exchanged between two containers, a connection may be established either in a client/server or request/reply fashion, or in a pipeline mode where workload is passed along with the opening sequence. Closing a connection ca be initiated by one party of both at the same time.

187 The pipeline mode is described only from the perspective of one container, without any details on the behavior of the pair when both peers simultaneously initiate a pipeline open and send to each other some workload.

188 Logical level transport

Note: the designations of "low level transport", "logical level transport", and "application level transport" are not officially used in AMQP1.0.

Beginning a session

The session is the logical transport of information between containers. For deployment it requires a physical transport medium, which in PR3 is implemented by the connection. Keeping track of session is accomplished through a so-called "channel number" (similarly with the TCP/IP port

189 number).

Closing a session

190 Upon encountering an error over one of the links carried by a session, or when there's no need anymore for a session, a party may close a session and clear the corresponding channel. Each party keeps separate track of the incoming and outgoing channels (similarly with local and remote port numbers with TCP/IP).

191 192 Link attaching and resuming

Upon establishing a session between two containers, applications (possibly running on the same machine) may initiate a number of unidirectional links between them. Links may have arbitrary names and receive a numeric handle for faster operation after the they are established.

When the nodes responsible for peering the applications already have an entry for a particular link, the semantics of establishing the link = resuming that link.

193 Link closing

When a link is no longer needed, the link endpoints (i.e. nodes) may terminate the link by sending a detach frame.

194 Transport flow control

When two nodes are engaged into a conversation over a link, the rate of message flow can be altered by exchanging flow credits and keeping track of the number of transfer units (i.e. elementary/atomic pieces of information, most often message fragments) exchanged between them.

195 Application level transport

Note: the designations of "low level transport", "logical level transport", and "application level transport" are not officially used in AMQP1.0.

196 Initial message transfer

When two containers have already established a physical connection, and at least a session with a pair of links over it, two applications may communicate via messages using the existing link endpoints. To enable proper application level hand-off of responsibility for messages, the PR3 draft introduces the concept of "message settlement", i.e. an application level acknowledgment that messages were not only received properly by also processed/acted upon. When the conversation can be summed up within a single message, the message may be considered as terminal, i.e. there is no follow up message.

197 198 Message settlement

When an application successfully processed one unsettled message, i.e. a message not confirmed to have been processed yet, the application may inform the other peer that the message can be settled and clear up its entry in the unsettled table.

At-most-once and at-least-once message delivery

The settling capabilities of PR3 can be used to enrich the semantics of the message transfer between two applications to ensure delivery of a message "at-most-once" or "at-least-once".

199 The at-least-once semantics works on the assumption that the disposition frame used to inform the other peer about the state of the transfer is lost.

200 CIAD COI TV Distributed IPC Facility Models

This page contains models that represent the Distributed IPC facility as described in John Day's book "Patterns in Network Architecture" (2008).

Models

During Iteration 3, we modeled the concepts from John Day's book "Patterns in Network Architecture" and we consolidated them during telecons and a face-to-face meeting with John Day and Chris William on June 15th in Boston. The domain models cover the relationships between elements such as application processes, IPC (inter-process communication Services, DIF (Distributed IPC Facility), protocol machines, primitives and operations, naming and routing, etc.

The Processing System model shows that disks and printers can be seen as processing systems and not just peripherals. Therefore, a computing systems consists of a number of processing systems, each executing application processes (AP) that communicate via IPC (inter-process communication)

201 The Layer model shows John's view of networking as IPC, where each layer implements the same mechanisms but policies are tuned to operate over different ranges of performance. A layer is a distributed IPC facility (DIF). Application processes communicate via a DIF. The IPC processes that make up this facility provide protocols that implements an IPC mechanism and management tasks. Since the IPC layers repeat, the IPC processes within an IPC facility are in turn the application processes requesting service from the IPC layer below.

The DIF Model shows how application processes use the DIF's API primitives and send the data as Service Data Units. Application entities (also named application protocol machines) execute the application protocol, which has a number of commands that operate on application objects (application protocol change the state of the application process, therefore chnage state external to the protocol).

202 The Protocol Machine model shows the format of the protocol data units.

The Naming model explains which names/identifiers are internal or external to DIFs. Note that since the communicating elements are application processes, they also have application names. To become a member of a DIF, an IPC process needs to explicitly enroll, i.e., authenticated and assigned an address.

203

CIAD COI TV RabbitMQ

This page describes RabbitMQ, an implementation of an AMQP broker.

Use in Release 1

For Release 1, ION is using AMQP 0.9.1 and RabbitMQ-Server v. 2.3.1 on CentOS 5.5

Use of VHOST:

The system uses the default VHOST '/'

Use of Exchanges:

Exchange Type Description

magnet.topic topic ION service traffic. Routing keys are equivalent to the global sysname qualified identifiers for service and process names.

events.topic topic ION Exchange Point for events

science_data.topic topic ION Exchange Point for science data (managed by the PubsubControllerService)

Authentication:

TBD

Description

RabbitMQ is an Open Source implementation of an AMQP broker (currently supports AMQP 0.8, 0.9, 0.9.1).

AMQP is implemented in Erlang/OTP and requires an Erlang VM installation.

204 CIAD COI TV Rich Service Architecture

Rich Service Architecture

The COI architecture is based on the Rich Services pattern, a type of Service-Oriented Architecture (SOA) that provides decoupling between services and allows for hierarchical service composition. As depicted in Figure 1, a Rich Service comprises several entities: (a) the Service/Data Connector, which serves as the sole mechanism for interaction between the Rich Service and its environment, (b) the Messenger and the Router/Interceptor, which together form the communication infrastructure, and (c) the constituent Rich Services connected to the Messenger and Router/Interceptor that encapsulate various application and infrastructure functions.

Figure 1. Rich Services pattern

To address service integration, this architecture is organized around a message-based communication infrastructure (see Messaging ). The Messenger is responsible for message transmission between communication endpoints. By providing a means for asynchronous messaging, the Messenger supports the decoupling of Rich Services. The Router/Interceptor manages the interception of messages placed on the Messenger and their routing. This is useful for the injection of policies governing the integration of a set of services. The Service/Data Connector encapsulates and hides the internal structure of the connected Rich Service, and exports only the description and interfaces that the connected Rich Service needs to be visible externally. The communication infrastructure is only aware of the Service/Data Connector, and does not need to know any other information about the internal structure of the Rich Service.

Figure 2. Common Operating Infrastructure, integrating services from subsystems

205 Figure 2 shows the Rich Services pattern applied to the COI architecture; the other five subsystems' services are encapsulated as Rich Services connected to the COI messaging infrastructure (i.e., the Exchange ). This shows the central and integrative role of the COI for the entire Integrated Observatory system-of-systems. The top of the figure depicts the infrastructure services that the COI provides to all subsystems. The COI ensures identity management, pervasive governance and policy enforcement, distributed state management and resource management. It also enables subsystem services to be composed to handle complex interactions, and manages the overall service orchestration, and enables the presentation of services to the environment. The Router/Interceptor allows for flexible composition between the infrastructure and application services. In this way, there is a clear separation between the business logic and its external constraints. At all abstraction levels, infrastructure services plugged into the Exchange can modify the interaction patterns by re-routing, filtering, or modifying exchanged messages. This feature enables the validation and signing of messages, and the injection of policies governing the integration of a set of services.

The Rich Services integration strategy enables constituent subsystems to evolve independently from the composite system. Subsystem functionality is exposed to the OOI Integrated Observatory network as services with defined service interfaces, and the only way of interacting within the OOI Integrated Observatory network is through messages. Service-orientation and messaging realize loose coupling of components, resulting in flexibility and scalability. The complexity of such a large-scale system becomes manageable through separate concentration on each concern. Each subsystem focuses on the services that it enables and assumes that all of the infrastructure services are in place. For example, when designing the Sensing and Acquisition subsystem, the architecture team emphasizes concerns related to instrument control and data acquisition. Instruments can belong to individuals or the marine operators, while all of the deployment platforms are under the marine operator's authority domain. However, since governance is managed pervasively by infrastructure services, and can be abstracted when designing the Sensing and Acquisition services, these issues are not of concern to the Sensing and Acquisition service developers.

Each service of Figure 2 is further decomposed according to the Rich Services pattern. See Service Integration.

The Rich Services architecture provides resource location independence: user applications are shielded from the complexity of the system and the location of resources. The COI subsystem provides the Service Framework and Resource Management services that enable prevasive use of resources across the entire Cyberinfrastructure, across multiple domains of authority. Via seamless integration of identity and governance services, the COI architecture supports the deployment, operation, and distributed management of thousands of independently-owned resources of various types (e.g., instruments, processes, numerical models and simulations) across a core infrastructure operated by independent stakeholders, where each stakeholder has different policies.

Domain Models

Rich Service Architectural Pattern

Figure 3 presents the details of the Rich Services architectural pattern in terms of the entities that form the CI domain model. The main element is the Rich Service, which is composed of a Routing Interface, a Communication Interface, and other Rich Services. There are two types of Rich Services: Rich Infrastructure Services (RISs) and Rich Application Services (RASs).

Figure 3. COI Rich Services Domain Model (OV-7)

206 In the OOI, the Messenger/Communicator block of the Rich Service pattern is realized by the Communication Interface exposing a set of Local Queues that deliver Messages. These queues are used by Routers in the Router/Interceptor block to receive messages from and deliver messages to Rich Application Services.

The Router/Interceptor block exposes the Routing Interface to Rich Infrastructure Services. This interface is made of three entities: Communication Setup Strategies, Communication Facilities, and the Communication Infrastructure. Communication Setup Strategies are used to create different types of Communication Facilities. To perform this task, they read Communication Channels that specify the communication needs based on Science Domain Properties. The Communication Infrastructure is the entity that provides and governs all Communication Facilities in the system. It is the ultimate policy enforcer for the creation and management of communication.

A Communication Facility is the active entity that transmits messages between Routers in the OOI. To provide its functionality, the Router uses Remote Queues bound to Communication Facilities. The Remote Queue allows communication between routers using the Communication Infrastructure. The communication of each RAS is mediated by a Router. RISs can leverage the Routing Interface to apply policies to all communications of RASs. In fact, the Router component delegates policy decisions to RISs acting as Policy Enforcers.

Rich Application Services are realized by Process Instances that perform the computational tasks required by scientists for their observations. As mentioned before, a Rich Service is a hierarchical pattern where one Rich Service can be composed of other Rich Services and a Communications Infrastructure. Therefore, Policies can be defined at each level of the hierarchy inside a Rich Service and are enforced by the internal RISs acting as Policy Enforcers.

CIAD COI OV Governance Framework

Interaction Management

The COI Governance Framework is responsible defining, monitoring and supporting interactions between distributed entities in the system, spread across multiple domains of authority.

See COI Interaction Management

Policy Management and Governance

Based on the Interaction Management services, the Governance Framework enables the implementation of guarded message flows according to specific policies defined by the OOI and its stakeholders. Policies are bound to resources of various kinds. The Policy/Governance and the Identity Management services are tightly coupled through the Exchange service to provide an efficient implementation of the policies.

Decomposition

The governance of OOI-CI is performed by applying Policies bound to Resources. To better illustrate the interaction between the Policy/Governance services, consider the following scenario: an event comes in as an explicit request from a web browser, a facility, or as an observation request from an appropriate component (e.g., an AUV A docking at a mooring point). The Identity Management System identifies the attributes of this event. The Governance Policy Enforcement Point (PEP) consults the Governance Policy Decision Point (PDP) about how to respond to the event (see Figure 1). The response typically involves exercising a capability (interacting with a resource), in which case the decision is to permit, deny, enable, or oblige the corresponding domain capability.

207 Figure 1. Policy Management and Governance Model (OV-2)

For example, we may setup a data stream to upload data from AUV A. The response typically also initiates or advances a Conversation. The policies read by the PDP may consult the status of the relevant conversations. For example, the network may be currently allocated to AUV B, thus causing the PDP to put AUV A on hold. All attributes necessary for enforcing a policy are created and managed by the Attribute Authority (see Figure 2). The requests to the PEP are usually expressed in the domain-dependent form, providing for domain-meaningful specification of Policies. The PDP, however is domain-independent and therefore expects the decision request in a canonical, domain-independent form. Translation from domain-dependent to domain-independent form, as well as collection and transformation of all necessary attributes, is provided by the Context Handler.

208 Figure 2. Policy/Governance Attribute Authority Model (OV-2)

Figure 2 depicts the structure of the Attribute Authority creating and managing all identity management and policy enforcement attributes. It consists logically of three repositories besides the RIS Attribute Authority Manager: Facilities and Agreements, Ongoing Conversations and Principals, which are implemented in the Data Management Subsystem, and the Attribute Authority which contains only their interfaces.

Domain Models

Figure 3 shows the internal dependencies of identity management, policy and governance mechanisms within the COI Governance Framework.

209 Figure 3. Architecture of an Agent Combining Identity and Governance (SV-1)

Governance Domain Models (OV7)

Behavior Models

Governance Activities (OV5) Governance interactions (OV6) Implications of Policy/Governance Framework over Resource Lifecycle

CIAD COI OV Agents and Monitoring

This page describes a simple way in which we can relate an agent (applying a principal's policies) can be reconciled with the constraints (liabilities and privileges) imposed on it by an Org Role. We can imagine that there are two separate rules sets and one set of facts based on which an agent acts. One set of rules captures the policies of the agents and comes up actions that we can think of as the agent potentially attempting. The second set of rules corresponds to the normative constraints to which the agent is subject, in light of its having adopted one or more Org Roles. The latter could be partitioned on a per-Role basis, if necessary. The idea with keeping the rule sets separate is simply for modularity. They could be realized within the same instance of the logic or rule engine that the agent uses for its decision making, and that engine may even be hosted as a service on the execution environment.

An Agent represents a principal in an Org as a locus of autonomy and identity. An Org Role is an abstraction of a participant in an Org and serves as a locus of normative constraints.

210 The above figure illustrates a simple, what we term a pessimistic , approach for implementing an agent. Here the agent representing an application carries out its internal reasoning in whatever manner its designers deem appropriate. Examples of such methods include applying a rule or logic engine, applying a conventional procedural approach, or asking a user. Regardless, the application agent attempts to perform some additional action, either because it is proactive or because it is reactive and has received an external stimulus. All actions of the agent are mediated by a monitor, which we conceptualize as being part of the middleware on top of which the agent executes. The monitor is aware of the current state of the conversation as well as any other commitments to which the agent is party. The monitor is derived from the given Org Role and includes specifications of the privileges and liabilities that define the key components of an Org Role. The monitor applies its reasoning to determine if the attempted action may proceed. If it can, it does, thereby leading to a change in the state of the ongoing conversation. If the action may not proceed, it does not. In either case, the agent who attempted the action is notified accordingly. Notice that in the pessimistic approach, the agent may be aware of all key elements of the state of the ongoing conversations. However, the agent need not be aware of such facts and may not wish to reason about them. Here we can think of the monitor as residing at the level of the agent in conceptual terms and potentially below the agent in implementation terms.

In an alternative implementation, the agent is automatically allowed to perform any actions it attempts (assuming it has acquired any and all capabilities needed for that action). The monitor only just captures the changing state of the interactions, and applies its specifications of the privileges and liabilities to determine whether the agent is compliant. The monitor derives these specifications from the Org Role it represents. However, the monitor reports its results to the Org agent for the relevant Org, which might pursue sanctions on the noncomplying agent. Here we can think of the monitor as residing above the level of the agent in conceptual terms and potentially below the agent in implementation terms, because it needs to observe the actions of the agent, i.e., watch the message traffic to and from the agent.

CIAD COI OV Federated Facility

The Federated Facility is an implementation of the Org concept to represent a facility in the Integrated Observatory Network. A facility is an independently operated domain of authority. The ION system is a federation of facilities.

Facilities in the Integrated Observatory Network

Release 1

The basic framework of a Federated Facility is introduced.

One instance of Federated Facility exists: The Integrated Observatory Network (ION) root facility

Release 2

The concept of Marine Facility will be developed, extending the Federated Facility. A Marine Facility represents a Marine Observatory (such as CGSN, RSN) and all its resources.

Two instances of Marine Facilities will be added. The ION federated facility is the integrating facility

Release 3

The concept of Laboratory Facility will be developed, extending the Federated Facility. A Laboratory Facility represents a community targeted facility for the purposes of Data Analysis and Synthesis.

Release 4

The concept of Classroom Facility will be developed, extending the Federated Facility. A Classroom Facility represents an educational community targeted facility for the purposes of education.

Release 5

The concept of Interactive Observatory Facility will be developed, extending a Federated Facility and subsuming and integrating elements from the Marine Facility, Laboratory and Classroom Facility. An Interactive Observatory Facility enables closed loop observations, analysis and control

211 of all Integrated Observatory resources.

Federated Facility Management

TBD

CIAD COI OV Governance Activities

Policy/Governance Activities (OV-5)

1.The Initialize PDP Activity

The Policy Decision Point (PDP) is the heart of the policy and governance subsystem. The PDP makes all the important decisions made in that subsystem. These decisions are based on the events under consideration (usually these are requests for actions). The PDP applies the policies given to it on the attributes of the given event, i.e., the attributes of the requester (Principal), the capability (the resource and the action to be performed on it), and some attributes of the context (such as the ongoing interaction). The attributes are supplied to the PDP at run time when a decision is needed. The policies must be known to the PDP for it to apply to them. In principle, the policies could be obtained on the fly, but it is often appropriate to fix the policies ahead of usage. These policies need not be hardcoded, however. The present activity deals with loading the policies into the PDP.

This activity occurs when the system is first initialized with a bulk load of the policies that initially will determine how the system behaves. The PDP attempts to load policies and requests the repository to serve the policies. The repository serves the policies, which are loaded by the PDP, subsequent to which the activity terminates successfully. In case of errors (such as hard disk failure or some other reason causing an inability to load the policies) the activity terminates unsuccessfully.

Figure 1: Initialize PDP Activity (OV-5)

2. The Apply Policy Activity

This activity lays out how policies are applied when someone requests an action. The requester first obtains a token from the identity manager and makes a request to the PEP (policy enforcement point). The PEP then requests a decision from the PDP (policy decision point) to determine whether the request should be performed. PDP then selects the appropriate policy that applies to the request, in this task it requests the necessary attributes from the attribute mediator if it finds attributes in the policy that are not supplied in the request.

212 Figure 2: Apply Policy Activity (OV-5)

For example, if a user ID is supplied in the request but the fact that the user is from a particular organization is not supplied in the request and if the policy associated with the resource considers which institution the user belongs to, then the PDP can ask the attribute authority about the institution attribute of the user. Once an appropriate policy is found that applies to the request, the PDP decides whether the policy allows the user to perform the action and conveys its decision to the PEP. The PEP performs the requested action if and only if the PDP permits it, and sends an appropriate response to the user.

Resource Life Cycle Activities

Governance is applied to the full life cycle of all OOI resources. Resource activities are documented here:

Resource Life Cycle

CIAD COI OV Governance Concepts

Conversation Management

Interaction Interfaces

OOI activities are enabled by the precise specification of collaboration patterns, including the required subsystems, their interaction protocols, and a description of the information exchanged over time with each observatory activity through an interaction interface. Furthermore, cross-cutting authentication, security, governance, and policy requirements will be associated with each interaction interface.

The interaction interfaces are provisioned via the COI Governance Framework; they are bound to actual resources either at the time of deployment or at runtime to provide the required degree of flexibility in system configuration. In effect, the OOI activity model is mapped to a service-oriented process model that is supported by appropriate configuration of the orchestration plug-in of each Capability Container.

Collaboration and Policy Framework

The CI Capability Container provides collaboration, agreement support and policy enforcement capabilities. Figure 4.1.5.2-1 illustrates this pattern for the base case of a single service provider and consumer. The pattern generalizes to arbitrary numbers of participants in a service orchestration. Conceptually, the example captures the establishment of a service agreement between two parties; for example, this could unfold between a regional cabled observatory (service provider) and a buoy-based global observatory (service consumer). Each one of the parties has

213 established contractual commitments with their respective user communities, including membership agreements. Upon establishing mutual commitments, a contract between the two parties is in place. Furthermore, each party operates under a set of policies. The negotiation and contracting process, as well as the actual service usage, leads to an interaction pattern between the two parties that is constrained by the contractual commitments and policy subscriptions of both parties.

Figure 1. Collaboration and Policy Framework

Because each Capability Container is equipped with plug-ins for orchestration, governance, policy enforcement, and monitoring/audit, the deployment mapping for the collaboration and policy framework is straightforward: the corresponding interaction interface is stored and accessed CI-wide. Each party's Capability Container orchestration component executes the projection of the interaction pattern on the respective role to participate in the overall collaboration. The governance and policy constraints are extracted from the interaction interface and provided to the corresponding Capability Container plug-ins for monitoring and enforcement.

The Data Acquisition example provided above shows how the COI facilitates the interoperability and on demand mobility of the capabilities of the Sensing & Acquisition, Data Management and Common Execution Infrastructure subsystems. The core software abstraction illustrates how the COI, through the use of the CI capability container, factors out the common aspects of communication, state management, execution, governance, and service presentation to provide a very scalable, secure and extensible model for managing user-defined collections of information and taskable resources. This ability to integrate resources of different types implemented by different technologies is the central proposition of the architecture. It provides the basis for an integrated observatory network that will remain viable and pertinent over multiple decades.

Domain Models

Policy/Governance Domain Models

Figure 3.3.4.3.3-1 shows the governance communication model that describes how the main modules of the governance subsystem of the COI communicate with each other. A subsequent section elaborates the data models for these communications.

The event source or requester sends an identified event (or request) to the PEP. This consists of the specification of the principal and the requested capability (or occurred event). The principal, event, and capability are specified via suitable identifiers. This picture presumes that the Identity Management subsystem determines valid identifiers for all of the parameters of the event being communicated. Subsequently, the PEP sends the event to the PDP, which responds with a decision. The PDP requests the applicable policies from the policy repository by specifying the capability in question. The policy repository responds with zero or more applicable policies. Subsequently, the PDP requests values for certain attributes of the principal, participations, and conversations from the Attribute Authority. (Figure 3.3.4.3.3-1 illustrates this only for principals to reduce clutter.) The Attribute Authority responds with values for the specified attributes.

214

Figure 2. Governance Communication Model (OV-7)

Interaction

The models of this section capture a key contribution of the COI: the notion of Interaction and its Interaction Specification data structure (see Figure 3.3.4.3.3.1-1 and Figure 3.3.4.3.3.1-2). Resources have well defined Interactions according to their Capabilities. An Interaction is an instance of an Interaction Specification governed by a Policy that can be enforced by a Policy Enforcer.

Figure 3. Policy/Governance Interaction Model (OV-7)

215

Figure 4. Interaction Specification Model (OV-7)

A Policy captures the Interactions between technical entities of the OOI system to manage resources. Policies are captured in the Rich Service pattern by RISs. In particular, we identify two types of Policies: Local Policies that mandate the behavior of a single entity of the system and can be enforced by a single Policy Enforcer, and OOI Policies that are global and define the behavior of different entities according to the rules defined by OOI. Service Agreement Proposals are exchanged between the various entities until they reach an agreement.

Local Policies are associated with single resources and applied locally. In particular, they are associated with Process Instances and determine the concrete behavior of the Interaction Roles such instances are playing. In OOI, all behaviors are defined by Interaction Specifications. Interaction Roles and the Communication Channels they exchange Messages on, along with the corresponding Message Specifications, are the structural and behavioral interfaces of each service provided by OOI. Interaction Specifications constrain the communication behavior of all Interaction Roles that appear in them; for this reason, great care must be taken in ensuring that, if Local Policies conflict with the communication pattern required by the Interaction Specification, the failure of some component to perform as expected is managed consistently.

Policy and Governance Interactions

Figure 5 shows the interaction aspects of policy and governance. A Policy may be one of three types: Authentication, Authorization, or Obligation. A Principal applies a policy. We think of this Principal as "Local" or "Self" and the Principals with whom it interacts as "External" or "Other". A Policy considers an ongoing Interaction and controls the actions of the (Local) Principal. An Interaction Specification defines the structure of the Interactions. Each Role determines the associated Interaction Specifications.

Policy and domain capabilities are treated on par: they just happen to deal with different kinds of COI resources. Policy resources are things such as credentials and authorizations (the is-a links to Resource are not shown to reduce clutter). Like other resources, policy resources may be virtual in the sense that another part of COI may have been assigned the responsibility of maintaining them. For example, a department may let the university ID service maintain the credentials. Also, an employee would be subject to policies of their employer that apply to their project.

216

Figure 5. Policy and Governance Interactions Model (OV-7)

Behavior Models

The Govern activity diagram in Figure 6 incorporates two linked activities. One of these is from the perspective of the Governor and the other from the perspective of the governed, termed here the Consumer.

217

Figure 6. Govern Activity (OV-5)

The Governor sets policies, and presents them in a form that the Infrastructure can work with. The Consumer seeks to take an action that is constrained by those same policies. The Governor determines whether the Consumer has the rights to take the appropriate action by granting general or specific access rights and allocations to the Consumer.

These actions have set up the necessary conditions for the Infrastructure to actually enforce the policies. This enforcement occurs at each stage of activity involving various actions taken by the infrastructure. Governance continues until the Consumer is no longer taking actions which are governed. In theory, governance applies to all resource types, although in many case a default policy may be the only one that needs to be enforced.

An advantage of applying governance as an ongoing activity, enforced by configurable rules, is that changes to policy should not require changes to the infrastructure. This assumes that the rules are computable, or able to be enabled by the computer. In this sense, the Govern activity is an example of further cross-cutting activities, such as failure management, logging and encryption/decryption:

Infrastructure enforces the received Policies, rights and Allocations Infrastructure audits the Enforcement of received Policies, rights, and Allocations

Communities and Agents

Our approach in distributed computing is based on the premise that independent entities interact in order to pursue shared goals. Entities can represent users, processes, resources and communities.

Entities in the system are represented by their agents. Each entity (or their agent on their behalf) can form any number of relationships with other entities. Relationships are based on mutual (bilateral) agreements between two entities, the results of a successful negotiation. Each entity tracks the consequences (i.e., commitments [9], [16]) of such agreements (i.e., contracts) with other entities. Each observable atomic action of an entity, such as sending a message, that causes a side effect leads to a change and reevaluation of the aggregate set of commitments of the entity towards other entities.

Entities communicate and collaborate within communities. A community is a specific type of entity in itself. Communities serve multiple purposes in our architecture, including providing a backstop for contracts, providing a locus for naming, and providing a venue to share resources in some uses including infrastructure. A community is represented by a specification that defines the rules for joining the community. Joining a community requires accepting the rules of the community, and the community will provide the registrant entity with a local name and address.

Entities may request to enroll (i.e., participate) in communities or can be invited by other member entities into the community. Enrollment is a symmetric process of negotiation. Entities negotiate the conditions under which they participate in the community and vice versa. If agreement is reached, the resulting contract builds the basis for relations with other community members.

Communities can form relationships with other communities, enabling the members of one community to interact with the members of another community, instituting the specifications of both communities. By contract, the community members are bound to the community specification with its rules, so there is no need for explicit compliance checking (i.e., policy enforcement) and members can interact directly. There might be an imposed requirement for members to leave behind audit trails for later evaluation, same as a tax rule not being directly enforced with every transaction, but which may be audited for compliance to the "state" community tax rules later for each member taxpayer.

218 We call the set of rules that communities (or other entities) impose policy. Policy to access a resource entity for instance might be an aggregate of many rules, such as the resource owner's rules, the community's rules, and any underlying obligations as consequence of membership.

Figure 3 describes the key ideas of the Agent Contract Network effort in conceptual terms. Each ellipse delineates a community of participants. Communities may be nested into, be disjoint from, or partially overlap with other communities. In the picture above, we see three communities: one called Community A, one called Community B, and one called OOI. The OOI community is the "root" community in that it defines the identities for the parties involved and provides the basic rules of encounter within OOI.

Each community specifies one or more roles. For example, Community A is a resource sharing community. It defines two roles: owner (of a resource) and user (of a resource). The community admits principals who may become a user or an owner (or both). Each owner can contribute its resources to the community, so they can be discovered by any user. A user and owner may negotiate usage terms resulting in appropriate contracts being created between each pair. These contracts govern their interactions regarding the resources they share.

Similarly, Community B models a messaging service realized as an exchange space (inspired by the emerging AMQP standard). This community describes two roles, communicator and distributor. A distributor maps to an exchange point and a communicator to either a publisher or consumer. Each party who adopts a role in this community enters into a contract with the community itself (viewed as a principal in its own right). Each communicator can discover a suitable distributor to publish information to or receive information from.

The OOI acts as an overarching authority. It provides a home for the various application-specific communities that exist within it, and supports the interactions of the principals not only by asserting their identities but potentially by helping monitor and enforce their contracts.

3.2 Conversation Management

Communication between two entities occurs as part of a conversation. A conversation presumes a contract is in place between the two entities intending to converse. This contract must include the common knowledge of an interaction pattern that provides a template for the conversation, with the conversation being an instantiation of the pattern. The actual interaction as part of the conversation must comply with the template of the interaction pattern. Each interaction (sending and receipt of a message) potentially causes change in the set of commitments related to the conversation and, thus, indirectly to the commitments between the two entities. Interaction patterns are thereby distributed Assumption/Commitment specifications, in particular also for policy. Each entity can independently monitor the fulfillment of the interaction pattern and contract for the other entity and for itself (and initiate protective or compensating action otherwise). Each party would thus update its commitment store based on each message it sends or receives. Each entity can engage in as many different conversations with different (or the same) entities concurrently as it likes. At any given instant, the effective set of commitments from the point of view of the entity is defined; each interaction can be traced back to a conversation.

We specify interaction patterns using Message Sequence Charts (MSCs, see [7], [9], [12]). We also define a language for commitments that are made and released for each interaction in an interaction pattern. We provide a logical framework to reason over the aggregate set of commitments over time and deduce any implications. Currently, we use a rules engine to implement such a mechanism.

The COI provides collaboration, agreement support, and policy enforcement capabilities. Figure 4 illustrates this pattern for the base case of a single service provider (instrument owner) and consumer (researcher). The pattern generalizes to arbitrary numbers of participants in a service orchestration. Conceptually, the example captures the establishment of a service agreement between two parties; for example, this could unfold between a regional cabled observatory (service provider) and a buoy-based global observatory (service consumer). Each one of the parties has established contractual commitments with their respective user communities, including membership agreements. Upon the establishment of mutual commitments, a contract between the two parties is in place. Further, each party operates under its own set of policies. The negotiation and contracting process, as well as the actual service usage, leads to an interaction pattern between the two parties that is constrained by the contractual commitments and policy declarations of both parties.

Because each Capability Container is equipped with plug-ins for orchestration, governance, policy enforcement, and monitoring/audit, the deployment mapping for the collaboration and policy framework is straightforward: the corresponding interaction interface is stored and accessed CI-wide. Each party's Capability Container orchestration component executes the projection of the interaction pattern on their respective roles to participate in the overall collaboration. The governance and policy constraints are extracted from the interaction interface and provided to the corresponding Capability Container plug-ins for monitoring and enforcement.

The COI, through the use of the CI capability container, factors out the common aspects of communication, state management, execution, governance, and service presentation to provide a highly scalable, secure and extensible model for managing user-defined collections of information and taskable resources. This ability to integrate resources of different types implemented by different technologies is the central value proposition of the architecture. It provides the basis for an integrated observatory network that will remain viable and pertinent over multiple decades.

Protocols are defined through interaction patterns. The interaction pattern (or projection thereof) represents the interaction interfaces of entities (i.e., components). The projection of a protocol on one party can be represented as a Finite State Machine (FSM). We use FSMs as protocol machines that bind the communication endpoint on an asynchronous reliable message-based system to the application logic. Figure 15 shows the use of FSMs as protocol adapters for service applications involved in a conversation as defined by an interaction pattern.

Figure 5 shows an exemplar scenario for the application of agents for the management of physical resources such as sensors, and of services in a distributed environment. Agents interact via the Messaging Service (see Section 4 for details on the Messaging Service). Services themselves use the Messaging Service for inter-service conversations as explained above. In this case, the services' agents provide the management and control for the service, such as starting/stopping the service and granting access. Finite State Machines as protocol adapters ensure that the agents and service protocols are always in a consistent distributed state, ensuring robustness of the entire system. Service protocol adapters provide access to the service; Managed Resource Agent protocol adapters provide access to the respective resource agents. Resource agents provide monitoring and control of resources, advertise and grant access to resource capabilities and manage the contractual relations and commitments of the resource to its environment on behalf of the resource. All these agent interactions occur in form of conversations based on defined interaction patterns. Proxy Resource Agents provide similar capabilities and interaction patterns but act as proxies or supervisors of Managed Resource Agents. Thereby, policy can be applied at various levels within the system through a chain of responsibility.

219 Domain Model for Governance

Overview Governance Model

Figure 6 summarizes the key representational and operational concepts of the Agent Contract Network. A Principal is an active OOI entity. A Principal may be an Individual or an Organization (Org). An Organization realizes an Org Specification that states the Contract Templates applying to its various Org Roles. A Principal may play an Org Role, which specifies a Contract Facade consisting of the Qualifications the Principal must meet, the Liabilities it takes on in playing the Org Role, and the Privileges the Org Role grants it. In operational terms, a Principal is represented computationally via an Rule-Based Communicating Agent, which carries out Conversations with other Agents. The Conversations instantiate Interaction Specifications, which aggregate Interaction Patterns specified in terms of Interaction Roles. Each Interaction Role maps to an Org Role and supports its Contract Facade.

This model relates an Org Specification with a Contract. A Contract is specified in Figure 7 to consist of a number of clauses. Each clause of a Contract involves two or more Org Roles. In effect, each Org Role partitions its view of the relevant parts of the Contract. We model the role-relevant parts of each Contract as consisting of three components: qualifications, privileges, and liabilities. In enacting an Org, each Principal that is the actor of an Org Role Participation aggregated within that Org is affected by each of the Roles it adopts. The Principal must be suitably qualified in order to adopt the given Role. By adopting the Org Role, the Principal acquires Privileges (such as powers and authorizations), and becomes subject to various Liabilities (using the term generically to include all manner of commitments where the Principal is a debtor). These requirements on a Principal that are based on the Org Roles it plays are assembled into a Contract Façade.

The Principal applies its (autonomous) Policies, ideally to satisfy its liabilities and take advantage of its privileges. The Principal normally realizes its Contract Façade; not realizing the Contract Façade would be a violation. In general, however, we cannot guarantee compliance. There are two main ways to address the question of compliance.

One approach is to be pessimistic and ensure that the actions taken by a Principal are compliant. This is not possible in general since the Principals are autonomous and heterogeneous. However, in cases where we determine the implementation of a Principal, we can place a monitor between the Principal and the rest of the system such that the monitor would allow only the policy-compliant actions of the Principal to proceed.

An alternative approach is to be optimistic wherein we assume the Principals proceed as they would any low-level intervention, but detect and handle noncompliant behavior. This we can accomplish in two ways: either by introducing architectural constructs for monitoring or through the Principals monitoring each other, and potentially escalating matters when there is a problem. Such escalation would be to the Principal that is the Org in whose scope the given Org and its contract exists.

OOI is an Org that serves as the highest scope for the Orgs that we define here. The OOI Org provides identity management as well within this effort.

Contract Model

Figure 7 presents in detail the model for contracts. We model a contract recursively as a set of contracts with the recursion bottoming out as a set of clauses. The recursion is unnecessary in a way but offers a more intuitive representation when compared with real-life contracts where the clauses are structured and the contract thus exhibits a repeating structure.

The clauses in real-life contracts fall into several major categories.

The Main Clauses deal with what the contract is about and the main "business" reason for having a contract in the first place. Naively one can treat a contract as applying between parties that can be viewed as black boxes. However, this is usually not the case in contracts of any importance or complexity. The Normative Clauses deal with matters that are important to the regulations and policies that apply on the interactions among the parties to the contract. The Normative Clauses are thus of special importance to our proposed use of contracts for governance. The Visibility Clauses deal with how much access the parties to the contract have to the internal implementations of each other. In general, each party would rely upon such clauses to make sure that the work product is of an adequate quality, that the effort is robust, and does not violate any laws or regulations to which one of the parties might be subject. The Scoping Clauses specify the purpose and scope of a contract. These are crucial in typical business contracts because of their potential effect of legal rights and such of the parties involved. We expect these might be rather straightforward in most OOI governance settings, although the main OOI membership EULA would have a description of the scoping requirements for when users sign up for an OOI account. The Resolution Clauses deal with how to respond to failures in a contract, including the possibility of sanctions (of violators) and compensations (by violators). The most likely forms of sanctioning will be through the somewhat amorphous means of reputation and via escalation of complaints to the Org that provides the scope for a contract. The Org may sanction a Principal that it judges to be malfeasant by ejecting such a Principal from the Org and by escalating a complaint further. A malfeasant Principal may be ejected from OOI and declared persona non grata.

CIAD COI OV Governance Domain Models

Domain Models

Figure 1 shows the governance communication model. This describes how the main modules of the governance subsystem of the COI communicate with each other. A subsequent section elaborates the data models of these communications.

The event source or requester sends an identified event (or request) to the PEP. This consists of the specification of the principal and the

220 requested capability (or occurred event). The principal, event, and capability are specified via suitable identifiers. This picture presumes that the Identity Management subsystem determines valid identifiers for all the parameters of the event being communicated. Following, the PEP sends the event to the PDP, which responds with a decision. The PDP requests the applicable policies from the policy repository by specifying the capability in question. The policy repository responds with zero or more applicable policies. Consequently, the PDP requests values for certain attributes of the principal from the Attribute Authority. The Attribute Authority responds with values for the specified attributes.

Figure 1: Governance Communication Model (OV-7)

1. Interaction

The models of this section capture a key contribution of the COI: the notion of Interaction and its Interaction Specification data structure (see Figure 2 and Figure 3). Resources have well defined Interactions according to their Capabilities. An Interaction is an instance of an Interaction Specification, governed by a Policy that can be enforced by a Policy Enforcer.

Figure 2: Interaction Model (OV-7)

A Policy captures the Interactions between technical entities of the OOI system to enforce policies and manage resources. Policies are captured in the Rich Service pattern by RISs. In particular, we identify two types of Policies : Local Policies mandate the behavior of a single entity of the system and can be enforced by a single Policy Enforcer, OOI Policies are global and define the behavior of different entities according to the rules defined by OOI. A Service Agreement Proposal is exchanged between the various entities until they reach an agreement.

221 Local Policies are associated to single resources and applied locally. In particular, they are associated to Process Instances and determine the concrete behavior of the Interaction Roles such instances are playing. In OOI all behaviors are defined by Interaction Specifications, therefore, Interaction Roles and the Communication Channels they exchange Messages on along with the corresponding Message Specifications are the structural and behavioral interfaces of each service provided by OOI. Interaction Specifications constrain the communication behavior of all Interaction Roles that appear in them; for this reason, great care must be taken in ensuring that, if Local Policies conflict with the communication pattern required by the Interaction Specification the failure of some component to perform as expected is managed consistently.

Figure 3: Interaction Specification Model (OV-7)

Figure 4 captures the main elements of a generic language to describe Interaction Specifications. The Interaction Specification is just one Interaction Element, therefore, it can be as simple as a Local Action (for example switching of a sensor in and instrument) or a complex Composite Interaction with Operators multiple Roles , Communication Channel Specifications, and Messages .

This generic language for Interaction Specifications allows us, in particular, to use widely-used notations such as Message Sequence Chart (MSC) or the Unified Modeling Language (UML) Sequence Diagrams for writing down Interaction Specifications. Both of these languages support Operators for sequential and parallel composition, choice, and repetition; in addition, the generic language also supports a powerful operator (called join) for overlapping Interaction Specifications, i.e. Interaction Specifications that share at least one role and at least one Messages among shared roles.

In addition, each Interaction Element can have Science Domain Properties associated to it. Such properties are expressed in a Science Ontology and allow Interaction Specifications to be described in terms of properties of relevance for the scientists using the system. The Science Domain Properties are used to bind the abstract concept of interaction to the concrete problems of the scientist. As an example, we can consider a Message Specification. It describes information sent by a particular Interaction Role. For instance, a science domain property could allow the scientist to define that the message contains a temperature expressed in Celsius degrees. The architecture allows science properties to be expanded by plugging in new science ontologies; therefore, the interaction specification language does not contain those concepts natively but as properties expressed in a generic Science Ontology.

222 Figure 4: Interaction Language Model (OV-7)

2. Policy and Governance Interactions

The figure below illustrates the main idea in our approach: we express all interactions, especially those corresponding to governance, as arising among autonomous principals. The principals adopt organizational roles so as to participate in one or more Orgs. Each Org helps structure the interactions among the principals that feature in it. Each such participation is specified via the contracts that each Org imposes. The aggregate effect of the contracts from an Org on a single principal is termed its Contract Façade. The operational interactions among the principals are captured in Interaction Specifications, each such specification supporting the contracts that come with the participation. The figure below presents a combined view of governance with the underlying operational interactions.

A specific implementation type that we are considering is a rule-based communicating agent, which stores the applicable rules and information about the state of the world and of ongoing interactions in a knowledge base. We are prototyping such an agent using the Magnet Framework and the Java Expert System Shell (Jess) with plans to migrate to a pure Python-based solution using the PyKE logic engine.

223 3.Governance Model

This model relates an Org Specification with a Contract. (A Contract is specified in another model to consist of a number of clauses.) Each clause of a Contract involves two or more Org Roles. In effect, each Org Role partitions its view of the relevant parts of the Contract. We model the role-relevant parts of each Contract as consisting of three components: qualifications, privileges, and liabilities.

The figure below presents the governance model.

A Principal (a subclass of Identity) is an entity with an OOI identity and possibly with authorization to engage in various interactions. Some principals are people; others might be subsystems, which are modeled as Orgs. A Principal may serve as an Attribute Authority. A Principal functions as a policy enforcer. An Individual is a subclass of the Principal that refers to a user, a sensor, or a software application -- anything conceptualized as being a locus of policy decisions and enforcement but without internal structure. The term Org denotes an organization, such as a virtual lab or a facility, which provides the context within which a policy applies. Orgs can play a monitoring or logging function among others. Each Org is based on a template called Org Specification that defines a default set of roles. An Org Role describes a participant in an Org, and specifies in what interactions take part in what capacity. The atomic roles are: Registrar, Searcher, Registrant, Provider, Announcer, Operator, User, Requester, and Negotiator. An Org Role is distinguished from an Interaction Role, introduced below. The concept of Org Role Participation refers to a combination of Principal -- Org -- Org Role . Participation is used to define who (an entity with an OOI identity) plays what role (for instance, Searcher) in which organization (also an entity with an OOI identity).

Example: Dr. Chu might define his laboratory as an Org. He might do so by instantiating the Org Specification Template-#3. Template-#3 might define three Roles: Researcher, Guest, and Administrator, each specifying what qualifications are needed for a participating principal, what privileges (such as capabilities authorized) they accord participating principals, and what liabilities (such as terms and conditions on behavior) they impose upon participating principals. For example, a Researcher needs to have a prior invitation from Dr. Chu and a Guest needs to be affiliated as a student or faculty member with a sister institution. Only a Researcher may curate a dataset. A Guest may read any dataset that has been curated. However, the Guest's publications that refer to the dataset are embargoed for six months.

In enacting an Org, each Principal that is the actor of an Org Role Participation aggregated within that Org is affected by each of the Roles it adopts. The Principal must be suitably qualified in order to adopt the given Role. By adopting the Org Role, the Principal acquires Privileges (such as powers and authorizations), and becomes subject to various Liabilities (using the term generically to include all manner of commitments where the Principal is a debtor). These requirements on a Principal that are based on the Org Roles it plays are assembled into a Contract Façade.

The Principal applies its (autonomous) Policies, ideally to satisfy its liabilities and take advantage of its privileges. The Principal normally realizes its Contract Façade; not realizing the Contract Façade would be a violation. In general, however, we cannot guarantee compliance. There are two main ways to address the question of compliance:

One approach is to be pessimistic and ensure that the actions taken by a Principal are compliant. This is not possible in general since the Principals are autonomous and heterogeneous. However, in cases where we determine the implementation of a Principal, we can place a monitor between the Principal and the rest of the system such that the monitor would allow only the policy-compliant actions of the Principal to proceed. An alternative approach is to be optimistic wherein we assume the Principals proceed as they would any low-level intervention, but detect and handle noncompliant behavior. This we can accomplish in two ways: either by introducing architectural constructs for monitoring or through the Principals monitoring each other, and potentially escalating matters when there is a problem. Such escalation would be to the

224 Principal that is the Org in whose scope the given Org and its contract exists.

OOI is an Org that serves as the top scope for the Orgs that we define here. The OOI Org provides identity management as well within this effort.

Contracts

The notion of an Org in the above sense is therefore intimately tied to the notion of contracts. An Org specification expresses such contracts in terms of the roles defined in the Org (called Org Roles above). When principals take on different Org roles, they must satisfy the qualification components of the corresponding contracts and adopt the liability and privilege components of the contracts that go with such roles.

We model a contract recursively as a set of contracts with the recursion bottoming out as a set of clauses. The recursion is unnecessary in a way but offers a more intuitive representation when compared with real-life contracts where the clauses are structured and the contract thus exhibits a repeating structure. The clauses in real-life contracts fall into several major categories.

The Main Clauses deal with what the contract is about and the main "business" reason for having a contract in the first place. Naively one can treat a contract as applying between parties that can be viewed as black boxes. However, this is usually not the case in contracts of any importance or complexity. The Normative Clauses deal with matters that are important to the regulations and policies that apply on the interactions among the parties to the contract. The Normative Clauses are thus of special importance to our proposed use of contracts for governance. The Visibility Clauses deal with how much access the parties to the contract have to the internal implementations of each other. In general, each party would rely upon such clauses to make sure that the work product is of an adequate quality, that the effort is robust, and does not violate any laws or regulations to which one of the parties might be subject. The Scoping Clauses specify the purpose and scope of a contract. These are crucial in typical business contracts because of their potential effect of legal rights and such of the parties involved. We expect these might be rather straightforward in most OOI governance settings, although the main OOI membership EULA would have a description of the scoping requirements for when users sign up for an OOI account. The Resolution Clauses deal with how to respond to failures in a contract, including the possibility of sanctions (of violators) and compensations (by violators). The most likely forms of sanctioning will be through the somewhat amorphous means of reputation and via escalation of complaints to the Org that provides the scope for a contract. The Org may sanction a Principal that it judges to be malfeasant by ejecting such a Principal from the Org and by escalating a complaint further. A malfeasant Principal may be ejected from OOI and declared persona non grata.

225 Each clause in a contract maps to the commitments among the principals who participate in that clause. A commitment here is an element of a normative relationship. A commitment is defined in terms of a debtor (principal or role), a creditor (principal or role), a context (principal or role that represents an Org within which the debtor and creditor function), an antecedent, and a consequent. We consider two broad types of commitments. When a commitment is active, it behaves analogously to a directed obligation from the debtor to the creditor. When the antecedent comes to hold (usually because the creditor did its part), the commitment changes to a stronger commitment whereby the debtor must bring about the specified consequent. Notice that because of the autonomy of the principals, we do not require that each commitment must be discharged. However, the failure to discharge a commitment can be treated as noncompliance on part of the debtor, who can -- depending on the contracts defined in the given Org -- be penalized for such noncompliance. As described elsewhere, in purely IT Orgs, the penalty may take the form of the revocation of credential, in essence ejecting a (persistently) noncompliant principal from the given Org. Practical commitments are commitments to act or to bring about a specified consequent. Dialectical commitments are commitments that purport to the assertion of a fact or something that is treated as a fact within the scope of an interaction.

Specifically, each clause is captured via one or more commitments, which are either Practical or Dialectical.

An expression _C(debtor, creditor, antecedent, consequent)_denotes a Practical Commitment. It means that the debtor commits to the creditor that if the antecedent holds, the debtor will bring about the consequent. This corresponds to an offer. Example: if NCSU requests to read instrument I-99, MBARI will supply the data from instrument I-99. An expression _D(debtor, creditor, antecedent, consequent)_denotes a Dialectical Commitment. It means that the debtor commits to the creditor that if the antecedent holds, the debtor stakes a claim about the veracity of the consequent. Example: UCSD certifies that Graybeal-99 is a valid user.

226 Contract Lifecycle Model

The contract lifecycle model describes how contract come into being (though negotiation) and how they will be fulfilled or otherwise terminated.

The model captures the entire lifecycle for contracts: authoring/negotiation, execution, and termination. OOI needs different lifecycle models for entities (instruments, processes, etc), and this model focuses on contracts. This model takes a peer-to-peer stance on the contracts. Thus the main states depend upon the actions of the participants, of which there may be two or more. The model includes the following key phases:

227 Negotiation. This phase is when the contract comes into being. The contract is created through a series of communicative acts. A negotiation is initiated when one party proposes to another party. The parties may make zero or more counterproposals to each other. The negotiation ends when one of the parties rejects the last proposal or all the parties accept. It is presumed that the proposer accepts its proposal, so in two-party settings, only one party (the recipient of the last propose or counterpropose) needs to accept. Execution. A negotiated contract enters the execution phase. It is initially inactive. The idea is that contracts are often for service enactments and therefore include standing commitments. Example: one MBARI agrees to provide a dataset when NCSU requests it. But there is nothing for anyone to do until the first request comes in. An inactive contract becomes active when one or more of the services it is for are requested. The service requests may require actions by more than one party in order to be fulfilled -- in other words, fulfilling a service request may call for a long-lived conversation among the participants to the contract and thus impossible to be accurately modeled in client-server terms. Monitoring. This phase occurs during execution. In almost all cases, the parties to the contract have to agree that the desired service was performed. Thus they must assess the outcomes if only to declare success. Therefore, some level of monitoring is essential. However, we think of more elaborate forms of monitoring of the contract execution. Such monitoring would be specified by the monitoring clauses in the contract (see the contract domain model). Resolution. If the monitoring uncovers problems with the contract execution, i.e., the violation of one or more contract clauses, the contract enters the resolution phase. Here, we apply the resolution clauses from the contract specification to determine a way to correct the situation, including imposing penalties or offering compensations to the aggrieved parties. In some cases, these might take the form of additional contract clauses being instantiated and thrown into the mix for execution. Some of the clauses may require additional negotiation (in a recursive manner). For example, if MBARI fails to deliver the contracted dataset, it may offer to waive the fees for a subsequent data request whose parameters might be negotiated among the concerned parties. When the resolution is unsuccessful, the contract is considered failed. Termination. When the termination clauses of the contract are satisfied, it enters the termination phase. This is when the contract may be archived or analyzed.

A contract that terminates with some base-level commitments pending is considered violated. One or more of the parties may escalate such a contract, for example, through litigation or equivalent means to complain to a higher authority or by communicating with the community of practice in a bid to lower the reputation of the party they claim violated its contract.

3.Policy Types

The figure below shows the interaction aspects of policy and governance.

A Principal applies a policy. We think of this Principal as "Local " or " Self " and the Principals with whom it interacts as "External " or " Other ". A Policy considers an ongoing Interaction and controls the actions of the (Local) Principal . An Interaction Specification defines the structure of the Interactions. Each Role determines the associated Interaction Specifications. An Interaction is a conversation among two or more principals or a conversation between a principal and a resource. Examples of the former are a Request to Register a data stream or to decommission an instrument (the request is from a principal to a principal, who can respond appropriately). Examples of the latter are Read, Create, Turn On, Delete, Publish, Turn Off where a principal tells a resource what to do. An Interaction aggregates one or more Signed Messages. Each Principal's view of an Interaction constitutes an Interaction View. A Communicative Act is an element of an Interaction View, and the level at which policy applies. A Communicative Act is in practical or operational terms a Message. The Communicative Acts when viewed in the peer-to-peer perspective lead to interactions as conversations. These are constrained by Interaction Specifications, usually constructed from a number of Interaction Patterns, each yielding two of more Interaction Roles. A Principal thus adopts a number of Interaction Roles concurrently. The Interaction Roles adopted by a Principal help it behave according to the Org Roles in which it participates, and thus to realize the Contract Façade to which the Principal is subject. The Interaction Roles of a Principal do not guarantee its compliance with the applicable Contract Façade simply because the Principal's autonomous Policy would determine how it actually behaves. However, the Interaction Roles provide an essential structure to the Principal's interactions that would normally help it satisfy its applicable Contract Façade. The Communicative Acts can also viewed in a command and control sense wherein a Principal executes the Capabilities supported by its Resources. These Capabilities are modeled as an aggregation of Organizational and Domain Capabilities. Organizational Capabilities relate to organizational relationships and their concomitant normative and contractual structures. Domain Capabilities relate to performing the tasks that a Resource is designed to support without any inherent alteration in the organizational relationships. Examples of such Domain tasks are reading a database or causing an underwater autonomous vehicle to navigate a specified trajectory in the ocean. A Policy controls this action and represents the reasoning behind the decision made by a Principal. Policy and domain capabilities are treated on par: they just happen to deal with different kinds of COI resources. Policy resources are things such as credentials and authorizations (the is-a links to Resource are not shown to reduce clutter. Like other resources, policy resources may be virtual in the sense that another part of COI may have been assigned the responsibility of maintaining them. For example, a department may let the university ID service maintain the credentials. Also, an employee would be subject to policies of their employer that apply to their project. The operation model shown below specifies how the Principals (realized computationally via agents) implement their operations in order to satisfy the governance requirements imposed upon them by the Org Role Participations in which they feature. This model is centered on communications among Principals captured via Communicative Acts. Each Principal applies its perspective on a communication. Under the Self perspective, it applies its policies to determine how to communicate (in proactive and reactive terms as necessary).

228 Figure 5: Policy and Governance Interactions Model (OV-7)

The model from Figure 6 presents a general way to think of policies. It is based on the types of policies for governance required for the resource activities identified in the Section of Resource Management. The cause and effect representation is used tabular representation of the use cases from Section 2.1.1.2.

A Cause is what causes a Policy to be applied. Causes can be Reactive or Proactive . Reactive causes occur in reactive policies, which respond to some event such as a communication. Proactive causes occur in proactive policies, which involve the Principal demonstrating initiative in making an observation or otherwise taking some action not solicited by an external event. An Effect is the outcomes that a Policy has, that is, the decision to which it applies. A Policy may authorize another Principal to exercise some Capability, or oblige the local Principal to exercise some capability, or merely enable the local Principal to exercise some capability.

229 Figure 6: Policy Types Model (OV-7)

5. Org Specification

We address the formalization of Org specification by introducing a vocabulary motivated from the above models as well as a general representation of the formal structure of specifications based on our study of logic and rule-based formalisms commonly used and envisaged to be used in the realization of the governance framework. Based on these we provide a specification language for Orgs, which we have employed in our prototyping efforts.

In our rule-based framework, the vocabulary consists of the properties that we can use to express the policies, commitments, and other elements of an Org Specification. The figure below introduces our key vocabulary.

We use the term Property to distinguish it from Predicate, which for historical reasons we take to include even the Procedural Attachments (external actions) supported by the Then clauses of our target rule languages. We split the properties into three main categories: The Statives are properties that describe the state of a Resource or an organization (through Participation and Normative relationships). The Manner properties describe, loosely, the adverbials such as of time and place by which to modify the other properties. The Action properties are the main part of the taxonomy and enumerate the action verbs. The Communicative Acts are the actions that can be performed autonomously by the Principals, and which we can assume the underlying architecture will convey to the other Principals whom they (the senders) designate. The Communicative Acts map to Domain and Organizational Capabilities as explained in the Governance (Operation) Model. The Domain Capabilities in this model are simplified to cause the Evaluate of an expression or the Apply (application) of a resource capability. The Organization Capability properties include the Participation, Resource Registration, and Normative Capabilities. Participation deals with which Principal plays which Org Role. Resource Registration deals with Contributing and Withdrawing Resources from an Org. The Normative Capabilities provide a taxonomy of the various commitment operations as well as additional operations dealing with the other Normative relationships. Note that these include some terms for which other synonyms are sometimes used. Specifically, Propose, Accept (proposal), Reject (proposal), and Revoke map to Create, Create, Release, and Cancel, respectively. The Resource Registration Capabilities are special in that they treat Resources from an organizational perspective. Likewise, the Evaluate Stative Capability has the effect that, though it is placed under Domain Capability, it enables us to evaluate the statives corresponding not just to Resources but also to Participation and Normative relationships.

230

The figure below introduces our representation of rules such as underlie our specifications of the normative and other rules based on the foregoing vocabulary.

231

Material Covered

After reading this page, you should be able to answer the following questions:

(to be provided)

Visit/Create Discussion Page

CIAD COI OV Governance Interactions

Core Governance Interaction Patterns (OV-6)

In this section, we detail core interaction patterns for policy/governance and identity management. We refer to the logical decomposition of these services and the underlying message based communication mechanism.

1. Interactions in Initializing PDP

Before the governance module can start deciding on requests for action it needs to be initialized with policies. When the PDP module bootstraps it is configured to know the location of the repository to which it sends a request to serve policies. The repository serves the policies which are loaded into the PDP.

Figure 1: Initializing PDP Interaction Pattern (OV-6)

2. Interactions in request for action

Figure 2 shows how a requester (for example a registrant in the subsequent interaction sequence diagram) will request an action via the governance module. The requester first registers with the identity provider and receives a token. Registration (not to be confused with the OV5 register activity) is detailed in the subsequent section for Identity Management and Authentication. The requester then presents a token to the governance module and makes a request to execute a function. The governance module receives the request in its PEP module that delegates the decision to its PDP module. The PDP first verifies the token (either with the Identity provider or by using a public key), then it selects a policy that applies to the request. If the PDP needs additional attributes, it issues a request to the attribute authority which responds with attribute-value pairs. The PDP then evaluates the policy and decides whether the requested capability should be granted. The PEP executes the request and responds with a success/failure message.

232 Figure 2: Request for Action Interaction Pattern (OV-6)

3. Interactions in the (OV5) Register Activity

Figure 3 shows two policy use cases associated with the OV5 register activity. This diagram (and all subsequent ones) does not show the detailed authentication/authorization interaction patterns that the registrant (researcher) goes through before it receives the response from the registrar.

233 Figure 3: Register Activity Interaction Pattern (OV-6)

4. Interactions in the (OV5) Commission Activity

A deployment engineer completes verification of deployment of a resource (for example a sensor that had been requested for commissioning) and sends a deployed message to the provider. Provider validates the deployment according to its policy. This diagram omits the detailed authentication/authorization interaction patterns that the provider goes through before it receives the response from the deployment engineer.

Figure 4: Commission Activity Interaction Pattern (OV-6)

CIAD COI OV Governance Use Cases

This page introduces the main use cases that help understand some of the features of the governance effort. We include some conceptual background but largely avoid providing a detailed analysis for the use cases that we include.

1 Introduction

We define governance as the administration of independent parties by themselves. Further, our interest lies in large-scale distributed systems where we seek computational means to accomplish governance. In other words, the parties in question have computational surrogates and their administrative decision-making is based on computational representations. This contrasts with much existing work on IT or SOA governance, which is primarily concerned with human-to-human interactions leading to decisions pertaining to enterprise resource planning, including the development, commissioning, and decommissioning of IT resources and other services.

Our primary motivation for an automated treatment of governance is simply to improve the quality and scale of resource administration. Further, we are motivated by governance not only in human-to-human interactions but for all resources up and down the stack. In the OOI, we foresee a system with thousands of stakeholders, tens of thousands of physical resources such as ocean gliders, and potentially millions of virtual resources such as datasets and conversations viewed at a sufficiently fine granularity. At those scales, automation is essential for administering resources according to the policies of the stakeholders.

234 We describe important use cases for governance. Although we deemphasize our specific solution, these use cases are prejudiced by our thinking. Specifically, these use cases highlight the autonomy of the participants and the high-level relationships among them that are central to governance. Our approach to distributed computing is based on the premise that independent entities or principals interact in order to pursue shared goals. Our conceptual model is centered on the concept of principal, each instance of which possesses a unique identity within OOI. Principals include users, resources, and organizations (termed Orgs in our model). The main idea behind our approach is that we express all interactions, especially those corresponding to governance, as arising among autonomous principals. The principals adopt organizational roles to participate in one or more Orgs. Each Org helps structure the interactions among the principals that feature in it. Each such participation is specified via the contracts that each Org imposes upon its roles.

2 User Stories

Let us consider some simple user stories that convey a sense of how we conceptualize the OOI being put to use.

2.1 Collaboration

The stakeholders of OOI include research scientists or investigators as well as educators from middle and high schools. Consider a situation where a teacher in a school in the Chesapeake Bay area would like to present some information about the students' local environment. This data could be as simple as acidity levels in the Bay, but the teacher may wish to make the data more compelling by presenting this data, not in generic terms but by associating it with what the students may have directly observed. Let us say the teacher decides to present this data to his students for the Memorial Day weekend (which occurs late in May, and just before the school year ends) and the Labor Day weekend (which occurs early in September, and just after the school year begins). A possible educational purpose would be to have the students contemplate the effect of the intervening summer break; another would be to have students make measurements of acidity themselves in their local niche and to compare them with the data from OOI.

Clearly, the teacher would need to access data that a researcher with the appropriate sensors would have gathered. The researcher may have entirely different interests from the teacher. She may be interested in multiyear trends rather than in considering changes over a three-month period. To this end, the researcher would participate in a resource sharing community where she would have shared the data streams being generated by her sensors. The teacher would also authenticate with OOI, discover the appropriate community, and enroll in it. Therein the teacher would discover the desirable data stream and, possibly with the help of other OOI functionalities, extract the information he needs.

2.2 Affiliation

The stakeholders of OOI include not only investigators but also research institutions and laboratories. Two institutions may decide to share their resources on a reciprocal basis, and thus enter into a suitable contract. A researcher at one of those institutions would be able to discover with which institutions his institution is affiliated. He would then be able to access an affiliate institution and further discover a research laboratory based at the second institution. Lastly, the researcher would be able to take advantage of resources belonging to the research laboratory.

2.3 Communication

Probing the above ideas further, we identify a need for defining conceptual spaces that facilitate communication among stakeholders. To this end, these conceptual spaces would provide a general-purpose approach for naming and monitoring that might be used by a variety of affinity groups. Consider two researchers who need to communicate regarding a scientific topic that interests them both. An example of such a topic could be acidity observations in the Chesapeake Bay. The communicating parties agree to be bound to some rules of encounter such as that the communications would be logged and could be replayed on demand, that the communications would be of recent observations, and that the communications would be delivered in order. In other words, the rules deal with the communication itself. Instead of implementing such a communication platform in an ad hoc manner, we propose to provide a reusable set of patterns that characterize such a family of rules of encounter. The formalization of some Advanced Message Queuing Protocol (AMQP) constructions, in particular the exchange space, illustrates how we might approach this scenario.

2.4 Substratum

In more general terms, since we are interested in achieving governance at all levels of the OOI system, it is worth considering a case that involves governance of the elements of the infrastructure. We can think of the stakeholders as being not the end users but the system administrators who exist in the system mainly to support the other stakeholders. Alternatively, we can imagine that the principals are computational entities who derive their authority and motivations from the stakeholders but would not ordinarily be visible to or known to most stakeholders.

2.5 Sanction

Further, the researchers could contract that they would not reveal the information they receive in a resource-sharing arrangement. There is no natural way to enforce such a contract. Therefore, we would like to support a general approach wherein if a contract is detected (through some external monitoring or serendipitous discovery) to have been violated, sanctions would be imposed by the Org OOI on the noncompliant party. An example of a severe sanction would be to declare the noncompliant researcher persona non grata in OOI, in essence ejecting them from all Orgs within OOI.

3 Basic Concepts

In the computational system (that we seek to architect), an agent represents each principal. Sometimes the distinction between agent and principal is not significant, but we refer to real-world entities as principals and their computational surrogates as agents. Each principal (or its agent acting on their behalf) can form any number of relationships with other principals. Such a relationship is based on a contract between two principals. The contract may arise as the result of a successful negotiation or may be implicitly imposed due to the parties adopting

235 complementary roles in the same Org. In our model, a contract may involve two or more principals as the contracted parties; it further references an Org that serves as a context for the contract.

Each principal's agent can help with the bookkeeping of the contracts in which it participates. The agent can help determine if the principal itself is complying with its contracts and if others with whom it deals are complying as well. The agent continually tracks the state of each contract by updating the state for each observable action, such as sending or receiving a message or an observation made of the environment (which we can also treat as a message reception). Principals communicate and collaborate within the scope of an Org. As we remarked above, an Org is a principal as well. Orgs serve multiple purposes in our architecture. They provide the following:

Backstop for contracts. Locus for naming. Venue to share resources including infrastructure.

Each Org is specified in a manner that defines the rules for adopting each of its roles. Joining an Org means to adopt at least one role in that Org. Adopting a role means accepting the rules of the Org for that role. Thus we understand enrollment as involving the creation of a contract. What is key from our perspective is that enrollment can potentially be operationalized in multiple ways, including the following ways to open the negotiation. Each of these ways would ordinarily lead to further communications and would be subject to the local policies of the principals involved.

The prospective enrollee may request membership. The prospective enroller may invite the enrollee. A third party may introduce the enrollee and enroller. A third party may require the enrollee and enroller to carry out the enrollment.

Importantly, we treat enrollment as a governance relationship normally achieve via a symmetric process of negotiation. Principals negotiate the conditions under which they participate in an Org and vice versa. If a contract is signed, it builds the basis for relations with other members of the given Org.

An Org can form a relationship with another Org, enabling the members of the first Org to interact with members of the second Org. For example, a small college may obtain library privileges for its faculty at a major university in the same city. Often, such a relationship would be symmetric. We use the term affiliation for such symmetric relationships. In our understanding, each principal applies its own policies to determine what actions to take. Thus a principal can decide whether to adopt a role in an Org and conversely the Org can decide whether to admit that principal to that role. The policies of each principal are subject to various constraints such as the requirements imposed by the roles that it has adopted.

4 Examples of Governance

Although we use the term Org formally, we have been using the term community informally. Figure 1 describes the key ideas of our approach. Each ellipse delineates an Org, corresponding informally to a community of participants. Orgs may be nested within, be disjoint from, or partially overlap with other Orgs. Figure 1 shows three Orgs: one called Org A, one called Org B, and one called Org OOI. Org OOI is the OOI viewed as a principal that acts as an overarching authority for all interactions within its scope. Org OOI provides a home for the various application-specific Orgs that exist within it. Org OOI is the root Org in that it defines the identities for the principals involved and provides the basic rules of encounter within the OOI. Further, it can help monitor and enforce contracts among its principals.

Each Org specifies one or more roles. For example, Org A in Figure 1 is an instance of a resource-sharing Org, which is a type we see repeated in many settings within OOI. Figure 2 describes the governance interactions of this Org in terms of its two main roles: owner (of a resource) and user (of a resource); the Org itself is a principal and is called community here. The notation is based on message sequence charts. The horizontal lines show governance actions that create or modify relationships among the parties whose life-lines they connect. Any temporal order requirements are captured via the dashed arrows that connect some pairs of the horizontal lines.

236

Figure 1: A simplified schematic representation of some Orgs in action

The resource-sharing Org, RS Community, admits principals who may adopt one or both of the roles user or owner. Each principal who adopts owner can contribute its resources to the Org, so those resources can be discovered by any principal who adopts the role user. A user and owner may negotiate usage terms resulting in appropriate contracts being created between each pair. The contracts govern the interactions of the contracting parties regarding the resources they share. Figure 3 shows some of the many possible operational interactions that constitute an enrollment negotiation. The idea is that the governance interactions would be mapped to some selected operational interactions depending upon the needs of the stakeholders who adopt the specified roles and any properties of the environment. Section 3 lists some of the choices.

Figure 2: Governance in a resource-sharing Org.

237 Figure 3: An example message sequence chart showing a possible implementation of the governance-level interaction to create the business relationship of enrollment.

Figure 4: Governance of the AMQP Exchange Space: Highlighting the business relationships.

238 Figure 5: Governance of resource sharing across affiliated Orgs.

Org B in Figure 1 models a messaging service realized as an exchange space (inspired by the emerging AMQP standard). Figure 4 identifies the principals involved in an exchange space, and the governance interactions relating to the publish-subscribe scenario. This Org describes two roles, communicator and distributor. A distributor maps to an exchange point (in AMQP terminology), which corresponds to a topic. A communicator maps to either a publisher or consumer. Each party who adopts a role in this Org enters into a contract with the Org itself (viewed as a principal in its own right). Each principal who adopts the role of communicator can discover a suitable principal who adopts the role of distributor to publish information to or receive information from.

Figure 5 illustrates another important example. Here two communities form an affiliation relationship with each other. The affiliation in effect propagates to their respective members. As a result, the member of one of the communities can discover services offered by members of the other. Once it has discovered such services, it may negotiate using them and engage them as appropriate. The affiliation relationship between the two communities takes place within the scope of the Org OOI, which however is omitted from the figure for brevity.

5 Some Speculations

This section is purely speculative and seeks mainly to engender deeper discussions with our pi calculus colleagues. A natural application for the pi calculus would be to characterize in operational terms the governance interactions that are needed in the above use cases. For the present purpose, it would help to compare the pi calculus with other more conventional operational approaches, including message sequence charts. Instead of static checking, which is often highly complex, dynamic monitoring at runtime would be appropriate for our intended applications. A possible theme worth exploring is to capture the antecedents and consequents of commitments through the "predicate" mechanism for protocol analysis and monitoring in Bocchi et al.'s work. A deeper challenge for us to explore together would be to map the high-level declarative representations of governance centered on contracts based on normative constructs to operational constructs centered on messaging. We might approach this challenge as follows. First, at the simpler level, we can formalize protocols to capture governance meanings and develop families of meaning-preserving transformations of session types to accommodate flexibility. For example, we might capture enrollment (or more generally negotiation) via a canonical protocol that involves an enrollment request followed by an admit. On this canonical representation, we might apply transformations such as introducing an invitation message prior to a propose message; introducing iterations of propose and counterpropose to negotiate the terms; introducing an intermediary that matchmakes the participants, and so on. Second, we can attempt to formalize canonical protocols and transformations at a more elementary level so as to support their composition. If we can map composition at the governance level to composition at the protocol level, that will be highly interesting theoretically and practically.

CIAD COI OV Interaction Management

A main capability of the COI Governance Framework is defining, monitoring and supporting interactions between distributed entities in the system, spread across multiple domains of authority.

Background and Motivation

OOI CI is designed as a 30 year program. It is expectation based on historic evidence that everything but the most basic architectural fundamentals will change during the life time of the program. The most invariant part of the concrete designs and implementations are the interaction protocols, i.e. the message sequences and formats between distributed (service) processes and other processes. These interactions are much more invariant than their various implementations. Even these interaction interfaces will evolve over time, but much slower and only by adding more comprehensive newer protocols, not by modifying or replacing existing protocols.

Capabilities and Benefits

239 The benefits of this approach are in multiple dimensions:

Design Time

Architects can design services of the system precisely and formally test them for important properties and for compliance with existing specifications. This generalizes to any form of acting entity besides services, such as agents, users, resources. Architects can simplify interactions, group them systematically, reduce cyclic dependencies and tight coupling. Architects can document interactions unambiguously Software implementers can base their fine designs on interaction specifications, for instance to develop two different implementations of the same service (e.g. Java, Python or version 1, version 1.1)

Implementation Time

Software implementers can take well documented interaction specifications understood as blueprints and code against them with confidence Code generators can generate interaction monitors from the specifications Code generators can generate client interaction stubs from the specifications Code generators can generate object model classes from the specifications Code generators can generate object encoders and decoders (for transport or persistence) from the specifications Testers can derive test cases from interaction specifications, including unit and integration test cases Testers can run regression and compliance tests of software components against interaction specifications

Run Time

The governance framework can monitor and intercept ongoing interactions between entities, i.e. send and receipt of atomic messages, and act based on this knowledge. Entities in their own domain of authority can apply governance framework mechanisms (part of a capability container) to protect their integrity and policy, when interacting with entities from a different domain of authority The governance framework can log interactions for later historic analysis The governance framework can perform compliance checking of existing implementations against specifications The governance framework can enforce policy and longer term, complex commitments across multiple domains of authority (in an OLTP way)

Historic analysis

The system can create audit traces of interactions in the system Developers can analyze interaction logs in order to find and resolve complex bugs More complex commitments and violations of contracts can be detected by analyzing past interaction histories (in an OLAP way)

Specification Technologies

Interaction specifications include various complementing parts, such as:

Interaction Protocol Specification:

Specify the valid sequences of messages for success and failure cases, and identify the different types of messages, as well as the roles of the different participants in an interaction A candidate currently looked at is Scribble. Other candidates are MSC (Message Sequence Chart) based graphical and textual languages and formalisms Also knows as interaction pattern, conversation type, session type, protocol etc.

Governance and Commitment Specification:

As annotation to interaction protocol specifications Governance acts (contracts, commitments, policy decisions) are performed during various points of an interaction.

Object Specification:

In complement to protocol specifications. Candidates include Google Protocol Buffers, UML MOF (or XML schema), RDF/semantics Message formats (realizing message types) are composite objects

Object Encoding:

In realization of object specifications. Encode and decode object content for transport and persistence.

Implementation Strategy

Common interaction mechanism, COI Exchange: All entities in the system (i.e. processes such as services and agents) communicate exclusively via the COI Exchange interaction mechanisms. This makes sure that all communication is comparable and traceable in terms of governance. Note: external interfaces of the ION system can be in any form, such as Web Services, HTTP, DAP etc. Common Message Format: Enables consistent, automated interaction monitoring and historic analysis. Every message in the system is self-described sufficient enough to identify the conversation it belongs to, the exact interaction protocol specification it complies with, sender, recipient, encoding, format and semantic interpretation.

240 Common Object Model: Enables consistent management of the system's "business objects" and their transport, persistence and use in the code Capability Container: The software infrastructure developed by OOI CI COI that enables all interaction management and governance mechanisms (besides other things). Service and developers have limited exposure to these complexities. The capability container exists in various technological environments (such as Python, Java and more in the future).

Draft implementation strategy

Figure 1 shows an exemplar implementation strategy for bringing a common internal object model with object encodings into the systems with the purpose of leveraging it for governance purposes. Note: some of the shown technologies are exemplar only

Figure 1. Draft implementation strategy

CIAD COI TV Governance

Governance References

The development of the Governance framework follows the theoretical concepts specified in the references below.

References:

Yathiraj B. Udupi and Munindar P. Singh. Governance of Cross-Organizational Service Agreements: A Policy-Based Approach. (pdf) Proceedings of the 4th IEEE International Conference on Services Computing (SCC). July 2007. http://people.engr.ncsu.edu/mpsingh/papers/mas/scc-07-governance.pdf Yathiraj B. Udupi and Munindar P. Singh) Multiagent Policy Architecture for Virtual Business Organizations. (pdf) Proceedings of the IEEE International Conference on Services Computing (SCC) September 2006. http://people.engr.ncsu.edu/mpsingh/papers/mas/scc-06-VO.pdf Yathiraj B. Udupi and Munindar P. Singh) Design Patterns for Policy-Based Service Engagements. NCSU Computer Science Technical Report 2008-3, January 2008. http://people.engr.ncsu.edu/mpsingh/papers/drafts/Policy-patterns-service-engagements-TR-2008-3.pdf

CIAD COI OV Identity and Policy Management

241 Identity Management

Identity Management (IdM) activities are about establishing identities, managing attributes of an identity, creating assertions about one's identity, establishing relations using identities and assertions and establishing trust between the providers of identities and providers of services. The primary tenet of the OOI identity management architecture design is the leveraging of existing user identities from external identity provides (presumably the users' home institutions) rather than establishing a silo-ed OOI identity management system. For example, the context of today's world, this means leveraging identities asserted through the InCommon federation.

Basic Concepts

The basic concepts of the IdM architecture are taken from the Liberty Alliance specifications. The architecture is instantiated as a set of activities, as shown in Figure 1, which are hierarchical in their relationship, meaning completion of an activity allows for subsequent activities to be completed that rely on the initial activity and those activities may enable further activity. Each activity has a corresponding activity that tears down the state created by the activity; these corresponding activities are not explicitly described as these semantics are fairly straightforward.

The Idm Relevant Nomenclature may be useful for the first time reader.

Figure 1: Relationship among nested Identity Management Conversations

The primary reason for this hierarchy of activities is to optimize ease of use and minimize repeated processing. For example, when you check into a hotel, there is a upfront check-in process and you are give access to a particular room. You are then given a room key, allowing you to access that room without having to perform check-in again each time you want to get access to your room. Eventually you check out and the room key is returned to the hotel. This follows a basic pattern in IdM activities where a token (e.g. a room key) is provided after an authorization process (e.g. check in), this token allows for expedited action (e.g. room access) until the relationship ends (e.g. check out).

A second reason for the hierarchy is that allows for different types of credentials to be used internal and external to OOI. Since OOI will be leveraging external identities that makes it dependent on the technology used to assert those identities. Without some form of abstraction, that would request the OOI internal infrastructure to stay in lock-step with that technology, which would be problematic given the size of OOI's

242 infrastructure and it's lifetime. Specifically, the Authentication activity allow for the translation of external identity technology to internal technology, focusing the need for understanding the external technology in a single point and giving independence to the rest of the OOI infrastructure to use whatever technology OOI sees fit.

A third reason for the hierarchy in IdM activities are based on the fact that assertion of an identity in modern computer infrastructure involves the use of cryptographic material (or perhaps a password, but for the sake of this discussion, we'll consider that equivalent to cryptographic key). The hierarchy of activities allows for long-lived keys that represent entities and are used infrequently, and then the generation of shorter-lived keys for frequent use activities (e.g. secure messaging). The length period for which these shorter-lived keys is a trade-off between these factors. Users would typically rather go through manual processes as little as possible, so want longer life times, meaning usability pushed towards longer periods. Security in preventing reuse of cryptographic keys and to mitigate the consequences of key theft push towards shorter lifetime and smaller scopes. In practice this trade-off is made based on experience and the sensitivity of the resources being protected.

Trust Establishment/Federation

The Trust Establishment activity is initiated by a client and is directed towards a Authentication Service of an identity provider (IdP). The result of this activity is to allow the client to trust assertions issued by the IdP Authentication Service and Attribute Authority components. When two parties have established trust with the same IdP, this allows them to subsequently perform the Authentication activity. An example of Trust Establishment is a PKI client installed the X.509 certificate for a Certification Authority in its local trusted certificates repository.

Trust Establishment is often done in tandem with Registration/Enrollment, though it is independent.

Federation is typically used to describe an act of Trust Establishment done across organizational boundaries. For example, a service provider in one organization may decided to federate another ogranizations, that is trust the identity provider in that other organization and clients asserting identities from that identity provider. While the two can be treated as synonyms, it is often useful to use the term Trust Establishment in a intra-organizational context and Federation in a inter-organizational context to clarify that context.

Enrollment/Registration

The Enrollment activity is initiated by a client and is directed at a Registration Authority. The result of this activity is to add the client to the IdP's state such that the client can undertake a subsequent Credentialing activity. An identifier for the client will also be generated, which uniquely identifies the client within the scope of the IdP (and, assuming the IdP has a globally unique identifier, can be combined with that identifier to create a globally-unique identifier for the client). An example of this activity is the registration of a user and their being entered into a user database.

Enrollment is often done in tandem with Trust Establishment, though it is independent.

In the case of the client being a resource, this activity may be initiated by an agent acting on behalf of the resource, for example, a resource owner or administrator, or another resource which spawned the resource (e.g. a host system acting on behalf of a VM image).

We will use the terms Enrollment and Registration as synonyms.

Authentication/Single Sign-on

Single Sign-on occurs between a client and a authentication service. This activity uses the result of the Credentialing activity and generates a typically short-lived (one the order of a day) or scoped token that can be used by the client to assert identity to another party (e.g. a SP). Example of Single Sign-on include the issuance of a Kerberos Ticket-granting-ticked (and subsequent service tickets) and the issuance of a proxy certificate in a Grid Security Infrastructure. (Note that in the case of GSI if the user is generating a proxy certificate themselves, they are in effect acting as their own authentication service. Alternately, MyProxy could be acting in that role as an on-line CA scenario.)

In the case of OOI, the token resulting from Single Sign-On is the OOI Identity Credential which is used for Secure Messaging.

The Authentication Service is a form of Token Service for those familiar with that term.

Secure Messaging

Secure Messaging occurs between a client and a service provider and provides the service provider with the ability to verify client identity as asserted by the client's IdP. This activity requires Single Sign-on have previously occurred between the client and IdP, and Trust Establishment has previous occurred between the SP and the client's IdP.

See details on the Secure Messaging page.

Other Concepts

"Thick" versus "Thin" Clients

There are two fundamental use cases to be supported by OOI identity management: command line or "thick" clients and web portals used in junction with web browsers or "thin" clients. In the case of thick clients, the user's application is running directly on the user's local system and is either part of the OOI infrastructure or acts as the interface between the user and the edge of the OOI infrastructure.

In the second scenario, there are three entities involved: the user's application, the OOI infrastructure and a web portal. The client on the user's local system (typically a web browser) is limited by either its capabilities or the limitations of the HTTP protocol in what interactions it can undergo

243 with the portal. The portal is either part of the OOI infrastructure or interfacing directly with that infrastructure. This case is more challenging since the result of Authentication needs to be a credential stored in the web portal for subsequent use by the portal on the user's behalf.

Interaction with the InCommon Federation through the CILogon Service

As described previously, the OOI identity management architecture will use external user identities as asserted through InCommon, rather than managing users directly. This creates a challenge however since InCommon today does not support thick (command line) clients, only thin (web broswer) clients. We address this challenge in our design by leveraging the CILogonservice, which is designed to solve exactly this problem. If/when InCommon evolves to serve thick clients directly, the CILogon service may be removed from the design, allowing OOI to interface directly with InCommon.

Domain Models

Identity Management Domain Models

The Identity Management Domain Model in Figure 2 depicts how message senders are identified by the CI. Federation establishes trust relationships among COI components. Each Identity Provider registers with the Federation to facilitate the exchange of trust roots and other security metadata (see Figure 3). Each Message Authenticator is configured with Federation metadata to facilitate Signed Message verification.

Figure 2. Identity Management Domain Model (OV-7)

244 Figure 3. Facility Model for Identity Management (OV-7)

Each Principal registers with an Identity Provider that acquires identifying information about the entity with which to create an Identity for it. The Identity Provider can be decomposed into four individual elements: an Authentication Service, a Registration Authority, a Credential Authority, and an Attribute Authority, although it is common to think of them as a single element. Identity consists of an Identifier that uniquely identifies the entity along with zero or more Attributes about the entity (e.g., group membership) that can be used for governance. Attributes may be defined by the Identity Provider and other system components with the role of Attribute Authority. The Identity Provider provides a protocol with the Entity to issue and/or bind an Authenticator to the entity's Identity. The Principal controls the Authenticator in the sense that no other entity should be able to assert the entity's Identity.

A Principal may have more than one Identity. Identities may be linked to enable the creation of a Federated Identity that aggregates multiple identities corresponding to a single Principal. A Federated Identity allows a Principal to use different credentials as becomes convenient, with those credentials being mapped to that entity's authorizations in the system. Furthermore, the Entity may have more than one Federated Identity for interacting with different services or acting in different roles.

The Authenticator is used by the Principal to assert its identity to the Identity Provider to obtain a Credential as needed. The Credential Authority presents it to the Authentication Service, which returns an Authentication Assertion that is converted into a Credential (this separation of concerns allows for the bridging of different legacy authentication systems and flexibility in deploying trusted Authentication Services to best meet deployment concerns).

The Credential enables the Entity to send a Message with identifying information. The Message Signer acts on behalf of the Principal to construct a Signed Message that identifies the signer. Taking the original Message as input, along with the principal's Credential, the Message Signer generates a Token that is cryptographically bound to the Message to create a Signed Message. The Signed Message is verified by the Message Authenticator acting on behalf of the recipient to identify the Identity of the sender.

Service Decomposition

Behavior

See Identity Management Activities

CIAD COI OV Identity Management Activities

Figure 1 shows the relationship between Identity Management activities. Initially, a principal starts without any standing with the identity management system. The initial activity that must be undertaken is Registration. On completion of Registration, which is typically a one time event at the start of the relationship with the CI, they can now perform Authentication, which is an infrequent event (typically of order daily) that establishes the principal's credentials and allows them to undertake Secure Messaging, an activity that can be repeated until some expiration of the credentials from Authentication occurs. In the following subsection, each of these activities is described in more detail.

245 Figure 1. Overview of Identity Management Activities (OV-5)

Figure 2 shows the initial trust relationships that are in place to support identity management. Specifically:

The OOI Root Org has delegated authority to issue identities to the OOI-CI Facility. User has previously enrolled with an external (InCommon/CILogon) IdP and has credentials based on that enrollment. OOI-CI Facility has federation contract with external IdP (created by Facility administration)

Figure 2. Initial Trust Relationships to Support IdM

Registration

The Registration activity is initiated by a principal and is directed at a Registration Authority. The result of this activity is to add the client to the identity management state such that the client can undertake a subsequent Authentication activity. An identifier for the client will be generated, which uniquely identifies the client within the domain (and can be combined with a domain identifier to create a globally-unique identifier for the client). An example of this activity is the registration of a user and their being entered into a user database.

The Registration Service must allow for registration of identities as asserted through the InCommon federation. This design covers how OOI can interoperate with the CILogon Service to achieve use of InCommon IdPs. This design covers how thick clients with register via CILogon to OOI. The CILogon Service provides users X.509 credentials usable for national-scape cyberinfrastructure based off of identities assert via their home

246 institutions through the InCommon federation. The goal of NCSA's work during Q1-2 of 2010 is to provided design and implementation of the between the CILogon Service and the OOI COI Secure Messaging, Registration and Authentication services (in addition to a design those services).

Use Cases

New OOI User: A new OOI user who has not previously registered with OOI visits an OOI Web Portal. The Web Portal requests a credential from the CI-Logon service, which provides that credential based on authentication of the user via the user's home IdP in InCoomon. The Web Portal presents that credential to the Registration Service. The Registration Service validates the credential and generates an OOI Identity for the user. The binding between the External Identity from the CI-Logon credential and the OOI Identity is stored as state for later use by the Authentication Service in the Principal Registry.

Existing OOI User with Additional Identity: An existing OOI user who has previously registered an external identity to an OOI Identity now presents an Identity Credential representing a different external identity (e.g. from a second institution due to an organizational change on the user's part) along with a request to bind this new external identity to their existing OOI Identity to allow them to continue to maintain their previously established relationships within OOI. The process occurs identically as with the case of a New OOI User. On successful authentication of the new External Identity and the existing OOI Identity, a new binding between the two is placed into the Principal Registry.

Requirements

In no particular order:

1. The Registration Service must be able to validate External Identity Credentials 2. The Registration Service must be able to map a user's External Identity to an OOI Identity based on state previously created by the Registration Service. 3. The Registration Service must not accept External Identity Credentials from Identity Providers that it does not trust. 4. The Registration Service must be able to issue OOI Identity Credentials that are trusted by Service Providers. This implies a well-managed service with private cryptographic keys. 5. The Registration Service must be able to generate new OOI Identities that are guaranteed to be unique. 6. The Registration Service must have the ability to create entries in the Principal Register. 7. The Principal Registry must store state in a secure manner such that the bindings cannot be tampered with by unauthorized entities. 8. There must exist an administrative interface to the bindings to allow authorized operators to manipulate them to handle exceptional situations (e.g. manually establishing a binding after out-of-band vetting of a user). 9. A user must be able to undo a binding on the user's request. 10. There must exist a mechanism for authorized operators or services to temporarily disable a binding in the event the user's External Identity is suspected or known to have been compromised. 11. For binding an External Identity to an existing OOI Identity, the Registration Service must require proof that the client is authoritative for the existing OOI identity (e.g. they must authenticate in such a manner to prove that identity represents them or demonstrate sufficient administrative permissions).

Prerequisites

In no particular order:

1. There must exist an administrative method for managing the External IdPs that the Registration Service trusts (perhaps shared with the Authentication Service). 2. The Registration Service must be able to receive messages from users who are not yet registered with the rest of the COI infrastructure.

Assumptions

In no particular order:

1. While there is no reason to make bindings of External to OOI Identities public, they are not particularly private either.

Registration Activity

247 Figure 3: Registration Activity (OV-5)

An agent acting on behalf of the principal may initiate this activity. For example, the initiator could be a resource owner or administrator, or another resource that spawned the principals (e.g. a host system acting on behalf of a VM image). Trust establishment activity creates bi-direction state between the principal and the identity manage-ment system. This allows for both subsequent authentication of the principal and subsequent trust by the principal in authentication of other entities performed by the identity management system. An example of Trust Establishment is a PKI client installed the X.509 certificate for a Certification Authority in its local trusted certificates repository.

A De-registration activity also exists, but is not shown as its semantics are a straightforward reversal of Registration, and serves to remove any state generated by Registration.

Registration Interactions

Figure 4 depicts a principal requesting registration in order to perform subsequent authentication. A request is sent to a Registration Authority, who is responsible for vetting the request (e.g. ensuring that the principal is an appropriate member or ensuring that any requirements for contact information are met). Assuming success, the principal is entered into the Principal Registry. A response is returned to the principal.

Figure 4. Registration Interaction Pattern (OV-6)

Steps shown in Figure 4:

1. User initiates registration by directing their web browser to an OOI Web Portal. The OOI Web Portal directs the user's browser to the

248 1. CI-Logon service with a request to receive an delegated credential. 2. The CI-Logon service, acting in the role of a Shibboleth service provider, redirects the user's browser to their identity provider (IdP). 3. The IdP authenticates the user and redirect the user's browser back to the CILogon service, passing a SAML authentication assertion in the process. 4. The CI-Logon service consumes the SAML assertion and issues an X.509 credential for the user to the OOI Web Portal. 5. The CI-Logon service redirect the user's browser back to the OOI Web Portal. 6. The OOI Web Portal presents the X.509 credential to the OOI Registration Service, which validates the credential, generates an OOI identity for the user and records that mapping in the Principal Registry. The Registration services returns an indication of success to the Web Portal. 7. The Web Portal indicates successful registration to the user.

Technology Choices

The CI-Logon service will use Shibboleth to authentication the user through InCommon. The CI-Logon service will issue X.509 end-entity credentials that conform to the IGTF Short-lived Certificate Service format. The CI-Logon service will utilize OAuth to delegate the X.509 credential to the OOI Web Protal.

Authentication

The Authentication Service provides single sign-on functionality. That is, it allows a user to present a credential from an external identity provider (IdP) and in exchange receive an OOI Identity credential suitable for use in secure messaging. The user must have previous registered the identity associated with the external identity provider, the result of which is a binding between the OOI Identity and the External Identity maintained in a Principal Registry.

The Authentication Service must allow for single sign-on via the InCommon federation. Part of this design will be to document how OOI can inter-operate with the CILogon Service to achieve use of InCommon IdPs. This includes interactions both with users using "thick clients" (i.e. native applications running directly in the user's operating system) and "thin clients" (i.e. web browsers operating through intermediary web portals).

The goal of the Authentication service is to provide an abstraction layer between internal OOI Secure Messaging and External IdPs. It bridges both credential formats and trust domains, meaning that if the technology in use for External IdPs change, only the Authentication Services need adapt, rather than all the Service Providers. Similarly, only Authenticate Services need to have trust relationships with all the External IdPs serving the OOI user community, and Service Providers need only have trust relationships with the Authentication Services.

For the purposes of the OOI design, the Authentication Service includes the functionality of the Credential Authority.

Use Cases

Thin Client: The use case is a user, who has previous registered their identity with OOI, starts by authenticating with an external (relative to OOI; presumably at their home institution) IdP via the CILogon service. This results in an External Identity Credential, which they then present to an OOI Authentication Service. The Authentication Service validates the External Identity Credential. It then queries the Principal Registry and determines the OOI Identity to which the External Identity is bound (a lack of a binding is an error) and generates and returns an appropriate OOI Identity Credential to the user. The OOI Identity Credential is then subsequently used by the user to join message exchanges and perform secure messaging with OOI Service Providers.

Thick Client: The use case is a user, who has previous registered their identity with OOI, starts by visiting an OOI Web Portal. Through the OOI Web Portal the user requests and obtains an External Identity Credential from the CILogon service, which is accomplished by having the user authenticate with an external (relative to OOI; presumably at their home institution) IdP. This results in an External Identity Credential held at the OOI Web Portal, which can then be presented to an OOI Authentication Service. The Authentication Service validates the External Identity Credential. It then queries the Principal Registry and determines the OOI Identity to which the External Identity is bound (a lack of a binding is an error) and generates and returns an appropriate OOI Identity Credential to the OOI Web Portal. The OOI Identity Credential is then subsequently used by the OOI Web Portal on the user's bahalf to join message exchanges and perform secure messaging with OOI Service Providers.

Requirements

In no particular order:

1. The Authentication Service must be able to validate External Identity Credentials. 2. The Authentication Service must have the ability to read bindings in the Principal Registry. 3. The Authentication Service must be able to map a user's External Identity to an OOI Identity based on state previously created by the Registration Service. 4. The Authentication Service must not accept External Identity Credentials from Identity Providers that it does not trust. 5. The Authentication Service must be able to issue OOI Identity Credentials that are trusted by Service Providers. This implies a well-managed service with private cryptographic keys. 6. The Authentication Service must issue OOI Identity Credentials with Identities only as specified by the state managed by the Registration Service.

Prerequisites

In no particular order:

1. There must exist an administrative method for managing the External IdPs that the Authentication Service trusts (perhaps shared with the Registration Service). 2. 249 2. The Authentication Service must be able to receive messages from users who are not yet registered with the rest of the COI infrastructure. 3. In order to support Think Clients, OOI Web Portals must be able to support additional code to accept credentials from the CI-Logon Service.

Authentication Activity

Figure 5: Authentication Activity (OV-5)

Authentication Interactions (Thick Client)

Figure 6. Authentication Interaction Pattern for Thick Clients (OV-6)

1. User initiates credential acquisition by directing their web browser to the CI-Logon service. 2. The CI-Logon service, acting in the role of a Shibboleth service provider, redirects the user's browser to their identity provider (IdP). 3. The IdP authenticates the user and redirect the user's browser back to the CILogon service, passing a SAML authentication assertion in the process. The CI-Logon service consumes the SAML assertion and issues an X.509 credential to the user. 4. The user presents the X.509 credential to the OOI Authentication Service, which validates the credential, uses the Principal Registry to map the user's external identity to their OOI identity and then returns an OOI Identity Credential to the user. 5. The user uses the OOI Identity Credential to join an OOI Exchange Space and subsequently send secure messages.

Authentication Interactions (Thin Client)

250 Figure 7. Authentication Interaction Pattern for Thin Clients (OV-6)

1. User initiates credential authentication by directing their web browser to an OOI Web Portal. The OOI Web Portal directs the user's browser to the CI-Logon service with a request to receive an delegated credential. 2. The CI-Logon service, acting in the role of a Shibboleth service provider, redirects the user's browser to their identity provider (IdP). 3. The IdP authenticates the user and redirect the user's browser back to the CILogon service, passing a SAML authentication assertion in the process. 4. The CI-Logon service consumes the SAML assertion and issues an X.509 credential for the user to the OOI Web Portal. 5. The CI-Logon service redirect the user's browser back to the OOI Web Portal. 6. The OOI Web Portal presents the X.509 credential to the OOI Authentication Service, which validates the credential, uses the Principal Registry to map the user's external identity to their OOI identity and then returns an OOI Identity Credential to the Web Portal. 7. The Web Portal, acting on behalf of the users, uses the OOI Identity Credential to join an OOI Exchange Space and subsequently send secure messages.

Technology Choices

The CI-Logon service will use Shibboleth to authentication the user through InCommon. The CI-Logon service will issue X.509 end-entity credentials that conform to the IGTF Short-lived Certificate Service format. The CI-Logon service will utilize OAuth to delegate the X.509 credential to the OOI Web Protal. The selection of technology for the OOI internal credential is pending development of the OOI Message Format.

Figure 11 shows a principal performing Authentication subsequent to Registration. The principal contacts an Authentication Service and provides the proof of identity that results from Registration. This is validated by the Authentication Service and, assuming success, provides proof of authentication to the Credential Authority. The Credential Authority gathers information regarding the principal and generates a credential that is returned to the principal.

251 Figure 8. Authentication Interaction Pattern (OV-6)

CIAD COI OV Idm Relevant Nomenclature

Attribute: A piece of information bound to an Identifier. Authentication Service: Entity that validates a user credential and generates for the user a second security token usable internally to OOI for message authentication. While it is possible for Communities to host their own Authentication Service, they may also delegate this functionality to the OOI-CI Facility. Capability Container: OOI CI software environment on a single machine that can host application services (capabilities) and provides infrastructure capabilities, such as connectivity to the OOI AMQP messaging service, identity management, governance and remote managability. (Comparable to an ESB-style container) A capability container (CC) is deployed on a compute node and host infrastructure and application capabilities Credential: See Identity Credential. Delegation: The process of one entity granting another entity some of its privledges. Delegation be be completed (often called Impersonation) or constrained, either in scope of in duration. Delegations in OOI are represented with Contracts. Enrollment: A conversation in which an entity joins a community Registration is a synonym of enrollment. Exchange Point (XP): A virtual message passing channel in an Exchange Space where messages are distributed from publishers to consumers. Exchange Space (XS) : A community with Exchange Points XP (virtual message passing channels) as resources There exists a Root OOI Exchange Space for inter-facility communication. This community describes two roles, communicator and distributor. A distributor maps to an exchange point and a communicator to either a publisher or consumer. The Exchanges Spaces use IdM technologies to decorate and verify messages with identities and attributes Facility: An OOI community representing a domain of authority. Current facilities are the OOI-CI Facility, Coastal-Global Facility and Regional Facility. Facilities are members of the root OOI Exchange Space. The OOI-CI Facility provides core services, such as an OOI wide resource directory and IdM services. User can enroll with facilities by providing an external identity. Facilities uses IdM mechanisms to verify external identities based on InCommon/CILogon. Once entities are member of a facility they can choose to enroll in further communities such as topic specific XS. Every Capability Container belongs to exactly one facility Federation: A contract between two Facilities allowing for policy-controlled conversations between entities associated with each Facility. This is a manifestation of inter-organizational Trust Establishment. Note this is different from [IdM-Conversations], which depicts Federation as a entity in conversations. Identifier: A string uniquely representing an entity. In OOI, Identifiers are assumed to be global in scope and not re-used. An identifier is used by the IPC to route messages. An identifier is asserted by an entity using a credential. Identity: The collection of Identifiers and Attributes associated with an entity. An OOI Identity is an identity created and managed by OOI.

252 An External Identity is an identity created and managed by an entity outside of OOI, e.g. InCommon/CILogon. An External Identity (or identities) will be bound to an OOI Identity for use in OOI (see Identity Binding). Identity Binding: The practice of administration making two Identifiers as equivalent. These can be Identifiers at two different facilities or an identifier at a facility and an identifier external to OOI. Conceptually Identity Binding is a federation of Identities. Delegation is accomplished as a restricted form of Identity Binding. Account Linking is a synonym of Identity Binding. Identity Credential: A verifiable token from an Identity Provider that allows its holder to authenticate itself or create messages that can be authenticated by those who have established trust in the Identity Provider. An External IdP Identity Credential is an Identity Credential issued by a Identity Provider external to OOI and used by a user to undergo Single Sign-on with OOI. An OOI Identity Credential is an Identity Credential issued as a result of Single Sign-On with an OOI Authentication Service. Identity Providers(IdPs): Entities composed of one or more of the following services. In practice these entities often share state (e.g. a user database) and are commonly lumped together into a IdP. Attribute Authorities: A service authority over attributes associated with entities. Registration Authorities: A service responsible for managing entities for which the IdP is authoritative and assigning identities to those entities. Credential Authorities: A service responsible for provided credentials to registered entities. *Authentication Services:*A service responsible for credentialing and authentication of entities for which the IdP is authoritative and allowing those entities to establish trust with other entities trusting the IdP. Identity Management (IdM) Service: A service responsible for providing an interface for binding attributes to an identity, including other identities. Attributes managed by an IdM service are typically expressed by an Attribute Authority. The OOI Communityis the "root" community in that it defines the identities for the parties involved and provides the basic rules of encounter within OOI. Org: A collection of interacting objects whose purpose is to fulfill an objective (RM-ODP defintion) "Org" is a special term, not to be confused with Organization. In the context of OOI, a organization is represented by a coherent policy. Virtual Organization, Virtual Laboratory, Community are synonyms for Org. Joining an org requires accepting the rules of the org, and the org will provide the registrant entity with a local name and address. Orgs can form relationships, known as federations, with other orgs, enabling the members of one org to interact with the members of another org, instituting the specifications of both orgs. Org may be nested into, be disjoint from, or partially overlap with other Org. Registration: See Enrollment Root Org (aka "OOI Root Org", "Root OOI Org"): Root of authority for the OOI project. Single Sign-On: The process by which an entity obtains an identity credential for message authentication. Service Provider (SP): An entity handling a request. While all entities can be thought of as equivalent, it is useful to distinguish between Clients and Service Providers (SPs) as initiators and handlers of service requests respectively. Trust Establishment: The process by with one entity accepts assertions from another identity, typically an Identity Provider in this context. User: A human being, typically acting as a client.

CIAD COI OV Policy Management

Managing policy in the OOI Integrated Observatory Network has several layers

Governance layer Independently operated facilities (domains of authority) federated and realize the ION Facilities are networks of trust. Once inside a trusted facility users can access resources more freely Agents represent the intent of the users in a facility (operator, end user) and the resources Resource owners and facility operators define policy Interaction layer Conversations are interactions between agents Conversations are also interactions between users and resources Monitoring and intercepting interactions enables to inject policy Policy Decision Point Service or agent that can make a policy decision Policy Enforcement Points When entering a trusted facility When accessing a resource (such as an instrument or a service)

See also

TBD

CIAD COI OV Secure Messaging

Secure Messaging is part of the COI Identity Management Services. Most of secure messaging is applied to application level message communication by the COI Capability Container.

Domain Models

253

Figure 1. Secure messaging domain model (OV-7)

The above figure shows the portion of the identity management data model that is specific to secure messaging.

Secure Messaging Behavior Models

Secure Messaging involves a principal using the result of the Authentication activity to create signed messages that can be validated by a service receiving those messages. This in turn allows the service to identify the principal, which is typically used to make access control decisions (it may also be used in other processes such as auditing).

The service must trust the identity management system as described in the Registration activity. This typically means that the service has registered with the same system, or that they system the service has registered with has established trust with the system the principal has registered with, i.e. the two systems have federated

Use Case

The basic use case is a one entity (referred to as the Sender ) sending a message to a second entity (referred to as the Receiver ). The goal of Secure Messaging is to provide certain guarantees to both entities regarding the message. The specific guarantees, which are selected by the Sender at the time of sending, may include:

Authentication: The Receiver can verify the identity of the sender of the message. Identity includes an identifier of the entity and/or attributes that may be included with the message by the entity. Confidentiality: The Send and Receiver have assurance that a third party cannot read any of the message contents. (Some metadata must be readable by intermediaries for delivery purposes.) Integrity: The Receiver can verify the contents of the message were not altering in transit. This includes addition of content, removal of content, or modification of content or other metadata. Non-repudiation: The Receiver can prove at a later date that the Sender send the message. Replay Protection: The Receiver can detect that a message was previously received. They requires local state at the Receiver. Limitations on the size of the local state limit the size of the history, in terms of number of messages, that the Receive can detect replays in. Messages may have time stamps to allow for the detection of old messages.

The guarantees may be combined. Some guarantees require that other guarantees be present (e.g. Authentication is of little use without Integrity; Non-repudiation requires Authentication and Integity).

Requirements

In no particular order:

254 1. The Sender must be able to specify what message security guarantees (including none) it wishes for a message. 2. Secure Messaging must allow for use of different cryptographic algorithms over time as those algorithms weaken. 3. Secure Messaging must allow for trade offs between security and performance. Initially they may be binary in that it is a "Security/No Security" options. Ideally it should allow for a range of security options. 4. Secure Messaging transformations must loss-less. I.e. it must be possible to recover all message contents as they were prior to transformation. 5. Secure Messaging must allow the messaging system to deliver messages as intended. 6. The Receiver must be able to extract identity information from received messages that is suitable for Governance and Auditing. 7. The Receiver, if wishing to have non-repudiation, must be able to record messages with their authentication message intact. 8. Messages must have timestamps so that that replayed messages outside the window of protection provided by the replay prevention can be detected and discarded. 9. The Receiver must be able to detect what protections were applied by the Sender and apply appropriate mechanisms to remove protections. 10. In order to encrypt messages to a Receiver, the Sender must have access to trust configuration that includes appropriate trust configuration regarding the Receiver's OOI Identity Credentials. 11. Secure Messaging functionality potentially (depending on requested functionality) requires access to the OOI Identity Credentials of the local entity.

Prerequisites

In no particular order:

1. Secure Messaging must be able to canonicalize messages in order to achieve consistent signatures and verification of those signatures. 2. Secure Messaging must be able to attach additional data (digitial signatures and trust metadata) to messages. 3. Sender must have previously undergone Authentication and have OOI Identity Credentials. 4. Receivers of Confidential Messages must have previously undergone Registration and have OOI Identity Credentials. 5. Sender and Receiver must have previously established trust, either in a common Org or to separate Orgs that have federated. 6. It must be possible to deploy functionality to perform secure messaging in such a manner such that the messages between the functionality and the local entity (i.e. Sender or Receiver) are secure in their own right. I.e. they must exist in the same process space or have some form of private messaging. 7. Entities need to have local storage to store configuration needed to validation of messages (e.g. trusted keys). This local storage must be secure against tampering by unauthorized individuals. 8. The Receiver must have local storage to record a history of received messages. The size of the storage determines the period of time for which replay prevention is provided. 9. Messages must have an accurate timestamp so that the receive can detect replay of old messages.

Secure Messaging Activity

The following figure shows the conceptual model of how secure messaging works. The model shows the symmetry between elements on the Sender and Receiver side.

255 Figure 2. Secure messaging concept model (OV-1)

The elements involved are as follows. For any given message on the Sender's side, not all elements may active (e.g. the Sender may choose not to encrypt a message). On the Receiver side, it will determine what elements need to be active based on the message and what protections were applied by the sender. A receive may choose to reject a message if it feels the message was not appropriately protected.

Elements in the above figure are:

Sender Infrastructure, Receiver Infrastructure: Spaces in which communications can happen with risk of eavesdropping or modification by other entities. Message Infrastructure: Infrastructure responsible for transporting messages reliably from the Sender to the Receiver. By itself, it provides no security guarantees. Message Signer: This element takes a message and using the Sender's OOI Identity Credentials, creates a digital signature for the message. It this attaches the digital signature to the message and passes it to the next element. Audit: The audit element on both the Sender and Receiver records messages that it observers for various purposes including debugging, replay protection, and non-repudiation. Message Encrypter: This element takes a message and encrypts it using trust configuration regarding the Receiver's OOI Identity Credentials. Message Decrypter: This element takes an encrypted message and decrypts it using the Receiver's OOI Identity Credentials. Message Validator: This element takes a signed message and trust information regarding the Sender's OOI Identity Credentials and validates the signature. Replay Protection: This element verifies that (1) a message is not older than some time window and (2) has not been previously recorded by the Audit element. Trust Configuration: Information regarding OOI Identity Credentials required by parties other than the bearer to validate signatures from the credentials or encrypted messages to the bearer. In the case of X.509 credentials, this is the X.509 certificate chain.

Conceptual Message Transformation

In this section we discuss conceptually the message transformations that are performed by the secure messaging elements discussed in the previous section. This section assumes a message format that is analogous to SOAP, in that messages have a body, containing the content meaningful to the application logic, and a header, containing information meaningful to other elements in the system (e.g. routing, security). The transformations here are based off of SOAP Security [WSS-SOAP].

256 Element Inputs Message Transformation Other Outputs

Signing Unsigned Signature element added to message header None message Sender's OOI Identity Credentials

Validation Signed Signature element removed from message header Identity of message signer as validated message by trust configuration. Trust configuration

Encryption Message Message body is replaced with encrypted version None Trust Encryption element added to message header indicated intended configuration recipient; This information serves to allow decrypter to select Identity of appropriate credentials for decryption. Message Recipient

Decryption Encrypted Message body is replaced with decrypted version None message Encryption element removed from message header Credentials of message recipient

Auditing Message None Copy of message placed into audit database

Replay Message None Error indication if message was Protection Audit previously recorded in audit database or database if message is too old. Current time

CIAD COI SV End to End Identity Management

TBD: Fill in the details once the identity context exists

CIAD COI SV IdM Technology Mapping

Some of this information is outdated

Technology Mapping

The COI will be realized as a composite capability block providing 1) an integration platform for data and control channels (streams), block data transfer and streaming media, 2) a rich set of options for integrating heterogeneous data sources and applications using a variety of data transports, and 3) an interface for injecting infrastructure services, such as policy monitoring and enforcement services, as plug-ins to effect federated authentication and security policies. This is the basis for provisioning the service (registry, brokering, binding and execution of services), facility and resource networks within the COI.

The services and their corresponding data models provide a uniform mechanism to detect and exploit the capabilities of OOI entities. Governance will be supported through templates for collaboration agreements that define partnership, and delegation and federation policies for participants and their assets. In particular, we will establish an integrated policy/security framework that is directly coupled to the interaction interfaces capturing the activity models. The design partners NCSA and NCSU have developed a comprehensive architectural framework for policy and security management that ties in directly with the notion of interaction interface. Policy and security properties are stored together with the interaction protocol that defines a service using the zone federation architecture of the SRB. This framework will be injected into the capability block pattern such that all interactions with the COI directly fall under the governance of this policy framework. Specifically, the ESB dependency injection mechanism and GridShib, GridGrouper, and myVocs integrated with SRB's Zones will be used as the core technology for implementing and enforcing the COI data structures and models for governance, security and other policies.

In the initial deployment, the capability blocks will be populated as follows: the messaging component will be instantiated to ActiveMQ/AMQP and the router/interceptor and service/data interfaces will be instantiated to Mule. Mule's dependency injection mechanism provides immediate access to persistence, application/transaction, workflow, configuration and monitoring frameworks that will be instantiated to Hibernate, Spring, Groovy and JMX, respectively. Furthermore, the COI will leverage the successful CI software stacks of BIRN/TeleScience with their web-service-based ATOMIC interfaces to the national Grid computing and security infrastructure. All of these web services will be configured as capability block plug-ins. The flexibility of this ESB-based approach allows the development team to rapidly integrate new capabilities as they become available.

257 The transport-transparent messaging component of the ESB will be exploited to implement data and control streams among CI subsystems, and provision and broker any service, data source, data transport and delivery mechanism, as well as any policy that is injected into the routing/interceptor facility of the ESB. Such a messaging system supports secure, durable, fault tolerant and high availability connections.

Service Technology Implementation Notes

Message WS-Security Implementations from Globus WS-Security defines mechanism for attaching security tokens to XML Signer X.509 Toolkit 1 and Gridshib4GT 2 messages, including X.509 and SAML credentials. Message SAML Validator

Authentication SAML Shibboleth IdP 3 Domain-specific choice. Typically based on legacy deployment. Service

Attribute SAML Shibboleth IdP 3 SAML Attribute Query Protocol Authority

Credential SAML GridShib-CA 4 Human principals will use interactive SAML protocols/GridShib-CA. Authority X.509 MyProxy 5 Automated principals will use X.509/MyProxy. GridShib SAML Tools6 GridShib-SAML Tools serve to bind SAML and X.509.

Registration ProtectNetwork ProtectNetwork. Complemented by a human-vetting operation to Principal database. Authority

Attribute SAML Shibboleth IdP 1 SAML Attribute Query Protocol Authority

Policy XACML Implementations by Sun 2 ; Edited via any XML editor; later authored through the Policy Authoring Repository Delegent 3 Tool

Policy XACML Implementations by Sun 2; Will need enhancements Decision Point Delegent 3

Policy Rich Service COI prototype in progress The PEP could be anything that can receive (or intercept) events and Enforcement Endpoint initiate capabilities (actions on resources) Point

Interface Products and Dependencies

Identity Management

Figure 1 shows a high-level view of the Identity Management architecture. The Authentication Service, Registration Authority, Attribute Authority and Credential Authority are implemented as Rich Application Services, providing capabilities directly to other entities over the COI. The Message Signer and Message Validator are Rich Infrastructure Services, enabling security on a variety of messages.

258 Figure 1 COI Identity Management/Authentication (SV-1)

The Registration Authority accepts requests from principals seeking the ability to participate in interactions on the COI. Acceptance and processing of the request allows for other interactions with other identity management services. The Authentication Service is subsequently able to authenticate the principal, leading in turn to the issuance of a credential by the Credential Authority. The Attribute Authority is then capable of issue attributes regarding the principal, drawing on a variety of data registries. Typically attribute issuance will be targeted at the governance system in order to fulfill its needs for policy decision making.

Input Data Model

Service Input Data

Message Signer Unsigned message; Message target; Authentication and attribute credentials

Message Validator Signed message; Metadata from Authentication Service

Authentication Service Authenticator.

Attribute Authority Principal identifier

Credential Authority Authentication assertion.

Registration Authority Requested identifier; Contact information

Output Data Model

Service Output Data

Message Signer Signed message.

Message Validator Unsigned message; Message sender identity.

Authentication Service Authentication Assertion.

Attribute Authority Principal attributes.

259 Credential Authority Authentication and attribute credentials

Registration Authority Principal identifier; Authenticator.

Metadata Model

Technology Metadata

Authentication Service Credential for validation of proof of authentication (e.g. X.509 certificate).

SAML Metadata

Attribute Authority SAML Metadata

Dependencies

Technology Dependencies

Message Signer

Message Validator Uses credentials from Credential Authority. Needs message model that allows for attachment of signature and credentials

Authentication Service Assumes legacy, domain authentication infrastructure.

Attribute Authority Registries for Principals, Facilities, Participations, Ongoing Conversations

Credential Authority Authentication Service

Registration Authority Principal Registry.

Policy/Governance

The governance decisions are based on policies that consider the attributes of principals, facilities, the principals participations in the facilities, and their ongoing conversations. Domain capabilities affect the resources in question, in this case, a data stream. Governance capabilities affect the attributes of the principals and facilities and the participations of principals in facilities. Following are the proposed modules:

Identity Provider: Determines the OOI identities that are the parameters of an event (e.g., originator, target) Governance PEP: The policy enactment tool able to carry out conversations Governance PDP: Interpreter for XACML policies Domain Capability: Way to access a resource Governance Capability: Way to modify the relationships that affect governance Attribute Authority: Way to access attributes relevant for governance Identities: Principals: Attributes of principals, that is, users and facilities Facilities and Participations: Who is associated with what; that is, which Principal participates in what Facility Conversations: Ongoing interactions involving this Principal and other Principals. Policy Authoring Tool (PAT): Create policies in the OOI XACML Profile

260 Figure 2 COI Policy/Governance (SV-1)

An event such as a message carrying an observation or a request arrives into the system. The Identity Management subsystem carries out the identification and attaches identities to the relevant parameters of the event. The main acting module, the PEP, receives the event and passes it to the PDP for arriving at a decision. The PDP may obtain additional attributes from the Attribute Authority in order to determine what policies apply, and arrive at a decision. The PDP conveys the decision to the PEP. The PEP carries out the suitable decision, by permitting or denying the request, or enabling this Principal, or obliging this Principal. The capability in question would be a combination of domain capability (about a resource) and a governance capability (about producing attributes that have subsequent impact on policy decisions). In general, there each domain capability exercises an associated governance capability, if only to record the current event as part of the ongoing conversation, which can affect decisions on subsequent events.

261 Figure 3 COI Policy/Governance XACML Communication Diagram (SV-1)

The model operates by the following steps:

Policy Authoring Tool (PAP) writes policies and policy sets and make them available to the Governance PDP. These policies or policy sets represent the complete policy for a specified target. The access requester sends a request for access to the Governance PEP. The Governance PEP sends the request for access to the Context Handler in its native request format, optionally including attributes of the subjects, resource, action and environment. The Context Handler constructs an XACML request context and sends it to the Governance PDP. The Governance PDP requests any additional subject, resource, action and environment attributes from the context handler. The context handler requests the attributes from a Attribute Authority (PIP). The Attribute Authority (PIP) obtains the requested attributes. The Attribute Authority (PIP) returns the requested attributes to the Context Handler. Optionally, the context handler includes the resource in the context. The Context Handler sends the requested attributes and (optionally) the resource to the Governance PDP. The Governance PDP evaluates the policy. The Governance PDP returns the response context (including the authorization decision) to the Context Handler. The Context Handler translates the response context to the native response format of the Governance PEP. The Context Handler returns the response to the PEP. The Governance PEP fulfills the obligations.

Note: If access is permitted, then the Governance PEP permits access to the resource; otherwise, it denies access (omitted for simplicity).

Input Data Model

Service Input Data

Attribute Authority Identifiers of Principal, Resource, or Action; additional set of attribute-value pairs for the same entity

Policy Repository Identifiers of Principal, Resource, and Action; additional set of attribute-value pairs for any of the identified entities

Policy Decision Point Identifiers of Principal, Resource, Action; additional set of attribute-value pairs

262 Policy Enforcement Point Selected messaging format

Output Data Model

Service Output Data

Attribute Authority Attribute values for entity identified in input

Policy Repository XACML policy set

Policy Decision Point XML encoding of policy decision

Policy Enforcement Point Selected messaging format

Metadata Model

Technology Metadata

Attribute Authority XACML OOI Profile (available as draft) to be placed into a SAML representation

Dependencies

Technology Dependencies

Attribute Authority Registries for Principals, Facilities, Participations, Ongoing Conversations

CIAD COI SV Roles and Permissions

This page describes the Release 1 implementation of roles and permissions

Roles specification

A roles file exists mapping user identifiers to roles, such that these roles can subsequently be used for user authorization.

Additional user attributes may be associated with user identities similar to roles, such that system elements can make authorization and resource decisions.

Permissions specification

A permissions file exists defining resource access permissions. The primary resource to protect are services; other resources (such as datasets and instruments) may either be protected indirectly through a protected service, or by providing resource level access control.

The permissions file defines tuples to control access to resources. The format for tuples is as follows:

', '', ''] ]]>

Authorization rules:

If a tuple has not been defined for the service:operation, the operation is assumed to have 'ANONYMOUS' level authority requirement Else, there is a policy tuple defined for the service:operation. A check is made to ensure the user role is equal to or greater than the role specified in the tuple. Role precedence from lower to higher is: ANONYMOUS, AUTHORIZED, OWNER

Example permissions file:

CIAD COI TV CIlogon

263 TBD

In Release 1.0, the user identity and authentication framework will exist outside of OOI, and the infrastructure will be utilizing the certificate based services of the CILogon service.

CIlogon is used to authenticate users with their existing organizations identity providers for OOI via the Web. CILogon is a service that acts as a broker for multiple authenticating authorities such as research organizations and universities. It also supports OpenId which is a lower level of authentication used by Google and Yahoo, for example. When an authentication request is received by CIlogon it delegates that request to the correct authentication provider. OOI will prompt a user who is requesting access to enter login information for an account at one of these organizations then leverage use the CILogon service to verify the account.

The process for interacting with CILogon follows this flow:

1. Users will a valid CILogon account will enter their credentials at the OOI login screen. 2. These credentials are forwarded to CILogon for mulit-factor authentication. If the account is valid, CI Logon will return a key and a certificate. a. The certificate will contain, if available, the user name, the authenticating institution and an email address.

b. In the case of OpenID authentication (Google, Yahoo!, Verisign) to CILogon, only the OpenID URI goes in the certificate subject (No email address, user name or institution) from OpenID providers, so in that case, OOI will need to prompt the user for additional information as required. 3. Internally, the identity registry will extract attributes from the certificate subject, where available, such as the user name, email and institution to check if this user has already registered. a. If this is a new user, the identity registry will form an internal identifier for the user based on the attributes in the certificate subject. i. The user registry will also assign an internal user id in the form of a UUID that will be the identifier that is return for the user. b. In subsequent logins from a registered user, the information in the certificate will be matched against information in the identity registry and the existing OOI user identifier will be returned. 4. Based on the results of this process, the user by be assiged a role of authenticated or, unauthenticated.

By utilizing the CILogon services, users do not need to create an additional account to access the OOI system.

In future releases, users with multiple accounts at CILogon affiliated organizations will be consolidated with the OOI system. For example if a user has one account at a research site and a separate account at at university, the user will be able to login to the OOI system via CILogon with either account and be recognized as the same individual.

CIAD COI TV SAML

5.2.2.3 Security Assertion (SAML)

SAML is a specification from the OASIS Security Services Technical Committee http://www.oasis-open.org/committees/security/ that allows for the exchange of authentication and authorization information between entities. The entity providing such information is referred to as an identity provider (IdP) and the entity receiving the information a service provider (SP).

SAML is primarily used for clients utilizing web browsers in products such as Shibboleth http://shibboleth.internet2.edu/. However, it has no fundamental restrictions as such, and has been applied to other use cases, such as to X.509 certificate-protected messages in the computational grid space by the GridShib project http://gridshib.globus.org/.

SAML messages are XML based and may carry a variety of information. However the most typical forms of information are entity identifiers that express the results of an authentication process, and attributes that convey information about an entity. The common use case being an IdP that authenticates an entity and then provides, using SAML, the entity's identifier and attributes to a SP. The SAML specification describes both the format of the SAML messages and the message exchange patterns (called "profiles") between the IdP, SP and client entity.

CIAD COI TV WS-Security

WS-Security is a set of specifications from the OASIS Web Services Security technical committee for securing SOAP messages. This includes signing (authenticating who sent a message as well as protecting it from tampering), encryption (providing confidentiality in transit) and attaching of other security messages, such as SAML or X.509 certificates or a combined form of those two as generated by GridShib, to SOAP messages. The attachment of other messages can be used to provide information necessary for the recipient to validate the signature or it can be used to provide additional information for authorization.

References:

http://www.oasis-open.org/committees/wss/

264 CIAD COI TV X509

X.509 End Entity and Proxy Certificates

X.509 End Entity and Proxy Certificates are a means for allowing authentication of entities to third parties based on mutual trust of a third party referred to as a certification authority. As opposed to SAML, X.509 certificates were designed to be used repeatedly over a longer period of time (days-to-years as opposed to a single session for SAML), and as such use heavier weight cryptography. Web servers commonly use X.509 certificates to authenticate themselves using the HTTPS protocol.

Computational grids adopted X.509 certificates for client authentication in SOAP messages in conjunction with WS-Security (web browsers and protocols allow for their use in the same manner with HTTPS, but it is rarely done). The X.509 Proxy Certificate is an extension to the End Entity Certificate allowing an entity to create relatively short-lived (typically hours) certificates that could be given to third parties to temporarily delegate capability, or that could be decorated with other security information, such as SAML messages.

References:

http://www.ietf.org/rfc/rfc5280.txt http://www.ietf.org/rfc/rfc3820.txt

CIAD COI TV XACML

XACML

The OASIS consortium developed the eXtensible Access Control Markup Language XACML to address the need for a standard language and architecture for policy management. XACML specifies schemas for authorization policies, decision requests, and responses. It also specifies how to evaluate requests against policies to compute a response.

The non-normative usage model of the XACML specification assumes that a Policy Enforcement Point (PEP) module is responsible for enforcing access decisions on one or more resources. When a user requests access to a resource, the PEP sends a decision request to the Policy Decision Point (PDP). The PDP evaluates this request against the available policies and attributes, and responds with an authorization decision. The PEP enforces the decision according to the PDP's response. There may be many PEPs in a system, each responsible for different resources. Other modules include the Policy Information Point (PIP) that acts as a source of attributes and the Policy Administration Point (PAP) that creates a policy and the Repository that stores attributes and policies. We describe these modules below.

Motivation and Approach

Proprietary and application-specific access control policy languages exist but they cannot be readily shared across different applications. Further, tool support for such languages is weak. XACML builds on well-established ideas in access control policies. The following are the key requirements motivating XACML.

Combining individual rules and policies into a single policy set that applies to a particular decision request. Flexible definition of a procedure by which rules and policies are combined. Dealing with multiple subjects acting in different capacities. Basing an authorization decision on attributes of the subject and resource. Dealing with multivalued attributes. Basing an authorization decision on the contents of an information resource. Logical and mathematical operators on attributes of the subject, resource, and environment. Handling a distributed set of policy components, while abstracting the method for locating, retrieving, and authenticating them. Rapidly identifying the policy that applies to a given action, based upon the values of attributes of the subjects, resource, and action. An abstraction-layer that insulates the policy writer from the details of the application environment. Specifying a set of actions that must be performed in conjunction with policy enforcement.

Interoperability

A key requirement of the XACML specification is providing an abstraction layer between the person who writes the policy and the application environment. For this, the PDP expects decision requests to be in a canonical form as described by the XACML schema published by OASIS. This schema is called the XACML context. A PEP may issue decision requests in its native format. In such cases, intermediate steps are required to convert the requests issued by the PEP to the canonical form understood by the PDP and to convert the PDP's response to the native format of the PEP. These conversions support interoperability. In cases where the native format of the PEP is XML, the conversion may be specified via a syntactic transformation such as is described with XSLT (eXtensible Style Sheet Transformation). For similar reasons, a resource that is itself an XML document can be included or referenced in the request, and can be queried by the PDP using XPath expressions specified in the policy. XACML does not define protocols or transport mechanisms. Instead, the usage model depends on other standards for specifying assertions, protocols, and transport mechanisms. XACML also does not specify how to implement a Policy Enforcement Point, Policy Administration Point, Policy Information Point (i.e., Attribute Authority in OOI lingo), or Context Handler. XACML artifacts can serve as standard formats for exchanging information between these entities.

The OASIS Security Assertion Markup Language (SAML), Version 2.0, provides a standard representation needed for packaging XACML assertions and policies. SAML defines the schema intended for use in requesting and responding with various types of security assertions. The SAML schema includes information needed to identify, validate, and authenticate the contents of the assertions, such as the identity of the assertion issuer, the validity period of the assertion, and the digital signature of the assertion. The SAML specification describes how these

265 elements are to be used. In addition, SAML has associated specifications that define bindings to other standards, such as for transport mechanisms, and the creation and verification of digital signatures.

Scalability

A PDP may consult multiple independently authored policies. This distributed nature scales well for authoring, storing, and editing policies. Several PEPs, each responsible for one or more resources, enforce the policy decisions. This again is scalable due to the possibility of distributed deployment of PEPs.

To improve the efficiency of evaluation and ease of management, the overall policies in force across an enterprise may be expressed in multiple independent policy components. It is thus necessary to identify and retrieve the applicable policy statement and verify the correct one. The element in XACML helps decide the applicability of a policy to a request.

Two approaches can be taken:

Use a database to store policies indexed by (This may be better and easier to scale, as it outsources efficiency concerns to the database). Load the PDP with all policies Notes: XACML 2.0 includes a number of so-called profiles, namely, Digital Signature, Multiple Resource, Hierarchical Resource, Role Based Access Control (RBAC), Security Assertion Markup Language (SAML), which simplify the development of solutions for known problems.

Figure 1 Policy Language Model (TV-1)

The main components of the XACML model are Rule, Policy, and Policy set.

Rule: The rule is the elementary unit of policy. A rule must be encapsulated in a policy. The main components of a rule are target, condition, and effect.

Target: The target element in a rule defines the set to which the rule applies. The target element is a set of resource, subject, action, and environment elements. In turn, each of the sub-elements of the target is represented as a set of attribute-value pairs. XACML supports customizable match functions that we omit for reasons of simplicity.

266 Attribute Value Pair: Attributes are the properties that help characterize a describe subject, resource, action or environment via the different values associated with the attributes. For example, subject attributes may be Age=32 and Sex=M, and environment attributes may be Month=April.

Condition: A condition may further refine the applicability established by the target. The condition element consists of attributes as described above.

Effect: Effect indicates whether the given rule evaluates to a permit or a deny decision.

Policy: A policy consists of four main components:

Target: The target element has a similar function as in the rule.

Rule-Combining Algorithm: This element specifies the algorithm by which the component rules of a policy are combined.

Rules: Rules are as described earlier.

Obligations: One or more obligations may be returned to the PEP for fulfillment under the accept or deny condition depending on which condition is specified by the FulfillOn attribute. The PEP allows access only if it can fulfill the obligation.

Policy Set: A policy set consists of target, policy combining algorithm (same as rule combining algorithm), policy, and obligation elements. These concepts are explained above.

Data Flow (Non-Normative)

The policy system is initialized by the PAP, which writes policies and policy sets. These policies and policy sets are available to the PDP. When a request is made by a user, it is first sent to the PEP. The PEP presents a decision request to the Context Handler in its format. The PEP may include additional attributes for the subject resource, action, and environment. The Context Handler is an abstraction layer, which constructs a canonical request from the PEP's request and sends it to the PDP. The PDP may request additional attributes from the Context Handler that the Context Handler obtains from the PIP. The Context Handler sends the request attributes to the PDP. The PDP evaluates the policy and returns the response context that includes the authorization decision. The Context Handler translates the response to the native response format of the PEP and sends it to it. The PEP fulfills the obligations and performs the actions requested if the PDP says yes.

References:

http://docs.oasis-open.org/xacml/2.0/access_control-xacml-2.0-core-spec-os.pdf http://xml.coverpages.org/xacml.html http://www.oasis-open.org/committees/download.php/2713/Brief_Introduction_to_XACML.htmlAppendices

CIAD COI OV Presentation Framework

Overview

The presentation framework provides the basic infrastructure for the externalization of the ION system to its environment. This includes user interfaces and application interfaces of various kinds.

For user interfaces, the COI Presentation Framework provides a Web UI platform, integrated with a COI Capability Container and with COI Identity Management and Governance services.

Service Decomposition

Figure 1 shows the decomposition of the COI Presentation Framework. The Web UI Server is a platform that is integrated with a Capability Container and provides the basic infrastructure to host application specific Web UI Components. Infrastructure includes a Web UI security module , which provides the integration with the COI Identity Management Services. The Web UI Service access provides connectivity with a Capability Container and the Exchange for messaging and service access.

A WebService Server and Interface provides externalization via SOAP and REST WebServices and simular technologies.

267

Figure 1. Presentation Framework services (OV-2)

See Also

Release 1 Web User Interface Framework

Available Presentation Frameworks

1. Content Management Systems (CMS)

CMS are designed to quickly build/update web portals of various complexity. They provide basic building blocks such as menus, articles, categories, galleries, and generally support some look & feel customization options via themes/skins.

Joomla Druapal Typo3 Wiki-based Confluence MediaWiki

2. Content Delivery

For most web portals there are two basic mechanisms for delivering content to the browser rendering the portal:

standard (implies page refresh, full DOM reconstruction) any standard GET/POST document response asynchronous/AJAX (in background, via XML or plain text, may update DOM dynamically) jQuery

3. Widgets

The layout and control elements of a web portal may be simplified by using a widget library. Depending on their backend support, the libraries may be (a) simple (based on plain HTML/CSS/JS), or (b) advanced (they provide their own backend tools, classes, APIs, etc). a) HTML + CSS + JavaScript coding

268 extJS - commercial but with plenty of widgets/tools http://www.extjs.com/products/extjs/

DoJo - open source, very powerful http://demos.dojotoolkit.org/demos/

QooXdoo - powerful and free, great alternative to DoJo http://demo.qooxdoo.org/current/demobrowser/

Yahoo UI - quite complex but with good documentation http://developer.yahoo.com/yui/

Rialto http://rialto.improve-technologies.com/rialtoDemo/demo2/demoRialto.html

Bindows http://www.bindows.net/demos/ b) server side programming (Java/Python) and automatic UI generation (HTML+JS)

ZK - uses XUL (think ) to build the UI, impressive capabilities http://zkoss.org/zkdemo/userguide/

Backbase http://bdn.backbase.com/client/overview

CIAD COI SV Web User Interface

This page describes the implementation of the COI Presentation Framework in Release 1. It is an extensible Web User Interface Framework

The Web UI Framework is based on the Grails server. Grails is a Java based web application server based on Tomcat with a Model-View-Controller framework.

See Also

Grails (TV): The Web UI platform (contains a Java Tomcat) CIlogon (TV): For authentication Java Capability Container (SV)

CIAD COI TV Grails

This page will describe the Grails framework, which is the Web User Interface platform in Release 1

See also:

Grails Web Page

Presentation Use Cases

In the following, we present several use cases for integrating user portals and applications.

1) OOI Portal

User access the OOI portal as the entry-point for OOi services.

269 2) Web Services

User has his own application that can interface with OOI via Web Services:

3) DAP

User has an application that can interface with OOI via DAP:

Workflow Integration

Workflows can be integrated in several ways depending if the data, execution, or orchestration are done inside or outside OOI. In the following we give some examples of possible scenarios:

1. integration through a data interface - a user has his/her own application, gets the data stream from OOI, performs computational tasks on the local platform, and sends the results back to OOI 2. 270 2. user gets the script and data from OOI and runs it on a local machine 3. user gets the script and data from OOI and requests OOI to run it 4. a user provides an application and another user executes it 5. user has data that he wants to integrate with OOI data, but still runs the application outside OOI 6. user has data that he wants to integrate with OOI data, and runs it as an OOI application 7. user goes to OOI and grabs some specialized module to run in selected applications/environments such as Matlab or Kepler

Integrating user applications in OOI requires the following capabilities:

Data access services Process execution services - execution and monitoring, parameter configuration, process analysis and wrapper generation/UI Resource virtualization services Matlab engine integration Wrapper (one instance per execution) JNI (keep instance running) Java adapters to Matlab Can be used as tool in a workflow Kepler engine integration Hydrant Kflex Kepler extension points for web execution environments Portals that run Kepler Batch command line execution for Kepler Simpler java application execute Kepler Kepler SOAP services Igoogle gadget for Kepler

An example of workflow integration is presented in Analysis and Synthesis demonstrating the use case of a science user who is working with Matlab to manipulate the oceanographic data. We refer to the scenario 2 of the previous list: user gets the script and the data from OOI and runs it on the local machine. Therefore, the steps of the process are: the user selects the region of interest, downloads a Matlab script, loads the script on the local machine, and runs it.

Scientific Process

Figure 1 depicts the steps of the scientific process:

271 Figure 1. Scientific activity model (OV5)

Figure 2 shows the numerical ocean modeling process. Instruments produce observations that undergo several processing steps (such as QC) and are then assimilated into models. Any dataset, from any level in the process, can feed into a model and be analyzed. Metadata requirements also apply to archives of model output.

An example is the ROMS model (a collection of algorithms) that can be easily downloaded as a software package. As each run of the model requires many configuration parameters that are stored into settings files full of coefficients, the researchers share their setting files to replicate experiments. Most adjustments to the model come from adjusting its parameters, whereas the algorithms are changed only from time to time. The boundary conditions limit the range over which the model operates. Through transformations, QC'd data are used to obtain the boundary conditions, which may also come from other models. The initial conditions - the state the model begins with - come from observations. The models focus on a particular spatial-temporal domain bordered by the boundary conditions, but they rarely offer predictions on the boundaries themselves. However, instead of downloading and executing the model, most people are more interested in working directly with the output of the model. The parameters of the model can be attached to the output as metadata. This is important, as the models could feed into other models. Meta-data describes all aspects of the workflow, e.g., sampling rate, position of the instrument, or meaning of the data output.

272 Figure 2. Data flow from instrument to vizualizer

Figure 3 shows the instrument-based data ingest pipeline with semantic provenance based on work at the Virtual Solar Terrestrial Observatory VSTO. The ingest system comes from the solar and solar-terrestrial communities, but the provenance work includes domain-independent portions geared for any data ingest system. The figure shows that data pass through a number of stages and are subject to the use of processing (depicted in the circles/ellipses), the addition of metadata and influences by various human roles. End users typically access data after Level 3. Therefore, metadata associated with the processing and provenance at each level is very valuable in answering the science questions.

273 Figure 3. VSTO/SPCDIS - Semantic Provenance Capture in Data Ingest Systems

CIAD COI OV Resource Management

Resource Management

The COI Resource Management services establish a base for the management of all resources in the CI. Capabilities include

Identifying resources uniquely in the system Defining types of resources flexibly Relating types of resources, such as to create inheritance trees or versions of type definitions Defining the structure of resource descriptions Storing resource descriptions flexibly and efficiently Cross referencing resources Providing the basis for specialized resource behavior and management Enabling flexible query of resources of all kinds Managing the life cycle of resources uniformly with extensibility for specific types of resources

274 List of Resources

For a list of OOI Integrated Observatory resources, managed by COI Resource Managment services, see: OOI Resources

Decomposition

Specializations (i.e., sub-types) of resources may be managed by other subsystem based on the services defined here. Figure 1 depicts the decomposition of the Resource Management services. The central service is the COI Resource Registry service. It provides the mechanisms to describe any resource in the system with its properties (or property sets) and associations. The Resource Registry is tightly related to the DM Inventory Services. Through DM Inventory services, it is possible for an application/a user to annotate resources with specific attributes and to find resources by their characterization.

A Resource Agent represents and manipulates resources that have behavior and state. The Resource Lifecycle Services provide the means to manage resource throughout their entire lifecycle from development to decommissioning. The Information Resource Repository service can store the content of information resources, besides describing their attributes and associations. The Resource Catalog service projects resources to the environment.

Figure 1. Resource Management services (OV-2)

Domain Models

Resource Types

Figure 2 shows a resource type hierarchy with selected resource types.

275 Figure 2. Resource type hierarchy (OV-7)

Resource Descriptions and Registry

A Repository encapsulates an object life cycle transition to and from persistent store. It represents all objects of a certain type as a conceptual set (usually simulated). It acts like a collection, except with more elaborate querying capability. Objects of the appropriate type are added and removed, and the machinery behind the Repository inserts them or deletes them from the persistent store. It provides the illusion of an in-memory collection of all objects of that type.

A subset of persistent objects has to be globally accessible through a search based on object attributes, mostly represented by metadata and/or data. A repository provides methods to select objects based on some criteria, and returns fully instantiated objects or collections of objects whose attribute values meet the criteria, thereby encapsulating the actual storage and query technology. The goal is to hide all this from the client objects so that client code will be the same whether the data are stored in an object database, a relational database, or simply held in memory. The Repository will delegate to the appropriate infrastructure services to get the job done. Encapsulating the mechanisms of storage, retrieval and query is the most basic feature of a Repository implementation.

A Repository framework can be built that allows more flexible queries. An example of this is the specification-based query (Figure 4). A Specification mapped into Metadata is a way of allowing a client to describe (specify) what it wants without concern for how it will be obtained. The repository delegates reconstitution (creation of a new object instance from stored data) of existing objects from persistent store to the Factory.

276 Figure 4. Repository Model (OV-7)

See more

Resource Life Cycle

All ION resources follow the same basic life cycle. Behavior is conditional to the current life cycle state. Specific types of resources can refine the common resource life cycle and define resource specific behavior. For instance, instruments go through the same life cycle as data products generated from an instrument, but the behavior is quite different. For details see CIAD COI OV Resource Lifecycle

Taskable Resources: Resource Agents

Resources that are external to the ION system and that have internal state and potentially behavior are classified as Taskable Resources. The CEI subsystem manages these class of resources. The software processes that actively represent resources in the system are resource agents. They are acting on behalf of a governing facility and the owner of the resource. Resource agents are aware of the life cycle of the resources and monitor the internal state of the resource and present it to the ION system uniformly. Agents also accept commands to manipulate the resource. The governance part of resource agents keeps track of contractual relations of the resource with the governing facility and any stakeholder individuals. Such contracts may influence resource access policy.

CEI Taskable Resource Management Resource Agents Resource Agent Interactions

CIAD COI OV Resource Lifecycle

The Resource Lifecycle defines a set of states for a Resource, exposed as Services provided by the Resource. A Resource changes its state based on a Strategy governed by an Actor. Strategies affect the state of a Resource and may have dependencies; for instance, a resource must be commissioned before being ready for decommissioning. Depending on the identity of the Actor and its rights under the policies established by the Owner, Operator/Maintainer or other authorized Actor, the set of Strategies available for that Actor is a subset of the overall set of Strategies. Consequently, the available set of state transitions that may be triggered by the Actor is a subset of the available states of the Resource Lifecycle.

277 Resource Life-Cycle Model (OV-7)

Resource Lifecycle Activities

Resource Lifecycle Activities The Develop Activity The Register Activity The Document Activity The Commission Activity The Activate Activity The Announce Activity The Discover Activity The Acquire Activity The Associate Activity The Govern Activity The Manage Activity The Use Activity The Release Activity The Deactivate Activity The Decommission Activity

The lifecycle for an OOI resource is shown in Figure 1, and can apply to any resource. Thus, the Resource Lifecycle drawing can be reinterpreted, or reused, to apply to any OOI resource, and each OOI resource follows its own lifecycle.

278 Figure 1. 2940-00027 OOI Resource Lifecycle Model (OV-5)

279 Figure 1. 2940-00026 OOI Resource Lifecycle Activities (OV-5)

The lifecycle may be expressed in UML statechart notation that facilitates describing the abstract states of the resource and the transitions among them. The states are shown at a high level of abstraction, and ignore the low level details of a resource. Several of the activities described in this document cause a state change in the underlying resource. Other activities do not cause state changes at the specified level of abstraction.

A resource begins in the initial state where notionally it does not exist as far as the CI is concerned. Through the develop activity, the resource is developed but is initially not commissioned. At this point a resource may be registered. The Register activity does not alter the state of the resource. The commission activity takes the resource to a state where it is commissioned - in essence, the commissioned state is a sub-state of the developed state. Much of a resource's useful lifetime is spent in the developed and commissioned state. This is a composite state meaning that it has two concurrent sub-states. A resource enters this composite state in the sub-states when inactive and not acquired.

A resource may be associated or announced by a user when the resource is developed and commissioned; these activities don't alter the sub-state of the resource.

The activate and deactivate functions cycle a resource between the inactive and active states. We can imagine these activities as being in the purview of the operator of the resource.

The acquire and release activities cycle a resource between the acquired and not acquired states. We can imagine these activities as being in the purview of a user of the resource. There is meant to be one copy of the not acquired - acquired cycle for each user of the resource.

A user may utilize a resource when he or she has acquired it and the resource is active. Using a resource has no effect on these states. When a resource is inactive and not acquired, the operator may decommission it. The condition for the decommission transition is that the resource has moved to the not acquired state for all users.

Note that all transitions are made autonomously by the parties involved; for example, the lifecycle doesn't require that a resource be decommissioned merely because it happens to be inactive and not acquired by any user.

The following are example policies that the participants might apply in order to carry out the various activities specified elsewhere. Whatever policies the operators of a resource select and advertise would be handled through the Policy and Governance mechanism.

Deactivate whenever. This policy maximizes autonomy for the provider but would potentially disrupt the users of the resource. Deactivate only if not currently acquired by a user. This policy would be friendlier to users, but might prevent a provider from deactivating a resource if a user decides not to release it. Deactivate based on fulfillment of previous agreements. This policy would generally be the most equitable for the parties involved.

The following activity diagrams show the activities involving the logical entities for Policy and Governance followed by a description of the activities included as transitions in Figure 1. In each activity diagram, the names on the left are the Actors abstracted into roles.

The sequence of actions within the diagrams follows the arrows, and is largely from left to right. Each diagram begins at the initial node, represented by a large dot, and finishes at a final node, an encircled large dot. Rectangles represent the actions needed to perform each activity, while diamonds represent decisions. A diamond may be preceded by an action node to indicate the action needed to perform the reasoning for the decision. The actions to perform each activity (boxes) are usually presented within a row, or 'swim lane'. The swim lanes represent actor responsibilities: when an action (box) is within a swim lane, the actor named on the left side of that swim lane has some responsibility for the action.

280 To enable sound composition of the activities, we distinguish between two main modes of termination of an activity. A final node shown with a solid (and green) circle indicates successful termination, whereas a final node shown with a dashed (and red) circles indicates failure. When activities are composed, the normal flow of control results in the successful termination of the composed activities. Following common programming language practice, we conceptually treat failures as exceptions. An activity can include decision nodes that "throw" an exception; an activity may handle or "catch" an exception thrown by one of its component activities, and by default, an activity forwards any exception thrown by its component activities up to the activities of which it itself is a component.

The responsibilities in these diagrams are not rigorously allocated. Any action that changes a resource is expected to Document the change. However, Document is treated as cross-cutting in the sense of Aspect-Oriented Programming. Thus, Document is not shown explicitly, but a suitable implementation of Document can be assumed to apply concurrently to any desired activity.

As mentioned above, the activity diagrams collected here are meant to inform the design of the OOI CI; in particular, the CI sub-projects are expected to take these diagrams as the starting point for the designs devised to implement the corresponding activities. This will lead to further refinement and additions to this document throughout the duration of the OOI CI development project, and possibly beyond.

All activities from Figure 1 are analyzed from the perspective of the COI use case "is the actor allowed to do the activity the actor attempts" or "must the actor perform some activity" or "may the actor perform some activity." To be allowed in this context means to be authenticated and to be authorized to do a particular activity. This use case is a generic one applying to all activities. Further analyses of the activities can bring additional COI use cases specific to a particular activity. For example, to be allowed to do an activity in a chain of activities, the previous activities have to be accomplished.

The Develop Activity

Any resource requires some action to come into existence. Broadly considered, the action can be called development. The figure below shows the Develop activity. The initiator of the activity is considered the Visionary, who conceives of the resource and takes responsibility for bringing it into existence. (These roles, of conception and instantiation, could be separated of course.) The Engineer represents any collaborator who helps create the resource; often this is a specialist who understands the tools necessary to build the resource, but it could also include operators, potential users, and other team members.

Together the Visionary and Engineer create the requirements for the resource, incorporating interface and policy requirements from the Operator of the resource. In our case, the resource Operator role is typically delegated by the OOI project to some organization or entity. Possibly influenced by an Administrator (who oversees budgets, responsibilities, and so on), the Visionary and Engineer decide how to proceed on the project. Funding may come from either the Operator (possibly a different OOI representative), an Administrative presence, or be obtained in some other way.

Given funding, the developers (Providers) design, implement, and test their resource. For some resources, the test action may be skipped. In a well-organized system, the Operator will have a test harness available, against which all resources can be tested to verify appropriate interactions with the system. Using such a test harness and their own testing processes, the developers determine whether the resource is ready for deployment. If it is not, they must return to the Design/Implement step, but if it is ready, they have completed this activity.

Figure 2: Develop Activity (OV-5)

Note that the Develop activity does not include the steps to make the resource an active part of the OOI system (Commission and Activate); these steps require more active involvement with the Operator to confirm the performance of the system. The step to Register a new resource is shown

281 separately and is usually performed in conjunction with Develop. Specific examples of the Develop activity include developing an instrument, developing a laboratory (a virtual space where collaborators can share information and run applications), or developing a software application like a model or catalog.

The flavor of testing may be different in each of these cases, but all have interactions with the OOI system (defined by the Operator) and all are similar in their progression through the activity. The Provider (Visionary, Engineer) specifies OOI Interface and Policy Requirements to Operator. The Operator presents Interface and Policy Requirements.

The Register Activity

A resource may be registered (Figure 3) any time after it has been developed. The entity that registers the resource is typically the resource provider, but it is possible for that activity to be delegated; we call the actor that registers the resource the Registrant.

Figure 3. Register Activity (OV-5)

The first step of the registration process is for the Registrant to characterize the resource. The characterization is submitted as a part of the registration. To be accepted, the characterization must include the information required by OOI's registration interface. When the registration information is submitted to the Registrar, that component must certify that the information meets the minimum criteria. If the criteria are not met, an error is returned.

Given satisfactory information, the infrastructure will save the registration in the catalog (a specific case of an Association Catalog) and perform the necessary actions to make it routinely accessible by Catalog users. If certain conditions apply, the registration may also be replicated to one or more external catalogs that will undertake their own Register action.

Under any circumstances, the registration information will be distributed as a published output of the catalog, so that interested parties may be automatically notified of the addition to the catalog. A common resource to be registered will be data, whether they are presented as constantly updated streams or individual data products. The Publish mechanism will allow users to see new data of interest to them.

Other important registered resources are data sources, including instruments and software processes. Once a data source (or any other OOI resource) is registered, its description can be made available to anything that relates to it, making a rich web of associations accessible in

282 appropriate contexts. In fact, the rich web of associations can be applied to any kind of OOI resource, including people, organizations, system processes, virtual collaboration environments, and research information.

The process of registering goes as follows:

Registrar certifies the submitted Resource registration or raises an error Registrar requests that the Resource registration be saved in the Registration catalog Registrar checks policy to decide whether the resource needs to be advertised to an external catalog Registrar Catalog Publishes the Resource Registration

The Document Activity

Documenting a piece of information is very much like registering information; many of the same actions must take place. This activity is used throughout the system to keep track of the various events that take place. The comprehensive list of events can then be used as yet another resource in the system, and associated with other events or data that have been collected:

Infrastructure certifies the received documentation request or raises a certification error Infrastructure saves the information to catalog Infrastructure publishes the information

When an information source submits something for documentation, the infrastructure must perform the basic tests done for any input, making sure that the provided information follows the defined interface. If this proves to not be possible, an error is logged. Once valid information has been accepted, the information is represented in the catalog of associations maintained by the infrastructure, and presented to interested parties.

The use of Document within this activity is effectively a recursive reference to the most basic version of the activity. When the Associations Catalog documents a piece of information, it simply stores it in its database in the appropriate form. Document is the simplest type of association made by the system, and generally refers to capturing a fact or piece of information. In many cases it is equivalent to logging of errors, status, and other information performed by any other software system. On OOI, Document also provides a means to store routine transactions of the OOI system, like deployments, status messages, and reports.

Figure 4: Document Activity (OV-5)

The Commission Activity

Commissioning includes the steps necessary to confirm that a resource is ready to use for the first time in the OOI system. It does not include the final steps involved in using a resource, which may have to be repeated each time the resource is used. Commission is expected to be performed only once for a resource during its lifecycle, unless the resource has been marked unfit for use (decommissioned) and then refurbished or otherwise made fit again. The Commission activity, therefore, provides final confirmation of fitness from the standpoint of the observing systems.

The commission activity starts of with a check on the registration status of the resource. If the resource has not been registered then it is carried out here. Once registered, an OOI Test Facility may inspect the resource and/or connect it (and any corresponding physical entity) to a test OOI network, to confirm that the resource complies with all of the interface requirements. Some of these requirements may be physical, while many others will be centered on the cyber-infrastructure. The latter includes interface protocols (networking and application), responsiveness to standard OOI interactions and consistency of the resource outputs with the metadata description provided during the Register activity. For many Ethernet-centric resources, the connection may be made at the Provider's site directly to an Ethernet network. For other physical entities, an appropriate OOI test harness may be shipped to the Provider's location. Like most OOI facilities, OOI Test Facilities can be located throughout the country, as well as on mobile platforms.

Once certified as OOI-compliant, the resource (and any corresponding physical entity) may need to be deployed. Most physical entities on the OOI system will incorporate their cyber-representation, either within the physical chassis, or in an attached adapter. An associated resource

283 deployment may also need to be coordinated. As part of deployment, the Deployment Engineer will verify that the deployed system is interfacing with the OOI system as designed, and then notify the resource provider that the system is ready for validation. After the provider Validates that the system is working as intended, OOI can assert that the resource has been commissioned.

Three examples - software, an instrument or AUV, and a virtual laboratory -demonstrate the Commission activity. First, given a software application that will publish data through OOI's cyber infrastructure, an automated test harness could receive data from the software, validate it against the registered metadata, and issue a certificate of compliance. The software can be deployed using a web-based tool, and again automated systems detect that it is interfacing correctly to OOI. Finally, the software developer checks a box indicating correct operation to complete the commissioning process.

An instrument or AUV may require testing with a particular physical harness, and possibly a visual inspection before deployment, but its virtual presence might be tested against OOI requirements in much the same way (possibly with more sophisticated tests for the AUV). On the other hand, the instrument or AUV's physical deployment must be conducted at sea by a Deployment Engineer, and possibly coordinated with operational staff in a control room.

Finally, a virtual laboratory is created using OOI tools and processes - with interfacing possibilities beyond the boundaries of the OOI. The registration, certification, deployment, and verification steps will be supported by the CI implementation; the Validate step lets the originator of the laboratory confirm that it appears to be working normally. The assertion that the laboratory has been commissioned is then but a single logic statement. In each of these cases, the resource has been warranted for use, but has not been enabled on the OOI system. This occurs during the Activate activity.

Figure 5: Commission Activity (OV-5)

The Activate Activity

Activate (Figure 6) enables the resource for use within OOI. It is a routine operation associated with the resource becoming available for use, and then possibly unavailable at other times. Activation of a resource might be requested by any OOI actor; some resources can become active upon request, while others require manual intervention (e.g., for physical deployment), and others can only become active under certain conditions. Hence activation request is first checked for proper authorization and such policies.

During enabling the resource is checked out and made ready to operate. Other entities are now in a position to use the resource, assuming they have the necessary authorizations:

Operator receives a request to activate which is checked for proper authorization policy Operator enables the resource

284 Figure 6: Activate Activity (OV-5)

Returning to our earlier examples, software applications may use activate in different models. Some software will be active and available at all times (either because it is running all the time, or because it is started instantly when needed), while other software may only be active when it is actually executing, or when an appropriate condition is met. For example, instant messaging, chat, or telephony software may be running, but not make the node available for interaction until the status is set to "Available."

Similarly, instruments may be turned off and on, in which case Activate/Deactivate could be useful activities, or they may be turned on and left on for their entire operational life, which implies activation only once. Most AUVs are activated for each mission/deployment and deactivated when the mission ends. A mixed mode of deployment may occur, with an instrument deployed multiple times, and turned on and off during each deployment; the proper application of Activate will have to be determined appropriately for the given scenario.

Finally, a virtual laboratory is essentially activated as soon as the principal user (the "Provider" in these diagrams) acknowledges the functionality of the laboratory. Given user expectations for response times and usability, computational entities can and should be configured to become active as quickly as possible once the necessary conditions are met. Whether the resource is active or not, most people will not know of its existence until the knowledge of its availability is promoted. This occurs as a result of the Announce activity (which does not depend on the results of Activate).

The Announce Activity

The process of announcing that a resource is available for operations is similar to the process of registering the resource in the first place (see Figure 7). The principal difference is that the resource has already been defined, and so the announcement amounts to a message with the properly formatted resource identification, together with the indication that this resource is now operationally available.

285 Figure 7: Announce Activity (OV-5)

While the process is similar to Register, and is not very complex (and so is not reviewed here), the outcome is typically very different:

Infrastructure certifies the submitted Announcement or raises an error Infrastructure saves the Announcement in Association catalog Infrastructure decides to advertise the Resource Registration in External Catalog Infrastructure Publishes the Announcement

For almost all OOI resources, the compelling stage of the resource's lifecycle is when the resource is available for use. Until then, the resource is not routinely discoverable or usable. Once the resource is announced, it becomes visible to all the potential users, including other processes, and forms part of the resource fabric of OOI. Since resources include data sources, services, and all of the data streams and data products of OOI, the announcement of availability is a critical milestone for the resource within the OOI.

As with registration, the announcement process may include a provision to advertise the announcement, meaning to notify external entities of the announcement. Whether or not this occurs, the announcement is published to OOI subscribers, making it widely known to interested parties in the OOI community. Announce, and the companion activity Discover, are particularly interesting to consider for the range of resources.

The resources most typically announced in existing systems are Informational-Data-Representations, in particular observations, measurements, and models. Reports and Documentation are also potentially of interest to all OOI users. For example, scientific papers relating to a particular data set would be of great interest to the provider of that data set. Definitions (software, schema, profiles, and so on) are less broadly announced in practice today (with the exception of technical definitions, such as Web service interface specifications using the Web Service Definition Language - WSDL), but are important to a certain class of users.

The associations established by the Announce activity include not just routinely collected metadata about data sets, but also user commentary and automated evaluations. Users of data sets will find it quite valuable to monitor announcements of associations about data sets of interest to see if anyone else has commented upon or cited them. Here again, the structural associations information (such as how the associations are stored or transported over a connection) may be only of interest to a more technical subset of users.

Functional resources have two major categories that are very compelling in this context: Environmental resources and certain Structural resources. The value in monitoring announcements of new platforms, nodes, or instruments is obvious, especially to the degree the

286 announcements include key properties such as the variables measured by the entity. Also interesting will be the ability to monitor an environmental resource such as an acoustic range, so as to reserve it for monitoring or for experimentation (e.g., emitting an acoustic signal in this range within a given time window). By representing these physical entities as OOI resources, the CI can perform many functions-planning and scheduling, observing, logging, and so on - that relate the physical entity to other OOI data sets and data relationships.

Although the resources listed in the Appendix are not comprehensive, there is an obvious range of applications for Announcing their availability for use, and in other ways making them a working part of the OOI CyberInfrastructure that can be accessed by all of the users of the system. Of course, to make use of the available resources, OOI users will require robust discovery mechanisms.

The Discover Activity

Discovery has two modes, intentional and accidental the former is aided by effective structural aids to searching and browsing; the latter is aided by organization of information resources that make discoveries of interest likely (for example, by creating "you may also be interested in" links in the viewed environment, as online stores frequently do).

Discovery can be directed by a human searcher, by a computational searcher, acting on behalf of a human searcher, or by an event-driven sequence of processing that provides information to a user who has registered an interest. The last scenario, which can be called Discovery by Subscription, is not outlined explicitly in this diagram. The capability derives from the services described elsewhere in the Conceptual Architecture. The Discovery by Subscription scenario can be viewed as either an intentional search, set up by the user in advance, but with a long timeline for the action "Format Query, Receive Results, Present Results"; or as a serendipitous event, in which information of interest just appears to the user (e.g., in email).

The complexity of the Discover activity derives from its attempt to capture complex approaches to the problem of finding something. In general, there are two parts to finding something: choosing whether to navigate (typically via the web) or search (entering strings into a search engine of some sort), and choosing the tool or tools that will yield the best result. As an example, search engines usually combine both approaches, first allowing a crude search specification, then allowing the user to navigate from many choices.

A portal or tool may, in its user interface, present multiple options for discovery. These may leverage interfaces provided by OOI or other catalogs, each of which may have its own interface, and some of which may present that interface automatically as a computer-friendly description of the available services. The selected User Interface must present the necessary capabilities to the Searcher, and then execute the chosen function and return the results.

Whether the approach is browsing (navigating web sites) or searching with a query, the Searcher will evaluate the results, and may iterate on the process in a number of different ways until acceptable results are returned.

Just as the preceding section described implications given a wide variety of resource types (see Appendix), discovery leverages the same variety while providing common approaches. All of the different resource types used in OOI can be maintained in a single repository and found using a single catalog, thereby leveraging the investment in those components. Different user interfaces can be built that present only resources of particular types, allowing the same level of customization which is possible with a more restricted catalog.

The process from the point of view of the COI is the following:

Registrar certifies the received Discovery request or rises and certification error Registrar requests Discovery by Associations catalog In case of error, the Registrar requests publishing of the risen certification error by System Error Logger System Error Logger requests documentation of the risen certification error by Associations catalog Associations Catalog documents the result of submitted Discovery or risen certification error Associations Catalog publishes the result of submitted Discovery

The Acquire Activity

Having found a resource that is interesting, an OOI member may want to use it. The OOI infrastructure must mediate all uses of resources, making sure that any conflicts are resolved and only authorized actors are allowed to perform a given function on a given resource.

The first step in utilizing a resource is acquiring permission to use it. To do so, there are three key criteria that must be met: the Requester must be authorized to use the resource, the usage must not create a conflict for that resource and any terms of usage must be met. The cyberinfrastructure must act as an honest broker, negotiating an agreement between a resource provider (which may be the OOI itself, or an individual with an asset) and the resource requester. The acquisition is successfully completed when the requester has obtained access to the resource, and performed the associated action(s) called for in the acquisition agreement.

The meaning of Acquire will differ significantly for different resource types. In the case of data streams and data sets, acquiring the data means getting permission to use them. Sometimes the data may already be accessible, and the Acquire activity is a formality that connects the requester to the data provider. If the resource is an instrument, Acquire means that the requester wants to use the instrument, which implies making changes to it to achieve an end. If the requester only wants to use data from the instrument, subscribing to its data stream is sufficient (Figure 8).

287 Figure 8 Acquire Activity (OV-5)

Obviously, many resources (one example being instruments) can only be acquired by one requester at a time, and so mediating requests will be an important part of OOI's management responsibilities. Diverse resources have the same characteristic: only one benthic rover can occupy a particular small patch of seafloor, and only one software instance can own the right to produce a particular data stream at a given time.

The Acquire activity includes the Negotiate and Fulfill Agreement activities (Figure 9 and Figure 10). The Negotiate activity can be described abstractly as an exchange of offers and counter-offers between participants. A participant may initiate an offer. It may reject an incoming offer, in which case the negotiation fails. It may accept an incoming offer, in which case the negotiation succeeds. Or, it may counter, in which case the negotiation goes on.

Figure 9. Negotiate Activity (OV-5)

288 Figure 10. Fulfill Activity (OV-5)

In abstract terms, there are two ways to fulfill an agreement. Each party may carry out the terms of the (presumably negotiated) agreement. If each party accepts the performance of the other, the agreement is considered fulfilled (successfully). If either party rejects the other's performance, the fulfill activity fails.

Another way in which fulfill may proceed involves the use of a trusted third party as an escrow agency (Figure 11). There are many ways to conceptualize escrow interactions. We describe one that is perfectly symmetric. Each party submits its results to the escrow agency. The escrow agency in turn forwards each result to the other party for evaluation. If both parties accept the results, the escrow agency declares success and completes the business transaction. If one or more of the parties reject the results provided by the other, the escrow agency declares failure and rolls back or unwinds the business transaction.

Figure 11. Escrow Activity (OV-5)

Other resources can be shared or allocated to many requesters. Any number of participants may belong to a laboratory once permission is received from the laboratory 'provider.' Bandwidth for transmitting commands to a deployed system may be shared among multiple users, although the total bandwidth used must not exceed the amount available. Defining the policies and procedures for resource acquisition will be an important part of the creation of the cyberinfrastructure.

The Associate Activity

The Associate activity is very similar in appearance to Register and Announce, but it is in fact a superset of those activities, and is key to operation of the OOI system. This activity performs the operation of relating two resources or characterizing a resource. The significance of this

289 activity is deferred to the end of this Activity description; for now we describe the process of making an association. Assuming the necessary resource(s) and descriptive information are in hand, the entity describing the association (the 'Definer') must formulate the relationship appropriately for the infrastructure. The Definer submits this formulation, which is accepted in the same way a registration or other announcement is made. If the service that receives such submissions deems the association correctly expressed, it is entered into the Associations Catalog (of which the Registrations Catalog is but one view), and the same progression occurs as for a registration or announcement.

The significance of this activity for OOI and its resources is more apparent when one considers the fact that data, and many other resource types, becomes enriched over time by information that relates to it. Over the course of a resource's life, many references are made to it. In fact there are many types of associations that can be made to resources, as seen in the following table that illustrates just a few. The combination of all of the associations in OOI makes up an Association Fabric that consists of all of the relationship assertions defined in OOI. The OOI capabilities will be driven in large part by the infrastructure's ability to organize and derive value from the associations in the Association Fabric.

Descriptor Association Resource

lat/lon/depth locates observation

quality characterizes observation

comment characterizes observation

Resource Association Resource

process cites use of observation

event corresponds to observation

user is a member of facility

facility provides service

service is governed by policy

observation routes to facility

290 Figure 12: Associate Activity (OV-5)

The process of Association proceeds as follows:

Association Service certifies the submitted Association or rises an error Association Service requests documentation of the Association in Association catalog Association Catalog requires the Resource Association in External Catalog Association Catalog Publishes the Association

Figure 38: Observation Lifecycle Associations (OV-5)

The Govern Activity

The Govern activity diagram in Figure 39 incorporates two linked activities. One of these is from the perspective of the Governor and the other from the perspective of the governed, termed here the Consumer.

The Governor sets policies, and presents them in a form that the Infrastructure can work with. The Consumer seeks to take an action that is constrained by those same policies. The Governor determines whether the Consumer has the rights to take the appropriate action by granting general or specific access rights and allocations to the Consumer.

These actions have set up the necessary conditions for the Infrastructure to actually enforce the policies. This enforcement occurs at each stage of activity involving various actions taken by the infrastructure. Governance continues until the Consumer is no longer taking actions which are governed. In theory, governance applies to all resource types, although in many case a default policy may be the only one that needs to be enforced.

An advantage of applying governance as an ongoing activity, enforced by configurable rules, is that changes to policy should not require changes to the infrastructure. This assumes that the rules are computable, or able to be enabled by the computer. In this sense, the Govern activity is an example of further cross-cutting activities, such as failure management, logging and encryption/decryption:

291 Infrastructure enforces the received Policies, rights and Allocations Infrastructure audits the Enforcement of received Policies, rights and Allocations

Figure 13: Govern Activity (OV-5)

The Manage Activity

Just as governance continues throughout the period that a resource is available, management of the resource must also take place. The Manage activity consolidates the various functions associated with resource management (Figure 40).

Figure 14: Manage Activity (OV-5)

Unlike most of the diagrams, this one flows from top to bottom. Actions are placed in the swim lane(s) that have the most responsibility for them, but can occur in any order and combination. The actions are grouped according to their nature, but no flow is implied by this grouping.

292 A few of the actions in this diagram are fairly hardware-specific, in particular Refurbish and Clean. However, the same capabilities can be applied to most of the different resources, whether they are science observation-oriented or part of the infrastructure itself.

The Use Activity

Figure 15 shows the Use activity in which an actor finally utilizes a resource. Using a resource can consist of commanding it, getting information from it, or both.

Since the agreements were already fulfilled in the acquire activity the user proceeds to directly inject commands to the resource he/she intends to use. The provider however checks for permission and other pre requisites. Depending on the agreement made when the user first acquired the resource, the Provider should go ahead and execute commands. Results are returned to user who can process the information and proceed to fulfill (see Fig 10.c for details of fulfill agreement) any incurred charges per agreement. User then checks to see if he/she is done and accordingly resumes the use or proceeds to release.

Resources on which command may be executed include many instruments, data sources, facilities with their components, and many components of the infrastructure itself.

Data are generated by most resources, and will probably be the most commonly used type of resource (aside from metadata associations, which will be extremely numerous). Note that most of the OOI data will be made available as a product set for non-OOI users; this diagram does not address that resource use.

Many of the OOI service resources will simply provide observation data and other information, and most will be accessible to any OOI member. These will typically be shared resources that can be used by as many members as desired.

Figure 15: Use Activity (OV-5)

The Release Activity

When a resource is no longer being used, the User is expected to release it. Optionally the provider may decide to take away the resource for some reason. The Release operation (Figure 16) serves several purposes: it makes consumable resources available to other requesters, it updates the list of users of the resource so that the system can determine who is currently using the resource (as when the system needs to Deactivate the resource) and it provides a record of the operational steps of the resource lifecycle.

Once the requester and provider have reached a fulfilled agreement any of them may proceed to remove the resource.. This role is comparable to that of the Title Company in a major property transaction.

Once the Negotiator confirms the Release transaction is successfully completed, the Infrastructure updates the list of users of the resource. In this way, a resource provider can learn who is currently making use of the provided resource, and the Provider or Infrastructure can notify all users of possible changes to the resource's status.

Timely execution of the Release activity is particular critical for resources with limited access, such as high-value controllable instruments in an observatory. Monitoring agents may be written to detect when high-value resources do not appear to be used even though the User has not released them. Upon detection of such a case, the monitoring agent could notify an operator, send a message to the User and Provider, or take unilateral action, depending on circumstances and configuration.

293 Figure 16: Release Activity (OV-5)

The Deactivate Activity

Deactivation makes a resource unavailable for use, although the resource may still be physically capable of performing a task. A deactivation may be requested by almost any entity (Figure 17).

Before deactivation takes place, it is the responsibility of the Operator of the system to determine whether deactivation is appropriate. In particular, this entails ensuring that resource users are minimally affected. This may just mean timely notification of the deactivation, or potentially postponement until user needs can be addressed. The Operator should also weigh the costs of deactivation (and likely future activation) when considering whether the deactivation should take place.

Once deactivation is the decided course of action, users are notified and use of the resource is disabled. If the resource corresponds to a physical entity, that entity can now be removed from the system (such as when an instrument is removed from a mooring) or from the operating environment (such as when a vehicle is brought back on to a ship).

Figure 17: Deactivate Activity (OV-5)

The Decommission Activity

The Decommission activity (Figure 18) is the most complete removal of a resource from the system. This activity is performed when a resource is determined to be unusable in its current state, requiring significant modification or maintenance before it can (or should) be used again.

As a general rule, Decommissioning means the resource is unavailable for use, and is also not discoverable as a potentially usable resource. It is the closest activity to "Unregister"; while the registration information is still maintained, it is only made available via specialized searches.

294 The most common use of Decommission is to indicate that an instrument is no longer available for use on the system, as for example when it has been destroyed or damaged. However, it can also be useful to indicate a data set that has been found to be invalid, a facility that has been decertified for OOI operations, or a security certificate that may no longer be used.

Figure 18: Decommission Activity (OV-5)

CAID COI OV Resource Lifecycle States by Resource Type

Resource Types Planned for R1

1. Default 2. Identity 3. Topic 4. Data Set 5. Data Source

Default

State Description

UNREGISTERED Not known to system. THis is the default state for all resources prior to type casting or further specialization.

Identity

State Description

REGISTERED The identity resource is known to the system, but not yet verified/validated with 3rd parties/two factor auth.

ACTIVE The identity is validated and "trusted" by the system.

DISABLED The identity is temporarily suspended (e.g., for security purposes).

RETIRED The identity is no longer valid in the system; yet, it remains recorded for auditing/archival purposes.

Topic

State Description

REGISTERED The topic is registered with the messaging system, but it is not in use, yet.

295 ACTIVE The topic is in active use, and available for subscription.

Note: topic removal/deactivation is handled internally. There is no need for additional states.

Data set

State Description

REGISTERED The data set is registered with the system, but no data/meta-data is being exchanged/available.

ACTIVE The data set contains data/meta-data and may receive updates. However, the data set is not yet published to 3rd parties (only available to its owner).

PUBLISHED The data set is active and available for subscription.

UNPUBLISHED This is a pseudo state. Upon unpublishing, the data set returns to the active state.

INACTIVE This is a pseudo state. The data set is still registered but the system stops collecting. The data set returns to the registered state.

RETIRED The data set is no longer registered with the system. The system still has the option to archive the data/meta data for historical queries.

Data source

State Description

REGISTERED The data source is registered with the system.

ACTIVE The data source is available for data collection.

INUSE The data source is actively being used at that moment.

NOT_IN_USE This is a pseudo state. Upon end of use, the data source returns to the active state.

INACTIVE This is a pseudo state. The data source is still registered but the system stops any future use (e.g., down for maintenance). The data source returns to the registered state.

RETIRED The data source is no longer registered with the system. The system still has the option to archive the meta-data for historical queries.

The following resource types and associated lifecycle states are in preparation for R2.

Instruments

State Description

REGISTERED The instrument is registered with the system.

ACTIVE The instrument is available for use. The default policy implies that it is only available to the owner, before being announced.

ACQUIRED The instrument is in use by a 3rd party and not available for others.

NOT_ACQUIRED This is a pseudo state. Upon end of use, the instrument returns to the active state.

INACTIVE This is a pseudo state. The instrument is still registered but the system prevents any further use (e.g., the instrument is off-line for diagnostics/maintenance). The instrument returns to the registered state.

RETIRED The instrument is no longer registered with the system. The system still has the option to archive the data/meta data for historical queries.

CIAD COI OV Implications of Policy over Resource Lifecycle

Implications of Policy/Governance Framework over Resource Lifecycle

The following table summarizes the effect of the policy/governance services on the resource lifecycle activities from the Behavior section. The

296 Manage, Govern, Develop activities are not included because the policies involved have an abstract structure. Terms used in this table and their meaning:

self - the entity that owns the policy; other - additional entity (if exists) on which the policy may apply or which may interact with the policy owner cause - message from other or observation by self effect - authorize other; enable self; oblige self; capability - intended outcome.

Description Self Other Cause Effect Capability

Register Activity

Register resource Registrar Registrant Reactive: Registrant Authorize Write (Use) to Researcher wants to register a new data stream makes a request Other Registration catalog to make it available (Resource )

Advertise registration Registrar Registrant Proactive Oblige Self Write (Use) to External Registrar decides whether the resource it has Catalog(Resource) certified needs to be advertised as well

Document

Deployment Registration Registrar Information Reactive: Information Authorize Write (Use) to A deployment engineer deploys an instrument so Provider Provider makes a Other Association catalog a request is sent to the registrar to certify and request (Resource) document this deployment

Commision

Certify resource Operator Provider Reactive: Provider Authorize Verify virtual lab A researcher requests the operator to certify a makes a request Other, virtual laboratory Oblige Self to Deploy and Verify?

Validate deployment Provider Operator Reactive: Provider Enable Self Validate virtual lab A deployment engineer working for a test facility (Deployment observes that completes verification of deployment of a sensor Engineer) deployment engineer that had been requested for commissioning completes verification of a deployment

Activate

Request Activation Operator User Reactive: User Oblige Self Operate A researcher requests the operator for software to requests activation be activated

Announce

Announce Registrar Announcer Reactive: A Authorize Write (Use) to Researcher wants to announce a new data researchermakes a Other Registration catalog stream that he has made available request to announce

Advertise announcement externally Registrar Announcer Proactive Oblige Self Write (Use) to Registrar decides whether the announcement it Repository (External has certified needs be advertised as well Catalog)

Discover

Submit query Registrar Searcher Reactive: Searcher Authorize Use (Read) Repository A searcher submits a query to discover a makes a request Other (Some catalogs) resource

Acquire

Negotiate Negotiator Negotiator Reactive: Negotiator Enable Self Accept Reject Counter Requester (Provider) negotiates terms of makes a request propose acquiring a data stream (resource) with Provider (requester).

Fulfill terms of acquisition Requester Proactive: Researcher Oblige Self Fulfill acquisition Researcher requests resource according to decides to request agreement agreement resource (Compensate Provider if resource is received)

297 Exercise access per acquisition terms Provider Requester Reactive: Researcher Authorize Access resource Researcher requests resource according to requests resource Other agreement

Associate

Associate entity Registrar Registrant Reactive: Researcher Authorize Write (Use) to Researcher wants to associate a new user with a makes a request to Other Association catalog laboratory that he has made available associate (Repository)

Advertise association Registrar Registrant Proactive Oblige Self Write (Use) to Registrar decides whether the association it has External Catalog certified needs to be advertised as well (Resource)

Use

Execute Commands Provider User Reactive: User Oblige Self Use (Command A researcher inject commands to an AUV, e.g., requests command to carry out Execution) for navigation execution the command

Release

Release resource Operator User Reactive: User Enable Self Operator decides A researcher is finished using an AUV and releases the resource whether to proceed informs the operator with the process of Releasing AUV (resource) from user

Restrict access Operator Proactive: Enable Self Operator decides to Agreement of acquirement expires take back the resource by Releasing it from the user

Deactivate

Coordinate deactivation Operator User Reactive: users notify Enable Self Operator decides Operator receives an answer from all the operator of their whether to proceed stakeholders (users, providers, infrastructure) assessments with the process of indicating the cost associated with deactivating a deactivating resource data stream (resource)

Decommision

Coordinate decommission Operator User Reactive: users notify Enable Self Operator decides Operator receives an answer from all the operator of their whether to proceed stakeholders (users, providers, infrastructure) assessments with the process of indicating the cost associated with decommissioning decommissioning a data stream (resource) resource

CIAD COI OV Resource Registry

The COI resource registry services support the registration of resources and resource types in the system. Two capabilities can be distinguished:

The COI Resource Registry, which is the service to define new resource types and perform uniform resource management for the entire system A generic Base Resource Registry implementation, which can be specialized and deployed for specific types of resources

All resource registries rely on the COI Data Store Services as backend for the distributed, persistent storage, retrieval and querying of resource data.

COI Resource Registry Service

Service Interface

Operations

CRUD Resource Type CRUD Resource (of any type) for management purposes

Structured Objects

Resource Type Object

298 Resource Object

Generic (Base) Resource Registry

Service Interface

Operations

CRUD Specific Type Resource Find (query) Specific Type Resource by criteria (such as ID or resource specific)

Structured Objects

Specific Resource Object Specific Resource Type Object

Detailed Behavior

The interaction with a resource registry follows a basic pattern of command/response.

Figure 1. Generic resource registry service basic interaction (OV-6)

Within a conversation with the resource repository, the other peer may be an arbitrary application (different resource type) or another core service such as the resource life-cycle services. The figure below is just for illustrative purposes and does not cover an entire conversation.

Figure 2. Resource registry and resource lifecycle services interaction (OV-6)

The resource repository has the following command set:

Command Arguments Response(Output) Semantics (Input)

LIST_RESOURCE_TYPE R Provides a list of available resource types.

REGISTER_RESOURCE_TYPE R OK/Already Register a new resource type. Registered/FAILURE

UNREGISTER_RESOURCE_TYPE R OK/FAILURE Unregister resource type.

LIST_RESOURCE_INSTANCES List instances for a given resource type.

299 REGISTER_RESOURCE_INSTANCE Register a new instance of a resource type.

UNREGISTER_RESOURCE_INSTANCE Unregister an instance of a resource type.

LOOKUP_RESOURCE_TYPE R I / FAILURE Lookup a specific resource type

LOOKUP_RESOURCE_INSTANCE R,I OK / FAILURE Lookup a specific instance of a resource type.

Figure 3. Resource registry domain model (OV-7)

Interaction Patterns

Register resource type

(COI Resource Registry only)

300 Figure 4. Register resource type with resource registry interaction (OV-6)

Register resource instance

301 Figure 5. Register resource instance with resource registry interaction (OV-6)

CIAD COI SV Resource Tutorial

This page is an introduction to the use of the ION resource management objects. The methods and functionality are provided by ion.services.coi.resource_registry_beta via the resource client and the resource registry. In this tutorial you will learn how to use the resource client to create resources via the registry and store them in the datastore. The setup.py installer will provide all the dependencies which you need to use existing resource objects.

Table of Contents:

Installing the Resource Client To test the resource client and its dependencies follow these steps: The object data model that backs ION resources Here are three example object structures: Address Book Mooring Object Mooring Composite Key Features of the Data Model Using Resources: a capability container shell tutorial! Start the resource application and create a resource client Play with the resource fields a little bit... Lets set some fields of the resource instance (addressbook)! Now lets try it with the addresslink object structure Retrieving a resource by reference or ID Using the default Datasets in the Resource App Some simple examples working with the dataset resource: Creating new resource objects in Google Protocol Buffers Setup and install: For more details see the GPB docs: Protocol Buffers Wiki Lets try an example! Create a new protofile in the ion-object-definitions repository Now we can start using these objects. What else can we do with our custom object?

302 Installing the Resource Client

This tutorial will guide you through setting up the resource client, a brief introduction to the OOICI object model, and then a detailed tutorial using resources in the capability container shell.

To test the resource client and its dependencies follow these steps:

Before staring this tutorial, make sure you've pulled the latest changes from the ioncore-python repository's develop branch. Since changes may be recent, follow the most current build guide for ioncore-python on GitHub here !

Firstly, run trial on the following

You should see the following output from the last test:

The object data model that backs ION resources

A resource in ION is a structured object - a composite made out of one or more Google Protocol Buffer (GPB) objects. Each structure forms a partially ordered set (poset) of a Directed Acyclic Graph (DAG).

The figure below shows a DAG. The content stored in cassandra is a DAG. Each object is a partial order, i.e. vertices which are reachable from a particular starting point or root within the DAG.

Updating an object adds a new root object (vertex) which points to the previous root and references the new content. This is based heavily on the GIT model . The semantics of the data store itself are the same as GIT: push, pull and fetch are the distributed, message-based commands. Checkout will lazy fetch objects as needed and commit is a local operation. At the moment, the operations are also semantically the same as GIT. Because we have a different use case than GIT, in the future a lazy fetch mechanism for individual content addressable object parts will be provided. In that framework, push and pull are still the mechanism by which the state of an object changes in the distributed system but each read or write command may result in network traffic to get the pieces which are being used.

Unlike git which stores only file system objects, we are free to define our own vertices (Git Blobs) using GPB messages. Each GPB message is made up of named fields of a particular type: integer, float, string, boolean, or message. GPB messages may be nested, one within the named field of another. By convention these objects are referred to as Nested Composites whereas objects containing only GPB elementary types for its fields are called Natural Composites. The AddressBook structure is an example of a nested composite.

The ION object framework defines a convention for expressing a link from one GPB object to another using a specialized type of message field. Objects structured in this manner are conventionally known as Linked Composites. Links must be set explicitly, but once set, the object management tools make the interface seamless. See the addresslink example for more information.

NOTE: "vertex" and "GPB object" are used interchangeably in this text to represent our implementation of GPB Messages; small logical records of information arranged in name-value pairs. ION "resource" and "structured object" are used interchangeably to represent the partially ordered set of related objects described in this section.

Here are three example object structures:

303 Address Book

Mooring Object

Mooring Composite

Key Features of the Data Model

The ION object structure provides three important advantages over using GPB directly:

GPB messages can only express a tree structure. ION objects stored as a DAG allow the same object to have more than one parent reference. When this structure is used, changes in the child are visible to both parents! The ION link convention does not specify the type of object linked. The recursive structure of GPB objects is very powerful, but the type of the nested object (GBP message) must be specified in a protofile (a file representing the structure of protocol buffer data). Using the ION link convention it is possible to make data structures which are defined at run time rather than in the protofile. There is efficiency in storing structures in a DAG. Each vertex (GPB Message) of the DAG is versioned and each version is stored only once using the SHA1 hash of the content as the reference. In a complex data structure small changes result in a small addition to the stored content.

Using Resources: a capability container shell tutorial!

We can start an app which runs the resource registry and the data store. With that in place we can use a resource client in the shell to make and store resource objects.

Start the resource application and create a resource client

To start the resource app with some default resources - currently an example CDM dataset, run the app with the command line argument register=dataset: bin/twistd -n cc -h amoeba.ucsd.edu -a register=demodata res/apps/resource.app

In the container shell create a resource client

Create some object type identifiers - these objects are created using the object type identifier assigned to each proto object definition. These identifiers are documented in the object definition Google Doc

Call the Resource Registry to make a new instance of a resource object

The Resource Client's create_instance() method returns a deferred. With a deferred, you must wait to get the result. Therefore, you can not copy and paste the previous code with this code as a sequence of lines. However, you should never notice lag while waiting for this result unless the broker is far away.

Play with the resource fields a little bit...

ResourceName property:

ResourceLifeCycleState property:

There are currently 7 possible resource states. These are class attributes of ResourceInstance

Constant String Value

ResourceInstance.NEW 'New'

304 ResourceInstance.ACTIVE 'Active'

ResourceInstance.INACTIVE 'Inactive'

ResourceInstance.COMMISSIONED 'Commissioned'

ResourceInstance.DECOMMISSIONED 'Decommissioned'

ResourceInstance.RETIRED 'Retired'

ResourceInstance.DEVELOPED 'Developed'

ResourceDescription property:

ResourceIdentity property (read-only): You can get, but can not set the Resource Identity!

ResourceType property (read-only): You can get, but can not set the Resource Type!

Lets set some fields of the resource instance (addressbook)!

The address book has a title (String), owner (Person) and repeated* person (Person). Lets set the title: * "repeated" is a keyword used in .proto files to signify one or more items of the same type. See the example object structures for more

Lets add a person to the addressbook

Put the changes to the resource in the datastore

Create a reference to the addressbook resource in the datastore The first reference represents the resource's ID (key) and location (branch) in the datastore. The second reference is linked to the current state, or HEAD, of the resource.

Now lets try it with the addresslink object structure

The addresslink object is structurally similar to the addressbook object. However, the addresslink implements its composite fields (person and owner) as typeless references (see the Address Book structure definition to review this difference.) As previously discussed, this convention allows one GPB object to link to another dynamically without being bound strictly to the protofile which defines it.

In the example below, we will define an addresslink object, create a person object by explicitly specifying the person_type definition, and attach that person to the addresslink as a reference.

Again, since create_instance() returns a deferred, we must execute these lines separately

The address link has a title (String), owner (Person) and repeated* person (Person). Lets set the title: * "repeated" is a keyword used in .proto files to signify one or more items of the same type. See the example object structures for more

Lets create a person object. In the address link we have to explicitly create the object from a type (person_type).

305 First lets set a non-repeated link field for "owner"

Add another person...

Now we can add that person to the address book in two different ways: First, we add Jane by creating a new "person" reference, and explicitly link that reference to "jane"

Next, we add another person reference and store Jane using standard assignment with the equals operator. This process implicitly creates the link between the reference object and Jane (under the hood, SetLink() is still called!)

Now Jane is in the addresslink twice - if you change her properties from one reference they change in all references!

Retrieving a resource by reference or ID

Create a reference to the addressbook resource in three ways.

The ResourceIdentity is a simple string - the UUID of the resource. The Resource Client reference_instance method returns a reference object which specifies the UUID and the version or branch of the resource. The Resource Client reference_instance can also return a specific reference to the current state of the resource. To edit a particular state you must create a new version from it - just like a detached head in git!

Make a new version

Retrieve a particular version

Wait for the deferred

Note that you can only have one version of a resource active at a time. Old references in your local namespace to that object will be broken and return an exception if you try and access them.

Using the default Datasets in the Resource App

By adding a command line argument when starting the resource app, the application will automatically create some resource objects which you can use in the application.

The application creates a resource identifier and provides it to you as a variable when it returns control to you in the shell.

Import the tools and the resource client and get the data set resource.

Get the result...

Some simple examples working with the dataset resource:

306 The example creates a Common Data Model dataset. See DM CDM for details.

Print information for a known global attribute (global attributes are contained in the dataset's root group)

Print information for a known variable

Print the values of all global attributes in the root group of the dataset

Print the names of all dimensions in the root group of the dataset

Print the names of all variables in the root group of the dataset

Print all attributes and dimensions for all variables

Creating new resource objects in Google Protocol Buffers

Setup and install:

See the process outlined in the README.txt file in ion-object-definitions, also on GitHub here . This file will explain how to:

Download the google protocol buffer source and related tools Build the executables Obtain and build the ion-object-definitions project Create your own GPB and add them into the repository

Please follow the instructions carefully.

Now you can run the demo to add an address or read the address book file in any of the three different languages:

See the readme in the examples directory for details...

ION Object Identifiers ION defines a convention for identifying GBP message object type. The identifier is created with the object structure by creating a _MessageTypeIdentifier enum which defines _ID and _VERSION. See the following snippet for an example:

In a nested composite structure, the root object must define an ID in this manner or they will not be regarded as identifiable structures in the system. In linked composites, each vertex is dynamically linked by the SHA1 hash and the type which must also be specified using this convention.

Subsystem Identifier Ranges To prevent conflicts between identifiers, each subsystem will be assigned a range of valid identifiers. Tracking in-use IDs is done through the ION Object Definitions Document

Subsystem ID Range

Core System Objects 1-256

COI 1001-1999

307 DM 2001 - 2999

CEI 3001-3999

SA 4001-4999

AS 5001-5999

PP 6001-6999

EOI 7001-7999

IPAA 8001-8999

AIS 9001-9999

At present the version number assigned in the identifier is not used in the current implementation, but in the future backward compatibility requirements will require us to maintain all previous versions of each object definition.

Rules:

1. Message names are CamelCase and should not include a leading underscore: "_" 2. Message names can not be the same as the module name - it is a conflict in Java 3. Enums use C++ scoping rules. This means that fields of an enum are siblings of their enum container rather than children. If a named field of a message object is the same as a named field of that message's enum, a namespace conflict occurs. 4. Ensure any identifiers selected for your object structures are appropriate for their subsystem and are not currently in use. Keep track of used identifiers in the aforementioned document. 5. Do not use the required field rule. It is to restrictive and should be implemented by the service or process that uses the object. 6. When defining resource objects, the Resource Framework provides for the Object Identity, Object Name, Object Description, Object Type, and Object Life Cycle State. None of these will conflict with your name space but you should not duplicate their purpose.

Do not define duplicate identifiers for resource objects. These are provided by the resource framework.

For more details see the GPB docs: Protocol Buffers Wiki

Proto files that you define in this repository and compile will be available to import in your local environment. You should not need to actually import the compiled classes though. You should create type identifiers using the object utilities create_type_identifier() method. For example:

However, these classes can still be imported explicitly if the need arises:

Lets try an example! Create a new protofile in the ion-object-definitions repository

Now, copy the following structure definition into cactus.proto:

Finally, make the appropriate python classes from the protofile structure:

Now we can start using these objects.

Start the resource application:

In the container shell create a resource client:

308 Create the type identifiers so we can instantiate our objects:

Next, lets bring this cactus to life...

Then, grab the result of the defered:

Note: If you receive the error "AttributeError: Deferred instance has no attribute 'result'", try waiting a second for the result to resolve, and try again.

What else can we do with our custom object?

Lets try giving this object some personality:

And we can't have a cactus all alone in the desert without a defense mechanism:

Now he's well equipped! Finally, lets remove a couple spines:

TODO:

Section: "How to write proto files" provides an explanation but may benefit from examples. Section: "The object data model that backs ION resources" could benefit from linking to additional diagrams provided by David -Section: "Creating new resource objects in Google Protocol Buffers" could provide a link to a separate confluence page which includes the GPB runtime readmes and discussion as necessary. Section: "Creating new resource objects in Google Protocol Buffers" would benefit from expansion. It should include a section on running the GPB demos with explanations regarding the process flow. Section: "Now lets try it with the addresslink object structure" would benefit from a discussion regarding the differences between the addressbook resource and addresslink resource, as well as the impetus/benefits of facilitating arbitrarily linked object structures by implementing untyped message fields. Section: "Retrieving a resource by reference or ID" rework the structure of this section so that it does not cut-off the code follow-through Section: "Creating new resource objects in Google Protocol Buffers" must mention the required package "pkg-config". This can be acquired via macports or xcode Repositories which reference the amoeba server should be switched over to github. Section: "The object data model that backs ION resources" notes that resource and structured object are used interchangeably. This reads oddly. Could it simply state that the entire DAG represents a single composite structure?

Address Book

Addressbook example

This is a simple example of a composite data object structure. An address book. The address book contains multiple entries called people. Each person has a name and an id. They also have an optional email and one or more phone numbers. The phone number is also a composite object with an enum for phone type and a string for the number.

There are two object models for the same data structure, one called AddressBook and one called AddressLink. They store the exact same content. One uses a single GPB composite message object to store the entire address book as one object. The AddressLink uses the OOICI data model to break the address book into a coupled structure of objects using a convention which defines a link. The object implementation makes the difference nearly seamless.

A diagram of the two address book data models

309 Coupled Mooring Composite

In this example, the single mooring object has been decomposed into explicitly decoupled objects using the ION link convention. Now there is some overhead in storing the reference, but the data structure is much more powerful.

310 Mooring Composite

This module contains a scientifically relevant example - a simplified data model for a mooring and instruments. In this case it is expressed purely as GPB composite. A mooring object would be one message containing potentially hundreds or thousands of objects.

311 CIAD COI OV Service Framework

Service Framework

The Service Framework manages services and information about services. It associates services with their descriptions and relations with other services. It enables their discovery and subscription.

Any service in the system is integrated according to the standard service integration pattern.

A service is brought to the system as a process (set of processes), deployed in a capability container: Process Management.

312 Figure 1 depicts the internal structure of the Service Framework.

Figure 1. Service Framework services (OV-2)

The Service Registry service keeps track of the services and their life-cycle. The registry provides instantiation information regarding the number of services available, how to access a particular service, their signature (interfaces or interaction patterns), computational resource requirements, and information pertaining to ownership, availability, or references to policies under which it can be invoked.

Domain Models

The information model of services, i.e., their description, is based on the OWL-S information model for services. Note that the specifications and tools related to OWL-S do not necessarily apply.

Figure 2 depicts the Service Registry Domain Model. A service registry stores service definitions, whereas an associated catalog retains metadata describing them. Parts of a service definition are its name, human-readable description, provider, signature and location. This abstract model can be mapped to various implementation technologies. For instance, the signature may contain interface description (e.g., using WSDL for Web Services) and the location could be an URI.

313

Figure 2. Service Registry Domain Model (OV-7)

CIAD COI OV Service Agent

A Service Agent is a specific sub-type of a Resource Agent.

TBD

CIAD COI OV Service Integration

This page describes how any service within the CI system works and is integrated

What is a Service?

314 Figure 1: Service Constituents

Constituent Description Element

Service A entity in the system that can be discovered and addressed by name, realizing a specific purpose. Services provide one or multiple operations. Nothing is known about the location of the service or its internal structure/implementation.

Requesting A service that has an operational dependency on a Providing Service to perform its purpose. It requests a providing service Service by sending a request message to it.

Providing A service that can be requested (by a requesting service). Both requesting and providing service are expected to comply Service with known shared conventions.

Service Interface A specification of the operations (aka methods, verbs, commands, performatives) of a service and their effect on the environment (i.e., their purpose in the system). For each operation, has a specification of the messages that the service understands and emits. Message specifications identify (and specify) objects that are contained in the messages; messages are arranged (i.e., sequenced) as interaction patterns. For instance, a requesting service sends a request message, the providing service responds with either a success-result message or an error message.

Service The internal implementation of the service interface and any required internal context, state, structures and code. A part that Implementation is completely hidden from the service requester. A service can have multiple different implementations that are indistinguishable by the requesters.

Technology The code that provides a front-end to an external technology, if a service cannot provide the service standalone or can act Integration as a requesting service. Interface

Technology An external system or component that is interfaced through a service. For instance, a storage system (Cassandra, RDBMS, Redis, iRODS), an HTTP-Server (Django, Apache, Drupal).

Technology Content of configuration files or external registry entries that are accessible by the technology. Packaging as binary packets Configuration for distribution and versioning (APT package, etc) and Packaging

Technology The steps needed to install a technology "package" on an OS, a virtual machine instance, a container, etc and the decisions Installation and made by a deployment engineer of where to execute which package with which resources. Deployment

315 Virtual Machine A definition or implementation of the process of starting and configuring (i.e, contextualizing) a virtual machine instance with Instance the technology on it. This includes installation steps, configuration steps, registration steps, exchange of secrets. (Operational Unit) Contextualization

Service A higher order process that is responsible for the existence of a service. A service supervisor typically defines the Supervisor deployment configuration of a service and all its service processes. A supervisor spawns the service and reacts to failures, for instance by taking down and restarting a service

Service Agent A separate process that controls a service's life cycle (i.e., start, stop, reconfigure), monitors the service for correct operations and emits failure events in case of failure, advertises the capabilities (i.e., the operations and their definitions) of the service, and keeps track who can access the service.

Service The configuration in a registry Deployment Configuration

Capability A service is hosted within a capability container. The capability container loads Service Process Packages from a repository Container whenever the service supervisor spawns a new service (i.e., a service's agent), or a service agent spawns a new service process.

Service Process A service can be realized as a distributed system, made out of multiple service processes. A service requester cannot distinguish different service processes. Service processes can be distributed, of the same or different type (e.g. coordinator and worker).

Service Process A packaged version of a process that can provide the service (interface, implementation, technology integration) and can be Package hosted by a capability container. The capability container can retrieve the package from a package repository (e.g. a PyPi server)

Service Process A package repository such as a PyPi server) Package Repository

Service Registration

Services are registered (described) in the service registry. Service interfaces are registered in the service registry, cross-referencing interaction patterns and object definitions in the interaction pattern repository. Service instances are registered in the service registry when they are created.

Service Configuration and Deployment

Service Access

Requesting service looks for Providing Service in the service registry. Requesting service gets name of a qualifying service instance Requesting service initiates a valid service conversation with the providing service (i.e., sends a request message) Providing service joins the service conversation and acts accordingly (i.e., sends a response/error message)

Service Management and Access Control

Service Supervisor knows which service instance need to be created (and supervised) Service Supervisor "spawns" new service instance, i.e. a service agent for this service instance Service Agent knows how many service processed need to be created (and where). Service Agent "spawns" new service processes and monitors their correct operations Service Agent configures the Capability Container's PEP (Policy Enforcement Point) that hosts the service processes: (a) static, e.g. ACLs (b) call-out to PDP On request, a PEP performs a local polict decision (case a) or performs a callout to a central PDP (Policy Decision Point) for the service instance (case b) The PDP can be (a) an attribute authority or (b) perform a call out to the service agent

CIAD COI OV Service Discovery

Overview

Service Discovery is an important part of the Service Integration framework. It includes the set of technologies, protocols, configurations, and repositories that allow an entity to detect the existence of a particular service, its interfaces, interaction patterns, governing contracts, entry points, etc.

316 Technology space

Universal Plug and Play (UPNP) Zeroconf/AVAHI/Bonjour

CIAD COI TV UPnP

Overview

Universal Plug and Play (UPnP)

CIAD COI TV Zeroconf

Overview

Zeroconf is a technology for automatically configuring a device through the use of the mDNS and DNS-SD protocols. It has multiple implementations, the most populars being Apple's Bonjour and Linux Avahi.

317 CIAD COI OV User Interfaces

System Wide User Interface Support

Presentation Framework

The COI subsystem provides the framework for any form of production and provisional web user interfaces of the OOI Integrated Observatory Network. This is provided as part of the COI Presentation Framework. Subsystem and implementation development teams, and later user communities, can leverage this framework to develop any ION user interface necessary.

User Interface Templates and Components

In addition, the COI development team provides templatized user interfaces and user interface widgets, such as for manipulating resource registries. These components can be used in specific production or provisional user interfaces.

These include

Generic Resource Registry Interface

COI User Interfaces

UI User UI Type Description Role

System Web UI Operator UI (in Monitoring metrics (rates, latencies, lengths) collaboration Messaging queues: queue lengths, average queue latency, consumption rate with CEI) Service request throughput per second Request roundtrip latency Viewing logs, warnings and errors

Capability Configuration, Configuration exists in the configuration registry and in basic per container config files. This UI serves Container Console the purpose to interactively activate a different configuration or perform a certain command for a Management capability container. The container has a shell/console style interface to activate certain commands. UI

General Web UI The most generic UI to view and manage any kinds of resource in the system. Basic actions include: Resource Management View resources of a type as a list/table UI View resources further restricted by a filter View details of one resource (optional) Create new resource of a type (optional) Edit details of one resource Change one resource's life cycle state (active, inactive, decommissioned) (optional) Delete one resource Extensions: Apply action (=verb) to a resource, specific by type (see below)

318 Deployment Web UI, Manage configuration entries for resources, components, services, processes, agents in the system. and Resource This includes viewing and editing the configuration Configuration Mgmt UI Extension

Agent Web UI, Manage the agents/processes in the system. Non-standard actions include TBD. Management Resource UI Mgmt Extension

Service Web UI, Manage the services in the system. Non-standard actions include TBD. Management Resource UI Mgmt Extension

Exchange Web UI, Manage the Messaging Exchange (AMQP, Hardware Routing) resources in the system. Non-standard Management Resource actions include TBD. UI Mgmt Extension

Identity Web UI, View user identities and their associated profiles. Non-standard actions include TBD. Management Resource UI Mgmt Extension

Policy Web UI, Define access and execution policy for facilities, Orgs and resources in the system. Manage roles, Management Resource groups. UI Mgmt Extension

CIAD CEI Common Execution Infrastructure

Common Execution Infrastructure (CEI) Subsystem Architecture and Design

This is the central page for the CEI subsystem architecture and design, a part of the OOI Integrated Observatory Network. Both COI and CEI subsystems symbiotically form the "Operating System" of the Integrated Observatory Network, with CEI providing more of the "system level" components. This page is structured into operational views (OV), system views (SV) and technical standards views (TV).

CEI Overview

Elastic Computing Infrastructure and Services Elastic Computing Terminology Elastic Processing Unit (EPU) Technologies: Nimbus , Eucalyptus/Ubuntu Enterprise Cloud, Virtualization (libvirt, etc)

Execution Engine Management Technologies: CohesiveFT, rBuilder

(Taskable) Resource Management Planner and Hierarchical Control Taskable Resource Agent Resource Agent Interactions Resource Agent State Machine

Process Execution Management (not scope of Release 1) Technologies: Cluster execution environments (Condor)

System Operations System Bootstrapping and Startup

Cross-Cutting Concerns Registries and Repositories User Interfaces

319 Quick Links

Subsystems: COI CEI DM SA AS PP

CIAD CEI OV

Both Common Operating Infrastructure (COI) and Common Execution Infrastructure (CEI) subsystems symbiotically form the "Operating System" of the Integrated Observatory Network, with CEI providing more of the "system level" components.

The Common Execution Infrastructure (CEI) provides an infrastructure for the virtualization of computing across the OOI, including taskable resource provisioning, remote operational management and process execution. It supports software package functional decomposition, deployment, implementation and integration, as well as execution engines and an environment for specific user-requested purposes. The following sections introduce the main entities and their relationships in the form of domain models and explanations.

The CEI will provision the services required to implement an elastic compute network together with a corresponding management UI module. This constitutes the computation and execution substrate for the entire CI. The CEI provides the following capabilities:

Virtualized computing resource provisioning, operations and maintenance; provisioning parameterized configurations of service and application modules into compute node deployment packages; monitoring and provisioning compute nodes based on compute resource utilization and latency of provisioning; on-demand scheduling of processes, optimized scheduling of stream process subscriptions;

320 extendable process execution environment that supports multiple execution formats; federation of process execution service providers, and process control interface; Immediate-mode scheduling of processes at specified locations; coupling of processes to the streaming environment of the Data Management subsystem; coordinated and/or chained scheduling of processes; an extendable set of process execution engines; standard process execution planning and control; standard provenance capture and reporting. Process authoring and monitoring applications that will be integrated as a user interface to the CEI include MatLab and Kepler.

Overview

Figure 1 shows a high level illustration of the purpose and dependencies of the CEI. The illustration shows a number of COI Capability Containers, which are software applications designed to host software processes, in a virtual execution environment called the "Operational Unit". In today's technologies, the Operational Unit is realized as a Virtual Machine Instance that hosts the Capability Container software package, which in turn hosts process software packages.

The processes attach to the OOI Integrated Observatory Network Exchange. Subsequently they communicate by sending and receiving messages. The CEI is responsible for bringing the Operational Units into being, for contextualizing them to their execution environment, for starting and managing the hosted resources (here Capability Containers), and for continuous management of the Operational Unit via agents.

Figure 1. Capability Container Deployment (OV-1)

Work Products

The Work Products provided by this subsystem are shown in Table 1.

Table 1. Work Products for the CEI Subsystem

ID Service Explanation Release

1.2.3.6 Common Provides the services to manage the distributed, immediate mode execution of processes R1, R2 Execution Infrastructure

1.2.3.6.1 Elastic Scheduling, provisioning, and monitoring services to maintain a balanced deployment of Virtual Compute R1 Computing Nodes to the computational engines. Services

321 1.2.3.6.2 Execution Maintains references to registered execution sites and Virtual Compute Node configuration packages. R1 Engine Catalog & Repository Services

1.2.3.6.3 Resource Establishes standard models for the operational management (monitor & control) of stateful and taskable R1 Management resources. Services

1.2.3.6.4 Process Provides the validation, scheduling, and management services for policy-based process execution at R2 Management specified execution sites. The service supports the coupling of the dynamic data distribution service with Services the process and its triggering. Provenance and citation annotation are registered associating the input and output products with the execution process and its operating context.

1.2.3.6.5 Process Maintains process itineraries and references to registered process engine configurations and execution R2 Catalog & sites. Repository Services

1.2.3.6.6 Integration Provide the capability to deploy OOI processing, both data stream and ocean models on to the nation R2 w/ National computing infrastructure, in particular the focus is on the Open Science Grid and the Teragrid (and/or it Computing logical successor). Infrastructure

Release 1 focuses on providing management of stateful and taskable resources, such as operational units (VM images) and Elastic Processing Units (highly available distributed services). The management activities include planning, provisioning, controlling and fault monitoring. Release 1 focused in detail on elastic computing, and also provides abstracted services for generic taskable resource management.

Release 2 focuses on supporting process (job) scheduling and execution through execution engines, in particular user provided processes. It will also provide an integration with the National Compute Infrastructure (Teragrid/XD).

After release 2, the Analysis and Synthesis subsystem will extend CEI services and mechanisms by providing workflow execution engines that can execute user provided scientific workflows, or visualization synthesis workflows and perform automatic mapping to available execution resources.

CIAD CEI OV Elastic Computing

This page describes the design of CEI system Elastic Computing infrastructure and services as demonstrated at the Release 1 LCA (August 2010) and several months following it. Some things that may be required in Release 1 may be called out inline with other design elements or they may be listed in the final "Possible Deficiencies" section of the document.

Elastic Computing Design 1. Introduction 2. Terms 3. Assumptions 3.1 Messaging Service 3.2 Capability Container 3.3 Exchange Space 3.4 Exchange Point 3.5 Reliable Data Storage 3.6 Stateful vs. Stateless Services 4. Design Overview 5. CEI Components 5.1 Provisioner 5.2 Deployable Type Registry Service 5.3 Sensor Aggregator 5.4 EPU Controller 5.4.1 EPU Controller Decision Engine 5.5 EPU Worker 5.6 OOI bootstrap commandline 6. Bootstrapping 7. High Availability 7.1 Definition of HA 7.2 Failure Matrix 8. Possible Deficiences Scale-down Context broker HA Restart speed / dual instances

Elastic Computing Design

322 1. Introduction

CEI's Elastic Computing Services are responsible for making services highly available.

An exchange point is a resource (provided by COI as part of the Exchange) that allows an entity to address messages to just one endpoint in the greater distributed messaging system for any service type that it interacts with. When a service is made highly available, entities will never address a service instance directly, they will use an exchange point instead.

The CEI software design is centered around a set of services and components collectively called an EPU (Elastic Processing Unit) or "EPU infrastructure". This EPU infrastructure will make sure the actual service instances that end up processing the messages directed to the exchange point are always available, never fail, and scale elastically to meet demand.

The main instruments used to make this happen are virtual machine instances launched via "infrastructure-on-demand" services like Nimbus and EC2, typically referred to as "IaaS" (Infrastructure-as-a-Service).

Figure 1 shows an illustration for an observe-decide-act feedback loop as provided by the CEI and its EPU. Based on observations of the system and its environment, decisions are made based on policy, with the decisions being automatically enacted. The results of these actions will feed back into the system observation process. This enables to realize an automatically scaling or a self-healing system.

Figure 1. Feedback loop Observe-Decide-Act (OV-1)

2. Terms

See Elastic Computing Terminology.

3. Assumptions

The CEI software makes the assumptions that other components and functionality exists already.

3.1 Messaging Service

A Messaging Service provides a flexible, asynchronous way to deliver messages from an entity in the OOI system to any other entity (subject to policy). The current implementation relies heavily on RabbitMQ AMQP brokers.

3.2 Capability Container

A Capability Container is a container that runs service code and provides it (directly or via proxy) with any infrastructure service it needs. It is responsible for initializing service code and keeping it alive with the local system resources necessary. It allows application/service code to easily adapt to the Messaging Service. It subjects all service activity to the configured policies.

3.3 Exchange Space

323 An Exchange Space is realized by a collection of Messaging Service instances that have a mutual security/namespace agreement that allows entities to address one another (subject to configured policies). A client or service is "enrolled" in the exchange space and from then on is a member of the system (this works much like a VPN that realizes an overlay network). The mechanics and details of this are outside the scope of this document.

3.4 Exchange Point

An Exchange Point receives and manages messages, manages and fulfills subscriptions, has an identity, has a message persistence strategy, is reified across multiple brokers and is a finite state machine (FSM).

See:

http://oceanobservatories.org/spaces/display/syseng/CIAD+COI+SV+Messaging http://oceanobservatories.org/spaces/download/attachments/20513320/2660-00008_SV2_CI_Messaging_Actor_Model.png

3.5 Reliable Data Storage

A data store will exist that allows a service to store information that will persist beyond crashes.

It must be transactional

the CEI service writing information must know that a set of writes has completed (so that it can e.g. correctly move on with an internal state change).

It must be consistent

the moment a CEI service writes something, a subsequent read by the service (e.g. if it is in recovery mode after a crash) should result in that written data being returned, not a previous value.

3.6 Stateful vs. Stateless Services

There are two types of CEI services that will be written.

A "stateful" service is configured by an external entity with information at boot time. The information is used during runtime and is not changed. Or if it were changed it is not of consequence to the high-availability scheme: i.e., the service can die and be restarted by an external entity without any participation of an up-to-date data storage read.

A "stateless" service may only be minimally configured by an external entity with information at boot time. But it reads the information it needs from a reliable data storage service (see the last assumption "Reliable Data Storage) to recover from crashes. During its runtime, it is constantly updating this data storage system with any information it would need stored in order to recover gracefully.

4. Design Overview

Some OOI context is presumed.

Consult the following diagram:

324 and the more detailed instance diagram below:

The diagram only contains components in one exchange space that can address each other by name. The "Transform-v2 service" is required to be highly available.

A instance of a component called the EPU Controller is started in order to make the Transform service highly available. It declares that a "HA-Transform" exchange point be created in the messaging fabric. This instance will be called "the EPU Controller for HA-Transform-v2."

Assume a highly available Provisioner service has been brought online in an exchange space, it is named "HA-Provisioner" and it will be explained later in the document how it itself is made highly available. Right now we are only discussing how non-CEI services are made highly available.

The provisioner is responsible for adapting to IaaS sites, enabling other CEI components to request contextualized VM instances and track their status (both in terms of VM lifecycle and contextualization status).

The EPU Controller for HA-Transform makes a request to the HA-Provisioner endpoint that an instance of a specific deployable type be started. This deployable type is known via configuration to start Transform service instances.

The provisioner launches a VM that, through contextualization, runs a capability container that runs an instance of the Transform service which we will call "Transform-0".

325 An onboard agent in the same capability container called an EPU worker will be configured to retrieve the next work message from the HA-Transform exchange point.

Now when a client (requesting service) of the HA-Transform service sends a message there will be an instance of the Transform service to accept the message.

A CEI component called the Sensor Aggregator was started at the same time this particular instance of the EPU Controller was started. It is specific to this service and we will call it "the Sensor Aggregator for HA-Transform."

It subscribes to information about the exchange point, the specific instances launched via the provisioner, and obtains any other data relevant to the EPU Controller's decisions.

Using information obtained from the Sensor Aggregator for HA-Transform, the EPU controller for HA-Transform is constantly evaluating the current state of the HA-Transform exchange point and the service instances. In response to certain situations, it can start/stop the appropriate numbers of compensatory instances to handle the current Transform service load. This decision is informed by the sensor aggregator's data as well as policies specific to the sites, situation, money, clients, etc. that are relevant.

5. CEI Components

5.1 Provisioner

Contains adapter logic needed for any context broker and IaaS implementation. Keeps track of the state of any VM instance or context that has been launched.

There is one provisioner in each exchange space, it is itself run as a high-availability, EPU-ified service. It is written to be stateless, an instance of it can be instantiated and use the data store to know what internal tasks it should launch to recover.

Operations:

Launch and contextualize a specific number of a specific deployable type at a specific site, identify the launched entities with unique client-provided identifer(s) (implementation choice is UUID(s)). Destroy a given operational unit(s) (client gives UUID(s)) that were launched Subscribe to the state of given operational unit's UUID(s) that were launched

Entities can subscribe to receive status updates about anything launched.

In order to work as an IaaS and context broker client, it must be brought online with the necessary credentials. A root owned file stores secrets, secrets are written to it during contextualization, only ever lives on operational units, never deployable units or seed deployable units.

Relies on Deployable Type Registry Service to lookup "how to get from the needed type to a running instance", see the subsection below "Deployable Type Registry Service"

5.2 Deployable Type Registry Service

The Provisioner needs to look up in this registry service what a deployable type actually "means" in terms of what it needs to launch.

The Deployable Type Registry Service is essentially a key/value store that maps needed types ("Transformer Service v3") into most of the needed information for a launch.

The deployable unit that, along with contextualization, will allow the deployable type to be realized as an operational unit is assumed to have been deployed to the site in question.

Essential inputs:

Deployable type Site to run it

Essential outputs:

contextualization document template. This is described in detail later but it contains any necessary information to bring about the desired instance of the deployable unit (including the EC2 AMI identifier or Nimbus image name/location). This will also include whatever contextualization data it takes to do on-the-fly conversion of a seed deployable unit into the desired type of deployable unit as the operational unit is instantiated.

5.3 Sensor Aggregator

The sensor aggregator obtains data about exchange points and EPU workers among other metrics like operational unit data/statuses. It uses various mechanisms to obtains this data and presents it all via subscription to the EPU controller.

Three examples of things it could monitor:

Queue draining rate

326 Available disk space of operational units CPU load of operational units Network load? Operational unit status [1]

[1] - The sensor aggregator is what subscribes to the Provisioner for state changes. Technically the EPU-Controller will make "create" calls to the Provisioner and cause the Sensor Aggregator to be subscribed (TODO: how can that happen exactly with the available mechanisms, the alternative is to have the EPU Controller call the SA and tell it to subscribe to a specific UUID which is not as compact/atomic of a procedure. It would be great to simply have a way to get a particular Sensor Aggregator to subscribe to any single instance that was started by a particular EPU Controller).

5.4 EPU Controller

Each unique reliable service has a unique EPU controller instance.

The main responsibility of the controller is to evaluate data from the sensor aggregator (see Sensor Aggregator section below) against policies and cause the correct compensation actions to occur if necessary.

All compensation actions will be attempted via messages to the HA-Provisioner.

Basic examples of actions: create one instance of deployable type XYZ, destroy one instance of deployable type XYZ

Each controller instance has to be monitorable: able to die unexpectedly, be brought back up by a fault compensation supervisor instance, and be able to continue its work where it left off.

Each controller instance is itself running in an operational unit.

It is bootstrapped (during the instantiation of the operational unit) with information:

one exchange point to create and monitor (i.e., one HA service to provide) one deployable type it can launch as compensation policies/hueristics about the particular interaction patterns: metrics need context in order to make the right compensation decisions policies/hueristics about deployable type sizing

Each unique reliable service has a unique Sensor Aggregator instance and the controller subscribes to updates from this in order to get information about the running system.

5.4.1 EPU Controller Decision Engine

The EPU Controller instance contains a stateless decision engine that is constantly evaluating the following inputs:

sensor data given policies/hueristics [1]

[1] - Live reconfiguration of policies/heuristics to use is out of scope of this document, the policies are presumed to be configured just once when the service is instantiated.

This decision engine makes the decision about what compensatory units to deploy, terminate or cleanup for the current situation given a set of constraints (in various dimensions).

The EPU Controller then is able to task the Provisioner with accomplishing its goals.

An error in provisioning (e.g. the IaaS site simply rejects the request) will result in new sensor data that needs to evaluated during the next "pass" of the decision engine (see below).

How resources are represented will need to be elaborated in the future. It is not just an internal representation since policy writers need to be able to express their heuristics about how "much" resources and what kinds of resources will (likely) cause certain desired compensating behavior. This will either be hardcoded or very simple for the current scope.

TODO: define different states in which compensatory units can find themselves as viewed from the EPU controller instance.

5.5 EPU Worker

Any message directed to the HA service address will be enqueued at one specific Exchange Point. The EPU controller monitors this Exchange Point but does not draw messages from it. Instead, via the Provisioner, it launches EPU workers that are contextualized to subscribe to this one Exchange Point.

Each message addressed to the reliable service is handled by a service instance in a particular operational unit. The applicable operational unit that can handle the messages in question runs an EPU worker agent

An EPU worker agent requests the next work message from the exchange point. It is configured to either draw "any" next message or a message with a specific conversation ID (session). Conversation IDs are out of scope of this document.

The message is delivered and passed to the service instance running in the same COI capability container.

327 The consumption rate for each worker will be based on an "on-board" policy about how many messages in this interaction pattern it can be processing at once. The EPU worker agent is configured during the instantiation of the operational unit where it is running.

TODO: clarification of consumption rates

NOTICE: This component cannot be developed without an Exchange Point implementation. In lieu of this, a service is "just a service" and it can draw messages from a named queue as normal.

5.6 OOI bootstrap commandline

Described in next section "Bootstrapping"

6. Bootstrapping

See System Bootstrapping

7. High Availability

7.1 Definition of HA

Observable Latency

One of the primary requirements of a reliable service (an "EPU-ified service") is that it does not ever go down, it must "always" be available.

Strawman definition of "always" 0.001% (5 nines) of unanticipated downtime for user observable services (for an entire month's deployment this is 26 seconds!). Strawman definition of "user observable" A module's service interaction messages are the user observable services that the EPU "fulfills." Strawman definition of "downtime" All EPU workers need to pick up messages within a certain time period The current idea of what that time period is: one or two seconds

7.2 Failure Matrix

This table explains what happens when any oeprational unit in the system is corrupted.

Operational Services on it Failure notes Unit

Your laptop epucontrol program This is the root supervisor, it is serving as the "last resort" supervision code for the timebeing. If it itself goes down, there is nothing left but a human to notice. This code will expand in the future: it will register supervisors with the operations team and exit.

#1 Context Broker #0 epucontrol monitors and restarts this instance

#2 MessagingService #0 EPU Controller & Sensor Aggregator for HA-MessagingService (and HA-Provisioner) ensure (includes all Exchange Point there are always at least two MessagingServince instances (or N if policy/situation requires it, or instances) one if in development mode)

#3-N MessagingService #1-N EPU Controller & Sensor Aggregator for HA-MessagingService (and HA-Provisioner) ensure (includes all Exchange Point there are always at least two MessagingServince instances (or N if policy/situation requires it, or instances) one if in development mode)

#4 CoreServices+DTRS #0 EPU Controller & Sensor Aggregator for HA-CoreServices (and HA-Provisioner) ensure there are always at least two CoreServices+DTRS instances (or N if policy/situation requires it, or one if in development mode)

#5-N CoreServices+DTRS #1-N EPU Controller & Sensor Aggregator for HA-CoreServices (and HA-Provisioner) ensure there are always at least two CoreServices+DTRS instances (or N if policy/situation requires it, or one if in development mode)

#6 Provisioner #0 EPU Controller & Sensor Aggregator & Provisioner-Provisioner for HA-Provisioner monitors and restarts this instance

#7-N Provisioner #1-N EPU Controller & Sensor Aggregator & Provisioner-Provisioner for HA-Provisioner monitors and restarts this instance it exists (or N if policy/situation requires it)

#8 Base EPU Controller & Sensor epucontrol monitors and restarts this instance, there is only a need for one of them CEI Aggregator & Instance Provisioner-Provisioner for HA-Provisioner

328 #9 EPU Controller & Sensor epucontrol monitors for failure and restarts Aggregator for HA-MessagingService

#10 An EPU Controller & Sensor epucontrol monitors for failure and restarts Aggregator instance for each HA core service

#N ServiceX #0 EPU Controller & Sensor Aggregator for HA-ServiceX (and HA-Provisioner) monitors and restarts this instance if necessary

#N ServiceX #1 EPU Controller & Sensor Aggregator for HA-ServiceX (and HA-Provisioner) monitors and restarts this instance if necessary

#N ServiceY #0 EPU Controller & Sensor Aggregator for HA-ServiceY (and HA-Provisioner) monitors and restarts this instance if necessary

#N ServiceZ #0 EPU Controller & Sensor Aggregator for HA-ServiceZ (and HA-Provisioner) monitors and restarts this instance if necessary

8. Possible Deficiences

Scale-down

Without two-level scheduling (direct management of scheduling/compensating service instances inside the first layer of scheduling/compensating operational units), there seems to be too much opportunity for underused VM instances in a scale-down situation.

A compromise strategy might be to have an EPU Controller that declares sets of Exchange Points, say a bundle of 5-10 very common services. One deployable type is provisioned that cause one new operational unit to exist which has these 5-10 very common services.

Context broker HA

Context broker does not currently get AMQP messages and so the HA strategy for it will need to be discussed. We think it can currently handle thousands of concurrent instances so keeping one instance alive should be "ok" for now if it can recover from failures gracefully. The contextualization process includes a "keep trying" semantic in the VM ctx-agents so if the broker is not around for a minute

Restart speed / dual instances

If "one or two seconds" is all that a service can be absent, the strategy in this document will not work for HA services that only have one operational unit servicing messages from an exchange point.

Considering it could take upwards of a few minutes to launch a VM instance to replace a corrupted operational unit, there would have to be two always running, probably in different "availability zones" (essentially: different data centers with different networks and power).

CIAD CEI OV Elastic Computing Terminology

Elastic Computing Terminology

2.1 Deployable type

An abstract description/template/recipe of an environment in terms of software packages, OS, etc. When instantiated, it will perform a specific task in the OOI network. Deployable types are registered and made available for automatic instantiation or modification by users based on a configuration of software deployment packages. Deployable types are independent of any specific execution site format. Each particular type will have its own unique identifier in the Deployable Type Registry Service (a CEI component defined later).

Example: VM template image that contains a particular OS (say, Ubuntu 9.04) with particular libraries (say, Python 2.6.4) that all run a specific version of the COI capability container and a specific set of services ("Transformer service v.0.4" etc.). A new permutation of software (even at the version level) necessitates a new unique identifier in the Deployable Type Registry Service.

2.2 Deployable unit

A specific rendition of a deployable type, e.g., a VM image registered for use at the Amazon EC2 service or Nimbus repository. There could be simple and complex deployable units. Complex deployable units represent virtual clusters (a collection of VMs that share a security and configuration context) or e.g. a set of units representing a workflow platform. In practice, a "seed" deployable unit will be the actual image "bits" in the repository (see below).

Example: In practice this will not be a particular AMI instance or Nimbus repository image, it will be a seed deployable unit coupled at

329 boot time with whatever we need to contextualize on it to make an operational unit that represents the desired deployable type.

2.3 Operational unit

An instantiated (i.e., deployed) deployable unit. Which by inheritance also means an instantiated deployable type. An operational unit is created at deploy time through the process of contextualization.

Example: a contextualized, running instance of the desired deployable type. For example an Ubuntu 9.04 instance with Python 2.6.4 installed running a specific version of the COI capability container and a specific set of services ("Transformer service v.0.4" etc.) that were brought up during the VM's instantiation and contexutalization process.

2.4 Contextualization

The process executed immediately after instantiation of a deployable unit before it becomes an operational unit. In practice this will be used in different phases of bringing an operational unit into existence.

Security/enrollment bootstrap into the OOI network (bootstrapping the Capability Container) Other higher level registrations Turning Seed Deployable Unit into the required Deployable Unit A seed deployable unit is an optimization for launching operational units that the Provisioner (defined later in this document) will take. It saves a lot of human time to have a slim virtual machine registered for use at specific sites which is transformed into the desired instance of the deployable type that was requested by the system. It is deployment-type-specific whether or not this is a good strategy. Any time the strategy is used, it is an encapsulation behind "deployable type --> operational unit" which is the mapping that really matters to anything using the Provisioner service. Entities ask for deployable types and operational units are brought into being.

Example: There are many ways we will use contextualization (bootstrapping etc.), some of the higher level scenarios are called out here

2.5 Service

An entity in the system that can be found and addressed by name and realizes a specific purpose. Nothing is known about the location of the service or its internal structure/implementation. It is registered in the COI service registry. Provided by a deployed software component package, integrated through a capability container.

Consult this page for more about services. In the context of that explanation of services: an important idea for the EPU architecture is that there is an out-of-band inspection of the messages between Requesting Service and Providing Service. Picture: Service-Integration-Invisible-Hand.png (TODO )

Example: "Transform service" "Process Registry Service" "Data Stream Registry"

2.6 Reliable Service / HA Service

A service that is backed by EPU infrastructure. It is addressable by a unique name that an entity can direct a request to. The messages are queued up at this exchange point but actually processed by unique service instances (the mechanics of this are explained in detail later in this document).

Example: "HA-Transform service" (this document uses this fake example)

2.7 Processes

A service is a process that is run the entire life of an operational unit. There is a notion of "process-process" which are things started (and potentially cancelled) independently of operational units but that is out of scope of this document.

Terminology, Domain Models

(the following may reflect slightly outdated information)

Figure 1 depicts the main concepts related to the operational system, its deployment and its implementation architecture.

330 Figure 1 CEI Operational, Deployment, and Implementation Artifacts (OV-7)

Table 1 defines these terms and others relevant for the CEI.

Term Definition

Enterprise The distributed system of systems. For OOI, this is the OOI Integrated Observatory, a federation of facilities sharing resources and providing services based on inter-facility agreements. Composed of a number of applications.

Application A distributed system that provides resources and services to satisfy a number of user concerns. Composed of modules.

Service Functional capability available to the integrated observatory through the observatory network. Provided by a deployed software component package, integrated through a CI capability container. A service hides from the actual implementation technology and details. It is registered in the COI service registry.

Module One distinct functional capability in a distributed system and the unit of granularity in an operational distributed system architecture. Modules can be hierarchically decomposed if necessary. Used synonymously to HA Service.

HA Service High availability service. A service that is hosted by EPU infrastructure or otherwise realizes high availability. Used synonymously with module

EPU An EPU (Elastic Processing Unit) provides the view of an ideal node in a distributed system providing services that are always available, never fail and that scale internal resources elastically to demand. An EPU is a distributed system made of multiple operational units to provide EPU management services and hosts additional services in order to make them highly available. An EPU can be geographically disparate and load-balanced transparently to the service requester; it hides all deployment, resource allocation and network management aspects from the requester. An EPU requests the instantiation and termination of operational units from a provisioner, and contains planner and controller components to manage its operational units. Different EPUs have different availability and scaling strategies. An EPU is itself a realization of an HA Service with the purpose of providing other HA Services, i.e. is the deployment infrastructure for a module. A separate section discusses the EPU.

331 Software Binary software unit for integration. Once operational, will provide services to the application through the network, most Component likely by using an integration infrastructure such as a set of CI services, user provided functionality and applications. A software component should be available in its source code representation together with an automatic build process (makefile).

Software Wraps and describes a software component. Packages can depend on other packages. A package has a unique name and Component a version. Packages can be stored in a repository. Package

Exchange (COI) The messaging integration infrastructure of the OOI providing a message publish-subscribe model form reliable asynchronous communication.

Configuration A set of software component packages, consistently identifying dependencies and versions.

Deployable Type An abstract description of an environment in terms of software packages, OS, etc. A configuration, consisting of all packages consistent with the dependencies required to perform a specific task in the OOI network. Deployable types are registered and made available for automatic instantiation or modification by users based on a configuration of software deployment packages. Deployable types are independent of any specific execution environment.

Technical Technologies, interfaces and frameworks required to deploy and instantiate a node in the network. Part of the technical Environment environment is the specification of how to interact with operational units of nodes for monitoring and management purposes.

Operational The dynamic environment within a network or application as specified and determined by the network, resource and Environment application providers. This may include network address settings, application cluster node configurations, registration in service registries, or specialized messaging end-points.

Execution Comprises both technical and operational environment Environment

Deployable Unit Adaptation of a deployable type to a specific technical environment in the environment's binary packaging form, for instance as a virtual machine image. Can be made available through a repository. Deployable units can be deployed and instantiated automatically on any compute node that provides a compatible execution environment. A self-contained package in a format specific to an execution environment. Can be deployed and instantiated and managed in an execution site such as by a virtual machine image. There could be simple and complex deployable units. Complex deployable units represent virtual clusters (a collection of VMs that share a security and configuration context) or e.g. a set of units representing a workflow platform.

Adaptation The automatic process of creating a deployable unit specific to an execution environment and execution site out of a deployable type, by adding software components required for instantiation, contextualization, monitoring, and management in the specific execution environment; and adapting to a specific deployment environment, such as by packaging it as a virtual machine image in a specified format.

Operational Unit Instantiation of a deployable unit in an execution environment. A deployed VM/environment/appliance. Will perform initialization and contextualization activities specific to the execution environment. Can host multiple processes.

Contextualization The process executed after instantiation of a deployable unit before it becomes an operational unit to adapt to available resources of the specific execution resource; to determine and set basic network parameters; to register itself in the execution environment; and to register itself as an operational node within the application environment (herd) with its capabilities.

Execution A service for executing process definitions in a specific process definition language, such as Kepler workflows or Matlab Engine scripts. The execution engine manages process scheduling and dispatching. An execution engine can be provided by an EPU, in which case it can elastically scale operational units to demand.

Definition The source format for process definitions, such as Matlab script language as understood by an execution engine. Language

Process User-provided description of a process, such as source code, as interpreted and executed by a specific execution engine. Definition

Process Additional parameters, configuration values, or dataset bindings required for the instantiation of a process from a process Configuration definition and for the execution of the process in a specific execution environment.

Process Startable software specific to a supported execution engine.

Task-process A task-process has to be started and stopped independently. In particular, some of the processes CEI will manage are core CI services in embedded environments, such as on sensor buoys.

Service-process A service-process comes into being when the operational unit hosting that process is deployed. It remains alive for as long as the hosting operational unit is deployed.

Process Repository for registering and storing process definitions. Specialization of taskable resource definition repository. Definition Repository

332 Facility (from Collection of resources and services provided within one domain of authority. The OOI is formed as a federation of multiple COI) facilities. Sharing of resources requires inter-facility agreements.

Operating Stakeholder operating one facility and domain of authority and authorized to form agreements with other facilities and Organization external organizations, such as resource providers.

Agreement Agreements are subject to policy and instill policy themselves. Agreements represent individual commitments of the respective parties. Commitments lead to obligations.

Execution Organization that provides execution and storage resources based on agreements with an OOI facility. This can be an Provider organization such as the Amazon Elastic Cloud or the Teragrid.

Execution Site The collection of execution and storage resources provided by an execution provider that are accessible over the network and provide a homogeneous execution environment for operational units and management services for remotely managing resources within a site.

Execution One physical or virtual node within an execution site, such as a virtual server in the Amazon cloud or on an OOI server. A Resource deployable unit for the correct execution environment can de deployed on one such execution resource thus forming an operational unit.

Storage Physical or virtual unit for arbitrary data storage and retrieval. Can be a network file system, a distributed database, or a Resource read-only data-warehouse interface. Operational units can make use of many storage resources.

Provisioner A service responsible for instantiating deployable units on request.

Computation A service responsible for making decisions about the need of taskable resources in the system over time, based on the Planner demand and the current state of the execution environment.

Deployable Unit Repository containing deployable units for access by the EPU provisioner and other applications across the Enterprise. Repository Specialization of taskable resource definition repository.

Execution Service that automatically performs the adaptation of a deployable type to a deployable unit for a specific execution Environment environment. This includes packaging into the required binary representation as well as adding functional components Adapter specific to management, contextualization and control for a specific execution environment

Component Repository that maintains deployable types as well as their building blocks in the form of software component packages. Repository Specialization of taskable resource definition repository.

Taskable A special category of resource that can be scheduled, executed, monitored and controlled. Each taskable resource has an Resource agent that can control the resource. There are special realizations of taskable resources that all share the common elements.

Figure 2 provides a domain model specifying the dependencies between the different implementation, integration, deployment, and operational concepts in the OOI. It shows the services that manage these concepts and the organizational entities responsible for them.

333 Figure 2 CEI Artifacts and Activities Domain Model (OV-7)

CIAD CEI OV Elastic Processing Unit

Overview

Figure 1 shows an overview illustration of an EPU. A user (in this case an OOI subsystem developer) intends to request an OOI service. Per OOI requirements and architecture, all OOI services are guaranteed to be highly available. The user requests the HA service. Internally this service is comprised of a worker queue (i.e. an inbox for work requent messages), and a number of worker processes. The worker processes know how to independently perform the task requested by the work message. Workers are deployed on Operational Units, which are Virtual Machine Instances. These VM instances are created from VM images that are loaded with the software packages required to perform the work.

334 Figure 1. Elastic Processing Unit User View (OV-1)

Terminology

An EPU (Elastic Processing Unit) is defined as providing the view of an ideal node in a distributed system providing services that are always available, never fail and that scale internal resources elastically to demand. An EPU is a distributed system made of multiple operational units to provide EPU management services and hosts additional services in order to make them highly available. An EPU can be geographically disparate and load-balanced transparently to the service requester; it hides all deployment, resource allocation and network management aspects from the requester. An EPU requests the instantiation and termination of operational units from a provisioner, and contains planner and controller components to manage its operational units. Different EPUs have different availability and scaling strategies. An EPU is itself a realization of an HA Service with the purpose of providing other HA Services, i.e. isthe deployment infrastructure for a module.

EPU Architecture

A service is made highly available (HA) by being hosted with EPU infrastructure.

A HA service is addressable by a unique name that an entity can direct a request to, messages are first processed by the EPU infrastructure.

If the name corresponds to existing applicable operational units (i.e., units hosting processes that can act in response to a particular request), a suitable operational unit (or process) is selected and the request is forwarded there. If no such operational units exist, a name resolution procedure triggers the deployment of suitable operational units. The population of available operational units is regulated based on need by the EPU infrastructure.

Figure 2. Inner structure of an EPU (OV-2) i. EPU as dispatcher: controller and workers

Each unique reliable service has a unique EPU controller. An EPU controller is a unique entity that is itself a high-availability set of running processes. To ensure high-availability in a virtual machine implementation of EPU infrastructure, the controller is spread across *at least* two VM instances in a scheme such that any instance can be destroyed/corrupted unexpectedly. The EPU controller provides the backbone of the entire availability scheme and as such there is a special reference in this architecture to implementation ideas at the end of this document.

335 Each unique reliable service's message is handled by an EPU worker. An applicable operational unit that can handle the messages in question also has an *aspect* that makes it an EPU worker.

The EPU controller forwards messages to a worker by using an Exchange Point, an entity provided by COI.

The controller adds messages to a specific worker queue and an EPU worker retrieves messages from a specific worker queue (the operational unit is configured via contextualization at deploy time).

See the following sequence diagram:

Figure 3. Creating an EPU and worker instances (OV-6)

The "busy" state of the EPU worker is where actual work is done.

While the EPU is guaranteed to be present and operational, that does not mean that all loads can be handled. There may be situations where it is

336 impossible to fulfill a request and in such case a message that is part of the interaction pattern will be fired off. That is, receiving back something like an "overloaded" fault is a necessary possibility in every service's formal interaction pattern. Thus, there is never a "meta" context that the service developer would also need to be aware of during development.

The EPU is not aware of the contents of the message, only its destination. ii. EPU as monitor

The EPU will monitor currently deployed operational units. Are they in a usable state? Are they idle? Are they at capacity or over capacity?

It will also subscribe to other sensor inputs. These will be named services or queue subscriptions on the messaging bus. iii. EPU as provisioner

The EPU may increase or decrease the number of operation units that are deployed for a particular service, the decision is made using that monitoring data, knowledge of pending requests, history based "guess work" and other sensor inputs. iv. EPU is itself a deployable type and operational unit

The EPU infrastructure is itself a deployable type and eventually realizable as an operational unit. It is deployed and shut down each time a new service version is brought online or decommissioned. That includes in the context of test systems or at any point in the life of the production system.

It is packagable and deployable, the same packaging and deployment customs, software registries and technologies that are used for any COI service will apply to the EPU software.

In a virtual machine implementation scenario (such as is planned), an EPU deployment will consist of at least two VM instances of an EPU controller (see next section). Further, a reliable service (an "EPU-ified" service) may end up being made reliable by a set of VM instances that are also "backing" another service. These are cost/resource optimizations (more implementation notes follow at the end of this document).

Because the EPU is itself written in the context of the COI Magnet (capability container), it is directly addressable and monitorable.

Most of the following use cases will be triggered by human intent but that is not a necessity (especially as the platform matures and certain best practices are deemed automatable):

Policy configuration The deployer may want to change the policy of a running system. The deployer may want to delegate the ability to change the running system's policies to a central mechanism (i.e., adjust the authorization rules of the management interface itself). The EPU controller will subscribe to sensors for monitoring data, the configurations about what to monitor and which AMQP queues to subscribe to could be configured through this management interface, not just through its initial deployment configuration.

Influencing state The provisioning decisions may be influenced by manual interventions. e.g., the EPU controller may be instructed to never acquire any more resources under any circumstances The dispatching decisions may be influenced by manual interventions. e.g., the EPU controller may be told to respond to all new requests with an "overloaded" message because it is being drained. This may be the typical way of upgrading an EPU controller's code, for example.

Availability

One of the primary requirements of a reliable service (an "EPU-ified service") is that it does not ever go down, it must "always" be available.

Strawman definition of "always" 0.001% (5 nines) of unanticipated downtime for user observable services Strawman definition of "user observable" A module's service interaction messages are the user observable services that the EPU "fulfills." Strawman definition of "downtime" All processes need to pick up messages within a certain time period i.e., in the EPU's case, it needs to have "handled" each message enqueued to it in the messaging bus within a certain time period The current idea of what that time period is: one or two seconds

Part of the provisioning work of the EPU infrastructure is to respond to monitoring data that indicates an EPU worker is overloaded or perhaps not responding outright. In such cases, it will divert messages from them and in the worst case scenario destroy them altogether upon evidence of corruption.

EPU Controller implementation notes

Because an EPU controller forms the backbone of the high availability plan for the entire CEI system, a special consideration of its implementation

337 is undertaken to illuminate the architecture as well as to analyze any design constraints dictated by implementations.

See candidate #1: EPU Controller Implementation 1

CIAD CEI TV Eucalyptus

Overview

The Eucalyptus technology forms the basis of the Ubuntu Enterprise Cloud (UEC), a community supported alternative to Amazon's EC2 (i.e. you can build your private cloud and manage it the same way as your AMIs, and expand into Amazon when scaling needs exceed current computational budget). The reason for exploring UEC is that it holds the promise of a mature, well-supported, and widely adopted technology in the future. With the release of Ubuntu Server 10.04 (April 2010), Eucalyptus is built into the distribution, and thus available to hundreds of thousands of users.

Domain Models

Eucalyptus offers a number of capabilities exposed through its Euca2tools. There are four classes of capabilities, namely for image management (AMIs), block storage (EBS), networking and security (IP addressing, VLANs), and managing virtual machines at run-time.

Virtualization

338 The default configuration for Eucalyptus under UEC relies on libVirt and KVM for building and managing virtual machines. However, Xen is also supported with proper kernels (starting with version 9.04, Ubuntu no longer provides prebuilt xen kernels, i.e. you have to recompile them yourself).

CIAD CEI TV Nimbus

This page describes Nimbus, as relevant for the OOI Integrated Observatory Network

TBD

CIAD CEI OV Execution Engines

Overview

Execution engines are a specialized kind operational unit capable of executing user provided process definitions of specific types, or performing other deaminized operational services.

Release 1 will have the following types of Execution Engine:

Name Technologies Description

PythonCC Python, Capability Container ( Python CC ) for Python processes and services. Currently based on the LCAarch code Twisted base; soon ioncore-python. Provides access to all ION services.

JavaCC Java6 Capability Container ( Java CC ) for Java processes and services. Includes ioncore-java library. Container server TBD. Provides access to all ION services.

WebUI JavaCC, Contains a Java CC plus the web user interface framework. Must have DNS alias to the Internet. Grails

MessageBroker RabbitMQ Message broker infrastructure of the Exchange ( RabbitMQ ). Prerequisite for any Capability Container to function.

Cassandra Cassandra This is the primary data storage technology for the system in release 1. This is the persistence layer, the system's "transaction database".

iRODS iRODS Will come later during this release. Similar to Cassandra, this is a data storage technology

Table 1. Release 1 Execution Engines

Note: Execution engines for specific types of user jobs (e.g. Matlab, Kepler, SQL stream processing, etc) and the related process (job) scheduling management service will be targed in release 2.

Domain Models

Figure 1 shows a domain model with the main concepts related to execution engines. The terms are defined in the CEI Overview.

339 Figure 1 Execution Engine Domain Model (OV-7)

Planned future execution engines for the OOI are:

Execution Process Description Engine Definition Language

Matlab Engine Matlab Embeds the Matlab application and provides the capability to execute Matlab scripts script

Workflow Engine Workflow Executes scientific workflows. Workflow execution can include automatic and interactive steps. Workflow (Kepler/Pegasus) engines include Kepler or Pegasus mapping to Kepler.

Antelope / UNIX UNIX This is an execution engine for embedded, low resource environments. Provides a UNIX based environment process with Antelope RTExec real-time scheduler, ORB ring-buffer and Datascope database for generic UNIX processes and Antelope ORB processes. Communication with the CI-COI occurs via an ORB-to-Exchange bridge.

MOOS with MOOS This is an execution engine for embedded, low resource environments. Provides a MOOS environment with optional process, MOOSDB for the execution of and interfacing to MOOS processes. This environment provides a MOOS-IvP IvP MOOSDB-to-Exchange bridge that connects the processes with the OOI message exchange for bi-directional behavior communication and control. Optionally, provides a MOOS-IvP-Solver environment for the deployment and module configuration of MOOS-IvP behavior modules. Configuration of behavior modules is provided through the MOOSDB-to-Exchange bridge.

CIAD CEI OV Process Execution Management

The Process Execution Management service is existing only in rudimentary form in Release 1. Other subsystem teams may implement it provisionally to their needs. A framework service implementation and implementation refactoring is targeted for Release 2.

Such processes are used for:

Data Management/EOI data publisher processes Data Management data consumer processes Instrument agent processes

Intent

Processes are assigned for deployment on designated capability containers. Processes must be suitable for the specific type of CC (such as Python vs. Java processes). After Release 1, other type processes exist and any Execution Engine may be selected.

The available capability containers are only specially designated Process Execution Containers. Such capability containers host no long-running service processes. A specific type of EPU for these containers exists. This EPU may be N-preserving or elastic.

Based on operator, user or system need, processes are scheduled for execution on selected existing or freshly provisioned capability containers. Processes may be short lived or long-running.

340 The Process Execution Management services need to maintain a registry of processes and their assignment to capability containers. If capability containers (or their hosting VMs) fail, these processes need to be respawned at other suitable capability containers.

NO multi-level scheduling is intended for Release 1. It is either an operator action to schedule processes to capability containers, or containers are selected randomly from available designated "empty" containers. Containers that qualify are empty containers that have no other purpose than hosting spawned processes.

CIAD CEI OV Registries and Repositories

Figure 1 shows an overview of various CEI registries and repositories:

Figure 1. CEI Registries and Repositories (OV-2)

CIAD CEI OV Taskable Resource Management

Taskable Resources

Resources Categories:

Information Resources (managed by the Data Management subsystem) Taskable Resources (managed by the CEI subsystem)

A Taskable Resource is defined as a special category of resource that can be scheduled, executed, monitored and controlled. Each taskable resource has an agent that can control the resource. There are special realizations of taskable resources that all share the common elements.

341 Properties:

Can be started and stopped. Started/stopped may have specific connotations such as deployed/termination, registered/unregistered. Are monitorable. Can be sent a control message.

Realizations/Subtypes:

Process (with subtypes task-process and service-process) Operational Unit HA Service

The following categorization of taskable resources exist:

Process Task process Service process Operational Unit Execution Engine Capability Container Complex Operational Unit HA Service A set of operational units that together provide a higher level service (complex operational unit) A set of operational units that form an EPU (complex operational unit that is providing an infrastructure component but is managed nonetheless) Application Service Agent

Figure 1 shows these categories of taskable resources and their dependencies. All taskable resources are managed by the CEI.

Figure 1. Taskable Resource Types and Dependencies (OV-2)

Management Activities

342 The following list defines the high level management activities for taskable resources, as performed by the CEI

Planning Provisioning Controlling Fault Monitoring and Compensation Registration

Figure 2 shows a decomposition of the Common Execution Infrastructure services. The core element is the Computation Planner, which receives Service Agreement Proposals from other services in the OOI that require computation. These Service Agreement Proposals contain the computation processing request as well as the exact conditions under which the plan should be executed. The Computation Planner enters a negotiation and agreement process with the requester. Upon agreement, the Computation Planner determines a Processing Plan and initiates the processing by triggering the Provisioner. The Provisioner brings actual Operational Units into being by interfacing with the specific host execution environment. The Computation Controller is the entity responsible for enacting the processing plan within the boundaries of the plan and the service agreement. It provides status about scheduled and ongoing computations to the Exchange for routing to the service requester. A Fault Monitor and Compensator is an independent entity monitoring any ongoing process and providing Fault Analysis information to the service requester.

Figure 2. Common Execution Infrastructure Services for Taskable Resource Management (OV-2)

Figure 3 shows a sequence diagram of how an operational unit gets provisioned through an interplay of Computation Planner, Provisioner and Controller and the newly provisioned Operational Unit.

343

Figure 3. Provisioning an Operational Unit (OV-6)

CIAD CEI OV Planner

The Planner service accepts requests for taskable resources and tries to meet them based on the resources available and existing policy. Note: this service is implemented only at a rudimentary level in Release 1.

Specific Planners exist for the following types of Taskable Resources in Release 1:

Elastic Processing Unit (EPU)

Service Interface

Operations

request_resource()

Message Types

TaskableResourceRequest ServiceAgreementProposal

Hierarchical Planning and Control

344 Figure 1. Hierarchical Planning and Control

CIAD CEI OV Resource Agent

Resource Agents are an central class of agents with the purpose of representing and managing (taskable) resources in the ION system. Taskable resources are resources with internal state and potential behavior.

In Release 1, the following types of Resource Agent exist:

Instrument Agent Operational Unit Agent. Represents one Virtual Machine Instance

345 Application Resource Agent. Represents one application (e.g. UNIX process on one VM) Capability Container Agent RabbitMQ Broker Agent Cassandra Node Agent Service Agent

Resource Agent Model

Figure 1 depicts a domain model describing resource agents for managed resources. The COI subsystem is responsible for defining and implementing a framework for resource agents and mechanisms to interact with these agents and to manage their associated resources. A Managed Resource may control and represent one physical resource. Alternatively, it could also be a virtual resource, such as a proxy for a Managed Resource, a Function Block (service), or an executable process. A Resource Agent represents the Managed Resource. The agent monitors and controls the resource, by maintaining a FSM representation of the resource's internal state. The agent also keeps track of the resource's relations to the environment, such as to an owner entity, an operating community, to users and external communities. Contracts and commitments are defined by the COI Governance Framework. The agent also advertises the resource's capabilities to the environment.

Figure 1. Resource Agent Model for Managed Resources (OV-7)

Figure 2 shows an illustration of Resource Agents representing resources. The figure shows a physical resource (such as a sensor) as well as a service resource. Resource agents themselves can be represented by proxy domains into another domain of authority.

346 Figure 2. Resource Agents (OV-1)

Service Dependencies

Figure 3 depicts the decomposition and model for resource agents.

Figure 3. Resource Agent services and model (OV-2)

See also

Resource Agent Interactions Taskable Resource Management

CIAD CEI OV Resource Agent Interactions

347 Resource Agent Model

The Resource Agent Model has been defined with the COI Resource Management services.

Resource Agent Interactions

A resource can be a physical resource, a service, an application, etc. The Resource Agent has responsibilities for monitoring, control, contract management, and to advertise the capability of the resource.

Generic Interaction Patterns

Generic patterns include Simple-Request, Request-Response, and Subscribe. In the following, we first show these generic patterns and then describe how they are used for Resource Agent interactions.

Figure 1. Simple Request interaction pattern (OV-6)

348 Figure 2. Request-Response interaction pattern (OV-6)

349 Figure 3. Subscribe interaction pattern (OV-6)

Monitoring Interactions

A Resource Agent can:

monitor resource events or states pull strategy - the Resource Agent asks explicitly for state getState(). Could be a full Request-Response pattern, where Resource Agent plays Requester role and Resource plays Participant role push strategy - the Resource Agent subscribes to changes and the resource fires events or notifies when state changes. Uses the Subscribe pattern, where Resource Agent plays Requester role, and Resource plays Participant role

350 monitor the messages exchanged by the resource. It can observe messages and build an internal state model for the resource get information from the host environment (JVM, host container, platform, cluster environment) regarding the resource. Could be a full Request-Response pattern, where Resource Agent plays Requester role, and Resource plays Participant role project resource state to the environment. Uses the Subscribe pattern, where the environment plays Requester role, and Resource Agent plays Participant role

Control Interactions

A Resource Agent can request the resource to execute a command (e.g., start up, shut down, etc). Could be a Simple-Request pattern (for resources that you cannot rely on to answer to commands) or a full Request-Response pattern (where resources will reject the command or execute it and send a result back). Resource Agent plays Requester role and Resource plays Participant role

Advertise Capability

At startup of the resource, the agent will update the Resource registry (or Service Registry) with the description of the resource (capability, interaction patterns supported, etc).

Life cycle:

Deploy phase: A deployer states the existence of an instance of a resource agent and an instance of a resource and the binding between the two Install/configure phase: Beforehand or the first time the resource agents runs an install and registers the resource capabilities Use phase: the resource agent starts the resource

Contract Management Interactions

For the general concepts of sending and receiving messages in the context of governance (contracts), see Capability Container interactions.

When a resource sends a message, it goes to the capability container who sets some headers and sends the message to the Resource Agent. The agent will update its knowledge base and act - the sender agent might block the message to go out, in which case the denial should propagate back to the application. If the agent does not block the message, it goes to the signer, who adds the signature. Then, the Messaging Abstraction component converts it to AMQP format and sends it to the Broker.

When a resource receives a message, it will have additional steps via the Message Validator and Policy Enforcement Point.

CIAD CEI SV Resource Agent State Machine

State machines are used within the Magnet framework, in agents and in services. All state machines follow the listed principles.

State Machine (FSM) Concepts

Event: A signal that can be communicated. An event is instantaneous (has no duration nor outcome) but can trigger (longer running) behavior. Events can have simple parameters Event producer: An entity that produces events, for instance caused by trigger conditions or conditions within the context Event consumer (Event handler): An entity that consumes events and causes changes to the environment FSM Trigger events: The set of all trigger events defined for a FSM. If any of these events occurs, the FSM evaluates its rules. Events can be registered as trigger events with the FSM. Not all events must be trigger events. State machine (FSM): Acts as both event consumer and event producer. Starts in a defined initial state. Represents a current state and decision logic of how to change that state. Whenever a trigger event occurs, the decision logic evaluates all transition rules defined for the current state. For one trigger event, exactly one transition will execute. Executed transitions may produce result events. State: Characterizes currently active behaviors or operational mode. Available behaviors change in the instant the state changes. Transition: Rule within a state machine that is defined by [0..n] transition guards, [0..n] trigger events, [0..n] result events, 1 source state and 1 target state. Source and target state can be the same. If more than one transition can execute for a trigger event, the FSM with choose one. Catch transition: Transition that is executed in case no other transition can be executed when a trigger event occurs. Used to realize total input-enabled state machines (FSMs that always react to a trigger event no matter what state and which event). Can realize error conditions handling. Guard: Conditional expression (a boolean statement) that can be evaluated based on available local information. Transition guard: A guard associated with a transition that determines if a transition executes ("fires") when a trigger event occurs. Result event: Produced in the instant the transition fires. Event consumers listen to events Action: Some behavior (e.g. code) that occurs within one state, triggered by an event. Actions take time while events are instantaneous.

State Machine Behavior

A FSM follows a basic life cycle: Define: The state machine can be defined and new states, transitions, guards, events, actions etc. can be added Ready: The state machine is fully defined and now immutable for the remainder of its life time. The state machine is in its initial state. Active: The state machine reactor is enabled and reacts to trigger events.

351 Suspended: The state machine reactor is temporarily disabled and does not react to trigger events. It remains in its current state. Once the state machine object is ready, no new transitions and other elements can be defined (it is immutable). The state machine starts in its initial state Once switched to active, the FSM immediately evaluates any executable rules for the initial state. Subsequently it reacts to trigger events. An active state machine can be suspended. From the suspended state it can be reactivated. It can also be reset to its initial state. Any defined trigger event will cause the FSM to evaluate all transition rules for the current state. If multiple transitions can execute, the FSM picks one. Catch transitions exist that execute in case no other defined transition can execute. By definition of the transition rules, not all transitions apply to all events. A transition rule only applies to a defined subset of the trigger events and additionally must satisfy all guard conditions The evaluation of guard conditions must not have any side effects The FSM produces [0..n] events as a result to an executing transition. The execution of the transition occurs mechanically by the FSM and is instantaneous. Actions can take time. Actions can be defined in reaction to events. Actions can have results that can lead to new raised events If multiple actions are defined for one event, the FSM will execute them according to a defined scheme (sequential, parallel, by priority, as event handling queue with possibility to consume the event, etc.)

State Machine Implementation

Types of event producers Message on queue with method: e.g. event_message(methodname,params) Time-scheduled event (e.g. timeout): event_timer(timerid, params) Action completed: event_action_completed(actionid, result) Types of event consumers Execute action Log event Types of actions Evaluate FSM (this makes it a trigger event) Publish event on exchange point Publish state change on exchange point Execute method (synchronously) Execute method (asynchronously) Etc State machine implementation Basic methods Add state Register trigger event Register result event Register guard evaluator Add transition, referencing states, trigger events, guards, result events Bind action to event Finalize state machine (set internal state to ready) The FSM embedding framework provides Add event producer type Add event consumer type Instantiate event producer, event consumer with parameters Add action Use of the state machine The embedding framework (Magnet, base resource agent) typically defines a basic structure of states, transitions between these and catch transitions. E.g. IEEE 1451.1 FSM Specializations of the FSM can then add further states, transitions etc, typically within the constraints imposed by the framework. FSMs can be chained, e.g using Magnet's Role concept: If a role is in a certain state (e.g. active) another role can be activated that has an own embedded FSM Instantiate message handling event producers (bind them to queues) Instantiate other event producers and event consumers as needed Register trigger events with state machine (bind FSM evaluate action to event) Bind result events to actions Bind result events to publish on queue

Use of state machines

Connected to messaging queues (through event producers), and publishes messages (as actions) As Magnet role FSM (for any messaging end point) As agent life cycle FSM (within a resource agent as Magnet Role) As resource life cycle FSM (within a resource agent as Magnet Role) As part of a resource controller: The controller subscribes to state change and event notifications from a resource (agent). It follows the state transitions of the resource and adds additional behavior. To represent an application's state outside of agents, e.g. in a service process such as the provisioner

CIAD CEI OV User Interfaces

352 CIAD CEI SV System Bootstrapping

Both CEI and COI symbiotically form the "Operating System" of the OOI Integrated Observatory Network. COI represents more the "user" part of the OI, CEI more the "system" part of it. Both COI and CEI have "kernel level" elements and "library/API" level elements. This page describes how the OOI ION comes into being. Both COI and CEI subsystems share responsibility of bootstrapping the system.

Overview

Bringing up the full system and its capabilities is a multi step process.

1. Bringing up the system to initial operating state This includes recovering any persistent state from previous system runs At this point the system's core infrastructure and services operate nominally but not at scale and not robust to failure 2. Bringing up services and processes as defined by deployment configuration. This includes bringing up the core infrastructure at scale bringing up non-core services and processes as defined 3. Maintaining steady state system operations, while ensuring reliability and responsiveness to user demand. This includes automatic monitoring of the system and compensating for failures and user demand interactive operator redefinition of deployment configuration and scaling policy interactive operator actions to bring down or replace parts of the system 4. Controlled shutdown of the system or parts thereof on demand

Bootstrapping to initial operating state

During bootstrapping, the system goes through following init levels:

Level 0: Pre-bootstrapping requirements met Level 1: Context Broker operational Level 2: Message Broker operational (not reliably or at scale) Level 3: Core services operational (not reliably or at scale) Level 4: Elastic computing services operational (not reliably or at scale)

In level 4, the system is in initial operational state.

Figure 1 shows an illustrative overview of the bootstrapping of the system to initial operational state.

353 Figure 1. System Bootstrapping (OV-6)

Bootstrapping Process

This section describes how the working system comes into being. By understanding the bootstrap it will also be clear what starts/monitors the CEI software. The previous sections of this document explained how other software is made available using the CEI software but that assumed certain CEI software was running in the first place (like e.g. the HA-Provisioner).

The operator launches a program called epucontrol that will carry out each of the following steps.

Follow along with the following picture, the picture has a time axis from top to bottom. Each of the following steps happens in lockstep. There are no parallel steps, each service/VM is launched, is then verified, and then the process moves forwards.

Consult the following diagram. The circled numbers correspond to section numbers in the subsequent content of this document.

354 6.0 Assumptions

Before it starts, epucontrol has access to the following information:

Security information required for each launch All current information for Deployable Type Registry Service TODO: enumerate everything else needed

6.1 One context broker instance

It brings just one context broker online in bootstrap mode. This launch can only use the IaaS provided contextualization mechanism (i.e., Amazon EC2's user-data or equivalent) because there is no context broker available yet but the context broker itself needs to be contextualized.

6.2 Validate context broker

It waits for the context broker to come online and makes a test call (sanity check). We want it to fail-fast, there should be no "debugging" later to find out that the root problem was that the context broker failed. This is a general principle for the whole bootstrap process.

6.3 One Messaging Service instance

It brings one Messaging Service operational unit online (we will move in mid-term to starting two as a virtual cluster using the context broker).

6.4 Validate Messaging Service instance

It waits for the Message Service to come online and runs a COI provided suite of sanity checks and initial configurations.

6.5 COI core services

It brings several core COI services (Data Store, Registries of various kinds) online in bootstrap mode. In the same "batch" of services is the Deployable Type Registry Service (DTRS)

Knowledge of the DTRS data already allowed the program to complete the previous steps (no component needed to consult a DTRS in order to know what exact thing to launch). This part of the bootstrap needs to be coordinated tightly with COI subsystem

6.6 Validate COI core services

It waits for the services to come online and runs a COI provided suite of sanity checks and initial configurations.

355 This includes recovering any existing persistent state or transaction log from a preceding system run, unless a clean start is required.

It also seeds DTRS with data (and runs sanity checks).

6.7 Base CEI instance

It launches one "Base instance" that has a EPU-Controller, a Sensor Aggregator, and a Provisioner-Provisioner for the HA-Provisioner service, seeding it with IaaS credentials.

The EPU-Controller for the HA-Provisioner has a Provisioner-Provisioner in it that has IaaS credentials and the deployable type needed for Provisioner instances That Provisioner-Provisioner is always used to start Provisioner instances, not the HA-Provisioner that the EPU-Controller is making highly available.

The HA-Provisioner has to be bootstrapped and built differently than other EPU-ified services because the EPU-Controller for the HA-Provisioner cannot rely on the HA-Provisioner service to do work that it needs. Instead it relies on an on-board Provisioner.

Consult the following diagram:

6.8 Validate Base CEI instance

It waits for the base CEI instance to come online and runs tests.

6.9 Other services

Any other high availability services are now brought up with their specific EPU Controller and Sensor Aggregator instances (see below). The HA-Provisioner is used normally as discussed in this document. Any EPU controller and sensor aggregator can be run and they will interact with the HA-Provisioner that has been brought online.

Now is when redundant Messaging Service and COI Core Service instances would also be brought online.

6.10 Bootstrap program serves as master supervisor

Finally the epucontrol program will daemonize itself and serves as a supervisor for all of the nodes that it launched. This is called the bootstrap supervisor and is a catch-all fault monitor. In the future, supervisors will be operational units themselves and the "root" of watching the entire system will be the responsibility of a staffed operations team. The epucontrol program will register those supervisors with the appropriate mechanisms and exit instead of daemonizing.

Bootstrapping System Services

Assumptions

356 System in initial operational state Message Service operational nominally in single instance configuration Core services are operational nominally with single instances CEI elastic computing services operational

Bootstrapping Services Process

This includes additional instances for core services and CEI elastic computing services, as defined by policy.

Ring 1: Messaging Broker at scale Ring 2: Data storage infrastructure at scale Ring 3: Central COI registries Ring 4: Central COI controllers Ring 5: Subsystem Registries and repositories Ring 6: Subsystem controllers Ring 7: External interface agents Ring 8: Web user interface

Core Services

Data Store Service (COI): Provides access to storage backends Resource Registry Service (COI): Register and find information about any resource type and instances in the system Deployment and Configuration Repository: provides access to existing deployment policy and configuration specified by the operator or test instance Exchange Space Service (COI): Register and find names in Exchange Spaces Service Registry (COI): Register and find service (types) and service instances in the system User Identity Registry Service (COI)

Elastic Computing Services

Planner Service (CEI) Provisioner Service (CEI) EPU controller Service (CEI) EPU registry

Additional Services

Authentication Service (COI) Interaction Pattern Repository Service (COI) Conversation Repository Service (COI) Web User Interface

CIAD DM Data Management

Data Management (DM) Subsystem Architecture & Design

This is the central page for the DM subsystem architecture and design, a part of the OOI Integrated Observatory Network. This page references Data Management services and information models, which inside are structured into operational views (OV), system views (SV) and technical standards views (TV).

DM Overview

Distribution Services: Moving information and science data in canonical form across the network Distribution Services Overview Topic Exchange Notification Framework and Events (SV) Information Models: Information Model Technologies (TVs): DAP Data Type Representations (Encodings)

Inventory Services: Keeping track of information and science data in the system. Strongly interconnected with the Resource Registry. Inventory Services Overview Information Resource Management Associations between Resources (SV) Information Model: Science Data Model See Also: Common Object Model (COI), Data Store Service (COI)

357 Preservation Services: Storing, retrieving persistent information in the system and outside of the system Preservation Services Overview Persistence Implementation Architecture (SV) Content Addressable Store (SV) Cassandra Schema Specification (SV) Virtual File Store (SV) Technologies (TVs): Cassandra , iRODS , GIT

Ingestion Services: Adding new information and metadata to the system as a whole or in increments Ingestion Services Overview Ingestion Service R1 (SV)

Transformation Services: Transforming between data representations Transformation Services Overview out of scope for Release 1

Presentation Services: Extracting information from the system for consumption by end users Presentation Services Overview Catalogs out of scope for Release 1

Cross-Cutting Concerns: User Interfaces

Quick Links

Subsystems: COI CEI DM SA AS PP

358 CIAD DM OV

Introduction

The Data Management subsystem provides the dynamic data distribution network for data products and metadata based on OOI-CI common data model. The subsystem architecture is hierarchical, organized by logical services which provide the functional capabilities of the system. The infrastructure services of data management provide for storage and inventory of information in the system and the mechanisms to distribute it to the application level services. The application services ingest raw data from sensors and models into the OOI-CI data model and represent it to the scientist as requested. The interaction between services is accomplished by passing messages. Data flows through the system of services in messages as a stream to which one service publishes and other services subscribe. The pub/sub model is the basis of the architecture, allowing governance and federation in a distributed scalable system.

Capabilities

The Data Management subsystem will provide the following capabilities:

Provision, manage and present data and metadata supporting the OOI domain model and data model Provide for syndication of data through publication and subscription of data Policy-governed data access User-defined data presentation Provision, manage and present data repositories, collections and streams Negotiate and manage federation of data repositories, data collections and data streams Negotiate and manage delegation of data preservation and presentation responsibilities Maintain and ensure the integrity of data in perpetuity Complex querying across and integration of geospatial, temporal, spatiotemporal, relational and ontological (tree and graph structures) resources Present, find, exploit and annotate data based on a semantic frame of reference Provision and exploit sharable semantic frames of reference Provision and exploit sharable mappings between different semantic frames of reference (i.e. crosswalks between multiple ontologies)

Decomposition

Figure 1 shows shows the decomposition of the Data Management subsystem into logical services: ingestion, transformation, presentation, distribution, inventory and preservation. All services depend on distribution to define data streams, i.e. to subscribe and publish data streams and to register for notifications. Once these data stream definitions are made, the actual behavior occurs by exchanging messages on these defined data streams.

359 Figure 1 Data Management logical service decomposition

From the point of view of science applications, the critical services are ingestion and transformation. The Ingestion service acts as a bridge to the Sensing and Acquisition subsystem and the Analysis and Synthesis Subsytem. Its main responsibilities are initial data parsing, initial metadata extraction, registration, and versioning of received data products.

The Data Management subsystem presents its capabilities through an interface to the Sensing and Acquisition subsystem, the Planning and Prosecution subsystem, and the Analysis and Synthesis subsystem. There are also direct user interfaces for Data Management Capabilities. The Data Management subsystem relies in turn on the services of the Common Operating Infrastructure and the Common Execution Infrastructure.

The Transformation service handles the data content format conversion/transformation, mediation between syntax and semantics of data (based on ontologies), basic data calibration and QA/QC, additional metadata extraction, qualification, verification and validation.

The Presentation service enables data discovery, access, reporting, and branding of data products. For data discovery, it provides the mechanisms to both browse/navigate specific data products and search/query of them based on specific metadata or data content. To access the data, a client may specify various subsetting or aggregation constraints to tailor the received data products to its specific needs. Any transformation between the internal representation of the data and that requested by the client is handled by the Transformation service.

The distribution, preservation, and inventory services are infrastructure-related services.

The Distribution service is a projection of the COI capabilities, with the main purpose of establishing a publish-subscribe model of communication that ensures distributed data delivery to all end-points. Hence, its primarily responsibilities are enabling registration of published datasets, subscription-based data access, data transportation with adequate routing, notification services based on subscription, and real-time (as defined by OOI data policies) data streaming.

360 The Preservation service is responsible for data replication, preservation, and archival/backup as defined by OOI policies. At its core, the preservation service provides a federated file system in which it is possible to meet these needs. The separation between the Preservation and the Inventory services allows optimization of actual information storage vs. information access and metadata handling.

The Inventory service provides the cataloging, indexing, and metadata handling capabilities required for data ingestion and retrieval. Indexing can be multidimensional, spatial, temporal, or based on content. From the point of view of the metadata, it provides annotation, provenance, derivation, and lineage capabilities. As a registry of resources, this service has strong dependencies on both COI and DM services.

Dependency

The architectural model of the DM subsystem is a hierarchy of services. The hierarchy is intended to limit dependencies (Figure 2) between the Data Management subsystem's logical services. The high level science services logically depend on the infrastructure services. The figure does not express the flow of information in the system. The messaging infrastructure is an assumed dependency for all communication which is not shown here.

Figure 2 Data Management Needlines (OV-2)

Work Products

The Work Products provided by this subsystem are shown in the Table 1.

ID Service Explanation Releases

1.2.3.4 Data The subsystem responsible for providing life cycle management, federation, preservation and R1, R2, Management presentation of OOI data holdings and associated metadata via data streams, repositories and catalogs R3

1.2.3.4.1 OOI Provides the common data and metadata model for the Integrated Observatory into which all integrated R1, R2 Common data products must translate, if required, for shared syntactic and semantic access. The scope of Data and syntactic representation of observed data shall be extendable and comprehensive. The scope of the Metadata observatory metadata model and semantic representation shall be extendable yet constrained in Model implementation to at least meet all data requirements imposed by the set "Core" OOI sensors and their associated QA/QC processing.

361 1.2.3.4.2 Dynamic Provides publication, subscription, and query services associated with variant and dynamic data R1 Data resources. Used in combination with the Processing Service to drive the policy decision to execute a Distribution process. Services

1.2.3.4.3 Data Catalog Provides registration, indexing, and presentation services to collect and organize data holdings with their R1 & Repository associated metadata for an individual, group and/or community. Services

1.2.3.4.4 Persistent Provides cataloging, preservation, validation & curation services to organize, persist and maintain data R1, R2 Archive holdings with their associated metadata for an individual, group and/or community. Services

1.2.3.4.5 Search and Provides query and browsing services by context based on the content, metadata and semantics of the R2 Navigation data holdings. Services

1.2.3.4.6 External Provides an extensible suite of access interfaces and data formats for interoperability with external R2 Data Access communities and applications Services

1.2.3.4.7 Aggregation Provides for the classification, categorization, and general grouping of data into collections. R3 Service

1.2.3.4.8 Attribution Associates and retrieves attributes to resources. The attributes can be associated within a semantic R3 and context (ontology). The service facilitates the characterization, qualification, and general commentary Association about the elements with which the participants interact. Services

Table 1 Work Products

The focus of release 1 is to provide an initial common data and metadata model, to establish the core architecture of DM services and to enable data ingestion from external data sources, dataset registration and distribution as data streams, and initial persistence of datasets on disk.

The focus of release 2 is on completing the data/metadata model, advanced persistance and replication, data access and the use of semantic technologies for discovery and mediation.

The focus of release 3 is on advanced metadata association services.

CIAD DM OV Distribution

The Distribution service provides the pub/sub/notify mechanism to manage data streams. Such data streams are not required for general service to service exchanges, where the pure COI Exchange mechanisms are used.

362 Figure 1 DM Distribution Service

The UI provides a mechanism to control the registry of publication or subscription.

The Publish Registration service enables the definition of data streams for publishing data. This service is not used for publishing actual data packets on an already-defined stream.

The Subscription Registration service enables the registration to data streams in order to receive data (subscription). This service is not used for receiving actual data packets on an already defined stream.

The Notification Registration service enables the registration for events (notifications) about the availability of new data, the chance of data or metadata and similar events. This service is not used for receiving actual events on an already defined stream.

The Data Stream registry is a specific realization of a Resource Registry which keeps track of data stream registrations by reference to their name in the inventory.

The Data Stream Routing service establishes a connection between publishers of messages on a data stream and subscribers, and potential notification targets. This service does not do actual routing of messages across the system, which is a concern managed by COI.

363 Figure 2 demonstrates the activity sequence for the three key processes of the data management subsystem, Register, Subscribe and Publish.

Figure 2 DM Registration, Subscription, Publication Activities

CIAD DM OV Information Model

Domain Models

Datasets in transit through the OOI-CI are represented as messages carrying information containers. The structure of an information container is depicted in Figure 1. There are several levels of abstraction that allow different services to operate efficiently only on the information pertaining to their functionality.

Figure 1 Information Container Overview (OV-7)

The science related services Ingestion and Transformation operate on the content of the container. Within the Ingestion service, the Data Format Detector operates at the level of the information container to identify what kind of information is being transported. Subsequently, the Ingestion Data parser inspects the content of the information container and analyzes the information block. The Metadata Extractor provides relevant information regarding the format of the information block, and supports services such as Versioning.

The Transformation service provides further inspection of the information within the information container during transit, and enables scientific

364 data transformation that involves parsing, metadata extraction, syntactical format conversion, mediation based on data semantics using ontologies, and information verification and validation. Within the Transformation service, the Data Parser operates at the successive level of the metadata that describes the content of the information block (see Figure 2). Similarly, the Metadata Extractor handles the semantics of the content of the information block.

Figure 2 Information Container Model (OV-7)

The format conversion service applies simple syntactic transformation rules for datasets in formats different from OOI canonical formats. It operates on the third level metadata (header) that describe the body of the information content block. The body may well contain process specifications or scientific data.

From the point of view of the infrastructure, the information container is the main data entity. Hence, services such as Preservation operate at the level of the entire information container, regardless of its actual content (see Figure 3). The L0 metadata associated with the information container are the only designator of the purpose of a container and of its content (e.g., process definition, resource reference, data product). Such metadata are highly infrastructure-specific and clearly differentiated from the scientific metadata. The Operations and Maintenance services are the only ones that write such metadata as they pertain to deployment and distribution concerns and overall CI operation.

Figure 3 Preservation Model (OV-7)

All information entities are recorded in repositories of various kinds, depending on the type of information from them and its purpose. Repositories contain representations of information types that have well-defined (standard) syntax and semantics. The History service operates on the representation to identify the order of information in the repository, and provides a means to seek and access the information in the requested order. The Archival service is the default service invoked for any in-transit dataset. It creates a record of the dataset into a repository corresponding with its information type. The Backup service may be invoked at times defined by the Operations and Maintenance services to duplicate repositories or parts of them to off-site backups. The Replication service relies on a Distribution Strategy to create and maintain synchronized copies of all parts of each repository in various locations throughout the CI to prevent information losses. The Distribution Strategy may take into account L0 metadata with regard to information type, access and availability policies, or quality of service requirements.

365 Domain Models

Figure 4 specifies the ingestion of observational data and other information products into the Integrated Observatory system together with all metadata, ancillary data and cross-referencing.

The Data Format Detector operates at the level of the information container to identify the kind of information that is being transported. For instance, it may determine that the container has scientific data, although at this level of abstraction, the exact type or its representation is not relevant.

Figure 4 Science Data and Information Ingestion Model (OV-7)

The Data Parser inspects the content of the information container by analyzing the first level metadata with the help of the Metadata Extractor and then deciphering the information block associated with the container. The Metadata Extractor provides information about the format of the information block and how to decompose the content of the information container into relevant parts.

The Versioning service operates on versioning information in metadata. It keeps track of the version of any particular dataset and adds versioning information on new datasets. The Registrar service operates on the information model for ownership, authorship, and policies that are associated with a dataset in transit. For instance, upon encountering a new dataset, it registers the associated information into the corresponding repositories.

The Transformation service provides further inspection of the in transit information container and enables scientific data transformation. Figure 5 specifies the transformation and mediation of science data based on observational data and derived data products, but also covers other information products. Transformation involves parsing, metadata extraction, syntactic format conversion, or mediation based on data semantics using ontologies and information verification and validation. When using the Data Parser at the level of the Transformation service, the metadata describes the content of the information block.

Format Conversion applies simple syntactic transformation rules to datasets in formats different from OOI canonical formats. For presentation purposes, it also enables the reverse transformation from an OOI canonical format into a well defined set of supported output formats. For conversion, it operates on the third-level metadata that describe the body of the information content block.

The Mediation service relies on a set of standards and ontologies to perform semantic transformation of datasets into other data products that can datasets for input to other services. For instance, it may transform a temperature measurement called "temp" expressed in Fahrenheit degrees into a canonical representation called "surface temperature" expressed in Celsius or Kelvin degrees. Through the exchange mechanism, various services may augment the Mediation service to perform complex transformation such as projections, mappings, coordinate transformations, translations, etc.

366 Figure 5 Science Data and Information Transformation Model (OV-7)

CIAD DM OV Topic Exchange

The Topic Exchange is a logical extension of the COI Exchange service.It leverages the Exchange's resources such as Exchange Spaces and Exchange Points, as well as the underlying messaging infrastructure (AMQP or hardware messaging). It adds the concept of Topics , which are trees or structures of information that interested parties can publish to and subscribe to. The Topic Exchange service then performs the mechanics of matching subscribers with publishers and for distributing and routing the information through the system

Resources

Topic

Topic tree Association with Science Data Model

Subscription

Query Expression Delivery Modality

Topic Exchange Roles

Topic Owner

Publisher

Subscriber

Infrastructure

Service Decomposition

PubSub Controller

PubSub Registry

367 Behavior

TBD

CIAD DM SV Common Data Model

The OOI CI Common Data Model is inspired by Unidata's Common Data Model and represented using Google Protocol Buffers (GPB). This page describes the core protocol buffer objects and associated models. Note the use of "optional" keyword in the code snippets below – in the GPB specifications it actually denotes "required".

The OOI Common Data Model is implemented based on the OOI CI Common Object Model, for representing complex structures of data objects and their representation in memory, on the wire for transport and on disk for persistence.

Datasets

The CI CDM supports datasets of various sizes and complexities. For uniformity reasons, datasets are identified through their root group – head of the tree which represents the actual dataset. A dataset may contain any collection of data structures and associated meta-data (attributes).

Basic data types

The CI CDM defines a set of basic data types which can be used to exchange information between any CI services regardless of their implementation language or platform (i.e. no need to worry about little vs big endian encoding).

368 Basic Arrays

The CI CDM provides signed and unsigned integer arrays both 32 and 64 bit wide, floating point arrays both in 32 and 64 bit precision (i.e. regular float and double), string arrays, and opaque arrays (i.e. the internal data representation of the opaque arrays is not exposed at the element level). Complex array types such as ArrayStructure and BoundedArrays are also defined (see below under CDM Variable).

Attributes

The CI CDM defines a set of standard attributes which can be associated with a data structure. The attributes represent meta-data whereas variables contain the actual data.

369 Dimensions

The CI CDM allows for multi-dimensional data structures to enable rich representations of ocean data (e.g., 3D and 4D datasets).

Groups

The CI CDM supports collections of objects through groups. A group is a container for attributes, dimensions, variables, and even other groups. Any dataset contains at least one group.

Variables

The CI CDM supports complex data structures such as Sequences, Structures, ArrayStructures, BoundedArrays, which are implemented through data references. For instance, a Structure contains a number of members as variables, which can point to other structures as their content. Also, the ArrayStructure contains a list of BoundedArray objects, which can point to any basic or complex data structure.

370

An Example Dataset

A CDM Dataset is composed of Dimensions, Variables and Attributes. Attributes are Metadata which describe variables or datasets. Variables are arrays of values of a particular type which have a specified dimensionality. Dimensions are integers that describe the length of an array in a particular dimension.

A Sensor Observation Service Trajectory Profile is a dataset which includes variables as a function of time and space.

Time and Depth are independent variables. These independent variables also have dimensions of the same name. Latitude and Longitude are dependent variables - functions of Time Salinity is a dependent variable which is a function of Time and Depth. Because both Salinity and Lat/Lon are functions of Time there is an implicit relationship through the 'shared dimension'

The dataset as well as each variable in the dataset has a number of standard meta data attributes. Each attribute is a value or list of values of a basic type.

Unidata defines the CDL as a language to encode a data structure in a human readable form.

Unidata also defines NCML as an alternative which is both human and computer parse-able:

371 ]]>

The example above translates into the following representation using the OOI-CI CDM. The red arrows are used to highlight references (Link.CASRef) to other message types. Each message is an independent blob that is addressable in the system. Note that the diagram below is

372 incomplete as it gets too large if the ArrayStructures and their content are included. The way in which these arrays are blocked is arbitrary - based on the block size and access strategy.

373 374 CIAD DM SV Notifications and Events

Overview

The event framework is basic infrastructure provided by the DM team to publish events through the data distribution network, for any interested parties to consume. Events have a defined type and type specific structure and identify the origin of the event. The event framework is based on the PubSub Framework.

Event Framework

Event Publisher

Events are published by various items in the ION system with no knowledge or requirement that anything be listening to them. The EventPublisher base class defines a specially derived Publisher class that can be used to create and publish event notification messages. By default, events are published to the exchange point 'events.topic'.

You should not use an instance of EventPublisher directly, rather, use one of its derived implementations, such as ResourceLifecycleEventPublisher.

Call create_event and then publish_event (or the combination method, create_and_publish_event) to send an event notification into the system. The create_event method takes kwargs which set the message fields. This is meant to be used in a convenience fashion, you may still alter the message create_event returns using normal message semantics.

The message sent by EventPublisher is the basic EventMessage (id 2322) which contains common information about an event, and the additional_data field is defined as another message, specific to the type of event being published.

Message Construction

When using an EventPublisher derived class, you do not need to create instances of Messages yourself, the EventPublisher class wraps this for you with create_event. It is smart in that you pass keyword arguments (kwargs) to this method, and it uses this to fill in the fields on both the base EventMessage and the message in the additional_data field. You do not have to use create_event to set these params, nor do you even have to use create_event at all to generate the message sent. It is provided as a common use method. Some samples:

Enums

GPB lets you define enumerations and use them as the type of a field in a message. Typically, you need an instance of the message in order to get the enum value to set the field with, but in the case of create_event, you do not have a message at this point. EventPublisher and derived classes should provide convenience classes to allow you to pass in enum value names (as strings).

See below in implementation notes for more information about enums.

Example Publisher Usage

Here, we will use a ResourceLifecyleEventPublisher to update the status of a resource.

Implementing a new EventPublisher

If you want to define a new Event Notification, follow these steps:

Define a new event_id and add it to the table below. Define a new event message in ion-object-definitions, net/ooici/ion/services/dm/event.proto. Define a derived EventPublisher and EventSubscriber which fills in the event_id and msg_type class fields. If your message contains any enum fields, you should define convienence classes to allow the user to set the value of those enum fields in create_event without needing to have an instance of the message ahead of time. These classes should follow this model:

375

Then, when calling create_event, you can use:

Alternatly, you may set the field in two steps:

Event Subscriber

The EventSubscriber is a custom Subscriber derived class for handling event notification subscriptions. If you are a process who is interested in acting on an event that another process may send at some point, create an instance of an EventSubscriber or a derived EventSubscriber that fills in the event_id type you are interested in.

Unlike the EventPublisher, the EventSubscriber is a usable class on its own, but derived versions that match 1:1 with Publishers are the typical use case.

An EventSubscriber is capable of listening to all events belonging to its event-id, or can be tuned to specifically listen to one event-id and origin pairing. See "Mapping to Topic Exchange" below for more information about the binding key setup.

Example Subscription

Mapping to Topic Exchange

Exchange Point: events.topic

Routing Key: .

An event-id is defined on the left side of the table below. The origin varies depending on the message type.

Event Hierarchy

The following table lists all types of events and their typical origins and sinks.

Release1:

Category Event Type Event Event msg object ID EventPublisher base subclass Origin in topic Source Observers ID (eventtype.origin)

Lifecycle Resource life 1001 ResourceLifeCycleEvent ResourceLifecycleEventPublisher resource id (UUID) Resource related cycle registry service, Datastore service (resource instance)

Container life 1051 ResourceLifeCycleEvent ResourceLifecycleEventPublisher container name CC Agent ? cycle

Process life 1052 ResourceLifeCycleEvent ResourceLifecycleEventPublisher process exchange CC Agent ? cycle name

376 Resource Data source 1101 eg. TriggerEvent TriggerEventPublisher data source External Dataset related update resource id (UUID) data agent source (script)

Dataset 1111 ResourceModificationEvent ResourceModifiedEventPublisher dataset resource Ingest ? modification id (UUID) service

Subscription 1201 SubscriptionEvent ResourceModifiedEventPublisher dispatcher_id AIS Dispatcher, New/Modification email service

Service Schedule event 2001 eg. TriggerEvent TriggerEventPublisher TBD Scheduler related service

Logging Error log entries 3002 LoggingEvent LoggingEventPublisher Logging Logging related framework Service within container

Critical log 3001 LoggingEvent LoggingEventPublisher entries

Future:

Category Event Source Observers Notes

Presence Login * SSI Audit subsystem -

- User CMDs profile SSI, user registry service Auditing, potentially Social networking for science - might be a other users sales point

- User CMDs data SSI, pubsub registry Auditing - subscription

Datasets Registration ** DM controller service Many! User probably registers new source via website or API (MATLAB)

- Creation * DM controller service, Many, ingester for Ingester must grab and parse the metadata Server-side processing sure! for semantic classification system (Ferret, GridFields)

- Dataset start - DM Controller, or perhaps Ingestion service - syndication service

- Dataset block - Fetcher Preservation - service, ingestion service

- Dataset error - Fetcher, ingestion svc, Logger How do errors back- preservation propagate to the user?

- Dataset end - Fetcher Ditto Possibly creates 'new dataset' event?

- Dataset preservation Preservation service Dataset registry - completed -

- Dataset polling - Syndication System monitor DDN1 use-case, periodic polling and look for changes via mdsum(metadata)

- Dataset change detected Fetcher, syndication Controller Who takes receipt and generates a request ** to pull the new data?

- Data read Proxy, user, SSC service Metrics (cache hit ratio)

Metrics Cache hit Controller, presenation service Service monitor, - operator interface

- Cache miss Controller Ditto -

377 - Cache fill Controller Ditto In the DX model, as a matter of policy a miss could trigger an automatic fill. May not be applicable here.

Data Register * Datastream controller or Ingestion svc Many more possible consumers once stream registry dataset is classified

- Start of data - Fetcher, instrument service Preservation, - pub-sub (&subscribers)

- Block of data - Ditto Ditto -

- End of data - Ditto Ditto May be impossible (infinite/unbounded data stream) or never happen

- Delisting/removal * Ditto Ditto Gotta have a way to remove a source

- Transformations Transformation service Unknown Speculative - possible use cases include WMS/KML generation, other ERDDAP-type tranformations.

Ingestion Dataset tagged/classified Data listeners Is this how the XML data header is created?

Audit Login OK * SSI Auditor, monitor -

- Login failure * SSI Ditto -

- Access OK SSI Ditto -

- Access denied * SSI Ditto, more urgent - though

- Role CMD SSI Ditto E.g. 'admin adds abilities to user X'

Pub-sub & Subscribe * Pub-Sub controller service P-S registry - syndication

- Unsubscribe * P-S controller svc P-S registry -

- Query P-S controller or registry Metrics and datasets/datastreams monitoring

Errors Security related SSI Audit log, sysadmins More on this in the Audit and Presence sections.

- Filesystem Preservation service, SRM E.g. 'no space on agent, PaaS agent, OS device', 'permission denied', etc.

- Network Any and all, primarily first Audit & logging AMQP errors, TCP errors, DNS. seen by COI code

Logging LOG. All P-S to developer Mr Hyde explains. Unknown macro: consoles {DEBUG,ERROR,INFO...}

**

CIAD DM SV R1 Data Distribution Specification

PubSub Resource Data Model

Diagram relationships between: ExchangeSpace, ExchangePoint, Topic, Subscription, Publisher, Owner

PubSub Controller Service

CODE BLOCK WITH OP INTERFACE AND MESSAGE OBJECTS

378 Publisher Client

Subscriber Client

CIAD DM TV DAP

DAP Protocol

Figure 1 provides a domain model showing DAP protocol entities and their dependencies, grouped into topical areas. DAP allows for simple exchange of scientific data using a common format (syntax). The protocol operates on top of another transport protocol such as HTTP that establishes the actual conversation between peers. The conversation partners take the roles of client and server, with the client requesting access to data or metadata and the server providing an answer to the request. Such answer can be both positive - returning the requested data/metadata, or negative - the request cannot be fulfilled.

A DAP request specifies one method such as DDS, Server, Help, etc, that triggers a specific behavior in the server trying to answer the request. DDS, DAS and DataDDS requests imply operations on the data or metadata accessible to the server (also known as dataset).

Figure 1 DAP Protocol Model (TV-1)

DAP provides a clear separation between the protocol specific aspects, data representation, metadata representation and the type domain for these entities. Figure 2 shows a domain model with the data entities and relationships of the DAP protocol. Variables represent the actual data containers provided by the protocol. The allowable values for the variables are restricted by the type domain of their individual types (e.g., integers cannot be 128 bit wide). Attributes represent the mechanism for storing metadata. Similarly with variables, they have names, types and values, with the difference that the attributes may only have atomic or structure types.

379 Figure 2 DAP Data Model (TV-1)

The type system implemented by DAP is very powerful in that not only it provides complex data types such as arrays and grids, but it also enables the definition of arbitrary complex types such as nested arrays, ordered structures, multidimensional grids, etc. A domain model of the type system is depicted in Figure 3. The dataset projected by a DAP server is actually a special kind of structure that encompasses all stored data and metadata.

380 Figure 3 DAP Data Types

CIAD DM TV Google Protocol Buffers

The Google Protocol Buffers (GPB) specification is currently being used as a formal way of defining the representation of any messages exchanged through CI. GPB does not provide only an encoding scheme, but a structured way of defining message formats (nested or plain) and services, plus a compiler suite (for Java, C++, and Python) that converts these specifications into service stubs and proper encoding/decoding classes. The descriptions are extensible with support for future extensions, optional parameters, comments, etc. The CI Architecture Team

381 analyzed a number of candidate technologies (see Data Type Representations) before choosing GPB for the implementation of CI R1. Th following is a domain model for GPB capturing its main characteristics.

PubSub resource model

Introduction

This page is to document the resource model in the DM PubSub Controller.

Resources in the PSC

ExchangeSpaceRes ExchangePointRes TopicRes PublisherRes SubscriberRes BindingRes QueueRes

Namespace hierarchy

The R1 hierarchy in the broker is exchange space -> exchange point -> topic where '->' means 'has zero or more'

Associations

Up the tree from the leaves:

Bindings are associated with a queue Queues are associated with a topic, XP and XS Topics are associated with an XP and XS (up the tree) XPs are associated with an XS

382 Top down:

XSs have zero or more XPs XPs have zero or more topics Queues have one or more bindings Publishers are associated with an XS, XP and topic Subscribers are associated with XS, XP and topic

CIAD DM OV Ingestion

Ingestion Service

The Ingestion Service (depicted in Figure 1) provides the basic mechanisms for identifying data streams and formats, parsing their content, identifying their associated metadata, adding version information, and registering the streams in the inventory. The distribution service acts as an orchestrator of the message flows between the internal services of the Ingestion.

The activity of the ingestion service is orchestrated by the ingestion coordinator service. The coordinator controls the flow of information through the service. The UI provides access to and control of the ingestion service. This would be the primary interface through which users would add or remove external data sources from the CI.

The Data Format Detector service identifies the protocol/format (e.g., DAP, SOS) of data entering the Ingestion service. Specific signatures (e.g., MIME types) associated with types of data stream are retrieved from the inventory. The identification of the exact data type for each data product enables proper parsing of the information content and transformation to the canonical form recognized by other OOI services.

The Metadata Extractor service provides mechanisms for identifying the metadata associated with each data product, parsing and correlating them with additional information (e.g., OOI global timestamps, location services, tagging, etc.).

The Versioning service allows the addition and updating of versioning information associated with each data stream. It also provides the means to automatically reflect changes in metadata as dictated by OOI policies (e.g., location of instrument was changed).

383 Figure 1 Ingestion Service (OV-2)

The Data Parser service performs basic parsing of a data stream, sanity checking, and automated QA/QC (e.g., checking type and range of variables). For data streams using OOI canonical formats, the service could simply pass through the information to other services.

The Registrar service records information about new streams, changes in stream metadata, and versioning information. It acts as an information provider to the Inventory service.

The Fetcher retrieves external datasets which are not available by subscription.

Ingestion message Sequence Diagram

The message sequence diagram is the first step to laying out the exact interface through which the services of ingestion interact.

Figure 2 Ingestion Service (OV-2)

CIAD DM SV R1 Ingestion Service Specification

384 Overview

The DM Ingestion service in Release 1 provide operations to setup data ingestion as well as to perform actual data and metadata increment ingestion.

The Ingestion service depends on the DM PubsubService for setting up ingestion exchange points and topic trees.

Service Operations

Operation: setup_ingestion (Are we using DataSet Controller for this?)

Request Message Type: Name, GPB ID Response Message Type: Name, GPB ID

Creates a dataset resource, defines a topic in the ingest topic tree, subscribes the ingestion service to the ingest topic tree for this topic.

Error conditions: TBD

Message Handler: ingest

Message Type: Name, GPB ID Response Message Type: None

The Ingestion service registers as a subscriber to the data ingestion exchange point. Data messages to this exchange point are processed in this message handler as updates for a Dataset Resource.

Ingestion expects the message in OOI Canonical Data Format.

Ingestion stores data from the data message using the resource registry for a targeted Dataset Resource. The targeted dataset resource is determined from the Routing Key (topic/subject) of the message. Metadata for the dataset resource is updated to reflect the newly ingested increment, such as adding a new time step to a dataset, or adding a variable or spatial extent.

Independent data messages can be ingested independently in parallel. There is no assumed dependency on the order of ingested data messages.

Data Objects

Object XYZ

GPB ID: xxx Purpose: Link:

Message Types

Context and Background

See Also:

CIAD DM OV Inventory

Inventory Service

The Inventory service (depicted in Figure 1) provides the index of information resources in the system and the tools to manipulate their associations though annotation and semantic reasoning.

385 Figure 1 Inventory Service (OV-2)

The Information Resource Registry is for registering information resources in the system. Primary information resources manged by the inventory are datasets and data streams. Other information resources, such as user identities, and virtual machine images can be managed with the same service. This service is very closely related and in mutual dependency with the COI Resource Management services their Resource Registry.

Resource Registry Federation is the basis for observatory integration. It provides federation of the OOI resource registry with external resource registries and incorporates external resource registries into the OOI resource registry.

The Index service maintains an access strategy for information or a dataset. An index can be created as separate information resource and subsequently referenced.

Figure 2 describes the Annotation service which is used to describe enhanced by metadata.

386 Figure 2 Annotation Service (OV-2)

The Discovery service supports finding resources by metadata attributes, potentially applying semantic reasoning

The Semantic Reasoning service, provides inference based on semantic annotation of information elements in repositories. Built on existing technology such as OWLIM or other engine.

Figure 3 shows the details of the semantic representation service.

387 Figure 3 Semantic Representation service (OV-2)

CIAD DM OV Data Model

The OOI-CI data model is build on abstraction. The key is to separate the concerns of scientific feature types from the underlying structure of the data. The structure of the data should in turn be separate from the representation, and finally the data representation should be isolated from the encoding used to serialize the data. Figure 1 shows these layers of abstraction in the OOI-CI data model.

388 Figure 1 Data Model Abstraction

389 Starting from the highest abstraction, the domain feature types are the objects which are relevant to the scientist such as points, curves and grids. Data values in these objects have operators relevant to their type such as re-grid (kriging), interpolate and slice. All such objects have well defined coordinate systems.

The underlying structure of the domain specialization model is abstracted from the scientist. This allows the to implement methods for various cases of each type. For instance the slice method will be different for a structured grid and an unstructured grid.

The attribute information model specifies the building blocks from which the types are built. The conceptual objects needed to exrpess the data. In the Unidata Common Data model, these are the Variables and the Attributes.

These objects are in turn made a simpler Information Abstract Model (Figure 2). This nested model of composite types (Lists, sets, Enum, and arrays...) is nested all the way down to the primative types (int, string, float...).

The Serialization Model expresses how these fundamental components are serialized using the encoding model.

The Encoding model is an abstraction of the technology choice which implements the message encoding (Figure 3). This could be Google Protocol Buffers or it could be XML. The point is that at each level, it never matters to the abstraction above.

390 CIAD DM OV Data Set Registry

The Data Set Registry Service provides the services to register and manipulate changeable data sets within the Integrated Observatory. This service is strongly dependent on the COI Resource Registry Service.

Service Interface

Operations

CRUD Data Set

Structured Objects

Data Set

CIAD DM OV Information Resource Management

The DM Inventory services are responsible for managing Information Resources. These are resources in the system that represent information, i.e. electronic artifacts. To this extent, DM extends the COI Resource Registry with capabilities for describing various kinds of information resources and their metadata. This includes science data. Additionally capabilities provided include discovery of resources, indexing of resources and metadata and semantic reasoning

Resources

Information Resources

Service Decomposition

391 TBD

CIAD DM SV Associations

This page describes the current design plans for associations objects for R1.

Associations

Intent

Associations are used to create relationships between resources and to make statements about a resource. Associations are based on the ideas around the RDF (Resource Description Framework) data model where statements are formed as subject-predicate-object expressions, also known as triples. The subject is the resource about which the statement is being made, the predicate is the verb which characterizes the relationship with the object resource. Associations provide a flexible way to add context to resources and augment information without changing the design or content of the resource itself.

Associations form a graph

Associations Design

The diagram below shows how associations are structured in the resource registry. The association resource at the top is stored in a repository similar to any other resource The subject, predicate and object resources which form the association are referenced with IDRef pointers. An IDRef points to a specific commit on a branch which identifies the state of the resource when the association was defined. However, this is for bookkeeping purposes, only the most recent version of the subject or object is used in the relationship. When a query requests the subject or object of a relationship, the most recent version of that resource will be returned unless the query is version specific.

392

The three resources at the bottom are the resources which define the relationship; Data Set s hasA Data Source r .

The predicate resource is a special case, unlike the subject and object which may change over time (attributes about the data set or data source may be modified but the relationship is still valid) the predicate is static. If the statement about the relationship between data set s and data source r changes, then a new association must be constructed. For example, perhaps it is decided in the data model that a data set will in the future own a data source so the current association is deleted and a owned-By association is constructed. To maximize the value of the predicates used to define relationships, a vocabulary will be created which will evolve to clearly clarify the types of relationships between system entities. This will provide users with the ability to query and reason on these relationships.

Predicate names in R1: has_a_name = 'has_a' is_a_name = 'is_a' type_of_name = 'type_of' owned_by_name = 'owned_by' has_life_cycle_state_name = 'has_life_cycle_state'

Creating an Association

Associations will be created in a manner similar to other resources via the resource client. There will be a standard Associations message type that will contain the three references to the subject predicate and object.

THe association, subject and object resources will be stored in a transactional fashion to avoid partial commits.

Deleting an Association

When a relationship is no longer valid, the association that represents that relationship is not removed, the new version of the subject or predicate (to which the association is not longer valid) is committed with out updating the association.

Locating Resources via Associations

The initial search capability in R1 can only be applied to object instances which are part of an association.

For simple Find operations, such as find all resources which are identity resources, the get_subjects operation can be used. In this operation the predicate is 'type_of' and the object is an identityresource as definind the the ION_RESOURCE_TYPES.

In the example above, the predicate and object are specified and the subject list is returned. The subject and predicate can also be specified in the get_objects method. Additionally, it is also possible to find all the association for a specific resource instance with the get_subject_associations and get_object_associations depending on where in the association the instance resides.

How are Associations Stored?

393 Assocations are stored in a Cassadra column family in which Type and LifeCycleState are denormalized for performance.

Associations in R2

A review of alternative designs, vocabulary management based on lessons learned is here.

Material Covered

After reading this page, you should be able to answer the following questions:

(to be provided)

[Visit/Create Discussion Page|CIAD DM SV Associations-Discussion]

CIAD DM OV Presentation

The Presentation service provides a user interface for the data catalog, for data access, and for data representation.

It is specifically responsible for maintaining the catalog of information resources in the inventory.

394 CIAD DM OV Preservation

The Preservation service manages storing (persisting) and retrieving any kind of information in the ION system. This includes science data preservation, replication, and archival/backup.

These DM provided infrastructure capabilities represent an integral part of the ION infrastructure, e.g. as persistence layer for all resource registries in the ION system.

Decomposition

Figure 1 shows the decomposition of the Preservation service into its constituent services and components.

395 Figure 1 Preservation Service (OV-2)

The History service acts as a data broker for preservation service. It identifies and delegates the data access (read/write) request to the right sub-service based on the requested data product, its direction (to or from the preservation service), and the origin and destination of data (e.g., in between facilities or inside a single facility). For instance, a request for storage of a particular data stream may go to the replication and local archival services at the same time; similarly, a "not-found" message from a local archive may trigger a "locate" message to the replication service through the history service to identify other replicas of the requested data.

The Replication service performs data replication through the federated system of facilities. This service provides the foundation for the distributed storage mechanisms offered by the preservation service.

From the perspective of the Cache service, the purpose of replication is to provide high speed availability of data in the physical installations of the CI subject to OOI policy.

From the perspective of the Backup service, the purpose of replication is to provide a mechanisms to safe guard against information loss due to failure in a component or an installation of the system.

The Long Term Archive service provides backup medium for indefinite offline storage as dictated by the OOI data retention policies. It provides the means for restoring original content in case of critical failures in the CI or any of its subsystems at both the level of data/metadata and algorithms/applications.

The Persistent Archive service provides the access to the file system. In a deployed facility, it will reflect the local storage capabilities directly accessible to any subsystem service running on that physical node (e.g., local disk, NAS, SAS, or a federated file system). For deployment in a federated system the Persistent Archive service realizes access to and policy management of the virtual file system.

396 Figure 2 Persistent Archive (OV-2)

Below in Figure 3 is the view of the Cache Architecture. The cache handles the online storage of the data in the system. The Cache, PreservationArchive, CassandraPreservationArchive, and IRODSPreservationArchive are entities responsible for metadata needed to use the Cache system. To do this, they will be represented as resources in the OOI system, with a schema that is used to contain the metadata to access the backend storage.

A name of a resource within a namespace is guaranteed to be unique. Namespaces in the system are used for the preservation controller to access the cache. Within each preservation archive, the namespace can be used to indicate the archive will be stored in another part of the backend storage. For cassandra this could be a keyspace or columnfamily, for IRODS this would be a zone.

Cache 1. Persistent Archive: The name of the Persistent Archive 2. Partition: An internal partition in the archive, for example the name of the Column Family. 3. Partition Type: The type of the Archive, for example Cassandra Mutable Cache, IRODS Immutable Cache.

Persistent Archive 1. DataType: The kind of archive, for example Cassandra or IRODS

Cassandra Preservation Archive 1. Hosts: The list of hosts in the Cassandra cluster. 2. Keyspace: The keyspace to use in the Cassandra cluster. 3. Replication Factor: The number of times data is replicated throughout the Cassandra cluster. 4. Column Family: The Column Family to store archives.

IRODS Preservation Archive 1. iRODS Zone: It is a virtual data community in iRODS and has one-to-one correspondence relationship with a iCAT catalog.

397 2. iRODS Hosts: The host in which an iRODS server is running. This is usually the iRODS iCAT server. 3. iRODS Port: The TCP port number with which the iRODS server listens and accepts connections from clients. 4. iRODS Default Resource: The default storage resource in which OOICI stores data. 5. iRODS OOI Collection: The collection in which OOICI stores key-value data. 6. iRODS User Name: The user account used for storing OOICI data. 7. iRODS User Password: The user account used for storing OOICI data.

Below is a complete schema when multiple iRODS zones are deployed.

398 Figure 3 Cache Architecture

Behavior

The Preservation Management Controller(PMC) uses IStore instances to archive datasets into Cassandra and IRODS. The PMC is responsible for tracking where the data is stored in the Cache. The PMC uses a registry to keep track of this meta data.

The PMC will be implemented as an ION service which has its own registry and an IStore instances for each of the caches that it uses. Currently we have ideas for 3 caches:

1. Cassandra for immutable types 2. Cassandra for mutable types 3. IRODS for large immutable types. Large should be about anything larger than 512 MB or greater, or anything too large to put into Cassandra.

Technology Integration

Cassandra IRODS s

399 CIAD DM SV Cassandra Schema Specification

In Release 1, the Cassandra (see here for Cassandra TV) cluster which provides persistent storage to the ION Network must be preconfigured with the core schema used for the Data Store Service and the Association Service.

The required configuration is as follows:

Initial Diagram of deployment plan: Google Draw Diagram

Cassandra Details & Data Model (Needs update to include Secondary Indexing in Cassandra 0.7)

Cassandra Cluster:

Operations team manages the cassandra cluster. Need to discuss configuration. Recommend starting with a 9 machine cluster - Replication factor 3

We should use the RackAwareStrategy for replication Cassandra Cluster Properties

Name other properties More properties

Cassandra Cluster Nodes

host port rpc_address(Thrift) Seeds

Host's IP Address 9160 Host's IP Address Another host's IP Address

The seeds can be one or more hosts in the cluster. This is how Cassandra discovers which nodes are in the cluster. The rpc_address and the listen address should be set to the eth0 interface address.

Cassandra KeySpace:

The keyspace provided for the the deployment should use the sysname provided for the messaging.

KeySpace Properties

Name Partitioner Replication Factor Replication Strategy

Random* 2 Rack Aware

Cassandra Column Families within Keyspace:

Two column families must be provided in the keyspace when the system is started - one for blobs and one for commits.

Column Family Properties

ColumnFamilyName (Key) key_cache row_cache key_cache_save_period row_cache_save_period type

blobs cache all keys default 0 rows (Default) 3600 seconds (Default) 0 Standard

commits cache all keys cache all rows (Default) 3600 seconds (Default) 0 Standard

Other properties are comparator, subcomparator_type, read_repair_chance, gc_grace_seconds, min_compaction_threshold, max_compaction_threshold,memtable_flush_after_mins, memtable_throughput_in_mb, memtable_operations_in_millions. We will use the default values for these properties.

Blobs Column Family Columns

The key for this column family is the sha1 of the blob.

Column Name Indexed Description

Value No element that contains value, sha1, isLeaf and content(blob)

Commits Column Family Indexed Columns:

The key for this column family is the sha1 in the value column.

400 Column Name Indexed Description

value No element that contains value, sha1, isLeaf and content(blob)

repository_key yes uuid

repository_branch yes uuid

subject_key yes uuid

subject_branch yes uuid

subject_commit yes Binary sha1

predicate_key yes uuid

predicate_branch yes uuid

predicate_commit yes Binary sha1

object_key yes uuid

object_branch yes uuid

object_commit yes Binary sha1

keyword yes Text representation of the predicate

A commit is a data structure with type, sha1, isLeaf, and the binary content. The type denotes which registry to which the commit belongs. The sha1 is a sha1 hash of the binary content. The isLeaf denotes whether the commit has any children in its tree structure. The content is the content of the commit. This represents the change or diff of the versioned object.

The repository is the mechanism used to store versioned objects. A repository is identified by a uuid. A repository can have branches which are also identified by a uuid. A branch provides the functionality to create another version control history of the object.

Questions:

When should a service create a new repository? When should a service use an existing repository? How do we remove repositories when they are no longer needed?

The Cassandra schema combines our version control system with an RDF like data structure. The RDF like structure we refer to as associations, which have an RDF triple of subject, predicate, and object. Each commit can have a versioned RDF triple associated with it. The RDF triple allows us to perform faceted search. For example find all of the datasets owned by David Stuebe that are model runs.

Question: Does a service need to create a new commit every time it wants to add or change an RDF triple? How does an object like a dataset have multiple associations?

How do I map a resource-id to a repository or repositories?

CIAD DM SV Content Addressable Store

A Content Addressable Store (CAS) is one specific implementation of a Key-Value-Store (KVS). It is an implementation of a DM Preservation service backend.

CAS Overview

TBD

Cassandra CAS Reference Implementation

Cassandra Storage Model Configuration

Keyspace is the name of the OOI instance, e.g. production1 or staging5 or michael3 We use a ColumnFamily (standard) for the KVS of that OOI instance "data store". Note, there might be other uses of Cassandra for the same instance, such as for application indices, science data store, OLAP etc. Key for each row in the CF is the SHA1 hash of the content (i.e. we have a CAS store) The column name is the "namespace" of the data store, e.g. different for different types of resources "services", "instruments" etc. TBD what namespaces we want to have The column value is the value of the blob, or a "reference name blob" into another data store (e.g. iRODS)

401 Reasoning:

Rows are distributed evenly across all nodes in the cluster (for the RandomPartitioner) Rows are automatically kept sorted by criteria (binary/string sort). Access to any row is O(1) no matter how many rows. Columns per row are kept sorted by criteria (binary/string sort) For a key-value store, we do not need either key-range or column-range queries, but even row distribution over nodes Having a 3 layer structure (2 chars of hash, next 2 chars of hash, rest) does not buy us anything, because Cassandra keeps the entries in the CF sorted anyways and a lookup is an O(1) function.

Statements:

For a CAS store, the "timestamp" feature of Cassandra columns is unused. Because entries are immutable, the timestamp has no effect in replacing columns with different values. iRODS CAS Reference Implementation

TBD

CIAD DM SV Persistence Architecture

Summary

Participants

Michael Meisinger Claudiu Farcas David Stuebe Matt Rodriguez Paul Hubbard Maurice Manning

Architecture

Level 3:

Resource Registry is implemented by using 3+ level 2 clients to store and retrieve (Cass objects, index, iRODS objects)

Level 2:

402 Specialized services such as the Cache (read: online store) and the Archive (read: partially offline store) and others using a configured persistent archive as backend to implement their specific behavior

Level 1:

The Persistent Archive service is an interface with different implementations for different persistence technologies The Persistence Manager service manages all persistent archives defined in the system

Level 0:

Persistence Technologies (Cassandra cluster, iRODS zone) are standalone systems. Specific services provide management: Cassandra Management service, etc.

R1 Implementation Design

Level 3:

The data store service uses git semantics (push, pull, fetch) to expose mutable objects to higher level services. The data store uses a storage client interface to provide these operations on different storage technologies. The data store will use managed connections to three separate Cache (Online Store) Instances, one for blobs, one for commits and one for mutable references. The resource registry services support the registration of resources and resource types in the system using both the association service and the data store service. The Association Service creates and finds associations between resources. Data Store Inventory Service is called by Data Store Service (DSS) and Association Service for internal bookkeeping.

Level 2:

An Memory Storage Resource is mapped to (realized as) a particular partition of a Persistent Archive Resource. A Persistent Archive is realized by a completely independent partition of a Persistent Technology which has credentials (domain of authority associated with it). There may be many Persistent Archives for different systems within one Cassandra cluster. The Persistent Management Service nanages all persistent archives defined in system

Level 1:

403 A Persistent Archive Service provides a common interface to operations such as defining a repository, startup and data modeling. Different implementations are created for supported persistence technologies.

Level 0:

The Cassandra Cluster Management Service provides day-to-day operational capabilities; policy definition, error handling, and monitoring.

CIAD DM SV R1 Persistent Archive Service

Detail Persistent Archive Resource Data Model

Persistent Archive Service

Detail OP interface and specify message objects

CIAD DM SV Virtual File Store

A Virtual File Store is one implementation of the DM Preservation service.

Its main purpose is to efficiently store and retrieve large chunks of binary data (blobs, files) that are referenced by name (path etc). The name reference is always location independent.

See Also

iRODS Storage Technology.

CIAD DM TV Cassandra

See Also

Apache Cassandra

Domain Models

Data Model

Keyspace: Analogous to a schema or a database (set of tables) in a RDBMS ColumnFamily: Analogous to a table in a RDBMS. Operations are only atomic within a ColumnFamily

404 Column: A name, value, timestamp triplet. Columns are ordered by name according to type SuperColumn: a name and list of subcolumns (with each having a name, value, timestamp) Row: Entry in a ColumnFamily with a key that has Columns or SuperColumns. Rows are ordered according to key type

Cassandra Architecture

Client, Data Access

Clients can connect to any node in the Cassandra cluster All nodes in the cluster are equal. Nodes proxy to other nodes if necessary - the proxy node, the one the client sent the request to, is known as the "coordinator node" Clients can specify read and write consistency levels (ONE, QUORUM, ALL) for each operation. Note: This does not affect the number of replicas written, only the ratio of replicas accessed. For less than ALL consistency level, weak consistency may be the result for data read. This means that the first read for one column may be outdated, but subsequent reads will not be. The Cassandra Cluster achieves consistency on read. When data rows are read, the replicas are made consistent (immediately for QUORUM or ALL level, asynchronously later for lower levels). Data read applies read repair. Data is taken from one replica but timestamp information is retrieved from all replicas. If multiple "versions" of column entries exist, the most recent timestamp wins, and all less recent columns are updated. If data rows are never read, an infrequent (~weekly) entropy service makes data rows consistent. Clients must synchronize their time Reads are more expensive than writes

Server/Cluster Configuration

One Cassandra server is a node Selected nodes are seed: they start a named cluster Additional nodes "bootstrap" into a cluster All successfully bootstrapped nodes in a cluster share a 128-bit token space. This is a namespace into which all data keys are mapped, similar to a SHA1 hash for arbitrary keys. The token space is organized as a ring. One node has one exact place (token) on the token space ring. The node owns all entries in the token space from (exclusive) the token of the node immediately before in the ring to the current node's token. The primary copy (replica) of a data row is placed on the node that owns the segment of the token space ring for the row's key The cluster has a configurable replication factor. This is the number of copies (replicas) every data rows has The partition strategy of row keys to the token space is configurable. It can be random or ordered (with different properties) The replica placement strategy is configurable. This means the placement of replicas additional to the primary replica. Typical options are placement on subsequent nodes on the ring, or placement on subsequent nodes of the ring in different data centers and racks It is smart to alternate nodes that are subsequent in the ring between data centers A node that automatically bootstraps into a cluster gets an assigned position in the ring (by load) and then gets data for this segment

405 transferred If a node gets temporarily unavailable, other nodes can store information in its stead (hinted handoff)

CIAD DM TV GIT

"Git is a free & open source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency." IT is used by many open source software development projects, such as the Linux Kernel and by the OOI CI development team itself.

GIT is based on a powerful Content-Addressable-Store (CAS) repository model to keep track of file contents and their structuring into directory trees of files, in their sequence of commits over time.

See Also

GIT Pro GIT: Explains the data model and use of GIT

CIAD DM TV iRODS iRODS

Technology Overview, White-Papers, History, Roadmap [History|^iRODS_SRB_history.pdf] Technology Overview- [Presentation to highlight|^IRODS_Reagan_ILM-08-04-30.ppt] [White Papers|^DICE_iRODS_White_Paper-08.pdf] Introduction to iRODS User Manual Developer Manuals: https://www.irods.org/index.php/iRODS_Browser https://www.irods.org/index.php/icommands https://www.irods.org/index.php/Web_Client https://www.irods.org/index.php/Jargon https://www.irods.org/prods_doc/ [OOI Integration Strategy|^iRODS_OOI_integration.pdf]

CIAD DM OV Transformation

Transformation Service

The Transformation service (depicted in Figure 1) handles content format transformation, mediation, qualification, verification and validation.

Figure 1 Transformation Service (OV-2)

The Representation Conversion service implements the syntactic transformations of data streams between foreign/unknown/non-standard formats into OOI canonical formats. There could be more canonical formats to accommodate various scientific ontologies. It operates in close collaboration with the Mediation service.

The Mediation service handles semantic mediation between the formats of the incoming data and the OOI canonical formats. Along with the Format Conversion service, it provides the mechanisms to support data management services under a variety of science ontologies and

406 Format Conversion service, it provides the mechanisms to support data management services under a variety of science ontologies and transport/storage formats.

The V&V service allows for verification and validation of all transformation steps involving the incoming data. It is also a centerpiece of the information assurance mechanisms established at the level of the Data Management subsystem.

For scientific data, the Exchange bus facilitates the integration of services from other subsystems. For instance, the QA/QC service provides quality assurance and compliance checking on incoming data streams according to specific OOI policies regarding data quality. The Calibration service provides mechanisms to perform data calibration according to specific scientific needs and established standards (e.g., regarding acceptable ranges, deviations, etc). These are not part of the Transformation service, but can be chained to it via the Exchange bus.

In addition, the message flow could be redirected through the Data Parser service to perform a deeper inspection of the messages carrying scientific data than at the level of Ingestion Service. The purpose is decomposition of the data stream into individual pieces of information, such as variables and specific data values. The syntax and semantics of the data products being analyzed in this service are crucial for a correct decomposition. When the data come in a different format than the OOI canonical one, the Data Parser service may selectively invoke the Format Conversion and/or Mediation services. Also, additional Metadata Extraction is possible by analyzing (L2) metadata of the datastream regarding the transformations applied to the incoming data product.

CIAD DM OV User Interfaces

The user interfaces to each of the logical services in data management is based on the COI Presentation Framework.

407 Figure 1. DM User Interfaces (OV-2)

Table 1 User Interfaces and User Application Support

ID User Interface Supported User Applications and Purpose

DMUI1 Data Access and Interface to query and browse data, data products, and other information products together with any associated Query Interface metadata and cross-referencing information. Enables iterative search and refinement of queries.

DMUI2 Metadata Interface to associate and modify metadata and attribution to any kind of information product at any time in the life Association and cycle of such products. This also includes establishing correspondence and cross-referencing links. Attribution Interface

DMUI3 Data Interface to define syntactic data transformation and semantic mediation based on available and recognized data Transformation formats and content standards, metadata standards and ontologies, and mediating ontology mappings. and Mediation Interface

408 DMUI4 Data Processing Interface to define data transformation workflows and associated processes by selecting them from the list of Interface available workflows and processes. This includes access to existing standard tools, processes and capabilities, third-party provided tools and user-defined specific processes.

DMUI5 External Dataset Provides an interface to integrate an external time series or derived data product into the catalog of the OOI for Interface ingestion and/or use by OOI users.

For Release 1, DM user interface applications are centered on data access and query functions. The table below outlines some of the user activities that are supported and provides links to screens that provide a example visualization. The functional specification for the Release 1.0 User Interface is here .

User Task Description Screen

Find Data Resource Browse the available data resources using search-by-navigation and query by spatial and Data Resource List of Interest temporal boundaries and locate a particular data resource of interest. Workspace

Examine Simple Examine the basic properties of a given data resource to understand whether it is of interest. Data Resource Data Resource Detail Workspace

Download Data Download data from a specified data resource. Data Resource Download

Subscribe to a Data Sign up to receive notifications regarding events related to a particular data resource. Notifications Setting Resource List Workspace

Monitor Check notifications received, and review active subscriptions. Notification Setting Subscriptions List Workspace

Publish Data Register a new NCF-compliant data resource in the OOI system. External Data Resource into the Resource OOI Workspace

Monitor Publications Review the status of data resources registered in the OOI system by the user. Data Resource List Workspace

CIAD SA Sensing and Acquisition

Sensing & Acquisition (SA) Subsystem Architecture and Design

This is the central page for the SA subsystem architecture and design, a part of the OOI Integrated Observatory. It is structured into operational views (OV), system views (SV) and technical standards views (TV). The Instrument and Platform Agent Architecture (IPAA) is part of this design.

The Sensing and Acquisition Subsystem will implement the services, specifications and user interfaces for individual instrument and physical device management and control, for OOI-wide sensor network (observatory) management and control, and for advanced coordination of entire observation missions and the entire observatory. The Sensing and Acquisition subsystem will also provide capabilities for data logging, data acquisition, data processing and data product generation and access. All physical and information resources managed by the SA services are fully described by metadata and accessible.

SA OV Overview SA OV Domain Models

Sensing & Acquisition Subsystem Services

Observatory and Platform Management (focus of R2, R3) Observatory Management Marine Facility Marine Platform Services Marine Resource Scheduling

Instrument and Physical Device Management Instrument Management Services Instrument Activation Direct Access to Instruments and Platforms

Data Logging and Data Acquisition Data Acquisition

409 Data Processing and Data Products (focus of R2, R3) Data Processing Data Product Registration and Generation Data Product Activation Data Calibration Services Data Validation Services

Cross-Cutting Concerns User Interfaces Technology Mapping

Instrument and Platform Agent Architecture (IPAA)

Instrument and Platform Agent Architecture Instrument Agent and Driver Integration Interfaces (SV) See also: Resource Agent (CEI) See also: Instrument Management Service for commanding the instrument. Instrument Driver Framework Instrument/Platform Device Life Cycle Instrument Agent Interface (SV) Instrument Driver Design (SV) Instrument Driver Interface (SV): this is the agent service provider interface

Instrument Development Kit

Quick Links

Subsystems: COI CEI DM SA AS PP

410 CIAD SA OV

The Sensing and Acquisition (SA) subsystem is responsible for providing the life cycle and operational management of sensor network environments as well as observing activities (i.e., scheduling, collecting, processing, calibration) associated with sensor data acquisition. It provides the higher level coordination of these assets through observatories.

Capabilities

The Sensing and Acquisition Subsystem will provide the following capabilities

Data acquisition, buffering, and transport mechanisms to get data to the shore into DM archives, Data processing and initial data QA/QC on platform and on shore, Maximize total data return from all instruments, Provide data & event processing capability at the platform, Prioritize data delivery, and Command and control systems for instruments and instrument platforms, Instrument test and certification, Instrument registration with associated metadata, Tracking of instrument ancillary data, Manage and allocate resources to instruments, Manage and allocate resources from instruments, Provision storage and processing at the platform.

Exemplar End-to-End Scenario

Registration with the CI Once the intent of purchasing an instrument has been made, the instrument instance, its make and known metadata to date are registered with the CI's Instrument/Platform Registry by the instrument providing PI though an OOI-CI portal. The initial resource life cycle state is tracked as "planned". From this point on, the instrument-although not yet physically existing and deployed-is registered with the CI and will be tracked through its life cycle from "planned" to "developed" to "bench test completed" to "system test completed" to "ready for deployment" to "operational" to "decommissioned".

PI's development and local testing of the instrument - Once the instrument has been purchased the owner of the instrument (PI) connects physically (plug directly the computer to the instrument - this can be RS232, Ether, USB, etc.) to the instrument to make sure that it is up and running. The user may use the manufacturer's software to calibrate and configure the instrument initially. During this process the owner will record metadata and advance the life-cycle state of the instrument registration to "developed".

CI Operator bench testing of the instrument - Once the owner is content with the performance of the instrument, it is now ready to be tested with the CI system. The instrument will be sent to the CI operator. The CI Operator-in close collaboration with the instrument owner-uses an instrument test kit to check the interoperability of the instrument with the CI system. The CI operator will setup a (primary) instrument agent with suitable drivers and a second proxy instrument agent, acting as instrument supervisor. The primary instrument agent translates observatory commands to physical instrument commands via the embedded instrument driver and can be deployed very proximate to the physical instrument, for instance on the hosting buoy platform. The supervisor agent acts as command clearinghouse and gateway and is deployed within the OOI-CI system on the shore. It checks all control requests for compliance with policy and passes valid requests on to the primary agent, when resources are available. The CI operator will update the instrument listing in the instrument registry and update the metadata in its repository. The CI operator also will register the sample of the data stream provided by the owner that was captured during the local testing phase (see below, data acquisition). Based on this information the CI operator will initially turn on the instrument and send test commands via the system to the instrument supervisor, which relays them to the instrument agent. The instrument agent maintains the control state of the instrument. During this process the owner will record further metadata and advance the life-cycle state of the instrument registration to "bench test completed"

Activation: System Test - This step is not realized in release 1. Once the instrument is fully bench tested, it is now ready to be connected to one of the OOI-CI network's Instrument Test Facility, system test installation sites with one of the Marine Observatories. This testing includes wet testing at the Marine observatory facility. By monitoring the control state of the instrument the instrument supervisor agent will validate its operation and set the instrument to the requested initial configuration through the same pipeline as before and activates the instrument in the OOI network. The instrument will be re-calibrated in the new location (deployed location on the network). During this process the owner and Marine operator will record further metadata and advance the life-cycle state of the instrument registration to "system test completed".

Activation: Deployment - This step is limited in release 1, no wet deployment. Once the instrument is system tested, it is now ready to be deployed at the target location. The instrument is already fully connected to the OOI-CI network and the pipeline of instrument agent, supervisor agent and observatory is set up and tested, all registrations have occurred. The instrument will be deployed, re-calibrated in the new location (deployed location on the network). During this process the Marine operator with the owner will record further metadata and advance the life-cycle state of the instrument registration to "operational".

Direct access to the instrument - Once activated and connected to the OOI-CI system the instrument owner can log on via the network to the OOI CI portal and request a direct access channel session to the instrument and the platform it is physically connected to. The system will provide the owner with direct and uninterfered serial and IP channels to the instrument and with terminal access to the hosting platform, e.g. a buoy controller. During this session, the instrument owner can make required changes and close the session when done. If the session is idle too long, it will be closed by the observatory automatically.

Observatory mode access to the instrument - Once all necessary changes, fine-tuning and checks have been finished with the instrument pre-deployment in direct access mode, the instrument agent can be switched to observatory mode. In this case, all instrument control flows through the instrument supervisor agent, which relays to the primary instrument agent, which translates control commands to instrument

411 commands. In the initial development stages of the OOI-CI system, not all command, control and monitoring activities might be capable via the instrument agents in observatory mode. The fallback to direct access mode is then possible for the instrument operator. When entering and leaving observatory mode, the instrument agent loads and saves the current instrument configuration.

Data Acquisition - The instrument owner with support from the CI operator defines data products with the Data Product Registry for the instrument for science data acquisition, instrument state/configuration and instrument state-of-health engineering data. All these data products are defined with metadata about content and structure of the data products, which are realized as data streams with continuous updates. New science data and engineering data packets are sent to the CI messaging system on channels for pre-defined data products.

Instrument Control Scheduling - Once the instrument is installed in the network there will be many users who will be interested in collecting data using this instrument or controlling it directly for reconfiguration. A resource scheduler, maintained by observatory management services and by the instrument supervisor agent, will schedule time for each interested user with a time allocated for their deployments. After each operation by a different user the instrument should be re-configured to the default state by the operator. During the allocated time the user will have access to the activity history of the instrument and its current state. The data processing services will have tools to capture data as well as process them. Data can be generated through polling or in a continuous manner on pre-defined data streams. In all these cases the state, metadata and the data will be time stamped and stored in the network as updates to engineering data streams. Interested users can register to updates and notifications for any of these data streams.

Failure Detection and Repair - Failures can be detected by different parties connecting to the instrument (owner, operator, users, and the system itself). Failures can be detected from the communication channels, quality of data, and through processed data. Also the owner could detect failure when he connects directly to the instrument. The user interface for each party will have a feature to report any failures observed and to register for any class of failures and expected state changes. Each user who has registered for notifications or was scheduled to work on the instrument will be notified of the state change. Once the instrument is repaired it will go through all the steps however the registries and instrument specific services will be activated as before.

De-Activation - Similar to the activation process there are many steps involved in decommissioning the instrument. The instrument will be turned OFF and all the registries as well the instrument specific processing services will be updated and set to reflect the de-activation of the instrument.

Data Processing - The instrument owner with support from the CI operator defines data processes with the Data Process Repository. A data process is a script or piece of code that can be repeatedly applied to data sample packets in order to produce a more qualified data product. Examples include automated QA/QC filters and annotations, engineering unit transformations, calibrations, and real-time stream segmentation. The result of a data process is another data product (i.e. data stream) or an augmented data product. The actual processing, i.e. which script is invoked how often on which input data producing which output data is configured with the Data Processing service. All data products need to be defined in the Data Product Registry before Data Processing can be defined.

Services Decomposition

The Sensing and Acquisition Services support instrument control and data acquisition activities. The Marine Observatory may include diverse sensor, actuator and mobile platform entities at multiple, dispersed physical locations. Figure 1 illustrates a high-level view of the Sensing and Acquisition subsystem services and the data flow between them.

412 Figure 1: Sensing and Acquisition Overview (OV-1)

Figure 2 shows a decomposition of the S&A subsystem into its main constituent services, and their interfaces to external systems and other subsystems.

413

Figure 2: Sensing and Acquisition Services (OV-2)

The Instrument device model consists of one or more physical sensors or actuators, and is represented in the Cyberinfrastructure by a logical device, the Instrument Agent. The physical device provides sensor data and status information to the Instrument Agent, and receives configuration information and commands from the Instrument Agent. The Instrument Agent communicates with the physical device via a Device Port, which is controlled by the Cyberinfrastructure via a Port Agent. The Port Agent supervises the port communications and power. The physical devices, as well as the device ports themselves, are powered on or off in a graceful manner, according to the requirements for power management. Each Instrument Agent has an Instrument Supervisor that is responsible for monitoring the state of health of the Instrument. Each platform has a Platform Agent that is responsible for controlling all devices on that platform (Instruments and Ports altogether). The Observatory Management is responsible for coordinating all observatory resources to ensure their safe operation.

Information about instruments is kept in an Instrument Repository; instruments have associated metadata that can be invariant (e.g., model, manufacturer, specs) or variant (e.g., location). The Observatory Management operational node is responsible for registering the invariant instrument metadata, whereas the Instrument Agent itself is responsible for publishing the variant instrument metadata.

The Platform Agent is responsible for registering and validating instruments and detecting resource conflicts at runtime or at registration. It schedules the data acquisition as specified in the observation plan produced by the Planning and Prosecution Services or according to the commands received directly from the operators via the Observatory Management node. The Planning and Prosecution Services provide activity scheduling for available resources under specified constraints for an upcoming mission, such as scientific ocean observation involving fixed and mobile resources. The Platform Agent negotiates with the Planning and Prosecution Services Network through a service agreement proposal protocol in order to agree on the resource allocations and partial plans. In addition, the Platform Agent can override the observation plans according to the new state of an instrument or in response to events from other instruments. It also sends instrument status information to the Planning and Prosecution Services Network for higher-level decision making and planning.

A different section covers details about Instrument and Platform Agents.

414 Observatory Management is coordinating and protecting the assets of the observatory, and therefore is responsible for fulfillment, assurance, and reconciliation. Fulfillment services include resource activation, configuration, simulation, and testing. Assurance services include state of health monitoring, fault detection and recovery, quality assurance. Assurance is related to checking that all resources that are online meet their Service Level Agreements. Reconciliation services include billing and resource conflict mediation. To manage resource conflicts, Observatory Management provides services to project resource constraints and requirements, the consumption of resources, and the impact on the environment. Besides the actual instruments, the observatory resources include all of the resources that support the instrumentation: communications, power, deployment platforms, instrumentation environment, etc. Management of the instrument environment includes the acoustic and optical spectrum, the electromagnetic and chemical environment, the positioning of instruments on deployment platforms, as well as the placement of platforms relative to each other. The complexity of managing observatory resources varies from one type of observatory to another (e.g., cabled observatory with static structure, moored buoy observatory with mobile profilers, or observatories with AUVs and gliders). Nevertheless, Observatory Management provides a uniform presentation framework for all resources that need to be managed.

Maintenance and calibration requests are submitted by users to Observatory Management. The infrastructure provides users with the capability to configure the calibration of their instruments and the processing steps to produce data products.

The Instrument Agent uses Data Acquisition to obtain the observed data from instruments which get stored in the repository. Furthermore, the observed data goes through several processing steps (such as calibration, event detection, QA/QC) that ultimately provide data products. Data processing operations include publishing information about processes and calibrations done to produce data products in the repository, as well as maintaining all of the relationships between data products. Furthermore, state estimation and health monitoring on different levels might require functionalities from the data processing component. For example, the Instrument Supervisor monitors observed state changes at the Instrument Agent, but also estimates the next state of the instrument based on analyzing the observed data acquired by the instrument.

Event detection capabilities allow instruments to signal events to users and/or other instruments. Using the planning services from the Platform Agent, they lead to the coordinated control of instruments in response to events. Such a feature supports integrated multidisciplinary science experiments. This capability is particularly important for global or coastal observatories with limited bandwidth communication to the shore.

Domain Models

The domain model depicted in Figure 3 shows the relationship between physical instruments, their hosting instrument platform, and their representation as virtual resources (instrument agents, platform agents) within the CI integrated observatory environment.

The core concept expressed in the model is that the communication between the Instrument Agent and the physical Instrument is mediated by an Instrument Strategy that implements the appropriate Instrument Dependent Communication Protocol. In the same way, physical ports are represented in the Cyberinfrastructure as Port Agents, and the communication between them is mediated by a Port Strategy. The Platform Agent is the entity controlling all Port Agents, Instrument Agents, and Instrument Supervisors associated with physical devices deployed on that particular platform.

A significant difference between instruments and ports is that instruments fire events, whereas ports do not notify or send events, mostly for safety reasons involving high-power instruments. Instead, the Platform Agent scans the ports for conditions.

Figure 3 Instrument Platform Representation Domain Model (OV-7)

The domain model from Figure 4 shows the internal entities of an Instrument Agent. The behavior of the instrument is described as a finite state

415 machine (FSM) model; during its lifecycle, an instrument switches states depending on the events received from the environment or depending on the commands received from the Platform Agent (which receives instructions from Observatory users, as well as the Mission Planning and Prosecution subsystem). The set of commands is not uniform for the entire lifecycle of the instrument, as an instrument can accept different commands in different states. In other words, a resource could have different capabilities during its life cycle (e.g., before activation it could just be able to self-diagnose). Moreover, the state model of an instrument can be changed during the lifecycle, as long as the new state model matches all the event dependencies.

Both events and commands (as well as data sets, which are not shown in this model) are sent within the OOI network via messages. The interactions between the Instrument Agent and the other entities of the Cyberinfrastructure is defined by Interaction Specifications, which constraints the sequence of messages that form a valid conversation between entities.

Figure 4 Instrument Representation Domain Model (OV-7)

Work Products

The work products provided by this subsystem are:

Table 1 Sensing & Acquisition Work Products

ID Service Explanation Release

1.2.3.8 Sensing and The subsystem responsible for providing the life cycle and operational management of sensor network R1, R2, Acquisition environments as well as observing activities (i.e., scheduling, collecting, processing) associated with R3 sensor data acquisition.

1.2.3.8.1.1 Instrument This service provides direct IP connectivity between the research team and their instrumentation from R1 Direct anywhere within the integrated network. The service is designed to support instrument connections Access using telnet, ssh and/or proprietary instrument software. Such a channel has a higher-level security requirement, and initiation will require a separate and more stringent authentication process.

1.2.3.8.1.2 Instrument Provides the command, control, and monitoring services to operate and manage an instrument. R1 Management Operating an instrument has a higher-level security requirement, and engagement will require a Services separate and more stringent authentication process. This service also supports instrument development and deployment through test and validation services.

1.2.3.8.1.3 Instrument Maintains informational representations of instruments and their configuration and calibration, along with R1 and Data references to their acquired data. It also maintains copies of all processes applied to data from Process acquisition through product delivery. All are associated with their respective metadata Repository

416 1.2.3.8.1.4 Data Provides services to configure data acquisition, disseminate and persistence of observed data R1 Acquisition originating from an Instrument platform to the Integrated Observatory Network Services

1.2.3.8.2.1 Marine Building on the capabilities of the Facility Service, provides services to task, coordinate, and manage R2 Facility the "marine" observatory resources and their interdependencies. The management services provide Services oversight to ensure safe and secure operations and to maximize the total data return from all instruments.

1.2.3.8.2.2 Instrument Provides registration, testing and validation services for instruments and instrument platforms to ensure R2 Activation conformity with different operational requirements in the network. Services

1.2.3.8.2.3 Data Provides services to configure the application of specific data processing steps at the acquisition site R2 Processing and/or at the ingest site Services

1.2.3.8.2.4 Data Product Maintains informational representations of measurements and data products with associated metadata, R2 Catalog and configuration and calibration by maintaining a history of all processes applied to data from acquisition Repository through product delivery. Services

1.2.3.8.3.1 Data Enables configuration of the data calibration and validation processes and the application of custom R3 Calibration automated data processing steps. The service supports the flagging and sequestering of derived data and until reviewed by responsible participants. Derived data are automatically associated with their data Validation source. The service supports automated revisions of the derived data on a partial or complete basis. Services

1.2.3.8.3.2 Marine The coordination services are the primary means for allocating and scheduling instrument use of R3 Resource communications and power, but will extend to the coordination of environmental interactions (i.e. sound, Scheduling chemical, light). Services

1.2.3.8.3.3 Data Product Provides services to produce and publish data products and apply processes for generating products R3 Activation from data and/or derived data. Data products are automatically persisted and published based on the Services configuration set for individual product development.

The focus of Release 1 is to provide the basic mechanisms of sensor integration, instrument control and data acquisition and the mechanisms to register instruments, data processes and their metadata.

The focus of Release 2 is to provide an integration of multiple instruments on platforms and into observatories and the coordination of shared resources between instruments and in an observatory. A major focus is also to support the first deployment of OOI marine observatory equipment. In addition, release 1 will provide initial mechanisms to execute simple data processes to generate qualified data products, for instance for QA/QC purposes as defined by instrument owners.

The focus of Release 3 is to provide advanced observatory management mechanisms, resource coordination and scheduling and processes to enable users to add own instruments and data products.

Beyond release 3, the Planning and Prosecution subsystem services extends S&A services by providing instrument and platform autonomy support.

CIAD SA OV Data Acquisition

Data Acquisition is an activity in the Integrated Observatory data collection

Data Acquisition

417 Figure 1. 2820-00031 Data Acquisition from instrument (OV-6)

Data acquisition is the process of collecting measurement data from physical instruments and converting them to a format understandable by the Cyberinfrastructure (e.g., from A/D volts to physical units). According to the schedule in the observation plan, the Platform Agent coordinates the data acquisition by sending a start command to the Instrument Agent that is the software representation of the Instrument in the Cyberinfrastructure. After an initialization phase, the Instrument Agent interacts with the physical Instrument to receive the observed data; then, the Agent adds relevant metadata and sends them to the Data Processing activity. As a main goal of the observatory is supporting scientific discovery by providing the necessary measurements to users, it is very important to record and preserve all relevant metadata.

There are two modes of data acquisition: poll mode or push mode. Polling (see Figure 2) means that the Instrument Agent dictates when the measurement is collected; the observed data are explicitly requested by the software agent. The physical device waits for the command to make a measurement, executes the measurement, and sends the data to the agent. Push mode (see Figure 3) can be used for Instruments that can be scheduled to acquire data autonomously; the physical device does the measurement by itself and the software agent passively accepts the data. Therefore, the Instrument Agent sends a schedule to the Instrument (for example, configuring the sampling rate, the acquisition interval, and the output formats), and then the physical device knows when to take measurements and to send the data to the Agent.

418 Figure 2. 2820-00008 Data Acquisition from polled instrument (OV-6)

Figure 3. 2820-00010 Data Acquisition from push instrument (OV-6)

CIAD SA OV Data Calibration Services

419 The Data Calibration services are elaborated in Release 3.

Contents on this architecture page are preliminary until refined during Inception of the respective release and approved by an LCO review.

General

Data Calibration services are applied to derive higher level data products from unprocessed instrument measurements, for instance to transform unprocessed voltage counts provided by an instrument into engineering units.

The Data Calibration services are based a number of enabling infrastructure services, including:

Data product registration (SA R2) Data processing services (SA R2) Data distribution services (DM) Process management services (CEI)

See Also:

Data Validation Services

CIAD SA OV Data Processing

The Data Processing services are elaborated in Release 2

General

Data processing services enable the derivation of information from lower level information, on a continuous data streaming basis, for instance for the generation of derived data products.

CIAD SA OV Data Product Activation

The Data Product Activation services are elaborated in Release 3. They are based on the Data Product registration and Data Processing services

CIAD SA OV Data Product Generation

The Data Product Generation services are elaborated in Release 2. They are based on the Data Processing services

CIAD SA OV Data Validation Services

The Data Validation services are elaborated in Release 3.

Contents on this architecture page are preliminary until refined during Inception of the respective release and approved by an LCO review.

General

Data Validation services are applied to derive higher level qualified data products from unqualified data products, for instance to apply automated or interactive data quality control (QC).

420 The Data Validation services are based a number of enabling infrastructure services, including:

Data product registration (SA R2) Data processing services (SA R2) Data distribution services (DM) Process management services (CEI)

Automated QC

For automated QC, the data validation services are applied to less processed data.

In addition to the above listed dependencies, other dependencies include:

Data inventory (DM R1) Data versioning (DM R2) Common data and metadata model (DM R1+2)

Interactive QC

Interactive QC is a process combining the automatic generation of derived, qualified data products (and updates for these data products) with interactive (human in the loop) annotation, association, approval processes. In order to provide such a strongly user interface based capability, data visualization and interactive workflow support must exist.

For interactive QC, a number of capabilities need to be present, including:

Automated QC data product generation Interactive workflow support (AS R3) Visualization (AS R2) Interactive workspaces (AS R3) Data association and annotation services (DM R3)

CIAD SA OV Direct Access

Direct Access is a mode of an instrument resource, managed by the S&A Instrument Management service. An instrument can be either in observatory mode or in direct access mode (see the Instrument/Platform device state model). In observatory mode, the instrument agent and driver perform an observatory command translation and data acquisition from and to the observatory.

In direct access mode, an operator has a direct access channel to the instrument, e.g. a direct serial connection tunneled through the network, or a VPN connection. No data is acquired in this time and no other user can access or command the instrument in this time. Direct access mode is mostly required for development and diagnostics, for instance in situations where the agent and driver are not powerful enough to perform some very instrument specific commands and configurations.

Figure 1. Operational node involved in the management of the instrument direct access mode (OV-2)

421 Figure 2. Interaction sequence for direct access mode (OV-6)

Visit/Create Discussion Page

CIAD SA OV Domain Models

The figures below provide domain models in the context of the Sensing and Acquisition subsystem

Parts of these figures may be outdated

422 Figure 1. 2820-00016 Instrument Platform Domain Model (OV-7)

Figure 2. 2820-00017 Message Translation Domain Model (OV-7)

423 Figure 3. 2820-00018 SA Overview Domain Model (OV-7)

CIAD SA OV Instrument Activation

Instrument registration and activation

Figure 1 sketches the main activities in instrument registration and activation

424

Figure 1. Instrument activation sequence (sketch) (OV-6)

Instrument switch on

Figure 2 shows the interactions between the Instrument, Instrument Agent, and Instrument Supervisor during the instrument activation. The Instrument Supervisor controls the activation by sending OOI commands to the Instrument Agent, which does the command translation in the format understood by the physical device. After executing the command, the physical device sends back the status information, and the Supervisor proceeds at the next step. The activation process starts with powering on the device, after which the device is in the on state but uninitialized yet. After a conversation that validates the device and configures its operation, it gets in an initialized state, yet inactive. The device becomes active when it receives the explicit activate command from the supervisor.

425 Figure 2 Instrument Switch On (OV-6)

CIAD SA OV Instrument and Platform Agents

Instrument and Platform Agents are specific sub-types of Resource Agent.

Instrument Agent

Each Instrument Agent manages one physical instrument, which can be a simple sensor or a sophisticated composite instrument. An instrument agent encapsulates all software necessary for interacting with a physical instrument in order to perform these basic functions:

command & control of an instrument through a common observatory message interface, including providing status updates on request and continuously, represent an instrument's capabilities uniformly to the observatory, perform data acquisition, manage and perform direct access to the instrument.

Instrument agents have the following detailed responsibilities and capabilities:

take commands from the observatory and end users and perform policy check and access mediation, translate commands to something understandable by the Instrument, perform efficient state synchronization and communication between terrestrial and platform representation over intermittent, slow

426 channels, data acquisition, time-stamping and raw data logging for measurements coming directly from the instrument, emit a context metadata annotated, formatted data stream, emit instrument state, state changes and state of heath as engineering data stream, manage the configuration of the instrument, provide direct access (e.g., bidirectional direct serial) to the instrument, update the clock of the instrument from an platform timing source, detect resource conflicts (such as versions or driver conflicts, or conflicting use requests), accept management from a platform agent (see below, potentially manage subordinate instrument agents.

Each physical instrument has one Instrument Agent that represents it to the observatory (i.e., acts as its "agent"). This agent has up to three types of representations that differ in their deployment location (see Figure 1):

(exactly one) Terrestrial representation: The main part of the agent-informally "the agent"-is deployed within the terrestrial Integrated Observatory Network and is always accessible. May communicate directly with the physical instrument or via a Platform representation of the agent. (zero or one) Platform representation: The part of the agent that can be embedded in remote, temporarily disconnected, resource limited environments, such as CG buoys. It does not exist in the case of instruments connected to the RSN cable. Interacts through opaque command & data protocol with the terrestrial representation through an intermittent, low bandwidth communication channel. (zero or any number) Proxy: Projections of instrument capabilities into another domain of authority. Acts similar to the terrestrial representation but presents a different policy and potentially limited set of capabilities. Does not interact directly with the physical instrument but through the COI Exchange with another instrument agent (either the terrestrial representation or another proxy agent).

Figure 1. Instrument Agent Representations Domain Model (OV-7)

The Instrument Agent Architecture comprises a number of constituent components. Figure 2 depicts the functional components of an Instrument Agent as operational nodes with their needlines.

Instrument Control is the centerpiece and represents the state of the instrument that can be controlled via commands. It provides instrument state and configuration on request and emits state change events. It performs diagnostics, self test and failure management as necessary. It is also responsible for instrument fault detection and recovery, but also for overriding warnings and protection alarms. The Instrument Driver is the actual software that translates commands into binary sequences understandable by the instrument, and knows how to get measurements from the instrument. The Instrument Supervisor is the interface to the observatory and end users and accepts their commands. Within the agent, it receives status updates and events from the Instrument Control. Instrument Direct Access enables the direct access of end users to the instrument, for instance through a bidirectional tunneled serial

427 connection. This component is managed by Instrument Control.

A Platform Representation of the instrument agent may contain parts of the Instrument Driver and parts of the Instrument Control. The extent of platform side capabilities is dependent on the platform resource and communication regime constraints. The interaction interface between Platform representation and terrestrial representation is fully opaque and can be optimized to achieve maximum communication channel efficiency (such as in the case of very low bandwidth, intermittent satellite links).

A Proxy of the instrument agent contains an Instrument Supervisor component only or parts thereof. It communicates via the ION Exchange to another instrument agent (proxy or terrestrial representation). This realizes a chain of agents.

See Instrument Agent and Driver Integration Interfaces for an illustration of important implementation level interfaces.

Figure 2. Instrument Agent Operational Nodes (OV2)

See also: Internal setup of the Instrument Driver

The Instrument Management Services (S&A) are responsible for managing command and control of the instrument and platform via the Instrument Agent, and for the registration of the Instrument Agent.

Figure 3 shows the internal setup on the Instrument Control node from the Instrument Agent. The Instrument Event Detectordetects events related to the normal behavior about the instrument and low-level failure management. These are not science events. Its role is to make sure the instrument life cycle is preserved. The Instrument Fault Detector and Recoveryhas a (local) mitigation strategy. The strategy might reside in the configuration. For example: in a modem driver, when the line dropped is an event. In a connection, an unexpected line drop is a fault. Redial is a recovery strategy. This leads to commands to the Execution Engine to dial again.

The Command Execution Engine is basically a state machine that provides a bound for the set of commands available at a time. The initial state comes from the configuration.

428 Figure 3. Instrument Control Nodes (OV2)

Platform Agent

Platform agents manage the resources associated with a Marine Observatory platform. Platforms include remote buoy platforms with multiple embedded single board computers (SBCs) that are intermittently connected and are resource limited, and cable infrastructure. Platform ("hotel") resources include power systems, telemetry, networking, clocks, CPU cycles and storage and multiple instruments deployed on that platform.

Similar to instrument agents, platform agents have a primary terrestrial representation and optional platform representation and proxys. The terrestrial representation of a platform agent is always available and the primary point of contact; it provides opaque delayed state synchronization with the actual platform and its resources.

Platform agents have the following responsibilities and capabilities:

Accept commands from the observatory and end users and perform policy check and access mediation, translate commands into something understandable by the platform resources, perform efficient state synchronization and communication between terrestrial and platform representation over intermittent, slow channels, perform platform resource state change notification, manage the configuration of the platform resources, emit platform and platform resources state, state changes and state of heath as engineering data stream, provide direct access (e.g., terminal sessions) to platform SBCs, manage platform clocks and precision timing, mediate platform resources (e.g., the communication outbound queue on a low bandwidth satellite link, and power consumption), detect resource conflicts, manages zero or any number of subordinate platform agents and instrument agents.

Profilers, AUVs, Buoys are examples of platforms, as well as the RSN cable infrastructure. A port is a buoy hardware platform resource that connects an instrument physically and electrically to a DCL (Data Concentrator and Logger) SBC module. A port can power cycle and configure comm parameters such as for serial. Telemetry typically includes Iridium satellite, FleetBroadband satellite and short range wireless. Each of these have modem type interfaces.

429 Figure 4 shows the internal setup of the Platform Agent. There are interface nodes (components) that represent the specific bidirectional interface to lower level physical or software components.

Figure 4. Platform Agent Nodes (OV2)

The Platform Control is the central control behavior of a platform. Figure 6 shows the constituent elements of the control part of the Platform Agent. A Platform agent can host (aggregate) other platforms and instruments. The Instrument Management operational node manages the hosting of instruments and their resource capabilities and needs on the platform. The Platform Management operational node manages the hosting of subordinate platforms and their resource capabilities and needs on the platform.

430 Figure 5. Platform Control (OV2)

The Resource Scheduling manages the coordination of multiple heterogeneous resources on a platform. This includes allocation and interference avoidance. The Time management node controls time on the platform. It represents underlying GPS module and clocks and any associated time synchronization and time drift models. The Storage Management node controls disk/flash use on the platform. Represents underlying storage resources and the resource allocation and consumption model. It decides on resource requests and issues scoped tickets for resource use. The Power Management node controls power use on the platform. It represents underlying power module and the resource allocation and consumption model. Decides on resource requests and issues scoped tickets for resource use.

The Comm Management node controls communication system use on the platform. The is the decision maker for all external communication (from the point of view of the local platform), including comm to the shore and within the platform. It represents underlying networking and telemetry modules and the resource allocation and consumption model. It decides on resource requests and issues scoped tickets for resource use.

Behavior

431 Figure 6. Communications Management on Platform (OV-6)

See also:

Instrument Management Services, for command and control of the instrument and platform via the Instrument Agent, and the registration of the Instrument Agent

CIAD SA SV Instrument Agent Interface

Table of Contents

Intro Pending Notes Comments Assumptions Pending Issues Future plans/ideas/issues Common agent interface Observatory-related interface execute_observatory Message Description: Command List: Arguments Returns: get_observatory Message Description: Arguments Returns: set_observatory Message Description: Arguments: Returns: Observatory Get/Set Parameter List: get_observatory_param_metadata Message Description: Arguments: Returns: get_observatory_status Message Description: Arguments: Returns: Observatory Status Keys get_capabilities Message Description: Arguments/Returns: publish Message

432 Description: Arguments Returns Instrument-related interface Channel addresses execute_device Message Description: Arguments: Returns: Command list: get_device Message Description: Arguments: Returns: set_device Message Description: Arguments: Returns: Instrument Get/Set Parameter List: get_device_param_metadata Message Description: Arguments: Returns: get_device_status Message Description: Arguments: Returns: Instrument Status Keys execute_direct Message Description Arguments Returns Both interfaces begin_phrase Message Description Arguments Returns end_phrase Message Description Arguments Returns cancel_phrase Message Description Arguments Returns apply Message Description Arguments Returns update Message Description Arguments Returns Configuration parameter metadata values Format types Timestamp and TimeDuration formats Success/Fail messages Errors

Intro

This is a more detailed description of the instrument agent interfaces. There are three parts to the CI-facing interface:

1. The interface that is intended to address the observatory infrastructure of the instrument agent 2. The interface that is intended to address the device that the instrument agent represents 3. The interface that is common across all agents that allows manipulation of the observatory resource

The observatory-related calls are the ones that the instrument itself is not aware of. This interface is relatively static and applies to an instrument agent that might represent any type of instrument. The device-related calls are the ones that are intended to directly affect the device itself. Calls to the device side of the observatory-facing interface are to be executed by entities in the observatory that are generally aware of the type (either specifically or as a general class) of instrument that is being represented and what it might be able to do (possibly as a result of queries to the observatory-related calls that offer that knowledge).

This observatory-facing interface is separate from the interface between the instrument agent and the instrument itself (via the driver that is part of

433 the instrument agent).

Pending Notes

There are still some pending notes as this interface is fleshed out.

Comments

Timeouts should be handled by the agent. If the driver times out connecting to or interacting with a device, it can return a timeout error. The user of the interface should have no need to worry about individual times required for different interactions with different devices. The interface should support a collection of work done over multiple messages. While a single message may be acted on atomically by the agent, the message as itself does not imply a transaction, nor should changes be committed at the end of every message. The model is to get and set values in a working space in the agent, then commit those changes with an "update" message. The interface is not compliant to any one specification, but uses concepts from many to implement capabilities within the OOICI framework (including registries, messaging, governance, etc.) Instrument agents are continually posting their state changes, user sessions, and events to a "State Topic" where clients can keep track of what the agent is doing (should they so desire). Phrasing purely indicates atomicity across multiple messages. It does not imply a transaction to a device or the agent interface. Sessions, state monitoring, and the like may perform the transaction behavior in the future. Sets of either sort (observatory or device) apply to an internal, per conversation/session "buffer" of configuration that can then be applied to a device or agent. Gets obtain the results directly from the working/active device or agent configuration. Only one outstanding phrase can be started at a time for each conversation/session. Errors must have unique IDs so as to make them accessible to automated understanding and/or processing. They should be used appropriately throughout the code. Enumerations are not used, but IDs are associated with controlled vocabulary. Metadata for a device can be stored and looked up in the metadata registry based on the resource ID. Parameter metadata for this agent are returned via the *param_metadata messages. If there is a metadata tracking component to the device (like PUCK or TEDS), the raw block can be get/set _to/from the device via the metadata block parameter.

Assumptions

Agent only returns resource ID and the rest of the details can be looked up in the registry. Governance (access, scheduling, permissions, etc.) is already handled by the CI before the op_* calls are executed in the agent. Actuators are treated as sensors that take more commands and have less data published from them. All strings are unicode. String display and internationalization is handled at a higher layer in the CI.

Pending Issues

Future plans/ideas/issues

These ideas/concepts/plans may be implemented in the future:

Commands and settings may be spooled to a queue for async action. The phrases they belong to may identify a context to which the command and settings apply. Transactions may support rollback at some point Need to determine how we want to handle and specify data corrections or additional layers of data streams down the road Connecting multiple instruments for a real-time interaction may be an issue for the future. With an asynchronous message bus, timing of interactions in the CI is difficult. If possible at all, interactions may need to be handled on a sensor side of a messaging link, possible at the platform level. This may involve updates with conditionals or something, more triggering control (possibly via EEE1451.0 sec 5.10.4, 5.10.7, 5.11, 5.12,), etc. Quality of service concepts may apply to network of data in some way. Most likely as additional parameters to modify. More work on calibration/correction/transfer function stuff as needed (that isnt handled in other services) Possible more work on the metadata for the physical connection to a device??

Common agent interface

The instrument agent is an observatory agent and thus inherits the interface that manages the lifecycle state, resource registration/de-registration, governance, and resource manipulation that is common to all agents. The interface described below is specific to instrument agents.

Observatory-related interface

These are the detailed calls/formats/arguments/headers/etc of the interface. Since it is largely (completely?) static and supposed to cover all CI-related operations for all instrument agents, this should be defined clearly and in detail here. execute_observatory Message

Description:

434 This call adds to the pending phrase a command specified by a command string and followed by a list of arguments to that command. A CIDev:phrase must be open for a command to be added to it. Commands will be added to a local-to-the-conversation buffer after each execute. Upon CIDev:ending of the phrase, the commands that were indicated will be tagged as being part of an atomic action. When they are CIDev:applied, they will be executed in the order that they were executed in the phrase.

Command List:

The executable commands that are available to a user are largely static for this call. Commands are passed as strings in the messaging protocol. The commands that are common to all instrument agents are:

Command Arguments Description Errors Return

StateTransition Name of Transition must be a valid state Invalid transitions will result in an "Invalid Transition" error The state transition in the general message and the original state will be retained. This may be name of transition to instrument state model. a result of not being in the right state before the transition is the make Transitions include: issued. current state Initialize Reset GoActive GoInactive Resume Pause Clear Run ObservatoryMode DirectAccessMode

TransmitData None Transmits any buffered data None None immediately

ClearUpdateBuffer None Clears the update buffer so that None None all pending get/set/execute operations are cancelled

ListUpdateBuffer None Lists the phrase IDs of pending None None phrases for the device

Arguments

The messaging format for argument entry is

['commandname', 'arg1', 'arg2', ... 'argN']

Returns:

CIDev:Success and failure messages as appropriate. See command list for returns and errors for specific commands. If no CIDev:phrase has been opened, the operation will fail. get_observatory Message

Description:

Obtains a list of CIDev:instrument agent configuration parameters that configure the instrument agent. The general principal is that any parameter that can be obtained through a get_observatory() command can then be set with a set_observatory() command under the correct conditions (permission, state, etc.). A phrase need not be established in order to fetch parameters. However, if a phrase is not established, the parameters may change immediately before or after the call. It is recommended to issue a start_phrase before a series of get_observatory messages are sent if a consistent state is required. If no phrase has been started the individual message is treated atomically and results are immediate. The values that are returned are garnered immediately from the active, running observatory configuration, not the buffered, un-applied, configuration.

Arguments

The argument is a list of CIDev:instrument agent configuration parameters whose current settings are desired. An empty list requests all

435 parameters. Example:

['param1', ..., 'paramN']

Returns:

CIDev:Success/failure where a successful response yields an argument with a dictionary of CIDev:instrument agent configuration parameter names and their values for the parameters that were requested. If a parameter was not found, it is not in the list. Example of the success argument:

['OK', {'param1':'123', ..., 'paramN':'xyz'}]

If a phrase has been started, the message has been added to a list of queries that will be atomically executed as a phrase. The return code is a success/fail indicating that the query was or was not added to the phrase successfully. A value will not be returned with this message, but rather with the CIDev:apply message if one is issued later. set_observatory Message

Description:

Defines the values for a list of CIDev:instrument agent configuration parameters. The general principal is that, provided correct permissions, parameters that can be obtained through a get_observatory() command can then be set with a set_observatory() command under the correct conditions (permission, state, etc.). Setting a value will apply that value to a buffered, local-to-the-conversation data block of configuration. If a CIDev:phrase has been started, the sets will all be made to the configuration block atomically at the end of the phrase. Set messages not in a phrase will be applied as if they were in a singleton phrase.

Arguments:

The argument is a dictionary that includes valid parameter names from the CIDev:instrument agent configuration parmaeters list and the values that are intended to be set. Example:

{'param1':'123', ..., 'paramN':'xyz'}

Returns:

A dictionary of CIDev:success or failure and optionally a returned reason for each attempted value set. A success indicates that the parameter could be set to the internal buffered configuration block. It can then be applied later to make the configuration active. The name of the parameter is the key to the dictionary. If no CIDev:phrase has been opened, the operation will fail. Example:

{'param1':['OK'], 'param2':['ERROR','Not invalid state'], ..., 'paramN':['OK','somesortofkey']}

Observatory Get/Set Parameter List:

Parameter Name Type Description

OutputTopics Dictionary A dictionary of the encoded data output PubSub queue objects that are being generated by this of strings instrument. Indexes are either the transducer name, or "Device" if applicable to the device as a whole. Sampling period and other data stream metadata to be stored in the registry.

436 EventTopics Dictionary A dictionary of the encoded event PubSub queue objects that are being generated by this instrument. of strings IIndexes are either "Agent" for agent value, the transducer name, or "Device" if applicable to the device as a whole.

StateTopics Dictionary A dictionary of the encoded state change PubSub queue objects that are used to output state changes for of strings this instrument. Indexes are either "Agent" for agent value, the transducer name, or "Device" if applicable to the device as a whole.

ResourceID String The resource registry ID of the instrument being fronted

DataCorrectionMode Dictionary A string indicating what corrections if any are currently being generated in an additional (or interleaved) of strings stream of data. Indexes are the transducer name or "Device" if applicable to the entire device. Moving between uncorrected and any other state may only happen when the instrument is inactive. May be:

Uncorrected Only Uncorrected implemented in Rel 1

TimeSource String A string indicating the source of time being used for the instrument. May be:

PTPDirect - IEEE 1588 PTP connection directly supported by the instrument NTPUnicast - NTP unicast to the instrument NTPBroadcast - NTP broadcast to the instrument LocalOscillator - The device has its own clock DriverSetInterval - Driver sets the clock at an interval

PhraseTimeout Integer The number of seconds to wait for an end_phrase message after a begin_phrase message is issued. If an end_phrase message is not returned within the timeout period, the pending phrase is cancelled. Value must be greater than 0.

ConnectionMethod String How the device is actually connected to the observatory. May be:

Offline - The device is offline CabledObservatory - The device is accessible through a cabled observatory, available full-time ShoreNetwork - The device is connected to a full-time shore connection PartTimeScheduled - The device is not always accessible, but comes online on a scheduled basis. Outages are normal. PartTimeRandom - The device is not always accessible, but comes online as needed. Outages are normal.

Parameters applying to the data sets (ie. sampling rate) are handled by the data set objects in the observatory infrastructure. get_observatory_param_metadata Message

Description:

This call gathers metadata regarding the CIDev:observatory configuration parameters that can be manipulated with the get_observatory() and set_observatory() calls. The metadata is per-parameter and can be used to determine more detail about the capabilities of the device.

Arguments:

Arguments should include a dictionary of configuration parameters to be queried with a list of CIDev:metadata parameter names to be gathered for the specified configuration parameters. Enter '*' to gather all possible parameters or metadata values. Example:

{'ResourceID':['LastChangeTimestamp', 'DataType'], 'DataCorrectionMode':['*']}

Returns:

CIDev:Success/failure with a success argument of a dictionary of dictionaries is returned with each configuration parameter having a dictionary of CIDev:metadata names and values. This mimics the argument list, but with name/value pairs instead of a metadata parameter list. Example:

437 {'ResourceID':{'LastChangeTimestamp':1286459918.397, 'DataType':'scalar'}, 'DataCorrectionMode':{'LastChangeTimestamp':1286457918.397, ...all additional metadata values..., 'DataType':'scalar'}}

get_observatory_status Message

Description:

Instrument agents have dynamic status beyond their lifecycle state. This is mostly related to the agent state, but may contain other CI-specific, dynamic values. Note that since the values are dynamic, their value may change while this call is completing, or shortly after this call is returned. It is recommended to issue a start_phrase before a series of get_observatory_status messages are sent if a consistent set of status values is required. If no phrase has been started the individual message is treated atomically and results are immediate. The values that are returned are garnered immediately from the active, running observatory status.

Arguments:

A list may be added so as to specify which CIDev:status keys should be returned. An empty list indicates all values should be returned.

Returns:

A dictionary of the CIDev:status keys for each of the keys. Each key indexes the current value.

CIDev:Success/failure where a successful response yields an argument with a dictionary of CIDev:instrument agent status values names and their values for the parameters that were requested. If a parameter was not found, it is not in the list. Example of the success argument:

['OK', {'key1':'123', ..., 'keyN':'xyz'}]

If a phrase has been started, the message has been added to a list of queries that will be atomically executed as a phrase. The return code is a success/fail indicating that the query was or was not added to the phrase successfully. A value will not be returned with this message, but rather with the CIDev:apply message if one is issued later.

Observatory Status Keys

These are the common status keys for the instrument agent.

Status Key Name Type Description

AgentState String The state that the instrument agent state machine is in. One of:

PoweredDown Uninitialized Inactive Stopped Idle ObservatoryMode DirectAccessMode

ChannelName List of strings A list of the names of the channels that are supported by this agent. The number of channels can be determined from the size of this list.

InstrumentConnectionState String The state of the connection to the instrument that is being fronted. One of:

Connected Disconnected

438 Alarms List of tuples Current alarm conditions. Alarm IDs include: (alarm_id, description) CannotPublishData - attempting to publish data but cannot InstrumentUnreachable - instrument cannot be contacted when it should be MessagingError - error when attempting to send messages to some destination HardwareError - hardware problem detected UnknownError

TimeStatus Dictionary of time Includes: status values Uncertainty - Current error range Peers - Peer address list

DataBufferSizeUsed Integer Number of bytes of data used by buffered data

InstrumentAgentVersion String The identifier of the software version of instrument agent code that is running

InstrumentDriverVersion String The identifier of the software version of instrument driver code that is running inside the instrument agent get_capabilities Message

Description:

Each agent has certain capabilities that need to be advertised to the observatory when queried. These capabilities represent the device-specific and any uncommon observatory parameters and actions that can be carried out by the agent.

Arguments/Returns:

One of the following names:

Argument Return

'*' Returns a dictionary of all lists, keyed by the name

ObservatoryCommands A list of all supported observatory command names, including common ones

ObservatoryParameters A list of all supported observatory parameters names, including common ones

ObservatoryStatuses A list of all supported observatory status names, including common ones

DeviceCommands A list of all supported device command names, including common ones

DeviceParameters A list of all supported device parameters names, including common ones

DeviceStatuses A list of all supported device status names, including common ones

ParameterMetadataNames A list of all supported parameter metadata names publish Message

Description:

Publishes a chunk of data from the instrument data. Messages to this call will only be accepted from the instrument agent's children processes.

Arguments

Argument Type Description

Transducer String The transducer name that is publishing the data, "Device" if relating to the whole device

439 Type String The type of message being sent. One of:

StateChange - The state has changed on the device ConfigChange - The configuration has changed on the device Error - An error has been encountered Data - A chunk of data to be published

Value String or data The block of data or string that is to be published

Returns

CIDev:Success/Fail. A "CannotPublish" error is returned if the topic is not available for publishing or the message is sent from a process that is not a child of the agent.

Instrument-related interface

Channel addresses

Each device may support a number of channels of data, be they separate sensors or separate streams of differently processed data. When issuing a command or gets, the address of the channel needs to be included. Addresses will vary by type of instrument supported as some channels are better specified with names or numbers, largely based on how the instrument handles channels. Ignoring the address field causes the command or get/set to apply to the device itself, while '*' applies to all channels. Example:

[]

['*']

['chan1', chan2', 'chan5']

execute_device Message

Description:

This call executes a command on the instrument that is fronted by the instrument agent. There are some standard commands that all instruments support, plus additional commands that specific instruments deal with. In order to work with the instrument-specific commands, the calling entity needs to know what sort of instrument is in place and what commands it can handle. This knowledge can be obtained by checking the type of instrument fronted by this agent or requesting the capabilities first. Commands will be added to a local-to-the-conversation buffer after each execute. Upon CIDev:ending of the phrase, the commands that were indicated will be tagged as being part of an atomic action. When they are CIDev:applied, they will be executed in the order that they were executed in the phrase.

Arguments:

The messaging format for argument entry is

[_addresslist_, ['command1', 'arg1', 'arg2', ..., 'argN']]

Example:

[['*'], ['Reset']] should issue a "Reset" to all channels and

[[], ['StartAcquisition']] should issue a "StartAcquisition" to the device itself.

440 Returns:

A CIDev:success/fail message will be returned. If more than one channel destination was specified for a command, each channel will have a response tuple for the channel, and order in the sequence a command was represented (if multiple of the same command are issued to the same channel). See command list for returns and errors for specific commands. For example:

[{('chan1', 1):['OK']}, {('chan1', 2):['ERROR','Connect Failed. Already established.'}}] where two of the same command were issued to channel 1. The first was successful, the second was not.

Command list:

The command list for the instrument varies based on what the instrument supports. The standard commands include:

Command Arguments Description Return

Reset none Reset the destination to the OOI defaults CIDev:Success/failure

FactoryReset none Reset the destination to factory defaults if available CIDev:Success/failure

StartAcquisition none Start automated acquisition of data using the current settings CIDev:Success/failure

StopAcquisition none Stop automated acquisition of data using the current settings CIDev:Success/failure

RunSelfTest String Run a self test on the specified subsystem (device specific). A missing argument CIDev:Success/failure or empty string tests whole device.

RunSelfCalibration String Run a self calibration procedure (device specific) on the specified subsystem CIDev:Success/failure (device specific). A missing argument or empty string calibrates whole device.

A success/fail return is a dictionary with a "status" field that is either "OK" or "ERROR" and a value field that has a description of the problem or the result. get_device Message

Description:

Get a list of instrument configuration parameters from the instrument. Configuration parameters are largely static as they do not reflect status but configuration. While there are some common CIDev:instrument configuration parameters, most parameters are very dependent on the type of instrument installed. Knowledge of the instruments can be obtained by checking the type of the instrument fronted by the instrument agent and/or querying the capabilities. The general principal is that any parameter that can be obtained through a get_device() command can then be set with a set_device() command under the correct conditions (permission, state, etc.).

A CIDev:phrase must be established and an update called in order to fetch parameters in a consistent state.

Arguments:

A tuple list may be added so as to specify which CIDev:instrument parameters for which channels should be returned. To request status of all channels, use '' for the channel name. To request a parameter of the device itself, use an empty string ('') for the channel name. To return all status values, use '' for the parameter name. Example:

[('', 'param1'), ('*', 'paramN')]

Returns:

CIDev:Success/fail where the success argument is a dictionary is returned with the names being a tuple containing the channel and the CIDev:instrument configuration parameter names, with the values being the values of the parameters. The values that are returned are garnered immediately from the active, running device configuration, not the buffered, un-applied, configuration. If a parameter name was not found, it is not included in the return list. Example:

441 {('chan1', 'param1'):'123', ..., ('chanM','paramN'):'xyz'}

If a phrase has been started, the message has been added to a list of queries that will be atomically executed as a phrase. The return code is a success/fail indicating that the query was or was not added to the phrase successfully. A value will not be returned with this message, but rather with the CIDev:apply message if one is issued later. set_device Message

Description:

Set a list of configuration parameters for the instrument. Setting a value will apply that value to a buffered, local-to-the-conversation data block of configuration. Configuration parameters are largely static as they do not reflect status but configuration. While there are some common CIDev:instrument configuration parameters, most parameters are very dependent on the type of instrument installed. Knowledge of the instruments can be obtained by checking the type of the instrument fronted by the instrument agent and/or querying the capabilities. The general principal is that, provided correct permissions and device support, parameters that can be obtained through a get_device() command can then be set with a set_device() command under the correct conditions (permission, state, etc.). If a CIDev:phrase has been started, the sets will all be made to the configuration block atomically at the end of the phrase. Set messages not in a phrase will be applied as if they were in a singleton phrase.

Arguments:

The argument is a dictionary with names being a tuple of the channel name and valid parameter names from the CIDev:instrument configuration parmaeters list and the values that are intended to be set. Use '*' as the channel name to address all channels, or exclude it to address the device itself. Example:

{('chan1','param1'):'123', ..., ('chan2','paramN'):'xyz'}

Returns:

Returns a dictionary of CIDev:success/fail for each attempted value set. The name of the parameter and the channel are the keys to the dictionary. An error is generated if no CIDev:phrase has been started. Example:

{('chan1','param1'):['OK'], ('chan1','param2'):['ERROR'], ..., ('chanM','paramN'):['OK']}

Instrument Get/Set Parameter List:

The parameters depend on the specific device being manipulated, but the common ones are:

Parameter Name Type Description

ChannelList List A list of channel names that are supported by the instrument

MetadataBlock Binary A block of data representing any explicit metadata chunk that may be part of the system. This may be a PUCK or TEDS block

442 SamplingMode List First item is the mode name, second item is the pre-trigger condition (empty string if N/A), third item is post-trigger condition (empty string if N/A). See IEEE1451.0 Sec 5.10.1 for descriptions. Sampling mode names are:

TriggerInitiated FreeRunningNoPreTrigger FreeRunningPreTriggerNoBuffers FreeRunningPretriggerBuffers ContinuousSampling Immediate

TriggerCondition String Trigger conditions are strings that refer to channel names and basic comparison operators for data values (Example: 'chanA < 5.2').

TransmitMode String The mode for transmitting data, regardless of data collection mode. May be:

OnCommand - Only transmit data when commanded BufferFull - Only transmit data when any internal buffer is full Interval - Transmit data at a given interval

TransmitInterval Integer Time difference between transmit of data (only applies if TransmitMode is in interval mode)

EdgeReportMode String Indicates which edges to report on when detecting thresholds and building series. Values can be:

RisingEdges - Report on rising edges FallingEdges - Report on falling edges BothEdges - Report on both edges

Location Tuple Latitude and longitude (decimal degrees, North and East positive), settable only when location is static

LocationType String 'Static' (Location unchanging) or 'Dynamic' (Location changes)

CalibrationType String Information regarding the type of calibration for a channel. May be one of the following values:

None - No calibration is needed or available Supplied - Calibration information supplied by the agent Self - Calibration can be done via self calibration command

CalibrationInformation Binary The block of data that is considered the calibration data that is on-board the device. data

DataUnits String The units of the data represented by a channel

DataLowerLimit Float The lowest data value supplied by a channel, post correction

DataUpperLimit Float The highest data value supplied by a channel, post correction

DataUncertainty Float The worst-case uncertainty that exists in the data value supplied by a channel

TriggerSeriesInfo Tuple The definition of a series that is to be described after a trigger. Tuple is (max_data_measurements, series_origin, series_increment, series_units, max_pretrigger_samples) where:

max_data_measurements - The number of data measurements to take in the series series_origin - The gap in data value or time (depending on the unit) between the trigger and the first series data point series_increment - The gap in data value or time between measurements series_units - The units used in the to measure the increment and origin values max_pretrigger_samples - The number of pre-trigger samples to include in the series when running in pre-trigger mode

FrequencyResponseHeader Tuple The frequency response parameters for a channel as expressed in a tuple of (ref_frequency, of ref_amplitude, ref_phase) where: Float ref_frequency - The frequency in Hz where the amplitude is defined as being unity ref_amplitude - The input amplitude used for the response, units are the same as the channel ref_phase - The phase shift of the output at the reference frequency

443 FrequencyResponsePoints List of A list of N points for frequency response values where points are described by tuples of floats in the Tuples format [CIDev:(pt_frequency, pt_amplitude, pt_phase),...] with:

pt_frequency - Frequency where the amplitude and phase are applicable (in Hz) pt_amplitude - The amplitude of the output at the given frequency pt_phase - The phase shift of the of the output at the frequency

get_device_param_metadata Message

Description:

This call gathers metadata regarding the CIDev:device configuration parameters that can be manipulated with the get_device() and set_device() calls. The metadata is per-parameter and can be used to determine more detail about the capabilities of the device.

Arguments:

A tuple-indexed dictionary may be used so as to specify which CIDev:parameter metadata values for which channels should be returned. The indexing tuple contains the channel and parameter names for which metadata parameters are being requested. To request status of all channels, use '' for the channel name. To request a parameter relating to the device itself, use '' for the channel name. To return all status values, use '*' for the parameter name. Values of the dictionary should be lists of parameter names that are desired. Example:

[{('chan1','InstrumentParamA'):['LastChangeTimestamp', 'DataType'], ('*', 'InstParamB':['*']}]

Returns:

A dictionary of dictionaries is returned with each configuration parameter having a dictionary of CIDev:metadata names and values. The outer dictionary is indexed by the channel and configuration parameter name as a tuple. Example:

{('chan1', 'InstrumentParamA'):{'LastChangeTimestamp':1286459918.397, 'DataType':'scalar'}, ('chan1', 'InstParamB'):{'LastChangeTimestamp':1286457918.397, ...all additional metadata values..., 'DataType':'scalar'}, ('chan2', 'InstParamB'):{'LastChangeTimestamp':1286457918.397, ...all additional metadata values..., 'DataType':'scalar'}}

get_device_status Message

Description:

Instruments have dynamic status. This is mostly related to the instruments state and or error status, but may contain other instrument-specific, dynamic values. Note that since the values are dynamic, their value may change while this call is completing, or shortly after this call is returned. It is recommended to issue a start_phrase before a series of get_observatory_status messages are sent if a consistent set of status values is required. If no phrase has been started the individual message is treated atomically and results are immediate. The values that are returned are garnered immediately from the active, running observatory status.

Arguments:

A tuple list may be added so as to specify which CIDev:status keys for which channels should be returned. To request status of all channels, use ' ' for the channel name. To request status of the device itself, use and empty string ('') for the channel name. To return all status values, use '' for the status key. Example:

[('', 'InstrumentState'), ('*', 'Alarms')]

Returns:

CIDev:Success/fail where the success argument is a dictionary of the CIDev:status keys for each of the keys. A tuple of channel name and key indexes the current value. Example:

444 {('', 'InstrumentState'):'Standby', ('chan1', 'Alarms'):('UnknownError','Something is very wrong'), ('chan2', 'Alarms'):('UnknownError','Something is only sort of wrong')}

If a phrase has been started, the message has been added to a list of queries that will be atomically executed as a phrase. The return code is a success/fail indicating that the query was or was not added to the phrase successfully. A value will not be returned with this message, but rather with the CIDev:apply message if one is issued later.

Instrument Status Keys

These are the common status keys that can be queried across all instruments. The instrument-specific keys are not listed here.

Status Key Name Type Description

InstrumentState String The state that the instrument or channel is in. One of:

Standby Testing Streaming Data

Alarms List of tuples (alarm_id, description) Current alarm conditions. Alarm IDs include:

UnknownError

SelfTestCapable Boolean The channel or device has the ability to self test

MultiRangeCapable Boolean The channel or device is able to offer data over different ranges execute_direct Message

Description

Direct access to an instrument will sometimes be required. To facilitate calls that are destined directly to an instrument, this call is defined. It is only operational when the instrument agent is in DirectAccess state. The call handles all necessary state checking and governance checking internally before the arguments to this call are then passed directly to the intstrument.

Arguments

This call accepts a key and a (possibly binary) block of data that is to be passed directly to the instrument. The key is used to identify the user, authentication credential, session identifier, or other necessary session description material that would allow the execute_direct() call to be accepted. It is expected that, during the transition to direct access mode, once validated, the instrument agent returns an key to the caller that will then be accepted here.

Returns

Failure if the key does not match the current approved session. Failure if the instrument is not in direct access mode. Success if the key matches and the block of data has been delivered to the instrument.

Both interfaces

Messages, commands, parameters, etc apply to both interfaces. begin_phrase Message

Description

This message indicates the beginning of a collection of operations (get/set/execute) on the instrument agent that are to be wrapped together as a collection of actions. While not necessarily a transaction or session, this message designates the beginning of a series of operations that are to be

445 treated together, atomically when operated on by the instrument agent. The phrasing is able to help indicate exclusivity of the resource, atomicity, and a logical block of work. Failing to begin a phrase will treat messages as both atomic and immediate in that the single message will be initiated right away.

A phrase can only contain one type of operation (get, set, or execute). The first type added to the phrase makes the phrase sets the acceptable contents of that phrase. Only additional operations of that type will be accepted into the phrase. For example, if a phrase has been started and a get message has been added, an attempt to add a set message will fail.

Only one phrase can be open at a time. If the phrase is incorrect, it can be CIDev:discarded and a new one started.

Phrases must either be completed with an end_phrase message or an apply_phrase message. The end_phrase indicates that the atomic block has been defined and is now ready to be applied at the next apply_phrase message. Using an apply_phrase without the end_phrase message ends and applies the phrase immediately, then begins another phrase.

Arguments

Optional timeout value (as a time duration) for the phrase. If a client knows this will be a long phrase, it can extend the default timeout on a per-phrase basis.

Returns

If the resource is currently able to accept a begin_phrase message, it will return a CIDev:success message. If the agent cannot begin a phrase, it will indicate a failure with an error message. If the resource is unavailable (perhaps a phrase has already been opened), this message will fail. A phrase may also fail for the wrong type being added after the first element has been addedd. For example:

['OK', "phraseXYZ123"]

['ERROR', "Phrase already pending"]

['ERROR', "Wrong type for phrase. Must be GET"]

['ERROR', "Phrase already ended. Apply or cancel first"}

end_phrase Message

Description

This message indicates the end of a collection of operations (get/set/execute) that have already been started using the begin_phrase message. Ending a phrase merely indicates that a phrase is ready to be executed and that the included operations will be executed atomically. Simply ending a set or command phrase does not imply that the phrase gets applied. Get phrases will obtain the data immediately upon the end_phrase message being sent.

Arguments

None

Returns

For set and execute phrases, end_phrase returns CIDev:Success or failure with description of failure reason. For get phrases, end_phrase returns a list of the results from the included get commands in the phrase. For example:

[{('chan1', 'param1'):'123', ..., ('chanM','paramN'):'xyz'}, {('chan2', 'param2'):'456', ..., ('chanM','paramK'):'pdq'}] for a phrase with 2 device gets.

Ending a phrase may also fail if a phrase has not been started or has already been ended.

446 ['ERROR', "No phrase started."}

['ERROR', "Phrase already ended. Apply or cancel first"}

cancel_phrase Message

Description

This message cancels a pending phrase. The operations that have been added to the phrase are removed and the agent is again ready to accept a begin_phrase message.

Arguments

None

Returns

CIDev:Success or Fail. If it is a failure, a reason is given. apply Message

Description

Apply an ended set or execute phrase to the buffer of pending/desired configuration for the device and observatory. This operation atomically applies the pending set or execute phrase to both the observatory and the device. Upon successful application, any buffered commands are cleared from the list and the instrument agent's pending/desired configuration reflects the changes that were phrased.

Arguments

None

Returns

CIDev:Success or fail if the commands or configuration could not be sent to the device or observatory. In the event of commands being applied, the response returns a list of CIDev:success/fail results from the commands in order of their listing. update Message

Description

This message updates the device and observatory atomically so that the pending/desired configuration and list of execute operations are made active, either through the driver to the device, or to the instrument agent's active configuration. This may ultimately include multiple phrases worth of commands and get/set operations that have been applied toward the working configuration buffer via the CIDev:apply message.

Arguments

Boolean indicating the commands will be executed before the configuration changes are applied (true) or after the configuration is applied (false)

Returns

Returns a list of one collective set_device returns and multiple execute_device returns. For example:

[{('chan1','param1'):['OK'], ('chan1','param2'):['ERROR'], ..., ('chanM','paramN'):['OK']}, [{('chan1', 1):['OK']}, {('chan1', 2):['ERROR','Connect Failed. Already established.'}}]]

447 Configuration parameter metadata values

Metadata parameter Type Description name

DataType One of: Integer , Float , Boolean , Range , RegistryID , Describes the intended type of data Timestamp, TimeDuration?, String

PhysicalParameterType One of: The type associated with the physical parameter. More info in IEEE1451.1 Sec 10.4.2.2. ScalarAnalog ScalarDiscrete ScalarDigital ScalarAnalogSeries ScalarDiscreteSeries ScalarDigitalSeries VectorAnalog VectorDiscrete VectorDigital VectorAnalogSeries VectorDiscreteSeries VectorDigitalSeries

MinimumValue Integer The minimum value this parameter can have

MaximumValue Integer The maximum value this parameter can have

Units String The units that this value represents

Uncertainty Range The uncertainty of the value. If none applies, empty range

LastChangeTimestamp Timestamp The timestamp of the last change to the parameter

Writeable Boolean True if the parameter can be set, false otherwise

Format types

Timestamp and TimeDuration formats

Timestamps and time durations will take the format used in IEEE1588. A timestamp is indicated by a tuple of (seconds, nanoseconds) from the epoch. A time duration is indicated by a tuple of (seconds, nanoseconds) indicating an amount of time.

Success/Fail messages

Success message contents are a list with the string "OK" and an optional series of explanation/argument values that are applicable to the operation being reported upon. For example:

['OK', 'ExampleID123', (lat, lon)]

Failure messages contents are a list with the string "ERROR", an error code, and a string describing the type of failure. For example:

['ERROR', 'UnknownError','This is a bad error.']

Errors

The following errors are common and may be returned at from any message. Messages may also return specific errors as indicated above.

Error ID Description

InvalidDestination Intended destination for a message or operation is not valid

Timeout The message or operation timed out

NetworkFailure A network failure has been detected

448 NetworkCorruption A message passing through the network has been determined to be corrupt

OutofMemory There is no more free memory to complete the operation

LockedResource The resource being accessed is in use by another exclusive operation

ResourceUnavailable The resource being accessed is unavailable

UnknownError An unknown error has been encountered

PermissionError The user does not have the correct permission to access the resource in the desired way

Specific errors may be returned only by certain operations. Their codes are listed here:

Error ID Description

InvalidTransition The transition being requested does not apply for the current state

IncorrectState The operation being requested does not apply to the current state

CannotPublish An attempt to publish has failed

InstrumentUnreachable The agent cannot communicate with the device

MessagingError An error has been encountered during a messaging operation

HardwareError An error has been encountered with a hardware element

PhrasePending A phrase is currently pending

WrongType The type of operation is not valid in the current state

InvalidCommand The command is not valid in the given context

CIAD SA SV Instrument Driver Design

This page describes the design of the instrument driver. The intent is to describe the design in sufficient detail to enable developers of new instrument drivers to "hit the ground running."

See also:

Instrument Driver Framework

Introduction

The Instrument Driver is the OOI-CI component that provides the "last mile" access to the instrument. That is, the Instrument Driver is the component that communicates, on behalf of the OOI-CI, directly with an instrument using the instrument's physical interface and native protocol. In order to provide this access, the Instrument Driver employs two software interfaces: one to communicate to the OOI-CI, and one to communicate with the instrument.

Further, in order to reliably provide the OOI-CI with access to the instrument, the Instrument Driver must be aware of and manage the behavioral characteristics of the instrument, including whether the instrument is intelligent, whether it is connection-oriented, whether it is full-duplex or half-duplex, whether it sleeps periodically, and if so, what sequence of actions is required to wake it up. This is not intended to be a complete list, but rather just to demonstrate a major function that an instrument driver must perform. To accomplish this, the Instrument Driver uses a hierarchical state machine, which is defined later in this document.

The following sections provide the goals and details of the Instrument Driver design.

Instrument Driver Design Goals

Generic Driver Specialization

A major goal of the Instrument Driver design is to provide a generic base driver that can then be elaborated upon and specialized by Instrument Driver developers to apply to specific instruments. This base Instrument Driver design, therefore, must provide the infrastructure upon which code can be applied which handles the behavioral peculiarities of all instruments.

This is accomplished by using base classes that encapsulate the common characteristics, and using subclasses that handle the specific characteristics.

449 OOI-CI Agnostic

Ideally, the Instrument Driver should be completely independent of the OOI-CI; that is, the Instrument Driver should be operable in a non-OOI-CI environment. This would allow, for example, a non-OOI CI laptop or mobile device to interface to the instrument. Because the driver provides the "last mile" access, the need might arise to access the instrument when the OOI CI is not available.

Instrument Native Protocol Abstraction

Another goal of the Instrument Driver design is to provide an abstraction - to the OOI-CI as well as the Instrument Driver developer - of the details of the instrument's native protocol. That is, the OOI CI should be completely unaware of the instrument's native protocol, but further, the person that is specializing the generic Instrument Driver to access a specific device should be able to instantiate a communications object that handles the details of the instrument's native protocol.

Native Protocol Abstraction to OOI-CI

This abstraction is handled with the interface between the Instrument Agent and the Instrument Driver. Because the interface is already defined it is not being addressed by this document.

Native Protocol Abstraction to Instrument Driver Developer

The generic Instrument Driver design aims to abstract from the Instrument Driver developer the details of the instrument's native protocol by providing an InstrumentCommunications object. The InstrumentCommunications object will provide specialization subclasses for the particular protocol employed by the instrument.

General Instrument Driver Design

This section describes the general design of the Instrument Driver.

The Instrument Driver is event driven, fielding events from the OOI-CI and from the Instrument. That is, the Instrument Driver is at rest unless an event is received from either the OOI-CI or the Instrument. These events are serialized and scheduled into a single state machine which acts as the processing core of the Instrument Driver. All events, whether from the OOI-CI or the Instrument, are processed serially by the state machine; it is the state in which the machine resides that determines how the event is to be processed. The following sections describe the two interfaces of the instrument driver and the state machine.

Instrument Driver to OOI-CI Interface

The Instrument Driver interfaces with the OOI-CI through the Instrument Agent. The Instrument Agent can be thought of as the Instrument Driver's adapter to the ION. The Instrument Agent Interface is described here: CIAD SA SV Instrument Agent Interface.

Instrument Driver to Instrument Interface

The Instrument Driver and the Instrument Agent are separate processes, and the interface between the two processes is message based. The Agent sends commands to the Driver using messages, and the command is replied to either positively or negatively, depending upon whether the command is valid or not. The Driver sends unsolicited messages to the Agent for the purpose of publishing data to the ION.

Instrument Driver Hierarchical State Machine

This section describes the Hierarchical State Machine employed by the Instrument Driver.

Hierarchical State Machine Introduction

A hierarchical state machine (HSM) is a state machine derived from a UML State Diagram. An HSM improves upon traditional Finite State Machines by utilizing state inheritance similar to how objects use class inheritance. Super states provide general state behavior, while sub-states provide behavior specialization (just as sub-classes specialize super-classes). Using this model, behavior that is common many states can be defined in super states and therefore reused by state inheritance.

450 States

This section specifies the states that a generic Instrument Driver can be in.

Idle Configured Disconnected Connecting Connected WakingUp Prompted

Events

This section specifies the events that can be received by a generic Instrument Driver. Events can arrive from either the OOI CI or the instrument.

451 Events from OOI CI

op_disconnect op_fetch_params op_set_params op_execute op_get_status op_configure_driver

Events from Instrument

connection made connection lost data received

Transitions

HSM Specification

This section describes the means by which the HSM can be specialized using an input document. This is described using a pseudocode type of language rather than specifying the precise syntax. To specialize a generic Instrument Driver, we need to specify the following.

State Specification

To specify a state we need to specify the following:

name parent events event inputs (inputs that can accompany the event) transition to state (state to transition to upon event entry actions exit actions

Event Specification

CIAD SA SV Instrument Driver Interface

The instrument driver needs an interface defined between it and the instrument agent. Since the instrument agents are the event handlers in the system and the drivers are the data/protocol handlers there needs to be a well defined way for them

Overview

The instrument driver consists of 4 parts:

A brief set of messages for generically commanding any instrument. A syntax for addressing commands and parameter key-value pairs to instrument channels, including multiple channels, all channels and the instrument itself. A client response message format containing command success and failure information. A publishing message format containing message type information used by the instrument agent to route messages for ion distribution.

Assumptions

1. Proper policy and access issues are resolved by the instrument agent before these messages are exchanged. 2. Command phrasing and exclusive access are resolved by the instrument agent. 3. Drivers are agent owned subprocesses accessed through a client object. This may simplify the driver logic and give it increased flexibility over embedding it in the agent process. 4. Data is published once in a raw or canonical format as configured by driver. 5. Instrument metadata, including channel types and status strings, will be collated in a confluence page and mapped to a naming scheme. This will be pushed into relevant CI object definitions as needed. 6. Command messages are queued for the instrument upon receipt with an immediate reply (no blocking for instrument reply), ensuring timely command-response and simplifying driver logic. Success indicates the message contained valid commands and they were successfully queued for execution. Command return values (data, parameter values, other results) are sent as publish messages to the agent. Clients (e.g. instrument agents) must monitor the publish stream to retrieve command results.

Pending issues (remove and list in assumptions once they are agreed)

1. 452 1. Specify a mechanism for associating a published response containing a command result with the original command message. 2. Do we want to package publish stream responses resulting from commands containing multiple instrument commands? Example: a set contains 20 parameter values to set. These are broken as issued as 20 instrument commands. Should the results be repackaged for the client into a single dictionary with 20 values?

To Do

1. Define "data" member format precisely for each publish message type. 2. Create a confluence page for instrument metadata, channel types and status strings. 3. Provide an analysis of the overlap between this interface and MBARI SIAM sensor interface. 4. Create design diagrams to illustrate the specific message flow concepts used here.

Command Syntax

Channel-parameter-value arguments are organized as tuples, lists and dictionaries as follows

syntax description example use

(chan_arg,param_arg) tuple of channel-parameter strings get message

[CIDev:(chan_arg,param_arg),...,(chan_arg,param_arg)] list of channel-parameter string tuples get message

{(chan_arg,param_arg):value,...,(chan_arg,param_arg):value} dictionary of channel/parameter-value pairs set message with the following argument semantics

tuple argument option description

chan_arg integer 0 the instrument itself

chan_arg integer 1,2,...,N channel 1,2,...,N

chan_arg 'instrument' the instrument itself

chan_arg 'channel_x' a named channel

chan_arg 'all_channels' all channels

chan_arg 'instrument_and_all_channels' all channels and the instrument

chan_arg 'all' all channels and the instrument

param_arg integer 1,2,...,N parameter 1,2,...,N

param_arg 'param_x' named parameter x for specified channel

param_arg 'all_parameters' all parameters for specified channel

param_arg 'all' all parameters for specified channel

Instrument commands address channels or the instrument as a whole. Channel specifiers are chan_arg strings or lists of chan_arg strings. Commands are specified as command-and-argument lists. They are combined in the following ways

syntax description

(chan_arg,[CIDev:'command','arg1','arg2',...,'argn']) send command with arguments to channel specifier

([CIDev:chan_arg,...,chan_arg],[CIDev:'command','arg1','arg2',...,'argn']) send command to list of channel specifiers

Message List

The following messages define the driver interface

Command Description

CIDev:get Retrieve named parameters from instrument.

CIDev:set Set named parameters on instrument.

CIDev:execute Execute generic command on instrument.

CIDev:execute_direct Send message payload directly to device without coded translation.

453 CIDev:get_status Retrieve instrument status.

CIDev:get_metadata Retrieve instrument, channel or parameter metadata.

CIDev:get_channels Retrieve list of named channels.

CIDev:initialize Initialize instrument driver.

CIDev:configure Configure instrument driver.

CIDev:connect Establish connection to instrument.

CIDev:disconnect Close connection to instrument. get Message

Description

Retrieve named parameters from instrument. These parameters are instrument specific.

Arguments

A tuple or tuple list following the standard syntax specifying channel-parameter pairs to retrieve.

(chan_arg,param_arg)

[(chan_arg,param_arg),...,(chan_arg,param_arg)]

Example:

[('instrument', 'param1'), ('all_channels', 'paramN')]

Returns

CIDev:Success/fail where the argument is a dictionary of CIDev:Success/fail for each value requested. The name of the parameter and the channel are the keys to the dictionary.

Example:

['OK',{('chan1', 'param1'):'OK', ..., ('chanM','paramN'):'OK'}]

Publishes

CIDev:GetResult events (one for each parameter specified) containing parameter values. set Message

Description

Set named parameters on instrument. These parameters are instrument specific.

Arguments

A dictionary of channel-parameter-value items in the standard syntax

{(chan_arg,param_arg):value,...,(chan_arg,param_arg):value}

Example:

454 {('chan1','param1'):'123', ..., ('chan2','paramN'):'xyz'}

Returns

CIDev:Success/fail where the argument is a dictionary of CIDev:Success/fail for each attempted value set. The name of the parameter and the channel are the keys to the dictionary.

Example:

['OK',{('chan1', 'param1'):'OK', ..., ('chanM','paramN'):'OK'}]

Publishes

CIDev:ChangeConfig events for each parameter successfully set. execute Message

Description

Instructs the driver to execute a command.

Arguments

Tuples of chan_arg, chan_arg lists and command lists in the standard syntax

(chan_arg,['command','arg1','arg2',...,'argn'])

([chan_arg,...,chan_arg],['command','arg1','arg2',...,'argn'])

Example:

[['all_channels'], ['initialize']] should issue "initialize" to all channels and

[['instrument'], ['disconnect']] should issue "disconnect" to the device itself.

Commands include:

command description messages published

FactoryReset Restore instrument configuration to factory zero or more ConfigChange messages defaults.

Sample Acquire a sample. one Data message

StartAcquisition Begin continuous sampling. one StateChange message and ongoing Data messages

StopAcquisition Stop continuous sampling. one StateChange message

Test Run self test routine. one or more TestResult messages

Calibrate Run calibration routine. zero or more ConfigChange messages and zero or more CalibrationResult messages

Returns

455 CIDev:Success/fail with where the argument is a dictionary of CIDev:Success/fail for each command attempted.

Example:

['OK',{'chan1':'OK', ..., 'chanM':'OK'}]

Publishes

Command specific messages as described in above table. See the CIDev:publish messages section for message definitions. execute_direct Message

Description

This message directly passes the payload of the message to the device. It is the agent's responsibility to determine what overall state the device is in and when these should be sent. The driver is to execute these whenever they are received.

Arguments

The block of data to pass to the instrument.

Returns

CIDev:Success/fail with a failure string.

Publishes

CIDev:RawData messages containing unmodified or translated data blocks from the instrument. get_status Message

Description

Gets status information from the instrument. Status values may be common or instrument specific.

Arguments

A chan_arg string or list following the standard syntax specifying channels.

chan_arg

[chan_arg,...,chan_arg]

Example:

['instruemnt', 'all_channels'] retrieves the status of all channels and the instrument as a whole.

Returns

CIDev:Success/fail with where the argument is a dictionary of CIDev:Success/fail for each status requested.

Example:

['OK',{'chan1':'OK', ..., 'chanM':'OK'}]

Publishes

456 A CIDev:StatusResult message for each channel queried. get_metadata Message

Description

Retrieve metadata for instrument, channel or parameters.

Arguments

A channel string, channel/parameter tuple, or list of strings and tuples specifying a combination of channels and parameters in the standard syntax.

chan_arg

(chan_arg,param_arg)

[(chan_arg,param_arg),chan_arg...,(chan_arg,param_arg),chan_arg]

Example:

['chan1',('chan1', 'caldate'),'instrument']

Returns

CIDev:Success/fail with where the argument is a dictionary of CIDev:Success/fail for each block of metadata requested.

Example:

['OK',{'chan1':'OK',('chan1','caldate'):'OK','instrument':'OK'}]

Publishes

A CIDev:MetadataResult message for each block of metadata requested. get_channels Message

Description

Retrieve list of named channels from instrument.

Arguments

None

Returns

CIDev:Success/fail with failure string.

Publishes

A CIDev:MetadataResult message containing a list of channel names and type string tuples. initialize Message

457 Description

Initialize driver to initial creation state. Not connected, no protocol objects, empty command and data queues, initialized state machine.

Arguments

None.

Returns

CIDev:Success/fail where the fail argument is a string describing the error on initialize.

Publishes

None. configure Message

Description

Configures the driver object to establish a connection to the instrument.

Arguments

A dictionary of key-value driver parameter pairs sufficient for the driver to communicate with the instrument in a supported mode.

Example:

{'ipaddr':'137.110.112.119','port':9000,'baudrate':9600}

Returns

CIDev:Success/fail where the argument is a dictionary of CIDev:Success/fail strings for each attempted configuration value set. The name of the parameter and the value are the keys to the dictionary.

Example:

{('ip_addr','137.110.112.119'):['OK'], ('port',9000):['OK'],('baudrate',9600):['OK']}

Publishes

None. connect Message

Description

Establish driver-instrument connection using current driver configuration. Assumes the driver is initialized and configured to a supported communication mode.

Arguments

None.

Returns

CIDev:Success/fail where the fail argument is a string describing the error on connect.

Publishes

CIDev:Event message with connection made if successful.

458 disconnect Message

Description

Close driver-instrument connection. Returns driver to initialized and configured state.

Arguments

None.

Returns

CIDev:Success/fail where the fail argument is a string describing the error on disconnect.

Publishes

CIDev:Event message with connection lost if successful.

Client Response Messages

Success/Fail

Success message contents are a list with the string "OK" and an optional series of explanation/argument values that are applicable to the operation being reported upon. For example:

["OK", "ExampleID123", (lat, lon)]

Failure messages contents are a list with the string "ERROR" and a string describing the type of failure. For example:

["ERROR", "This is a bad error."]

Often the optional response value will contain a dictionary with more specific success/failure information. For example, in the case of a set:

["ERROR", {('chan1','param1'}:'OK',...,('instrument','paramx'):'ERROR'}] that allows the setter to easily identify which elements of the set caused the trouble. Other messages return variations on this theme as described above.

Publish Messages

Publish messages sent to the agent with operation "publish" have the following contents

{"Type":topic, "Transducers":transducer, "Value":data}

Here, transducer is always "device," while topic is the type of publish message and data is topic-specific, given in the following table:

topic description data

Error an error occurred error string

StateChange change in instrument state occurred new instrument state

ConfigChange change in instrument parameter occurred new parameter-value pair

Data a data sample was retrieved data sample

RawData data was received in response to execute_direct unmodified data block from instrument

GetResult a parameter value was requested parameter-value pair

CalibrationResult a calibration routine was run calibration routine result

459 TestResult a test routine was run test routine result

MetadataResult instrument metadata was retrieved instrument metadata dictionary

StatusResult instrument or channel status was retrieved instrument or channel status

Event an instrument event occurred event string

Test Instrument Interface

The instrument driver needs an interface defined between it and the instrument agent. Since the instrument agents are the event handlers in the system and the drivers are the data/protocol handlers there needs to be a well defined way for them to communicate with each other.

Notes

Assumptions

Proper policy and access issues are resolved by the instrument agent before these messages are exchanged.

Pending issues

Do we need to split execute into driver and instrument? Can we collapse set_data_destination into the generic set (if there is one)? Do we need phrasing here as well, or can we assume that the instrument agent has some internal capacity to lock the resource before actually making driver calls during its get/update operations? Or maybe the agent is smart enough to stuff a complete list into a large message and make it atomic? Should set_state really be something like execute_state_transition? ie. would it make any difference to define what transition is being made instead of the next desired state? Possibly easier to code? What sort of mechanism need to exist to tie an agent instance to a driver instance? Shared key? destination pairs when handling messages? Should be mostly built into the infrastructure, but may need to be checked here for each message.

Messages execute Message

Description

Instructs the driver to execute a command.

Arguments

The messaging format for argument entry is

[_addresslist_, ['command1', 'arg1', 'arg2', ..., 'argN']]

Example:

[['*'], ['Initialize']] should issue a "Reset" to all channels and

[[], ['Disconnect']] should issue a "Disconnect" to the device itself.

Commands include:

Command Applies to Description Arguments Returns

Initialize Instrument Resets the instrument or channel to a known, initial state None CIDev:Success/Fail

Connect Driver Explicitly establishes a connection to the instrument or verifies connection exists None CIDev:Success/Fail

Disconnect Driver Explicitly ends a connection to the instrument None CIDev:Success/Fail

460 get Message

Description

Gets parameters from the instrument. These parameters are instrument specific.

Arguments

A tuple list may be added so as to specify which instrument parameters for which channels should be returned. To request status of all channels, use '' for the channel name. To request a parameter of the device itself, use an empty string ('') for the channel name. To return all status values, use '' for the parameter name. Example:

[('', 'param1'), ('*', 'paramN')]

Returns

CIDev:Success/fail where the success argument is a dictionary is returned with the names being a tuple containing the channel and the instrument configuration parameter names, with the values being the values of the parameters. The values that are returned are garnered immediately from the active, running device configuration, not the buffered, un-applied, configuration. If a parameter name was not found, it is not included in the return list. Example:

{('chan1', 'param1'):'123', ..., ('chanM','paramN'):'xyz'} set Message

Description

Sets configuration parameters on the instrument. These parameters are instrument specific.

Arguments

The argument is a dictionary with names being a tuple of the channel name and valid parameter names from the instrument configuration parmaeters list and the values that are intended to be set. Use '*' as the channel name to address all channels, or exclude it to address the device itself. Example:

{('chan1','param1'):'123', ..., ('chan2','paramN'):'xyz'}

Returns

Returns a dictionary of success/fail for each attempted value set. The name of the parameter and the channel are the keys to the dictionary. An error is generated if no phrase has been started. Example:

{('chan1','param1'):['OK'], ('chan1','param2'):['ERROR'], ..., ('chanM','paramN'):['OK']} get_status Message

Description

Gets status information from the instrument. The status values are instrument specific.

Arguments

A tuple list may be added so as to specify which status keys for which channels should be returned. To request status of all channels, use '*' for the channel name. To request status of the device itself, use and empty string ('') for the channel name. To return all status values, use '*' for the status key. Example:

[('', 'InstrumentState'), ('*', 'Alarms')]

461 Returns

CIDev:Success/fail where the success argument is a dictionary of the status keys for each of the keys. A tuple of channel name and key indexes the current value. Example:

{('', 'InstrumentState'):'Standby', ('chan1', 'Alarms'):('UnknownError','Something is very wrong', ('chan2', 'Alarms'):('UnknownError','Something is only sort of wrong')] set_data_destination Message

Description

When in Observatory mode and the instrument is streaming non-polled data, that data needs to be collected and sent to the instrument agent for publishing appropriately. This message accepts a messaging destination (process ID in the instrument agent perhaps?) argument that indicates where this data is to be sent.

Arguments

The messaging destination to use (a String)

Returns

CIDev:Success/Fail indicating that the value was accepted, but does not indicate if the destination actually accepts messages, as no test of the destination is made. execute_direct Message

Description

This message directly passes the payload of the message to the device. It is the agent's responsibility to determine what overall state the device is in and when these should be sent. The driver is to execute these whenever they are received.

Arguments

The block of data to pass to the instrument

Returns

CIDev:Success/fail

Common material

Success/Fail

Success message contents are a list with the string "OK" and an optional series of explanation/argument values that are applicable to the operation being reported upon. For example:

["OK", "ExampleID123", (lat, lon)]

Failure messages contents are a list with the string "ERROR" and a string describing the type of failure. For example:

["ERROR", "This is a bad error."]

Instrument Drivers

TODO: Fill in text about this page

Current versions of existing drivers are documented as children of this page.

Generic Instrument Driver

Replace with link to generic driver definition

462 Observatory Interfaces and Drivers

SIAM Interface

Replace with link to SIAM driver definitions

Instrument Drivers

Add links to drivers

CIAD SA OV Instrument Life Cycle

Figure 1 below shows the common life cycle of any device managed by an instrument or platform agent. This is a specialization of the general resource life cycle.

Figure 1. Instrument/platform device life cycle

The Direct Access Mode and the Observatory mode are mutually exclusive for a device. The direct access service is managing direct access sessions.

CIAD SA OV Instrument Management

The Instrument Management services provide the uniform management and control of instruments. This enables the integration of instruments within observatories. In particular, instrument management services provide the following capabilities:

463 Activate an instrument Register a new instrument in the system Register all data products for an instrument, from unprocessed to processed and qualified Controlling instruments Manage the mode switch between direct access mode and observatory mode' Get the internal state of an instrument Change the internal state and configuration of an instrument Execute a command with an instrument Start and stop an instrument agent

Behavior

Figure 1 shows how to command an Instrument through its Instrument Agent

464 Figure 1. Commanding an Instrument through its Instrument Agent (OV-6)

Figure 2 shows how an Instrument Agent is registering itself and the instrument with the Agent Registry Service (COI).

465 Figure 2. Registration of the Instrument Agent in the Agent Registry (OV-6)

Visit/Create Discussion Page

Material Covered

After reading this page, you should be able to answer the following questions:

What are principal activities of instrument management? Is an agent created first, or registered first? (Why?) Which is registered first, an agent or the resource for which it serves? (Why?)

CIAD SA OV Marine Facility

The Marine facility is based on a COI facility. It represents an observatory and its physical and information resources and can form agreements, for instance with other marine facilities. CG and RSN are two primary marine facilities. The marine facility will be implemented in release 2.

The Marine Facility is represented in the External Interfaces diagram.

CIAD SA OV Marine Platform Services

The Marine Platform services are elaborated in Release 2. They are based on the Instrument Management services and the Instrument Agent Architecture

CIAD SA OV Marine Resource Scheduling

466 The Marine Resource Scheduling services are elaborated in Release 3. They are based on the Observatory Management, Marine Facility and Instrument Management services

CIAD SA OV Observatory Management

Note: Most of the observatory management services are not implemented in release 1 of the ION.

Decomposition

Figure 1 shows the high level decomposition of observatory management.

467 Figure 1. Observatory management decomposition (OV-1)

Figure 2 shows observatory management services. These include Direct Access Management, the CIAD SA OV Instrument Management, Marine Resource Scheduling Services, Data QA/QC services and Data Calibration and Validation services.

468 Figure 2. Observatory Management Services (OV-2)

Material Covered

After reading this page, you should be able to answer the following questions:

(to be provided)

Visit/Create Discussion Page

CIAD SA OV User Interfaces

Figure 1 shows all the user interfaces (views) in the scope of the Sensing & Acquisition subsystem. Note that most of these interfaces will be implemented on a provisional basis only in release 1.

469 Figure 1 S&A User interfaces (OV-2)

Table 1 User Interface and User Application Support (incomplete)

ID User Interface Supported User Applications and Purpose

SAUI1 Observatory Shows the state of health and related information for the observatory with its infrastructure and instrumentation Operations resources. Console Observatory operations include management of power, bandwidth and processing for observatory resources and aggregated representations in single screen operation views.|

SAUI2 Instrument Interface for instrument providers to bring their instruments online by going through all activation, calibration, dry and Management wet testing and interfacing concerns and Activation

SAUI3 Data Configures the steps that are applied to all data coming from the instrument until hand off to the dynamic data Acquisition distribution service. This includes automatic transformation, filtering, and grouping processes, as well as temporary Definition buffering and segmentation strategies for real-time and locally stored data streams. Interface

SAUI4 Data Interface to define adapters to perform data acquisition for external data sources. Acquisition Interface

SAUI5 Observatory Interface to the operations and management system operated by marine observatories for managing their physical Management infrastructure. System Interface

CIAD SA SV Instrument Development Kit

Instrument Test and Certification Facility

470 (not in the scope of release 1)

Figure 1 depicts the Instrument Test and Certification Facility as one specific installation site of the OOI integrated observatory network, including a specific CI capability container configuration that provides access to all CI services and resources, designated for system testing of instrumentation and of supporting infrastructure. The Instrument Test and Certification Facility includes Marine Specific System Test Facilities for wet testing of sensors and marine observatory infrastructure, with special configurations for the RSN and CGSN observatories. Furthermore it includes a Logical Test Facility Workbench for dry testing of instrument hardware in an integrated network setting together with their instrument agents, drivers and software integration. Access and management portals provide interactive access to the OOI operators and instrument providers.

Figure 1 Instrument Test Kit (SV-1)

CIAD SA SV Instrument Driver Framework

Figure 1 shows the internal structure of the generic instrument driver framework. It contains abstract representations of ports, protocols, information state, commands, events and a finite state machine (FSM) representation of the instrument. All six different concerns are separated and independent of one another. Specific implementations for each category, such as different drivers for communication ports (RS-242, Ethernet, etc) and communication protocols (vendor specific communication command sets) exist and can be added, providing a flexible extendible instrument adapter implementation framework and rich library of components to draw upon.

Any specific instrument agent instance will be able to extend a template for the applicable instrument class and provide the integration and configuration needed to support the specific instrument type. This framework supports significant reuse and results in high synergy when creating instrument adapters for each of the approx. 49 core types of instruments of the OOI. A similar strategy applies to platform agents.

471 Figure 1 Instrument Driver Implementation Framework (SV-1)

Decomposition

Figure 2 shows the internal setup of an instrument driver. There are several variants of instrument driver implementations. One of them being the instrument driver framework developed by OOI CI (which does not have to be a fully new development but can be based on an existing middleware/framework). Specific driver instances are based on this framework and detail out composition of various modules to manage comm ports, command protocols, internal state, etc.

The Command Protocol provides an abstraction of a specific command language for the interaction with a physical instrument. For serial instruments, this is a specific dialect of command/response codes. The Communication Port provides abstraction of a specific communication port with a physical instrument, such as RS-232, Ethernet etc. The realization of Instrument Driver Middleware role include, but are not limited to: Antelope, MOOSDB, MBARI SIAM.

472 Figure 2. Instrument Driver Nodes (OV2)

CIAD SA SV Technology Mapping

Technology Mapping

See the Technology List

Data Structures and Models

The Sensing and Acquisition Subsystem will implement and contribute domain models via the COI for instruments, observations, plans, schedules, marine resources, allocation, and transducers, leveraging standards such as OGC Sensor Web Enablement (SWE) and IEEE 1451. The plug-and-play instrument support and remote control capabilities of the Software Infrastructure and Applications for Monterey Ocean Observing System SIAM will provide design references.

Interface Points

The Sensing and Acquisition Subsystem provides its services via the COI to all subsystems. However, major interaction points exist with the COI, the Planning and Prosecution subsystem and the Data Management subsystem. While the COI should largely mitigate end-chain communication technology limitations, it is clear that the pathways used for data acquisition and state-of-health monitoring for instruments must often share features with network-wide resource allocation, network state-of-health monitoring, and sometimes data Grid transmission capabilities. Thus, the Sensing and Acquisition Subsystem must at the very least peacefully coexist with features of network-wide communication from central observatory acquisition and control sites out to those of sensors. The Sensing and Acquisition Subsystem interacts heavily with the front-end of the Data Management Subsystem, and hence must partially drive the interface engineering.

CIAD SV Instrument Agent and Driver Integration Interfaces

Figure 1 illustrates implementation level interfaces related to Instrument Agents, their implementation and strategies to integrate various

473 instrument drivers.

Figure 1. Instrument Agent related interfaces (OV-1)

CIAD AS Analysis and Synthesis

Analysis & Synthesis (A&S) Subsystem Architecture and Design

This is the central page for the AS subsystem architecture and design, a part of the OOI CI System. It is structured into operational views (OV), system views (SV) and technical standards views (TV).

AS OV Overview

Services and Service Components

Workflows Model Integration Data Analysis Interactive Analysis and Visualization

Crosscutting Topics

User Interfaces Technology Mapping

CIAD AS OV

The Analysis and Synthesis Subsystem Services are providing capabilities and user/application interfaces to support advanced and systematic data analysis and output synthesis applications. This includes the life cycle and operational management of community numerical ocean models, ensembles of models and the virtual ocean simulator framework as well as modeling activities (i.e., assimilation, analysis, evaluation) using observed and derived data products. Analysis and Synthesis provides a flexible scientific stream based workflow execution capability. Analysis and Synthesis services support event detection, data analysis and visualization by utilizing the workflow mechanism and providing specialized advanced support services for these activities. A&S also provides the virtual collaboration services used to create virtual observatories and classrooms that may provide interactive collaboration, analysis and synthesis workspaces.

Capabilities

The Analysis and Synthesis Subsystem will provide the following capabilities:

A scientific workflow definition, execution and control framework based on data streams, A data stream processing capability, including stream subscription, stream process scheduling, and stream process execution,

474 Support for and execution of advanced measurement processing services, including a measurement calculus and measurement semantic model, Support for and execution of data assimilation processes, Support for and execution of data analysis processes, such as event detection, Support for and execution of output synthesis processes, such as data visualization, transformation, Generation of derived data products from analysis and synthesis stream workflows, Numerical ocean model integration services, including on-demand modeling, data assimilation, assimilative modeling, ensemble model execution, Virtual collaboration services, enabling virtual observatories, laboratories and classrooms, User and application interfaces for interactive analysis and synthesis workspaces

Figure 1. Analysis and Synthesis Services Overview (OV-1)

Decomposition

Figure 2 shows the core Analysis and Synthesis Services Network with its operational nodes and needlines. Because of the integration architecture surrounding the Exchange as data and message distribution network, all nodes are connected to it to receive and produce information. Operational nodes that are frameworks enable the plug-in of user-provided processes, workflows, applications, and tools to perform the designated functions.

475 Figure 2. 2850-00001 Analysis and Synthesis Operational Nodes and Needlines (OV-2)

The Analysis and Synthesis operational node provides the core services needed for the analysis of observations and their synthesis into derived data products and graphical visualizations. It interfaces to science and education users, such as investigators and experts that provide input, experience, and oversight for analysis and synthesis activities. Users access the Analysis and Synthesis services through the Interactive Analysis and Visualization services, and interact with the Event Detection Framework and the Data Assimilation and Model Integration Framework. Users accessing the service network may be distinct for different nodes. For example, an investigator user who is central to Analysis and Synthesis may be a scientist or educator, while for the Event Detection Framework it may be an engineer or statistician. In each case, the user provides specifications and rules, process definitions, and key decisions to and receives refined datasets, analysis and visualizations from the relevant node.

The Data Assimilation and Model Integration Framework and the Event Detection Framework represent archetypical activities carried out in an analysis and synthesis effort. In this context, the nodes are activity centers that may be either ongoing or one-time in form. The Data Assimilation and Model Integration Framework node hosts prognostic and retrospective numerical models of observed processes and events, usually involving assimilation of real-time or retrospective data from the Instrument operational node or data repositories, respectively. It receives specifications to define numerical experiments along with qualified data from repositories. It may also receive real-time data that has not been subjected to QA/QC directly from the Instrument node through the Exchange. The Data Assimilation and Model Integration Framework node publishes model products and their descriptors to which the interested users can subscribe via the Exchange.

The Event Detection Framework node is operating as a filter on real-time or retrospective data to provide detected and classified events as a product. The Event Detection Framework node receives process specifications to establish trigger conditions, definitions and patterns for events and qualified data or model products from the Data Management services via the Exchange. It may also receive real-time data directly from the Instrument node via the Exchange. It provides topic-based identified events and patterns to the Data Management node.

The Workflow Management node provides support for the definition, integration and enactment of user defined workflows. Such workflows can be defined for various purposes, where CI processing is required, ranging from human-in-the-loop data QA/QC, to event detection and model integration. The workflow management node provides a workflow engine based on the process execution services defined in the Common

476 Execution Infrastructure (CEI).

The Virtual Collaboration Management node provides the foundation for the definition of virtual observatories, laboratories and classrooms. This enables a virtual collaboration of multiple individuals in distributed locations in the setting of a defined project using the same set of physical and virtual resources. A virtual collaboration environment can be provided through the Workspace and Presentation Platform node. Capabilities include definition of a project by a project lead role, invitation of individuals to collaboration, selection of the resources available to all project members, management of project membership and resource use policy, use of data processing, analysis and visualization tools, the capability to define workflows and the publication of results to the public. The Virtual Collaboration Management node is based on the facility framework provided by the Governance Framework in the Common Operating Infrastructure (COI).

Work Products

The Work Products provided by this subsystem are:

Table 1 Work Products

ID Service Explanation

1.2.3.2 Analysis The subsystem responsible for providing the life cycle and operational management of community models, and ensembles of models and the virtual ocean simulator as well as modeling activities (i.e., assimilation, analysis, Synthesis evaluation) using observed and derived data products.

1.2.3.2.1 Laboratory Building on the capabilities of the Observatory Facility, provides services to organize, manage, and control research and and educational activities, the resources they use, and the participants involved. It is the virtual home where research Classroom teams gather their resources, carry out their objectives, and collect their results. It belongs to an individual or a group. Facility It provides the group management tools to facilitate membership and collaborations and to assign roles and Services responsibilities.

1.2.3.2.2 Data Provides a generalized analysis and synthesis framework for transforming, analyzing, and visualizing data through Analysis the application of user and community developed processes. and Visualization Services

1.2.3.2.3 Event Provides services to register processes to detect and publish events from data streams. Events are automatically Detection persisted and distributed based on the configuration set for the detector. Services

1.2.3.2.4 Model Maintains a hierarchy of evolving interdisciplinary models (e.g. from 'reduced' process-oriented models to operational Catalog and forecast systems). Supports the registration and dissemination of model data sets. An initial set of community-based Repository numerical ocean models, such as Regional Ocean Modeling System (ROMS) and the Harvard Ocean Prediction Services System (HOPS), will be introduced.

1.2.3.2.5 Modeling Provides ocean modeling network services for access to multiple community-based numerical ocean models for Services parameter estimation/optimization and data assimilation. Provides the services to construct, modify, and execute numerical ocean models with command and control services for their operation and management. It provides a Virtual Measurement Sampling service to drive virtual instruments and/or virtual data acquisition processes. Services support: multiple models used in ensemble techniques, uncertainty and error estimation, and adaptive multi domain 2-way nested configurations for generating dynamical interpolation of data sets, data assimilation, reanalyzes (hindcasts), nowcasts, and forecasts.

1.2.3.2.6 Model Provides testing and validation services to ensure conformity with the different operational requirements in the Activation network. Services

1.2.3.2.7 Virtual Provides services to interact with the ocean through a simulator producing virtual ocean fields updated on a daily Ocean basis covering all three observatory types. The simulator involves on the order of twenty tracers including four Simulator physical variables (temperature, salinity, zonal and meridional current), a dozen biogeochemical variables (silicate, Framework nitrate, phytoplankton, ammonium, two phytoplankton groups, two zooplankton grazers, two detrital pools, DIC, and oxygen), and four more tracers of interests (e.g., tracers from hydrothermal event plumes).

CIAD AS OV Data Analysis

477 CIAD AS OV Interactive Analysis

Interactive Analysis and Visualization

The Interactive Analysis and Visualization node defines the approach for analysis tasks, and strongly emphasizes the visualization interface to users such as investigators and experts. It may have multiple instantiations within the Analysis and Synthesis nodes. Figure 1 refines the operational node for interactive analysis and visualization as part of the Analysis and Synthesis services and shows its decomposition.

The main activities carried out within this node are analysis and visualization as depicted in the Analysis and Visualization Application Framework nodes. These connect to the Workspace and Presentation Platform and have flexible and powerful interfaces to users who provide expertise for both algorithm development and investigational purposes. The Analysis Application Framework harnesses computational resources to reduce, assimilate, and model data and produce data products for further analysis and interpretation, usually by one or more investigator users inside a virtual laboratory environment. It receives analysis algorithms and data from the Data Management Services Network and analysis instructions from investigator users via the Workspace and Presentation Platform node. It provides data products (e.g., process characterizations and correlations) to the Data Management Services Network and algorithm information and data products to the Visualization Application Framework for representation to the Workspace and Presentation Platform. The analysis application framework can draw upon workflows defined by the workflow framework node.

The Visualization Application Framework node provides the services to define, generate and manage visual representations of data and data products. It receives visualization paradigms from the Science Data Management Services Network, algorithm information and data products from the Analysis Application Framework node, and presentation context information for the specific Workspace and Presentation Platform node in use. The Visualization Application Framework provides visualization context information to the Data Management Services Network to discover

478 an appropriate set of representation paradigms, and renders data product representations based on one of these paradigms to the Workspace and Presentation Platform node.

Figure 1. Interactive Analysis and Visualization Overview (OV-1)

The Workspace and Presentation Platform is the interface between the Cyberinfrastructure and investigator users, and contains functional capabilities as well as user interface elements to support interactive visualization of data and data products. It provides instructions to the Analysis Application Framework node to manage and control modeling and analysis processes, contextual information (e.g., geospatial bounds, view classification such as textual, 2D, 3D and display paradigm) to the Visualization Application Framework node, and navigation and selection information to the Information Distribution Services Network to obtain resources and their descriptors. It receives data and model product representations from the Visualization Application Framework for display.

Analysis expert users author or register scientific, numerical and statistical algorithms that are evaluated by the Process Definition Validator node. Such users publish analysis algorithms scoped to their domain of applicability. The Visualization expert users author or register visualization algorithms in the same manner.

The Workspace and Presentation platform is also the primary interface for investigator users to author and register observation requests. Such observation requests are subject to evaluation by the Observation Request Validator and are then made available to an Observation Plan Repository via the Exchange. From there, detected environmental events can trigger the execution of Observation Plans via the event response services of the Planning and Prosecution Services Network.

479 Figure 2. 2850-00002 Interactive Analysis and Visualization Operational Nodes and Needlines (OV-2)

Behavior Model

Interactive Analysis

480 Workflows are integrated in OOI in several ways depending if the data, execution, or orchestration are done inside or outside OOI. In the following we give some examples of scenarios:

Integration through a data interface - a user has his/her own application, gets the data stream from OOI, performs the computing on the local platform, and sends back the results to OOI User gets the script and the data from OOI and runs it on the local machine User gets the script and data from OOI and requests OOI to run it A user provides an applications and another user executes it User has data that he wants to integrate with OOI data, but still runs the application outside OOI User has data that he wants to integrate with OOI data, and runs it as an OOI application User goes to OOI and grabs some specialized module to run in selected applications/environments such as Matlab or Kepler

This section focuses on the use case of a science user who is working with Matlab to manipulate the oceanographic data. We refer to the scenario 2 of the previous list: user gets the script and the data from OOI and runs it on the local machine. Therefore, the steps of the process are: the user selects the region of interest, downloads a Matlab script, loads the script on the local machine, and runs it.

Figure 3 depicts the interactions between the user and the CI Data Portal and Data Management Services to select the region of interest for the data. The user performs several data queries and visualizes the obtained data, until he finds the data he is interested in.

Figure 3. User selects the region of interest (OV-6)

Figure 4 shows the interactions between the user and the CI Application Selection Portal to find and download an available Matlab script. The user queries for applications and the metadata associated with them to identify the script that does what he is interested in. After downloading the script, the user loads it on the local machine and configures it (see Figure 5).

481 Figure 4. User downloads the script (OV-6)

Figure 5. User loads the script (OV-6)

Figure 6 shows the interactions between the user, Matlab, and the CI Data Management Services when running the script. The script obtains the data sources from OOI, prepares them to fit the input of the script, processes them, and ingests the results in OOI, updating also the provenance information. At the end, the user provides the feedback for post processing the results.

482 Figure 6. User runs the analysis script (OV-6)

Figure 7. Workflow (OV-6)

CIAD AS OV Model Integration

483 CIAD AS OV User Interfaces

Table 1 User Interfaces and User Application Support

ID User Interface Supported User Applications and Purpose

ASUI1 Workspace and Enables interactivity with users such as investigators, data analysts and engineers for data and data product Presentation analysis and synthesis purposes. Enables the generation of data set presentation formats, such as Platform visualizations and dataset exports.

ASUI2 Analysis Application Application tool interface for user provided analysis tools and applications, such that they can be used Integration consistently with OOI datasets, data distribution and visualization capabilities. Framework

ASUI3 Visualization Application tool interface for user provided visualization tools and applications, such that they can be used Application consistently with OOI datasets, analysis tools and data processing capabilities. Integration Framework

ASUI4 Automated Enables the definition and modification of automated and human-in-the-loop data processing workflows and Processing processes, such as data QA/QC, model integration, and event response behavior. Interfaces

ASUI5 Observation Enables the definition of observation plans and their refinement based on validation and evaluation provided by Planning Interface the Integrated Observatory.

ASUI6 Process Definition Enables the authoring and registration of user provided processes in one of the formats supported by the CI. Interface Execution engines exist for such formats. Users can provide source code and executables, and bind them to OOI data sources

CIAD AS OV Workflows

484 CIAD AS SV Technology Mapping

Table 1 lists the technologies and standards used for the implementation of this subsystem. An integration strategy is provided for each technology and standard. For further details, refer to Section 5.1.

Table 1 Technology Mapping

485 Service Technology Integration Strategy

Analysis Matlab Will be a delivered, supported process execution environment Application/Language

Workflow Execution Kepler The CI will embed GUI and execution environment; treatment similar to Matlab Engine

Meta-Workflow Pegasus Used as workflow engine and resource mapper Resource Mapper

Visualization Toolkit VTK Will be a delivered, supported execution environment of the CI for visualization code. User provided VTK code can be embedded within an execution env on the CI

Graphics Engine OpenSceneGraph Will be a delivered, supported execution environment of the CI for visualization, integrated with VTK. Expectation: interchange formats are DAP for data and Collada for graphics

Visualization IDV Can be interfaced with the CI through a DAP interface. Expected to work out of the box. Supported Application presentation tool.

Spatial Display OSSIM Planet Will be interfaced with the CI through a Colada interface for the viewing-geometry; for the control Application interface to a representation engine based on OpenSceneGraph and VTK

Spatial Display GoogleEarth and Will be interfaced with the CI through a KML and OGC interface (geom for viewing); control Application Maps interface to a representation engine based on OpenSceneGraph and VTK

The development effort for the Analysis and Synthesis Subsystem will provide the framework to integrate community-based numerical ocean models such as the Regional Ocean Modeling System (ROMS) and the Harvard Ocean Prediction System (HOPS). Its modeling and simulation capabilities will leverage existing and emerging data assimilation modules based on the variational method (e.g., 3DVAR or 4DVAR) or Kalman Filter (KF).

OpenSceneGraph is an open source high performance 3D graphics toolkit, used by application developers in fields such as visual simulation, computer games, virtual reality, scientific visualization and modeling.

OSSIMPlanet is built on top of OSSIM (Open Source Software Image Map) with OpenSceneGraph capabilities and is a rapidly evolving project for accurate, high performance, 3D Geospatial visualization. It automatically intersects with DTED (Digital Terrain Evaluation Data) and/or SRTM (Shuttle Radar Topography Mission) elevation trees for topography and natively handles a wide range of commercial and government file formats.

The Visualization Toolkit is an open source graphics toolkit. It is a platform independent graphics engine with parallel rendering support. VTK has an active development community that includes laboratories, institutions and universities from around the world.

COLLADA establishes an interchange file format for interactive 3D applications. COLLADA defines an open standard extensible markup language (XML) schema for exchanging digital assets among various graphics software applications that might otherwise store their assets in incompatible formats. COLLADA documents that describe digital assets are XML files, usually identified with a .dae (digital asset exchange) .

CIAD PP Planning & Prosecution

Planning & Prosecution (PP) Subsystem Architecture and Design

This is the central page for the PP subsystem architecture and design, a part of the OOI CI System. It is structured into operational views (OV), system views (SV) and technical standards views (TV).

PP OV Overview

Services and Service Components

Resource Planning Technologies: ASPEN-CASPER

Mission Planning and Execution

Instrument Interactivity

Autonomous System Control Technologies: MOOS , MOOS-IvP

Crosscutting Topics

User Interfaces Technology Mapping

486 Deployment Scenarios (OSSE)

CIAD PP OV

The Planning and Prosecution (PP) Services Network will provide the services together with the standard models for the management of stateful and taskable resources. It provides controller processes with the semantics to monitor and control the operating state of an active resource as well as to initiate, monitor and amend tasks being carried out by a taskable resource. The managed resource is required to present its operational state and declare its governance context. Central applications of the Planning and Prosecution services network are to provide observatory and observation mission (campaign) planning and prosecution. Such activities include to carry out simultaneous coordinated multi-objective observations across the resources of the observatory. Further activities include event-response behaviors and the interface with autonomous vehicle resources.

Capabilities

The Planning and Prosecution subsystem provides the following capabilities:

Command, control and monitor semantics to operate and manage a (stateful and taskable) resource, Time-structured, concurrent coordination and prioritization of shared resources that are distributed and constrained, Provisioning of a behavior-based architecture for rapidly reconfigurable autonomous task execution, Unique multi-objective optimization of behavior coordination, allowing for effective compromise to be attained between periodically competing task objectives for a collection of resources, Provisioning of a behavior calculus, allowing sequences of task states to be structured for long-term, persistent plans while remaining highly reactive to events and in situ control requests, Autonomous robust execution of observation plans on fixed and mobile intermittently connected instrument platforms, Defining, storing and managing observation plans and event response behaviors

Decomposition

The planning and prosecution services are multi-purpose, ranging from development and execution of observational plans to the control of internal CI resources, including computational resources. See Figure 1.

Figure 1. Planning and Prosecution Illustration (OV-1)

487 Figure 2. 2880-00001 Planning and Prosecution Operational Nodes and Needlines (OV-2)

The Interactive Observatory Facility is provides the services to design, assemble, and operate configurations of resources from across the OOI into unique systems for planning, testing and prosecuting observation requests, leveraging the nested and autonomous capabilities of the fully integrated network of sensing, modeling, and control resources. It provides experimentalists with services to define, compose, and schedule multi-instrument observations that can execute across the observatory. It is based on the services provided by the Analysis and Synthesis SN for virtual collaboration management and the interactive workspace.

The Event Response Framework node provides automated and expert (i.e., usually involving user intervention) review of events and processes that may result in responsive tasking or retasking of instrument and mobile resources and thus provides observation requests to the Resource Planner. It subscribes to rules governing resource usage and information on processes and events from the Data Management SN.

The Resource Planner is a solver that requires a resource constraint model as its basis. The resource constraint model represents resources abstractly through their state condition as well as activities that can be performed on the resources. For instance, an AUV's state could consist of the battery charge condition, position, depth, and speed/energy profile. Activities for an AUV could include change position and depth or change to different behavior, such is "return-to-base" from "loitering". The resource constraint model is entirely at the discretion of the resource provider or a resource integrator overseeing multiple resources. The resource planner acts as a constraint solver that takes a resource request as input and results in a resource use plan as output. In addition the responsibility of the resource planner is to negotiate resource use with individual resources and their respective stakeholders. The resulting service agreements and the resource use plan are the resulting envelope for the operation and control exercised by the resource use controller.

The Plan Repository is a repository instance based on Data Management services for storing and managing observation plans and related information, such as parameterization possibilities, resource (re)configuration, and autonomous behaviors. Observation plans in the repository can serve as templates that are modified when events response behavior is executed.

The Resource Use Controller operates in the framework that the resource planner has determined. The resource use controller takes as input a resource use plan and a service agreement and creates as output resource use plans on its local level of control or specific resource commands

488 to trigger state change and activities in resources. In addition, the resource controller can defer the execution of resource use plans within a service agreement to a nested resource planner on a lower level that itself can break down the plan to the next level resources and make the respective resource agreements.

The Fault Monitor is a distinct separate component analyzing and overseeing resource status, providing fault analysis input to the resource use controller, which in turn might revise the plan within its local service agreement or get back to the resource planner for replanning.

Work Products

The Work Products provided by this subsystem are:

Table 1 Work Products

ID Service Explanation Release

1.2.3.3 Planning The subsystem responsible for providing the mission and campaign planning and prosecution (execution R3, R4 and through completion) activities associated with carrying out simultaneous coordinated multi-objective Prosecution observations across the resources of the observatory.

1.2.3.3.1 Interactive Building on the capabilities of the Observatory Facility, provides the services to design, assemble, and R4 Observatory operate configurations of resources from across the OOI into unique systems for planning, testing and Facility prosecuting observation requests, leveraging the nested and autonomous capabilities of the fully Services integrated network of sensing, modeling, and control resources. Provides experimentalists with services to define, compose, and schedule multi-instrument observations that can execute across the observatory. As an example of a simple observation statement: on event "X" provide a CTD and a current profile of region "Y" using gliders "A, B, C" in configuration "Z" using behavior scenario "W".

1.2.3.3.2 Event Provides services for policy and behavior based reconfiguration of tasks and observational programs. R3 Response Provides a nested communication, command, and control architecture that enables and supports the Services deployment and prosecution, fully autonomously or under operator control, of new missions, processes and behaviors, in parallel to and without interruption of prior platform objectives.

1.2.3.3.3 Portable Provides a portable, platform-generic higher-level control software package based on the public-domain R3, R4 Control MOOS mission control software that can run natively on fixed observatory assets, and for download and Software implementation into platforms such as gliders and AUV's operated in the observatory. The software provides a standard communication, command, and control connectivity with the overall OOI CI, and a standard NMEA interface to native control software on the platforms.

1.2.3.3.4 Mission Provides and maintains platform specifications, planning elements, and plan and behavior modules for a R3 Catalog and variety of multi-objective ocean observation missions, such as the capture of a coastal upwelling event. A Repository representative set of plan and behavior modules that adhere to a full Boolean logic precondition language Services for generically-conditioned autonomy actions will be introduced.

1.2.3.3.5 Planning Provides software tools and user interfaces for the scientist defining a set of states for each fixed or mobile R3, R4 Services node involved in a planned experiment or observation campaign, and to design the associated, conditional state transitions, forming the basis for defining the behavior algebra (language) necessary to complete a predetermined, as well as autonomously adaptive sensing task.

1.2.3.3.6 Mission Provides standard safety procedures protecting the fixed or mobile assets that could be damaged through R4 Coordination improper use by inexperienced operators, such as collision control for multiple AUVs and assurances of Services depth limits for sensor packages.

1.2.3.3.7 Mission Provides a complete mission simulation capability for pre-deployment planning and testing of specific R4 Simulator measurement campaigns. Seamlessly linked to the OOI Virtual Ocean Simulator, this enables comprehensive testing of predetermined as well as adaptive missions, such as the capture and measurement of a rapidly developing coastal front or a subsea volcanic eruption.

CIAD PP OV Autonomous System Control

Autonomous system control

Autonomous systems, such as AUVs, provide resource environments that operate under significant constraints and to various degrees need to operate autonomously. In case of intermittent, high latency and low bandwidth communications, a local smart executive providing local resource planning and behavior control is essential. The CI provides the respective services. See Figure 1.

489 Figure 1. Autonomous Control Illustration (OV-1)

490 Figure 2. 2880-00002 Autonomous Control Operational Nodes and Needlines (OV-2)

The Autonomous System Interface is a connection point to local resources, controllers and processes aboard an autonomous resource platform such as an AUV or a satellite-connected global mooring. It is a gateway to local processes that are not under direct control of the CI.

An Autonomous Controlled Process is any process representing capability that can be interfaced with. In particular, this targets the MOOS communication middleware for interfacing with embedded autonomous vehicle control processes, including sensor adapters, navigation and behavior modules.

The Autonomous Behavior Controller represents a specific autonomous vehicle behavior module that can be controlled through the autonomous system interface. In particular, this targets MOOS-IvP Helm with its IvP-Solver that provides vehicle motion control.

491 Figure 3. 2880-00010 Autonomous Control Pattern (OV-2)

CIAD PP OV Instrument Interactivity

3.4.1.4 Instrument Interactivity

3.4.1.4.1 Instrument control for Executing an Observation Plan

The Platform Agent receives observation plans from the Planning and Control Services and commands the Instrument Agents (for instruments associated to that particular deployment platform) to perform the actions (e.g., data acquisition) according to the plan. OOI has two classes of controlled resources: one class consists of autonomous indirectly controlled resources (such as gliders and AUVs), and the other class consists of direct controlled and commanded online resources (such as sensors on the cabled network).

Figure 3.4.1.4.1-1 shows the execution of a plan for the case of a glider platform, whereas Figure 3.4.1.4.1-2 shows the much simpler case of a moored buoy platform. A glider is autonomous, and receives a plan to move to a particular location (e.g., 10ft along 080) and then to start the data acquisition process. The Mission Planner provides a plan fragment, the Platform Agent issues commands to Instrument Agents, and the Instrument Supervisor provides events (e.g., when it reached 10ft).

492 Figure 3.4.1.4.1-1 Executing an observation plan for an autonomous resource (OV-6)

For a controllable resource (Figure 3.4.1.4.1-2), the Platform Agent has different modes, and the Mission Planning and Control triggers mode changes at appropriate times (as part of the plan). For example, moored buoys are static and accept commands to change the acquisition mode.

Figure 3.4.1.4.1-2 Executing an observation plan for a direct-controlled resource (OV-6)

The previous two figures show the case where execution of the plan works well. However, there are instances where execution could fail and require rescheduling. Figure 3.4.1.4.1-3 shows the case of a glider that is approaching a collision with another unit. The Instrument Supervisor monitors the state of the glider and alerts the Platform Agent that it is about to collide. The Platform Agent re-plans the activities and sends a command to move the glider. When the Instrument Supervisor detects a safe state of operation, the Platform Agent resumes the normal observation plan.

493 Figure 3.4.1.4.1-3 Executing an observation plan for a glider and detecting a collision (OV-6)

CIAD PP OV Mission Execution

494 Figure 1. 2880-00006 Define Mission Plan (OV-6)

Figure 1. 2880-00007 Execute Mission Plan with Direct Commands (OV-6)

495 Figure 1. 2880-00008 Execute Mission Plan with Smart Executive (OV-6)

496 Figure 1. 2880-00009 Failure to Execute Mission Plan Autonomously (OV-6)

CIAD PP OV Resource Planning

Domain Models

Figure 3.3.5.3-1 shows the domain model specifying the dependencies between service requestors and service providers when negotiating a service agreement through a proposal process.

Both Service Requestor and Service Provider are specializations of a Negotiating Party. The goal of the negotiation involving exactly one representative of each party is a Contract defining a Service Agreement. This agreement covers Commitments for each party and defines Policy that is the result of the agreement.

The negotiation process involves submitting a series of Service Agreement Proposals, first issued by the service requestor and responded by the potential future service provider. Both proposals are different and reference the Requested Resource and Activity together with Requested Parameters for the resource use. The Offered Constraints (Bid) define the range of conditions, for instance, time windows for a resource use, associated cost, that apply to the proposal. The negotiation continues iteratively with refined proposals until an overlap in requested resources, parameters and offered constraints is found or any of the parties decides to abort the negotiation (for example, if the conditions are not favorable to it). When an agreement is reached, a contract results.

497 Figure 3.3.5.3-1 Service Agreement Proposal Domain Model (OV-7)

Figure 3.3.5.3-2 shows the domain model for resource planning and control. The Resource Planner requires a Resource Constraint Model \ to satisfy specific resource planning requests. It acts as a constraint solver based on the resource constraint model for the input provided in a Resource Request.

Figure3.3.5.3-2 Resource Planning and Control Domain Model (OV-7)

The Resource Constraint Model defines representations for State, Constraints, Resources, and Activities . Resources are representations of physical capability. Resources have a state that can be manipulated through activities. Constraints encode the rules and conditions that apply to manipulating a resource and to performing an activity. For instance, the battery state of an instrument can be represented as a resource, where the state represents the actual charge value. Possible activities are the recharge of the battery or switching off a consumer device. Constraints indicate that all non-essential consumer devices need to be switched off when the battery charge state falls below 10%. The battery charge activity needs to be stopped when the battery charge state reaches 100%. The flexibility enabled by the resource planner is directly dependent on the encoding of the real world environment and the resources within the model.

The resource planner develops a solution to a resource request subject to constraints. This process operates within existing Service Agreements specifying terms and conditions for resource use. The resource planner can negotiate further service agreements with Resource Providersfor Resource use in order to satisfy and optimize the resource request. Most of the time, these negotiations are refinements covered by higher-level envelope service agreements.

The resource planner develops Resource Use Plans that specify an orchestration (arrangement) of Activities applying to resources. Through the Resource Use Plan and the Service Agreement, the Resource Planner defines the frame of operation for the Resource Use Controller, which in turn issues Commands to Resources that can modify resource state and lead to changing environmental and resource conditions. Such changes in conditions might be covered by the resource use plan and the related service agreements. If they are not covered or a failure condition occurs, a re-evaluation of the plan, a subsequent resource negotiation or re-planning might be required.

498 The Resource Use Controller can choose to delegate a part of the plan to a nested resource planner that acts within the envelope set by the service agreement. Thereby, a resource use plan can break the problem solving into sub-problems. The same holds true for the execution of the plan and the handling of any changes or error conditions related to the plan.

Observation requests and observation plans are specific, prominent instances of resource requests and resource use plans.

Behavior Models

3.4.1.3 Resource planning

3.4.1.3.1 Defining an Observation Plan

Figure 3.4.1.3.1-1 shows the sequence diagram for a scientist defining an observation plan with the help of the Mission Planning and Control services from the Planning and Prosecution subsystem. The Cyberinfrastructure offers the capabilities to query for plans so that the scientist doesn't have to start from scratch. He/she can obtain an existing plan and adapt it to his/her own needs. The Planning and Control services are responsible for checking all the constraints and automatically suggesting what is possible. The feedback loop with the users goes on as long as there are violations in the plan.

Figure 3.4.1.3.1-1 Scientist defining an observation plan (OV-6)

CIAD PP OV User Interfaces

Table 3.3.5.4-1 User Interfaces and User Application Support

ID User Supported User Applications and Purpose Interface

499 PPUI1 Resource Interface for the definition and modification of resource constraint models that are the basis for specific resource Constraint planners. Model Definition Interface

PPUI2 Interactive Enables the interactive planning of resource use and the optimization of activity plans through manual intervention. Resource Accounts for the high degree of human knowledge and expertise required in planning and controlling observation Planning missions and other resource use. Interface

PPUI3 Resource Enables the integration of user-provided resource planning and constraint solving tools. The OOI provides a general Planner purpose resource planner that in particular can handle observation planning and process execution planning. For other Integration resource use and planning purposes, specific tools can be adapted to and integrated so that they can be used Interface seamlessly in the Integrated Observatory environment.

CIAD PP SV Deployment Scenarios

4.2.4.3 Deployment Scenarios

A deployment scenario that will be used in the Observing System Simulation Experiment (OSSE) is shown as an overview in Figure 4.2.4.3-1 and in more detail regarding driver deployment in Figure 4.2.4.3-2. Coordination between the individual observatory elements will be done using the ASPEN/CASPER mission-planning tool. ASPEN is intended to run at a shore-based control station in batch mode. CASPER is the embedded version of the mission-planning tool that can be placed on mobile assets such as AUVs and gliders in order to support autonomous operations.

The ASPEN system will enable shore-based mission planning, producing an orchestration of activities that involve resources to be controlled. The CI will also provide services that encapsulate CASPER for continuous mission evaluation and replanning on remote assets, based on in input from the shore-side ASPEN mission plans that are provided whenever a shore connection is established. CASPER runs continuously to keeps track of temporary objective inconsistencies and risks or changing environmental and control conditions.

CASPER can complement autonomous vehicle navigation and decision making using the MOOS/IvP Helm system. MOOS-IvP is an existing MOOS application for autonomous behavior control of autonomous assets with many available configurable behaviors. It can be controlled by setting state variables through MOOSDB that result from ASPEN/CASPER mission plans. MOOS-DB provides the communication conduit between shore-based, external vehicle and on vehicle mission planning and control processes. The CI will provide an adapter to MOOS-DB.

On gliders and AUVs, a CyberPoP will be deployed to perform sensor data acquisition, buffering, processing for environmental condition detection and local (embedded) mission planning and control. The CI directs these MOOS-enabled systems to adaptively sample within their regional network and feed the data back into the OOI mission planning system.

The back-seat driver paradigm will be used to implement the CASPER MOOS-IvP high-level mission planning and autonomy system. All platform-specific control such as safety, actuator control, and navigation will be separate to high-level mission planning. The CASPER/MOOS-IvP combination will generate the desired depth, speed, and heading for a mobile asset based on the mission at hand (sent by CASPER) and navigation feedback. However, there can be platform-specific behaviors that will be active in the MOOS-IvP Helm. For example, when an AUV is towing an acoustic array, it is necessary to include vehicle behaviors such as MemoryTurnLimit behaviors that will avoid any maneuvers that might damage the array. In other words, MOOS-IvP will take these vehicle-specific behaviors into account when generating the desired waypoints. Furthermore, The IvP Helm behaviors can be dynamically configured via either a mission-controller or human communication, or inter-vehicle communication or the behaviors themselves can adjust their parameters based on sensor input.

Figure 4.2.4.3-2 shows the deployment of software drivers for the mission planning and control at the shore station and on the autonomous vehicle. In addition, it shows the integration with oceanographic models in ROMS. The modeling component will have the ability to post data to the mission planning and control as well as to post event notices. This capability enables the autonomous event-driven response.

500 Figure 4.2.4.3-1 OSSE Deployment Abstract (SV-1)

501 Figure 4.2.4.3-2 OSSE Deployment Scenario (SV-1)

CIAD PP SV Technology Mapping

Technology Mapping

Table 4.2.4.2-1 lists the technologies and standards used for the implementation of this subsystem. An integration strategy is provided for each technology and standard. For further details, refer to Section 5.1.

Table 4.2.4.2-1 Technology Mapping

Service Technology Integration Strategy

Resource Planner ASPEN Integral part of the CI design as resource planner; there will be an interface however, so that other resource planners can be integrated similarly, for different user purposes

Controller CASPER

Autonomous MOOS The CI will deliver this as functionality. It is included in the design as one layer of several for autonomous Comm Middleware control

Autonomous Helm MOOS-IvP The CI will deliver this as functionality. It is included in the design as one layer of several for autonomous Helm control

502 The deployment of the technical components of the Planning and Prosecution subsystem is predicated on the ESB implementation that is the basis of the capability container concept of the COI. This allows the Planning and Prosecution subsystem to have a federated presence across the CI. In particular, the management of state, execution scheduling and orchestration of taskable resources will be provisioned as state management and orchestration/process execution plug-ins of the ESB. Further techniques and implementation technologies on which the Planning and Prosecution subsystem is predicated include: Interval programming (IvP), a unique, new mathematical model for representing and solving multi-objective optimization problems for reconciling vehicle behaviors active during a mission; MOOS (Mission Oriented Operating Suite), an open source middleware for connecting software components on an autonomous platform; and IvP Helm, a behavior-based autonomy package using multi-objective optimization for behavior reconciliation, with a full Boolean logic behavior calculus and an interface to the MOOS middleware.

CIAD PP TV ASPEN-CASPER

5.1.2 ASPEN and CASPER

Typically sensor planning draws more from measurement selection or adaptive sampling, where one tries to select measurements from a set of possible measurements to maximize information gain or to minimize entropy/uncertainty in a given model. For an oceanographic application, this might be to minimize uncertainty in some physical or biological phenomena such as to minimize a squared uncertainty of the boundary of a thermal front, a current, an algal bloom, etc. Because the powerset of potential measurements is exponential, and exponential in a large number, various greedy or approximation methods are used. In the applications that we typically target, mission planning constraints are also central. This means that not only do we have to pick a "good" set of observations, we need to be able to generate an acquisition plan that makes those acquisitions and respects operations constraints (does not exceed power, data volume, etc.) and the laws of physics (e.g. travel times, etc.). These problems are generally solved iteratively in parallel by refining the observations (to improve expected knowledge gain) and the supporting activities (to make a plan executable/legal). For our first pass implementation we will cover observation plans that utilize campaign metaphors such as transects, grid-based mapping of regions, etc. This approach will get us an adequate solution and enable us to get feedback on the planning capabilities that will be best for the end user group. We will also be defining generic API's for path planners, resource modelers, and other specialized reasoning modules as these will vary from adaptation to adaptation.

ASPEN and CASPER are both at TRL 9 from the standpoint of operational usage for space autonomy and ground-space sensorweb applications. ASPEN has been used in ground-based setting on a number of space missions operationally including MAMM Smith et al. 2002, EO-1 Chien et al. 2005a, Three Corner Sat, and Orbital Express Chouinard et al. 2008. CASPER has been used operationally onboard the EO-1 Chien et al. 2005b and Three Corner Sat missions. ASPEN is being used operationally for space-ground sensorweb applications including linkages to numerous in situ sensing networks Chien et al. 2008.

From the standpoint of application to oceanographic sensorweb, scenarios have been worked out and modeled but not demonstrated in software with current TRL 4 and in progress with planned OSSE's to take the shore planning component to TRL 7-8.

References

S. Chien, B. Cichy, A. Davies, D. Tran, G. Rabideau, R. Castano, R. Sherwood, D. Mandl, S. Frye, S. Shulman, J. Jones, S. Grosvenor, "An Autonomous Earth Observing Sensorweb," IEEE Intelligent Systems, May-June 2005, pp. 16-24. Chien, R. Sherwood, D. Tran, B. Cichy, G. Rabideau, R. Castano, A. Davies, D. Mandl, S. Frye, B. Trout, S. Shulman, D. Boyer, "Using Autonomy Flight Software to Improve Science Return on Earth Observing One, Journal of Aerospace Computing, Information, and Communication, April 2005, AIAA. Chien, D. Tran, M. Johnston, A. Davies, R. Castano, G. Rabideau, B. Cichy, J. Doubleday, D. Pieri, L. Scarenbroich, S. Kedar, Y. Chao, D. Mandl, S. Frye, W. Song, P. Kyle, R. LaHusen, P. Cappelaere, "Lights Out Operations of a Space,Ground, Sensorweb," Space Operations 2008, Heidelberg, Germany, 2008, AIAA Press. C. Chouinard, R. Knight, G. Jones, D. Tran, D. Koblick, Automated and Adaptive Mission Planning for Orbital Express, Space Operations 2008, Heidelberg, Germany, 2008. B.D. Smith, B.E. Engelhardt, and D.H. Mutz,"The RADARSAT-MAMM Automated Mission Planner," AI Magazine, vol. 23, no. 2, 2002. pp. 25-36.

CIAD APP DoDAF Reference

DoDAF v1.5 Architecture Views and Products

The Department of Defense (DoD) Architecture Framework (DoDAF), Version 1.5 (References: DoDAF), defines a common approach for DoD architecture description development, presentation, and integration for both operations and processes.

Views and Products Developed

503 Figure 1. DoDAF views, comprising All Views (AV), - Operational Views (OV), System and Services Views (SV) and Technical Standards Views (TV)

In the following tables, a coarse overview of the content of each of the views defined by DoDAF is presented:

Table 1. Definition of DoDAF Architecture Framework Views

View Definition

All-Views There are some overarching aspects of an architecture that relate to all three views. These overarching aspects are captured in (AV) the All-Views (AV) products. The AV products give information pertinent to the entire architecture but do not represent a distinct view of it. AV products set the scope and context of the architecture. The scope includes the subject area and time frame. The setting in which the architecture exists comprises the interrelated conditions that compose the context for the architecture. These conditions include doctrine; tactics, techniques, and procedures; relevant goals and vision statements; concepts of operations (ConOps); scenarios; and environmental conditions.

Operational The OV is a description of the tasks and activities, operational elements, and information exchanges required to accomplish DoD View (OV) business processes. The OV contains graphical and textual products that comprise an identification of the operational nodes and elements, assigned tasks and activities, and information flows required between nodes. It defines the types of information exchanged, the frequency of exchange, which tasks and activities are supported by the information exchanges, and the nature of information exchanges.

Systems The SV is a set of graphical and textual products that describes systems and interconnections providing for, or supporting, DoD and functions, including business functions. The SV associates system resources with the OV. These system resources support the Services operational activities and facilitate the exchange of information among operational nodes. View (SV)

Technical The TV is the minimal set of rules governing the arrangement, interaction, and interdependence of system parts or elements. Its Standards purpose is to ensure that a system satisfies a specified set of operational requirements. The TV provides the technical systems View (TV) implementation guidelines upon which engineering specifications are based; common building blocks are established, and product lines are developed. The TV includes a collection of the technical standards, implementation conventions, standards options, rules, and criteria organized into profile(s) that govern systems and system elements for a given architecture.

Architecture Products

Table 2 lists the products defined by DoDAF version 1.5 together with an expected content summary.

Table 2. Architecture Products defined in the DoD Architecture Framework

Product Name Summary

AV-1 Overview and The Overview and Summary Information provides executive- level summary information in a consistent form that Summary allows quick reference and comparison among architectures. AV-1 includes assumptions, constraints, and Information limitations that may affect high-level decision processes involving the architecture.

AV-2 Integrated The Integrated Dictionary contains definitions of terms used in the given architecture. It consists of textual Dictionary definitions in the form of a glossary, a repository of architecture data, their taxonomies, and their metadata (i.e., data about architecture data), including metadata for tailored products, associated with the architecture products developed. Metadata are the architecture data types, possibly expressed in the form of a physical schema. In this document, architecture data types are referred to as architecture data elements.

504 OV-1 High Level The High- Level Operational Concept Graphic describes a mission and highlights main operational nodes (see Operational OV-2 definition) and interesting or unique aspects of operations. It provides a description of the interactions Concept Graphic between the subject architecture and its environment, and between the architecture and external systems. A textual description accompanying the graphic is crucial. Graphics alone are not sufficient for capturing the necessary architecture data.

OV-2 Operational Node The Operational Node Connectivity Description graphically depicts the operational nodes (or organizations) with Connectivity needlines between those nodes that indicate a need to exchange information. The graphic includes internal Description operational nodes (internal to the architecture) as well as external nodes.

OV-3 Operational The Operational Information Exchange Matrix details information exchanges and identifies "who exchanges Information what information, with whom, why the information is necessary, and how the information exchange must occur". Exchange There isn't a one-to-one mapping of OV-3 information exchanges to OV-2 needlines; rather, many individual Matrix information exchanges may be associated with one needline.

OV-4 Organizational The Organizational Relationships Chart illustrates the command structure or relationships (as opposed to Relationships relationships with respect to a business process flow) among human roles, organizations, or organization types Chart that are the key players in architecture.

OV-5 Operational The Operational Activity Model describes the operations that are normally conducted in the course of achieving Activity Model a mission or a business goal. It describes capabilities, operational activities (or tasks), input and output (I/O) flows between activities, and I/O flows to/from activities that are outside the scope of the architecture. High- level operational activities should trace to (are decompositions of) a Business Area, an Internal Line of Business, and/or a Business Sub-Function as published in OMB's Business Reference Model.

OV-6a Operational The Operational Rules Model specifies operational or business rules that are constraints on an enterprise, a Rule Model mission, operation, business or architecture.

OV-6b Operational State The Operational State Transition Description is a graphical method of describing how an operational node or Transition activity responds to various events by changing its state. The diagram represents the sets of events to which the Description architecture will respond (by taking an action to move to a new state) as a function of its current state. Each transition specifies an event and an action.

OV-6c Operational Event The Operational Event-Trace Description provides a time-ordered examination of the information exchanges Trace Description between participating operational nodes as a result of a particular scenario. Each event-trace diagram should have an accompanying description that defines the particular scenario or situation.

OV-7 Logical Data The Logical Data Model describes the structure of an architecture domain's system data types and the structural Model business process rules (defined in the architecture's Operational View) that govern the system data. It provides a definition of architecture domain data types, their attributes or characteristics, and their interrelationships.

SV-1 Systems Interface The Systems Interface Description depicts systems nodes and the systems resident at these nodes to support Description organization/human roles represented by operational nodes of the Operational Node Connectivity Description (OV-2). SV-1 also identifies the interfaces between systems and system nodes.

SV-2 Systems The Systems Communications Description depicts pertinent information about communications systems, Communication communications links, and communications networks. SV-2 documents the kinds of communications media that Description support the systems and implement their interfaces as described in SV-1. Thus, SV-2 shows the communications details of SV-1 interfaces that automate aspects of the needlines represented in OV-2.

SV-3 Systems-Systems The Systems-Systems Matrix provides detail on the interface characteristics described in SV-1 for the Matrix architecture, arranged in a matrix form.

SV-4 Systems The Systems Functionality Description documents system functional hierarchies and system functions, and the Functionalities system data flows between them. Although there is a correlation between Operational Activity Model (OV-5) or Description business-process hierarchies and the system functional hierarchy of SV-4, it need not be a one-to-one mapping, hence, the need for the Operational Activity to Systems Function Traceability Matrix (SV-5), which provides that mapping.

SV-5 Operational Operational Activity to Systems Function Traceability Matrix is a specification of the relationships between the Activities to set of operational activities applicable to architecture and the set of system functions applicable to that System architecture. Functionalities Traceability Matrix

SV-6 Systems Data The Systems Data Exchange Matrix specifies the characteristics of the system data exchanged between Exchange Matrix systems. This product focuses on automated information exchanges (from OV-3) that are implemented in systems. Non-automated information exchanges, such as verbal orders, are captured in the OV products only.

SV-7 Systems The Systems Performance Parameters Matrix product specifies the quantitative characteristics of systems and Performance system hardware/software items, their interfaces (system data carried by the interface as well as Parameters communications link details that implement the interface) and their functions. It specifies the current Matrix performance parameters of each system, interface, or system function and the expected or required performance parameters at specified times in the future.

505 SV-8 Systems The Systems Evolution Description captures evolution plans that describe how the system or the architecture, in Evolution which the system is embedded, will evolve over a lengthy period of time. Generally, the timeline milestones are Description critical for a successful understanding of the evolution timeline.

SV-9 Systems The Systems Technology Forecast defines the underlying current and expected supporting technologies. It is Technology not expected to include predictions of technologies as with a crystal ball. Expected supporting technologies are Forecasts those that can be reasonably forecast given the current state of technology and expected improvements. New technologies should be tied to specific time periods, which can correlate against the time periods used in SV-8 milestones.

SV-10a Systems Rules Systems rules are constraints on architecture, on a system(s), or system hardware/software item(s), and/or on a Model system function(s). While other SV products (e.g., SV-1, SV-2, SV-4, SV-11) describe the static structure of the Systems View (i.e., what the systems can do), they do not describe, for the most part, what the systems must do, or what it cannot do.

SV-10b Systems State The Systems State Transition Description is a graphical method of describing a system (or system function) Transitions response to various events by changing its state. The diagram basically represents the sets of events to which Description the systems in the architecture will respond (by taking an action to move to a new state) as a function of its current state. Each transition specifies an event and an action.

SV-10c Systems The Systems Event-Trace Description provides a time-ordered examination of the system data elements Event-Trace exchanged between participating systems (external and internal), system functions or human roles as a result of Description a particular scenario. Each event-trace diagram should have an accompanying description that defines the particular scenario or situation. SV-10c in the Systems View may reflect system-specific aspects or refinements of critical sequences of events described in the Operational View.

SV-11 Physical Schema The Physical Schema product is one of the architecture products closest to actual system design in the Framework. The product defines the structure of the various kinds of system data that are utilized by the systems in the architecture.

TV-1 Technical The Technical Standards Profile collects the various systems standards rules that implement and sometimes Standards Profile constrain the choices that can be made in the design and implementation of architecture.

TV-2 Technical The Technical Standards Forecast contains expected changes in technology-related standards and Standards conventions, which are documented in the TV-1 product. The forecast for evolutionary changes in the standards Forecast should be correlated against the time periods as mentioned in the SV-8 and SV-9 products.

CIAD APP UML Reference

Introduction to Class Diagrams

This architecture document uses class diagrams to describe logical data models and domain models of the OOI CI system. This section explains the meaning of the graphical notation that will be used here. This notation is based on UML class diagram notation.

Class

A class picture (Figure 1) represents a type (class) of a system entity; it can be a logical or a physical entity. In the real system there might be multiple instances of one class. In this way, all these instances share the characteristics of the class. Example for a class is "Sensor". Instances then are all the different sensors in the system: "CTD_Pioneer_1", "Seismometer_3", "Hydrophone_2".

Figure 1. UML Class

Package

A package is a container that organizes elements in the domain model. One package can contain other packages (sub-packages) and entities (Classes, Associations, and Generalizations). Typically, packages are created to group entities by topic and structure them hierarchically.

506 Figure 2. UML Package

Block

Blocks (or colored boxes) are group entities with similar characteristics. They include a textual description and stress a component view of the system. An entity can be part of various blocks in different views of the system, but an entity is part of only one package.

Figure 3. UML Block

Association

An association is represented by a solid line between two classes. It establishes a relationship between these two entities. It can be directed, indicated by an arrow, or undirected (Figure 4).

Figure 4. UML Undirected Association (Top), Directed Association (Middle), Named, directed association with multiplicities (Bottom)

Associations can have names, indicated by labels along the solid line and a multiplicity on both ends of the line (Figure 4, bottom). In case of a directed association, the name is interpreted in the direction of the arrow. The multiplicity of an association determines the number of instances of the given entity at each end of the line. In Figure 4, for example, exactly one instance of Class1 refers to 0 or more instances of Class2. Possible values of multiplicity are:

Exactly one instance is required. 0..1 At most one instance is allowed. 1..* One or more instances are required.

507 * An arbitrary number of instances are allowed.

Aggregation

One particular type of association is the aggregation association (Figure 5). It is a directed association with a white diamond on one side and an arrow on the other. The arrow can be omitted. It denotes that the class next to the arrow is part of the class next to the diamond. In most cases, this expresses a "has-a" relation.

Figure 5. UML Aggregation association

Generalization

Generalization is a graphical notation used to describe hierarchical types. The meaning of Figure 6 is that Class1 is a specialization of Class2. Therefore Class1 has inherited all the relationships and attributes of Class2 and an instance of Class1 can be used where one of Class2 is required. In words, this expresses an "is-a" relation. Class2 is a special case of Class1.

Figure 6. UML Generalization

Introduction to Message Sequence Charts

Message Sequence Charts (MSCs) provide a rich graphical notation for capturing interaction patterns. MSCs have emerged as a means for specifying communication protocols in telecommunication systems. They have also found their way into the new UML 2.0 standard, which significantly improves the role of interaction models within the UML.

We use the MSC notation to describe the interaction patterns defining services. MSCs come in two flavors (basic and High-Level MSCs) and have a number of operators. In this appendix, we briefly introduce the notations of MSCs used in the diagrams presented in this document, namely the basic MSCs and the LOOP, PAR, ALT operators.

Basic MSCs consist of a set of axes, each labeled with the name of a system entity. An axis represents a certain segment of the behavior displayed by the entity it references. Arrows in MSCs denote communication. An arrow starts at the axis of the sender; the axis at which the head of the arrow ends designates the recipient. Intuitively, the order in which the arrows occur (from top to bottom) within an MSC defines possible sequences of interactions among the depicted entities.

We exemplify the MSC constructs by showing different interaction patterns between a client and a server. In the MSC from Figure 7, the client enters the start state, sends request message to the server, receives response message from server, and enters the done state, in the order specified from top down. States are represented in MSCs as labeled hexagons.

508 Figure 7. MSC basic construct

The MSC from Figure 8 uses a LOOP construct to represent the repetition of an interaction pattern: the client repeatedly sends request to and receives response from the server in that order. The asterisk at the top left corner of the LOOP box indicates there can be any finite number of repetitions.

509 Figure 8. MSC with LOOP operator

The MSC in Figure 9 shows the use of an ALT construct to indicate alternative paths in an interaction pattern: the client performs the same sequence of operations as in Figure 7, except that the client may receive an alternative message reject from the server rather than response, depending on server's decision.

The MSC from Figure 10 shows the PAR operator. After the server receives request, it sends messages response_1 and response_2 in parallel: the order in which the messages response_1 and response_2 occur is left unspecified; both must arrive before the client enters the done state.

510 Figure 9. MSC with ALT operator

Figure 10. MSC with PAR operator

511 CIAD APP References

Appendix B. References

Management Plans

Reference Citation Location

CIMP-PEP OOI CI Project Execution Plan. Most recent released, on 2010-00001_PEP_CI.pdf Alfresco.

CIMP-QA OOI CI QA-QC Plan. Most recent released, on Alfresco. 2010-00002_QA_QC_Plan_CI.pdf

CIMP-RM OOI CI Risk Management Plan. Most recent released, on 2010-00003_Risk_Management_Plan_CI.pdf Alfresco.

CIMP-AQS OOI CI Acquisition Strategy. Most recent released, on Alfresco. 2010-00004_Acquisition_Strategy_CI.pdf

CIMP-PM OOI CI Property Management Plan. Most recent released, on 2010-00005_Property_Management_Plan_CI.pdf Alfresco.

CIMP-OM OOI CI Operations and Maintenance Plan. Most recent 2010-00006_Operations_and_Maintenance_Plan_CI.pdf released, on Alfresco.

CIMP-MAINT OOI CI Maintenance Strategy. Most recent released, on 2010-00007_Maintenance_Strategy_CI.pdf Alfresco.

CIMP-EHS OOI CI Environmental Health and Safety Plan. Most recent 2010-00008_Environmental_Health_and_Safety_Plan_CI.pdf released, on Alfresco.

CIMP-CMP OOI CI Configuration Management Plan. Most recent released, 2110-00001_CMP_CI.pdf on Alfresco.

CIMP-TTO OOI CI Transition to Operations Plan. Most recent released, on 2110-00002_Transition_to_Operations_Plan_CI.pdf Alfresco.

CIMP-COMM OOI CI Commissioning Plan. Most recent released, on Alfresco. 2110-00003_Commissioning_Plan_CI.pdf

CIMP-ITS OOI CI Integration Test Strategy. Most recent released, on 2110-00004_Int_Test_Strategy_CI.pdf Alfresco.

CIMP-ITV OOI CI Integration, Test and Verification (ITV) Plan. Most recent 2110-00005_ITV_Plan_CI.pdf released, on Alfresco.

Specifications

Reference Citation Location

SCIPROSP OOI Science Prospectus, OOI Program 2007.

CI-IRD OOI CI User, System and Subsystem Requirements. DOORS Requirements Database Export. OOI CI.

CI-SPECS OOI CI System Architecture Drawings. OOI CI, Available on Alfresco. Enterprise Architect Export (HTML)

CI-COP1 OOI CI Concepts of Operations. Science User Operational Concepts ("Dr. 2115-00002_Science_User_OpCon_CI.pdf Chu")

CI-COP2 OOI CI Concepts of Operations. A Day In the Life of an Instrument 2115-00001_Instrument_Life_Cycle_OpCon_CI.pdf

CI-COP3 OOI CI Concepts of Operations. A Day In the Life of Data 2115-00017_Data_LifeCycle_ConOps_CI.pdf

IOCI-AD Integrated Observatory Cyberinfrastructure Architecture Document, This Architecture and Design (current version) Document. OOI CI ADT, current Version, 2011. Most recent approved, R1 LCO Version, R1 LCA Version

Workshop Reports and Whitepapers

Reference Citation Location

512 CI-DCOI OOI CI Design Workshop, Common Operating Infrastructure. OOI COI FDR Kickoff Workshop CI ADT

CI-DDM OOI CI Design Workshop, Data Management. OOI CI ADT ASWS-DM

CI-DSA OOI CI Design Workshop, Sensing and Acquisition and Instrument ASWS-SA Integration. OOI CI ADT

CI-PAD OOI CI Architecture Document, PDR Final version, 16-Nov-2007

CI-PERS OOI CI User Persona Model, OOI CI ADT, Final version 1-00, 2115-00010_User_Persona_Model_Whitepaper_CI.pdf 28-Oct-2008.

CI-PROPOSAL Network for Ocean Research, Interaction and Application (NORIA) Proposal, 22-Dec-2006.

CI-RDPG OOI CI Requirements Elicitation Workshop Report, Data Product 2115-00007_ReqWS4_DPG_CI.pdf, Generation, OOI CI ADT, Final version 1.0, 16-Oct-2008. workshop page: RWS-DPG

CI-REPE OOI CI Requirements Elicitation Workshop Report, Education and 2115-00009_ReqWS6_EPE_CI.pdf, Public Engagement, OOI CI ADT, Final version 1.0, 16-Oct-2008 workshop page: RWS-EPE

CI-RIOM OOI CI Requirements Elicitation Workshop Report, Integrated 2115-00008_ReqWS5_IOM_CI.pdf, Observatory Management, OOI CI ADT, Final version 1.0, workshop page: RWS-IOM 16-Oct-2008

CI-ROOP OOI CI Requirements Elicitation Workshop Report, Ocean 2115-00006_ReqWS3_OOP_CI.pdf, Observing Programs, OOI CI ADT, Final version 1.0, 16-Oct-2008 workshop page: RWS-OOP

CI-RUA OOI CI Requirements Elicitation Workshop Report, User 2115-00003_DesignWS_User_Applications_CI.pdf, Applications, OOI CI ADT, Final version 1.0, 28-Oct-2008 workshop page: ASWS-UA

CI-RWS1 OOI CI First Science User Requirements Elicitation Workshop 2115-00004_ReqWS1_OceanModeling_CI.pdf, Report, OOI CI, Final version 1.0, 08-Nov-2007 workshop page: First Numerical Modeling Requirements Workshop

CI-RWS2 OOI CI Second Science User Requirements Elicitation Workshop 2115-00005_ReqWS2_OceanModeling_CI.pdf, Report, OOI CI, Final version 1.0, 09-May-2008 workshop page: Second Numerical Modeling Requirements Workshop

Historic References (Outdated)

Reference Citation Location

CI-CARCH CI Conceptual Architecture with initial requirements http://www.orionprogram.org/organization/committees/ciarch

IOA-AD Integrated Observatory Applications Architecture Document, 2130-00001_Integrated_Observatory_Applications_AD_CI.pdf OOI CI ADT, FDR Final Version 1-00, 28-Oct-2008.

IOI-AD Integrated Observatory Infrastructure Architecture Document, 2130-00002_Integrated_Observatory_Infrastructure_AD_CI.pdf OOI CI ADT, FDR Final Version 1-00, 28-Oct-2008.

External References

Reference Citation Location

DoDAF DoD Architecture Framework v1.5, Washington, D.C.: Department http://www.defenselink.mil/cio-nii/docs/DoDAF_Volume_I.pdf of Defense, April 2007.

TRL Mankins, J.C. Technology Readiness Levels. White Paper, April http://ipao.larc.nasa.gov/Toolkit/TRL.pdf 1996

See also OOI CI Public Document Repository

513