The Use of Sdmx Standards for Supporting Reference Metadata Interchange Within the Sodi Project

SODI Implementation and support of standardised data THE USE OF SDMX STANDARDS FOR SUPPORTING REFERENCE METADATA INTERCHANGE WITHIN THE SODI PROJECT Last updated on 20 December 2006 SODI Implementation and support of standardised data SDMX Reference Metadata Interchange* Contents 1. INTRODUCTION.......................................................................................................3 2. PROJECT OBJECTIVES ..........................................................................................3 3. BACKGROUND.........................................................................................................3 4. A MODEL FOR METADATA EXCHANGE ................................................................4 5. CONTENT-ORIENTED GUIDELINES.......................................................................5 6. REFERENCE METADATA REPORTING .................................................................6 7. SYSTEM ARCHITECTURE.......................................................................................6 8. STANDARDISATION OF METADATA CONCEPTS.................................................8 9. SDMX METADATA INTERCHANGE ........................................................................8 * Document prepared by Marco Pellegrino Eurostat, Directorate B: Statistical Methods and Tools, Dissemination Unit B4, Reference Databases Contact: [email protected] The views expressed are those of the writer and may not in any circumstances be considered as stating an official position of the European Commission (Eurostat). 1. INTRODUCTION This paper provides an overview of the project “SDMX-compliant Metadata for use within Eurostat's Web Site” being executed under the framework contract “Implementation and support for standardised data formats for statistical data”. The project is embedded within the SODI project1 (SDMX Open Data Interchange) sponsored by Eurostat and it has been discussed during the latest meeting of the SODI task-force, held in Luxembourg on 13-14 November 2006. The main purpose of the project is to enable end users and metadata producers to access, analyse and reuse statistical metadata originated from multiple websites. In order to achieve this objective, we intend to: a) standardise the set of concepts used in collecting, processing, and disseminating statistical metadata; b) use SDMX protocols for ensuring the interoperability of messages, independently on the respective IT platforms, and for delivering a timely information to users on the web. The data and metadata involved in this project refer to a comparatively small number of standardised data sets in the domain of Principal European Economic Indicators (PEEI, see Table 1) but the project tasks, dealing with formats and registry technologies, illustrates how a similar approach could be used to support the collection and management of standardised metadata covering more subject-matter domains. 2. PROJECT OBJECTIVES The project intends to demonstrate how SDMX technical standards and content guidelines can support the exchange and web dissemination of an important set of reference metadata2. To realise this goal, it is necessary: • to develop tools and standard formats needed for the creation and management of Metadata Structure Definitions (MSD) compliant with SDMX version 2.0 standards; • to design and develop a set of tools allowing creation, transfer and management of reference metadata in SDMX-ML format, using as much as possible information already available in the existing metadata repositories; • to design and develop a registry-based architecture for transferring reference metadata to external users and to the web-site. 3. BACKGROUND In 2005, Eurostat launched "SDMX Open Data Interchange" (SODI) as a data sharing and exchange project within the European Statistical System. The project started with a pilot exercise involving National Statistical Institutes of Germany, France, the Netherlands, Sweden and the United Kingdom. The statistical institutes of Denmark, Italy, Norway and Slovenia joined the pilot in 2006, while Finland and Ireland are due to join the exercise in 2007. The long term view is to extend the results of the SODI pilots to cover any suitable statistical domain and to explore the feasibility of using SDMX as the preferred standard formats for the harmonisation of statistical production systems along the statistical life cycle of Eurostat. Eurostat and EU Member States have been gradually increasing their use of standardised messages for the transmission of statistical data, and this work will continue over the next five years. 1 The SODI project focuses on the interoperability of statistics for collecting and disseminating short-term statistics, especially in the domains of the Principal European Economic Indicators (PEEI), with the overall objective of increasing timeliness and accessibility. SODI is an SDMX implementation project. 2 According to the SDMX Metadata Common Vocabulary, "reference" metadata are metadata describing the contents and the quality of the statistical data, normally including "conceptual" metadata, describing the concepts used and their practical implementation; "methodological" metadata, describing methods used for the generation of the data (e.g. sampling, collection methods, editing processes); and "quality" metadata, describing the different quality dimensions of the resulting statistics (e.g. timeliness, accuracy). These metadata are often stored in a separate metadata repository and they are referenced from the related data element. 3 Statistical institutes are now confronted with the challenge of providing at the same time clear, timely and accurate information on the data (metadata) which are disseminated on public channels. For this information to be consistent, comparable and reusable by third parties, further efforts need to be done for harmonising its content and presentation, reducing redundancies and double work. 4. A MODEL FOR METADATA EXCHANGE The Open Metadata Interchange project is based on the SDMX information model and makes use of the SDMX 2.0 set of standards for facilitating the exchange of statistical information through the use of web services and mark-up languages. The SDMX information model encourages the specification of formal rules for formatting metadata, so that these can be exchanged, read and processed by computers without manual intervention. A web- service, using information about web locations of data and metadata, can then navigate, find and automatically process the information for analytical and dissemination purposes. In particular, the version 2.0 of Technical Standards represents a major advance over version 1, as it supports richer and more complex data/metadata structures and it allows querying metadata across various sites for retrieving a customised reporting in a standard XML format. Chart 1 - SDMX Data and Metadata Reporting3 Chart 1 depicts the essential characteristics supported in the SDMX model for data and metadata reporting. The pivot of this diagram is the Data or Metadata Flow, maintained by the organisation that collects data or metadata. A Data Flow is linked to a “Data Structure Definition” (DSD, also known as “key family” in Gesmes) while a Metadata flow is linked to a “Metadata Structure Definition”. A "Metadata Structure Definition" (MSD) defines the allowable content of metadata and identifies the data structures to which metadata can be attached Data or metadata may be provided by many Data Providers and any Data Provider may report or publish data or metadata for many Data or Metadata Flows. The Provision Agreement is a way of applying constraints on the scope of the data or metadata that can be supplied: for instance, a Data Provider might supply data or metadata for a limited subset of values. The Data or Metadata Flow may also be linked to one or more topics (category) in a subject- matter scheme (category scheme). A category scheme provides a way of classifying data for collection, reporting, or publication. 3 The chart is taken from the SDMX Version 2 package, namely from the "SDMX Implementors Guide, Version 2.0", November 2005, figure 30. 4 5. CONTENT-ORIENTED GUIDELINES Technical standards are complemented with "content-oriented guidelines" aimed at establishing good practices in the use of a common terminology and in structuring data and metadata sets for supporting the exchange and encouraging re-use across domains. Although content-oriented guidelines are not strictly required for being conformant with the technical ISO standard, SDMX partners intend to promote the use of concepts that are common to as many statistical domains as possible. In March 2006, SDMX delivered a draft set of content-oriented guidelines4 consisting of: Cross-Domain Concepts (a list of metadata concepts relevant to several statistical domains, recommended for use in data and metadata exchange to promote re-usability of statistical information between organizations) Statistical Subject-Matter Domains (a standard scheme against which similar domain lists of various organizations can be mapped to facilitate the exchange of data and metadata) Metadata Common Vocabulary (MCV5, a repository containing concepts and related definitions to which metadata terminology used in international and national data producing agencies may be mapped) The SDMX group is currently at work for reviewing the guidelines in the light of comments received and for completing the package, before proceeding to a further consultation of respective

The Use of Sdmx Standards for Supporting Reference Metadata Interchange Within the Sodi Project

The Statistical Data and Metadata Exchange Standard (SDMX)

Joint Adb – Unescap Sdmx Capacity Building Initiative Sdmx Starter Kit for National Statistical Agencies

Central Banks' Use of the SDMX Standard

Implementor's Guide for Sdmx Format Standards (Version

SDMX 2.1 User Guide Aims at Providing Guidance to Users of the Version 2.1 of the Technical Specification, Released in April 2011

SDMX Roadmap 2021-2025

Policy-Making for Research Data in Repositories: a Guide

Information Model for Format Implementers

Annex 1 Technology Architecture 1 Source Layer

PDF Presentation

(I) Functions of Metadata in Statistical Production

The SDMX Service Architecture for the Perspective of a National Statistical