SDMX Metadata Structure
Total Page:16
File Type:pdf, Size:1020Kb
SDMX GLOBAL CONFERENCE, PARIS, 19-21 January 2009 THE IMPLEMENTATION OF THE SDMX CONTENT-ORIENTED GUIDELINES WITH REGARD TO METADATA EXCHANGE IN THE EUROPEAN STATISTICAL SYSTEM (August Götzfried and Marco Pellegrino, Eurostat) This paper describes the implementation of a more advanced framework for metadata production, exchange and sharing making use of the new SDMX content-oriented guidelines published in January 2009. The combined use of technical standards and content guidelines can greatly support the standardisation, exchange, web dissemination and re-use of structural and reference metadata1 within the European Statistical System (ESS). With regard to structural metadata, Eurostat is not only implementing the content-oriented guidelines, but is also undertaking further harmonisation efforts for additional sets of metadata which are offered to be used at SDMX level in a second phase. With regard to reference metadata, the use of the Cross- domain Concepts (version 2009) is leading to the implementation of a Europe-wide standard for reference metadata called ESMS, Euro-SDMX Metadata Structure. The ESMS, currently being implemented within Eurostat, will be successively used for reporting metadata to Eurostat and for exchanging metadata between international organisations, for instance between Eurostat, the European Central Bank, the IMF and the OECD. The SDMX work has progressed considerably, in 2008, but more still needs to be done for taking advantage of the full potential that SDMX standards and content guidelines provide. 1. Background: Using SDMX standards for improving metadata exchange The dissemination of reference metadata on Eurostat's web site has considerably increased since the implementation of the free dissemination policy, in 2004. As more and more data got freely disseminated, metadata files also needed to keep pace. These metadata files, based on the SDDS standard, were explaining data content, methodology, re-use precautions and overall quality, providing users with a standard template across all of the statistical domains. The visibility of these metadata was high since the start (statistics on web usage show an average of more than 3000 consultations of reference metadata files per day on Eurostat's web site). A policy for central quality monitoring of the contents of the metadata files was put in place, together with training initiatives and specific actions for increasing the metadata coverage. In spite of this success, users are still confronted with a series of problems when trying to retrieve comparable methodological information per statistical domain for Eurostat, on one hand, and for different countries on the other hand. The European Statistical System as a whole still does not provide users with a common set of standardised, comparable and re-usable reference metadata describing both European statistics (produced by Eurostat) and national statistics produced by EU Member states or other associated countries. Reference metadata collected for countries are in most cases non-standardised and follow different structures, normally determined by the managers of the respective statistical domain. Even the SDDS, implemented by the IMF, hardly cover 5% of the whole dissemination of a statistical institute. 1 In SDMX, "structural metadata" are those metadata acting as identifiers and descriptors of the data, such as names of variables or dimensions of statistical cubes. Structural metadata must be associated with the data, otherwise it becomes impossible to identify, retrieve and browse the data. "Reference metadata" are metadata that describe the contents and the quality of the statistical data (concepts used, metadata, describing methods used for the generation of the data, and metadata, describing the different quality dimensions of the resulting statistics, e.g. timeliness, accuracy). While these metadata exist and may be exchanged independent of the data and its structural metadata, they are often linked (“referenced”) to data. 1 What is more, according to our assessment, many SDDS-based reference metadata are quite weak with regard to information on national statistical processes and on data quality. For these reasons, the standardisation process needed to be accelerated, so that comparable data and metadata could be made available more easily, reducing redundancies, minimising reporting efforts and establishing a stronger coordination of metadata requirements. These tasks involved the development of more advanced metadata standards in both the IT area (system architecture and tools) and in metadata content. And this is where SDMX fits into the picture. While version 1.0 of the SDMX standards was mainly concerned with data sets and its structural definitions, version 2.0 introduced a full metadata support, providing for the attachment of reference metadata to any part of the data tree, as well as for the reporting and exchange of metadata using XML formats. These functionalities can be very useful for supporting data quality initiatives, allowing for a better exchange of quality-related metadata. One of the most important features of the SDMX information model is the specification of formal rules for formatting data and metadata, so that these can be exchanged, read and processed without manual intervention. A web-service, using information about web locations of data and metadata, can navigate, find and automatically process the information for analytical and dissemination purposes, even querying metadata across various sites for retrieving a customised reporting in XML format. Chart 1 (taken from the version 2 package) depicts the essential characteristics supported in the SDMX model for data and metadata reporting. The pivot of this diagram is the Data or Metadata Flow, maintained by the organisation that collects data or metadata. A Data Flow is linked to a “Data Structure Definition” (DSD) while a Metadata flow is linked to a “Metadata Structure Definition” (MSD) which defines the structure of metadata and identifies the data elements to which metadata can be attached. Data or metadata may be made available by many providers and any provider may report or publish data or metadata for several data or metadata flows, according to a Provision Agreement. The Data or Metadata Flow may also be linked to one or more topics (Category) in a subject-matter scheme (Category Scheme). A category scheme provides a way of classifying data for collection, reporting or publication. Chart 1 - SDMX Data and Metadata Reporting 2 The core of the SDMX model for reference metadata is the concept of “Metadata Structure Definition”, which defines: • which metadata concepts are to be reported; • the identity of the metadata concept (for example, a code which may simply be derived from the cross-domain concepts scheme); • the format and representation (textual or coded); • the role in its usage, e.g., mandatory or conditional. Reference metadata may be attached to different object types (for instance a data set, a time series, or an observation). These files are however often attached at a high level (i.e. at data set or statistical domain level, or even at agency level) because the contents of the file often refers to several or even all of the data tables produced on the basis of the respective data set. A Metadata Structure Definition also needs to identify the object the metadata are attached to. For using this model, it was necessary: • to develop and manage standard Metadata Structure Definitions (MSD) compliant with the SDMX version 2.0, using a standardised list of metadata concepts; • to design and develop a set of IT tools allowing the creation and management of reference metadata, using as much as possible information already stored in existing metadata files; • to design and develop a system architecture (based on a registry) for transferring reference metadata to external users and to the web-site, independently on the respective IT platforms. During the course of 2007 and 2008, Eurostat dedicated significant resources to the development of a system architecture and IT tools aimed at supporting data and metadata exchange within the European Statistical System. At the same time, intensive consultations were conducted within several ESS working groups and task-forces, while the SPC (Statistical Programme Committee), the IT Directors' Group and the Directors' meeting expressed their favourable opinion and encouragement on the SDMX implementation. 2. Content-oriented guidelines: a step forward towards statistical standardisation The Content-Oriented Guidelines (COG) finally comes to complements the SDMX Technical Standards with a set of recommended practices for creating interoperable data and metadata across statistical domains. The release of the new package is therefore a big achievement with regard to the international harmonisation of metadata messages. From now on, metadata reports can be structured using the SDMX list of standard concepts – such as "contact", "timeliness", "dissemination format", "classification system", or "comparability" – with a common description and identification, so that IT systems which exchange data and metadata understand what the data or metadata refer to without any big problem in determining the semantic equivalence between concepts. The SDMX COG package comprises the following main elements: SDMX Cross-Domain Concepts: a list of 66 metadata concepts plus a series of sub-concepts, relevant to several statistical domains and recommended for use in data and metadata