Statistical Data and Metadata Exchange Exchange Problems & Opportunities
Total Page:16
File Type:pdf, Size:1020Kb
SDMX Statistical Data and Metadata eXchange Exchange Problems & Opportunities Statistics agencies/providers want to: • Avoid sending the same data to multiple agencies • Avoid sending data packages full stop • Not adopt new formats/standards for a few, specific data flows • Make datasets user-friendly and comparable International Organisations/receivers want to: • Avoid time and errors when processing different file formats from providers • Large dataset size processing is a technical and manual headache • To have comparable data • Avoid the time delays caused by manual file processing • Avoid round trips of validation with member agencies, avoid creating proprietary validation rules Exchange Problems & Opportunities Everybody would like: • Automatic validation of the exchange before processing • Automated workflows for exchange of statistics • Lower cost, increase quality, more guidelines for exchange and implementation • To document datasets structural metadata and reference metadata • To store the documentation, have a standard way of querying it, and make it discoverable • To benefit from a large community offering free tools and sharing expertise around a standard What is SDMX? • Statistical Data and Metadata eXchange • Released in 2002 “SDMX is an initiative to foster standards for the exchange of statistical information.” • Sponsor organisations: – BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank • SDMX.org web site What is SDMX? • A set of technical standards – Information Model – Web service standard, e.g. for creating data queries – Registry standards so that data catalogs can be queries/data can be discovered • Guidelines for – Coding – Best practices when using the standard – Technical implementation • SDMX governance: – SDMX Sponsors are steering group – Technical Working Group, Statistical Working Group • Existing, reusable tools • Main exchange format is with standard schemas Why use SDMX?: The Business Case Designed to improve machine-to-machine meta/data exchange Saves resources: • Reuse of exchange systems across domains and agencies • Reuse of statistical metadata and methodology Improves quality: • Promotes standard classifications -> reduces mapping and transformation errors • Automated exchange -> reduces manual intervention errors • Validation is a first-class part of SDMX Improves timeliness: • Automated workflows, less “wait states” • Reduces delays from manual intervention E.g. Copy/paste – click; repeat this many times. Automation allows unattended workflow execution For exchange, why not use…? Standard Issues Simple CSV Not structured, hard to validate No metadata Excel Metadata tied to presentation Proprietary format Licensing Hard to process and automate FAME, SAS, Proprietary format STATA files Licensing GESMES No information model Proprietary format Few tools or international support XBRL, DDI Not focused on modelling the exchange XML (only) No context to tags SDMX adds context to XML The SDMX Information Model • Information Model Examples: Information Model Objects Used by Excel Sheets, Cells, Rows Formulae, VBA Relational database Database, Table, SQL, Interface Column OECD metadata 42 categories OECD.Stat, Metastore • SDMX IM is designed for statistical data and metadata exchange, and cataloguing that metadata • SDMX IM was designed for aggregated data, but can be used for microdata The SDMX Information Model (High Level) Clickable SDMX: https://statswiki.unece.org/display/ClickSDMX SDMX Main Tools • SDMX Registry – Structural Metadata catalog – Data Discovery – Demo of SDMX Global Registry • SDMX Converter – Converts between formats (Excel, GESMES, CSV, etc.) • SDMX Reference Infrastructure – SDMX Export and mapping for an existing database – Used by many agencies for reporting • Plug-ins/Libraries for Econometrics Tools: – R, Stata, SAS, etc. Mapping • Java and .Net software libraries (SDMX Source) • Full Tools list • OECD.Stat data warehouse platform is partly SDMX now, will be fully SDMX based in next two years – Is used by iStat, ABS, Tunisia, ILO, being considered by others – Has an active community Querying Data <Query a web service: OECD KEI dataset> http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/KEI/PS+PR+PRINTO01+SL +SLRTTO01+SLRTCR03+OD+ODCNPI03+CI+LO+LOLITOAA+LORSGPRT+LI+LF.AU S+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+USA.GP+GY+ST.A+Q+M/all?startTi me=2015&endTime=2017&format=compact_v2 SDMX Guidelines • <Checklist for SDMX Design> • <SDMX Glossary> • Guidelines – Versioning – Data vintage representation – Confidentiality/Embargo representation – Non-calendar time ranges – Cross-domain Code lists (Observation Status, Seasonal Adjustment, etc.) – How to modelling a statistical domain/reporting framework – Creating Data Structure Definitions – Reference metadata concepts – A global framework/set of concepts for exchanging ref metadata Relationship to other standards The Generic Statistical Business Process Model (GSBPM) SDMX Global Data Structure Definitions (DSDs) • Since 2011, SDMX has brought together the technical and statistical world in several domains to work on “Global DSDs” • Global DSDs improve on the heterogenous reporting methods that we have today • They improve many aspects of data exchange, including: – Better timeliness by allowing data queries (rather than sending data many times) – Avoid the burden of maintaining many different reporting systems and exchange agreements – Save money by reusing IT systems, standards, and methodology • Find them in the Global Registry Status of Global DSDs Domains/Reporting frameworks IN PRODUCTION PUBLICATION DATE National Accounts (including Gov. Finance 2013 Q3 Statistics) Balance of Payments 2014 Q1 Foreign Direct Investment 2014 Q1 IN PROGRESS International Merchandise Trade Statistics 2017 Q4 Price statistics 2017 Q4 Labour statistics 2017 Q4 Education 2017 Q4 Sustainable Development Goals 2018 Q3 R&D Statistics To be decided Environmental-Economic Accounts To be decided Energy Statistics Envisaged In Conclusion • SDMX is a set of technical, content and methodological standards • Many free tools, more coming online • Not just a file format (though it includes that) • Goal of SDMX: Help organise statistical metadata to make the exchange of information easier and more efficient • Has a very active community, new features such as SDMX-CSV • Saves resources, improves quality and timelines of data exchange • For transparency: – SDMX helps to manage, catalog and surface metadata through registries (such as the global registry – Standard exchange mechanisms and structures help with comparability and linking between datasets – SDMX initiative is aligned with other standards through the HLG and international organisation communities Thank you David Barraclough – OECD [email protected] http://sdmx.org SDMX LinkedIn Group .