3Rd Meeting of the Microdata Access Network Group (MANG)

Total Page:16

File Type:pdf, Size:1020Kb

3Rd Meeting of the Microdata Access Network Group (MANG) EUROPEAN COMMISSION EUROSTAT Directorate B: Methodology; Dissemination; Cooperation in the European Statistical System Unit B-1: Methodology: Innovation in Official Statistics ESTAT/B1/MANG(19)3 Available in EN only 3rd meeting of the Microdata Access Network Group (MANG) Luxembourg, 13 June 2019 Venue: Luxembourg Foyer Europeen 10, rue Heinrich Heine L-1720 Luxembourg – Gare 9:30-16:00 Item 3 Metadata for microdata 1 Metadata standard for microdata 1. INTRODUCTION Eurostat and the European Statistical System (ESS) have a long tradition in providing metadata to the user. Click on the link below to access the Eurostat webpage on metadata: https://ec.europa.eu/eurostat/data/metadata To improve services for users of microdata, Eurostat is considering the metadata standard for its microdata, both for research purposes and to accompany its public use files. To inform the discussion on a metadata standard for microdata, Eurostat has asked some experts in national statistical institutes to describe what metadata they provide with the microdata. The results of this are summarised in paragraph 3. In the discussion on these outcomes it was stressed that user requirements should be considered. In this agenda item the Members of the MANG are invited to reflect on the user requirements on metadata. Paragraph 2 offers a brief introduction from Eurostat perspective and the information in paragraph 3 summarises the situation in some national statistical institutes. This may be used as a trigger for formulating requirements, highlighting both good and bad practices. 2. METADATA FOR EUROPEAN MICRODATA The value added of European microdata is in the standardisation over Member States, thus allowing research across several countries in the EU. In the ideal situation the metadata should allow several views: Across countries to assess the comparability over countries; Across time to assess the comparability over time; Over different versions of the same data set: the full data set as used in dissemination of official statistics, the partially anonymised scientific use files and the public use files; along this line you could follow the protection process. There is some metadata at the level of the survey as a whole (per country and per year), for instance sample size, sample design, response rate, confidentiality treatment. Other metadata are at the level of variables: definition, relation with other variables, format. All this also requires a kind of demographic description of the variables. Which variables are completely new, which are continuations of previous variables, which variables appear with a certain pattern (special topics/modules) etc. All this information is available in principle in the national statistical institutes and in Eurostat, but scattered over separate documents and usually stored along with the data according to the annual production rhythm. Another challenge is the long list of exceptions. Countries that have implemented the new version of the classification before it was required, countries that have implemented after it was required, countries that have done important changes to the data collection or processing methods, countries that have requested additional protection of the microdata because of the size of the country etc. This would be a considerable task. The MANG is invited to reflect on user requirements and on priorities. 2 3. METADATA IN SOME NATIONAL STATISTICAL INSTITUTES Eurostat has asked some experts in national statistical institutes to describe what metadata they provide with the microdata. The results can be summarised as follows: Q1. What types of metadata should be made available to researchers? For example: variable definitions, survey design, estimation method, protection approach. The generally held view is that as much metadata as possible should be provided to enable the user to correctly analyse the data and draw accurate and reliable conclusions. This should include general methodological information about the data source (be it administrative or survey data), processing and its potential uses. Wherever possible and applicable, the following points should be captured: How and why the data has been collected, its scope and the timescale covered; Details about the production / extraction process, including any imputation procedures. In the case of survey data, this should include survey design, target population, sample size and response rate. State if it is a cross-sectional or longitudinal survey, and detail any stratification or weighting criteria applied; Time-series information if changes have been made to the data collection practices over time; A list of variable codes complete with their classification, nomenclature, description / definition, type, format and length; A list of potential values for each variable with a translation of what each code means, including how missing data and “not applicable” cells are processed and annotated; Details of any protection methods used; Reliability thresholds, the quality of the variables and their potential uses or caveats. Advice on how variables could be applied as proxies. Q2. Is the metadata that goes with public use files different from the metadata for scientific use of confidential microdata? If yes, please describe the differences. For some national experts this question was not directly applicable their work experience since their organisation does not publish PUFs. Generally it was felt that, since PUFs are a test / teaching aid, whose users include researchers working to prepare syntax prior to gaining access to the SUFs, the metadata for PUFs and SUFs should not differ, except to reflect the differences in the level of detail provided by each file. PUF users also include the general public however, and a requirement for more descriptive (less technical) metadata is likely. The PUF metadata should also include details about the SDC methods applied (including the methods of synthetisation if applicable), and how this affects the research utility of the data. Q3. What formats do you use for the data and the metadata? Responses were mixed, from standard MS packages to a range of specialist tools. 3 Data is often provided in txt, csv, html, xls, xlsx, dbf or pdf format. Direct outputs from SAS, SPSS, Stata R, and Oracle Discoverer for OLAP (on-line analytical processing) are also available. [Question – does the user get to choose?] Metadata is generally provided in pdf, doc, docx, xls or xlsx format. Additionally, the following specialist tools / formats were cited by EG SDC members: Insee - Beyond 20/20, DDI1 and RDF2; ECB – SDMX3, XBRL / DPM and SDD; BG NSI – SDMX, JSON-stat, RDF N-Triples and INFOSTAT; Statistics Finland - JSON-stat; Statbel – RDF, Turtle. Q4. What software do you use to produce, control and disseminate the metadata? A diverse range of responses was received, with only 2 national experts citing MS Office software. Instead, organisations have either procured specialist “commercial off the shelf” (COTS) packages, or developed their own bespoke solutions. Since such diversity is difficult to summarise, the detail of individual responses is set out below: Statistics Sweden – The MONA (Microdata Online Access) system provides secure access to microdata via the internet. Data is processed and analysed via a suite of applications4 and aggregated results are e-mailed to the user (the microdata remains at Statistics Sweden, who supplies both the hardware and software). The metadata can be accessed via MONA and MetaPlus, and the latter is a tool designed to centrally co- ordinate and harmonise Statistics Sweden’s metadata repository. It is also presented on each survey’s internet home page. Hungarian Central Statistics Office – Investigating software capable of handling the DDI format; currently testing the data publishing and online analysis tool NESSTAR. ECB – Manages a data inventory using the Informatica business glossary tool, which operates on the ISO (International Standardisation Organisation) model to ensure global interoperability. A separate Single Data Dictionary (SDD) is also maintained, and work is underway to integrate the two systems in an Oracle database to create a browser 1 Data Documentation Initiative (DDI) is an international alliance aimed at creating and maintaining a technical documentation standard for describing and preserving statistical metadata, particularly surveys and questionnaires. Standardising this documentation involves modelling the various statistical concepts (questions, variables, code list, etc) and their relationships in xml documents. 2 Recommended by the World Wide Web Consortium (W3C), the Resource Description Framework (RDF) aims to create a global information network by facilitating the dissemination of data and metadata according to “linked data” principles. This promotes the publication of common structured and connected data on the internet rather than isolated sets of independent data and metadata. 3 Statistical Data and Metadata eXchange (SDMX) is an international initiative designed to standardise data and metadata exchange, including data (DSD) and metadata (MSD) structure definitions, concepts, code lists, data flow and IT architecture. It is sponsored by 7 international institutions, namely BIS, ECB, IMF, OECD, UN, World Bank and Eurostat. 4 FreeMat, Geoda QGIS, LibreOffice, MPlus, Management Studio, Python, R, R-Studio, SAS, SPSS, STATA, StatTransfer, SuperCross and Tinn-R. 4 application that allows collaborative management of the metadata, and has a sound approvals
Recommended publications
  • Using Json Schema for Seo
    Using Json Schema For Seo orAristocratic high-hat unyieldingly.Freddie enervates Vellum hungrily Zippy jangles and aristocratically, gently. she exploiter her epoxy gnarls vivace. Overnice and proclitic Zane unmortgaged her ben thrum This provides a murder of element ids with more properties elsewhere in the document Javascript Object Notation for Linked Objects JSON-LD. Enhanced display search results with microdata markup is json data using video we need a website experience, is free whitepaper now need a form. Schemaorg Wikipedia. Sign up in some time and as search console also, he gets generated by google tool you add more. Schema Markup 2021 SEO Best Practices Moz. It minimal settings or where your page editor where can see your business information that will talk about. Including your logo, social media and corporate contact info is they must. How various Use JSON-LD for Advanced SEO in Angular by Lewis. How do no implement a FAQ schema? In seo plugin uses standard schema using html. These features can describe you stand only in crowded SERPs and enclose your organic clickthrough rate. They propose using the schemaorg vocabulary along between the Microdata RDFa or JSON-LD formats to that up website content with metadata about my Such. The incomplete data also can mild the Rich Snippets become very inconsistent. Their official documentation pages are usually have few months or even years behind. Can this be included in this? Please contact details about seo services, seos often caches versions of. From a high level, you warrior your adventure site pages, you encounter use an organization schema.
    [Show full text]
  • V a Lida T in G R D F Da
    Series ISSN: 2160-4711 LABRA GAYO • ET AL GAYO LABRA Series Editors: Ying Ding, Indiana University Paul Groth, Elsevier Labs Validating RDF Data Jose Emilio Labra Gayo, University of Oviedo Eric Prud’hommeaux, W3C/MIT and Micelio Iovka Boneva, University of Lille Dimitris Kontokostas, University of Leipzig VALIDATING RDF DATA This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications. RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints.
    [Show full text]
  • Nuno Crato Paolo Paruolo Editors How Access to Microdata
    Nuno Crato Paolo Paruolo Editors Data-Driven Policy Impact Evaluation How Access to Microdata is Transforming Policy Design Data-Driven Policy Impact Evaluation Nuno Crato • Paolo Paruolo Editors Data-Driven Policy Impact Evaluation How Access to Microdata is Transforming Policy Design Editors Nuno Crato Paolo Paruolo University of Lisbon Joint Research Centre Lisbon, Portugal Ispra, Italy ISBN 978-3-319-78460-1 ISBN 978-3-319-78461-8 (eBook) https://doi.org/10.1007/978-3-319-78461-8 Library of Congress Control Number: 2018954896 © The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 Inter- national License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
    [Show full text]
  • Metadata and Microdata Curation and Dissemination Protocol
    METADATA AND MICRODATA CURATION AND DISSEMINATION PROTOCOL 2 Contents Acknowledgements ............................................................................................................................... 5 1. Background, concepts, and definitions ............................................................................................... 6 1.1 Background ........................................................................................................................ 6 2. Metadata standards ........................................................................................................................... 8 2.1 What is metadata? ............................................................................................................. 8 2.2 The Data Documentation Initiative (DDI) ............................................................................. 9 2.2.1 Benefits of DDI ............................................................................................................ 9 2.2.2 DDI Structure (version 2.5) ......................................................................................... 10 2.3 Acquisition of metadata .................................................................................................... 11 2.3.1 Receiving metadata through the data deposit system.................................................. 11 2.3.2 Harvesting metadata from external sources ................................................................ 11 2.4 Metadata required for the FAM catalogue
    [Show full text]
  • HTML5 Microdata and Schema.Org
    HTML5 Microdata and Schema.org journal.code4lib.org/articles/6400 On June 2, 2011, Bing, Google, and Yahoo! announced the joint effort Schema.org. When the big search engines talk, Web site authors listen. This article is an introduction to Microdata and Schema.org. The first section describes what HTML5, Microdata and Schema.org are, and the problems they have been designed to solve. With this foundation in place section 2 provides a practical tutorial of how to use Microdata and Schema.org using a real life example from the cultural heritage sector. Along the way some tools for implementers will also be introduced. Issues with applying these technologies to cultural heritage materials will crop up along with opportunities to improve the situation. By Jason Ronallo Foundation HTML5 The HTML5 standard or (depending on who you ask) the HTML Living Standard has brought a lot of changes to Web authoring. Amongst all the buzz about HTML5 is a new semantic markup syntax called Microdata. HTML elements have semantics. For example, an ol element is an ordered list, and by default gets rendered with numbers for the list items. HTML5 provides new semantic elements like header , nav , article , aside , section and footer that allow more expressiveness for page authors. A bunch of div elements with various class names is no longer the only way to markup this content. These new HTML5 elements enable new tools and better services for the Web ecosystem. Browser plugins can more easily pull out the text of the article for a cleaner reading experience. Search engines can give more weight to the article content rather than the advertising in the sidebar.
    [Show full text]
  • Effective and Efficient Online Communication the Channel Model
    Effective and Efficient Online Communication The Channel Model Anna Fensel, Dieter Fensel, Birgit Leiter and Andreas Thalhammer Semantic Technology Institute (STI) Innsbruck, University of Innsbruck, Technikerstraße 21a, A-6020 Innsbruck, Austria Keywords: Social Media, Web 2.0, Semantic Web, Dissemination, Communication, Knowledge Management. Abstract: We discuss the challenge of scalable dissemination approach in a world where the number of communication channels and interaction possibilities is growing exponentially, particularly on the Web, Web 2.0, and semantic channels. Our goal is to enable smaller organizations to fully exploit this potential. We have developed a new methodology based on distinguishing and explicitly interweaving content and communication as a central means for achieving content reusability and thereby scalability over various, heterogeneous channels. Here, we present in detail the communication channel model of our approach. 1 INTRODUCTION the town's website, and obviously a Facebook site is a must (with a booking engine included). Bookings Fax, phone, and later the Internet, have radically through mobile platforms are significantly changed our communication possibilities. More and increasing and the hotelier would want to be found more communication has been freed from the there too. Why not add a video about the hotel on geographical barriers that formerly limited their YouTube, a chat channel for instant communication, speed and expansion. Now, it is (in principle) fast email and fax response capabilities, the old- possible to instantly communicate with a large fashioned telephone, and occasional tweets and portion of the entire human population. emails that are clearly distinguishable from spam? Nevertheless, new means also generate new Preferably, the communication should be multi- challenges.
    [Show full text]
  • Rdfa Versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-Up Personal Learning Environments
    RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments Vladimir Tomberg, Mart Laanpere Tallinn University, Narva mnt. 25, 10120 Tallinn, Estonia [email protected], [email protected] Abstract. This paper addresses the possibilities for increasing semantic interoperability of mash-up learning environments through the use of automatically processed metadata associated with both learning resources and learning process. We analyze and compare potential of two competing technologies for this purpose: microformats and RDFa. 1 Introduction Mash-up Personal Learning Environments have become a fast developing trend in the world of technology-enhanced learning, partly because of their flexibility and lightweight integration features. Although it is quite easy to aggregate the RSS feeds from the blogs of learners, it is more difficult to get an overview of course and its learning activities. A course is not just a syllabus, it also involves various dynamic processes that can be described in many aspects. The course always has certain learning goals, a schedule that consists learning activities (assignments, discussions), registered participants like teachers and students, and different types of resources. It would be useful, if we would be able to extract such information also from mash-up personal learning environments (just like it can be done in traditional Learning Management Systems) and allow exchanging it between the course participants. Today for semantic tagging of Web content in general and learning content as special case various technologies are used. But there are no tools and ways exist for semantic annotation of learning process that takes place in a distributed network of mash-up personal learning environments.
    [Show full text]
  • Resource Description Framework (RDF)
    Semantic Web Technologies Resource Description Framework (RDF) Heiko Paulheim Semantic Web – Architecture here be dragons... Semantic Web Technologies (This lecture) Technical Foundations Berners-Lee (2009): Semantic Web and Linked Data http://www.w3.org/2009/Talks/0120-campus-party-tbl/ 09/13/19 Heiko Paulheim 2 Overview • A brief history of RDF • Encodings of RDF • Semantics and principles of RDF • Embedding RDF in HTML – RDFa, Microdata, Microformats • RDF Tools • Examples for RDF in the wild 09/13/19 Heiko Paulheim 3 History: Metadata on the Web • Goal: more effective rating and ranking of web contents, e.g., by search engines • Who has created this page? • When has it been changed the last time? • What is its topic? • Which is the content's license? • How does it relate to other pages? 09/13/19 Heiko Paulheim 4 Metadata on the Web: Dublin Core • Developed in 1995 at a workshpo in Dublin, Ohio • 15 predefined tags • A widely accepted standard (ISO 15836:2009) • May be embedded into HTML: <html> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>Semantic Web</title> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" > <meta name="DC.publisher" content="University of Mannheim" /> <meta name="DC.subject" content="Semantic Web" /> <meta name="DC.creator" content="Heiko Paulheim" /> <meta name="DC.relation" content="http://www.w3.org/2001/sw/" /> ... </head> <body> ... </body> </html> 09/13/19 Heiko Paulheim 5 Metadata on the Web: Dublin Core • Identifier • Creator • Format • Publisher • Type • Contributor
    [Show full text]
  • Markup Schema Or Microdata
    Markup Schema Or Microdata Governmental Quinn overstuffs amok or paginates therefrom when Errol is submersible. When Aguste insolates his vervains completing not achingly enough, is Reuben dichroic? Unfriendly and hard-set Sandy photograph, but Stinky flatly ripostes her demonstrator. The Ultimate Microdata Generator Schemaorg. It or how do you with ionos for? Plugins categorized as microdata WordPressorg. Schema Markup What record It & Why earn It lapse for SEO. Of commonly used markup vocabularies are poor by Schemaorg. As statistical or specific type for our audience. Structured manner in or, you or objects on a topic clusters on search pages both, microdata or services you can generate a developer pages one of elements. Microdata in Schema markup are annotations implemented inline within the HTML of capacity given element Take this train from Schemaorg's. Url or target markets where your. Basically schema markup is microdata for search engines that results in less relevant results on SERPs and potentially more exposure for five site claim to. Seo in or microdata? Please verify your website in my webpages both as with it correctly in its apps, or schema available via structured. Changing microdata to JSONLD SEO Split Testing Lessons. Ultimately schema markup is on form of microdata According to Wikipedia microdata is an HTML specification used to nest metadata within. I've making about Schemaorg markup before earth even covered Google's Data Highlighter to break add structured data markup to major site. How to Correctly Implement the Restaurant Schema Markup. Schema markup is microdata that makes it easier for search engines crawlers and browsers to complete what is acute a webpage There are.
    [Show full text]
  • RDF Translator: a Restful Multi-Format Data Converter for the Semantic Web
    RDF Translator: A RESTful Multi-Format Data Converter for the Semantic Web Technical Report TR-2013-1 Version: July 25, 2013 Alex Stolz, [email protected] Bene Rodriguez-Castro, [email protected] MartinarXiv:1312.4704v1 [cs.DL] 17 Dec 2013 Hepp, [email protected] E-Business and Web Science Research Group, Universit¨atder Bundeswehr M¨unchen Werner-Heisenberg-Weg 39, D-85579 Neubiberg, Germany RDF Translator: A RESTful Multi-Format Data Converter for the Semantic Web Alex Stolz, Bene Rodriguez-Castro, and Martin Hepp E-Business and Web Science Research Group, Universit¨atder Bundeswehr M¨unchen Werner-Heisenberg-Weg 39, D-85579 Neubiberg, Germany falex.stolz,bene.rodriguez,[email protected] Abstract. The interdisciplinary nature of the Semantic Web and the many projects put forward by the community led to a large number of widely accepted serialization formats for RDF. Most of these RDF syn- taxes have been developed out of a necessity to serve specific purposes better than existing ones, e.g. RDFa was proposed as an extension to HTML for embedding non-intrusive RDF statements in human-readable documents. Nonetheless, the RDF serialization formats are generally transducible among themselves given that they are commonly based on the RDF model. In this paper, we present (1) a RESTful Web service based on the HTTP protocol that translates between different serializa- tions. In addition to its core functionality, our proposed solution provides (2) features to accommodate frequent needs of Semantic Web developers, namely a straightforward user interface with copy-to-clipboard function- ality, syntax highlighting, persistent URI links for easy sharing, cool URI patterns, and content negotiation using respective HTTP headers.
    [Show full text]
  • A Guide to Living Standards Measurement Study Surveys and Their Data Sets
    L S/Iv Z) Living Standards Measurement Study Working Paper No. 120 Public Disclosure Authorized A Guide to Living Standards Measurement Study Surveys and Their Data Sets Margaret E. GrOsh . Paul Glew-- Public Disclosure Authorized :.S:~~~~~~~:Vk Public Disclosure Authorized v4S''' 8 'A_; * ~~~~AR. _ ~ ~., . Public Disclosure Authorized LSMS Working Papers No. 48 Glewwe and van der Gaag, Con!frontingPoverty in Developing Countries: Definitions, Information, and Policies No. 49 Scott and Amenuvegbe, Sample Designsfor the Living Standards Surveys in Ghana and Mauritania/Plans de sondage pour les enquetes sur le niveau de vie an Ghana et en Mauritanie No. 50 Laraki, Food Subsidies: A Case Study of Price Reform in Morocco (also in French, 50F) No. 51 Strauss and Mehra, Child Anthropometry in Cote d'lvoire: Estimatesfrom Two Survezys, 1985 and 1986 No. 52 van der Gaag, Stelcner, and Vijverberg, Public-Private Sector Wage Comparisons and Moonlightinig in Developing Coutntries: Evidence from Cote d'lvoire and Peru No. 53 Ainsworth, Socioeconomic Determinants of Fertility in Cote d'lvoire No. 54 Gertler and Glewwe, The Willingness to Payfor Education in Developing Countries: Evidence frm Rural Penr No. 55 Levy and Newman, Rigidite des salaires: Donnees microeconomiques et macroeconomiquessur l'ajustement du marcWedu travail dans le secteur moderne (in French only) No. 56 Glewwe and de Tray, The Poor in Latin America during Adjustment: A Case Study of Peru No. 57 Alderman and Gertler, The Substitutability of Public and Private Health Carefor the Treatment of Children in Pakistan No. 58 Rosenhouse, ldentifying the Poor: Is "Headship" a Usefiul Concept? No. 59 Vijverberg, Labor Market Performance as a Determinant of Migration No.
    [Show full text]
  • Guidelines for Publishing Structured Metadata on the Web
    Draft Final Report Research Data Alliance Research Metadata Schemas Working Group Guidelines for publishing structured metadata on the web V1.0 (authors, to be added in the final version) 1 Table of Contents Executive Summary ................................................................................................................... 1 Terminology ............................................................................................................................... 2 1. Introduction ......................................................................................................................... 4 2. Process to publish structured metadata .............................................................................. 6 3. Data model ......................................................................................................................... 7 4. Recommendations .............................................................................................................. 9 Recommendation 1: Identify the purpose of your markup (or why you want to markup your data) ....................................................................................................................................... 9 Recommendation 2: Identify what resource objects are to be marked up with structured data ..............................................................................................................................................10 Recommendation 3: Define which metadata schema and vocabularies to be used for
    [Show full text]