S-DWH Manual Chapter: 5 “ Meta Data” Version: Author

Total Page:16

File Type:pdf, Size:1020Kb

S-DWH Manual Chapter: 5 “ Meta Data” Version: Author in partnership with Title: S-DWH Manual Chapter: 5 “ Meta data” Version: Author: Date: NSI: 1.1 CoE DWH Feb 2017 1 Handbook to set up a S-DWH: Roadmap for a Design phase Content 1 Metadata ......................................................................................................................................... 2 1.1 Fundamental principles .......................................................................................................... 4 1.1.1 Metadata and data basic definitions .............................................................................. 4 1.1.2 Categories ....................................................................................................................... 4 1.1.2.1 Passive or Active category dimension ......................................................................... 5 1.1.2.2 Formalized or Free-form category dimension ............................................................ 5 1.1.2.3 Reference or Structural category dimension .............................................................. 6 1.1.3 Metadata subsets ............................................................................................................ 6 1.1.3.1 Statistical metadata .................................................................................................... 7 1.1.3.2 Process metadata ........................................................................................................ 7 1.1.3.3 Quality metadata ........................................................................................................ 7 1.1.3.4 Technical metadata ..................................................................................................... 7 1.1.3.5 Authorization metadata .............................................................................................. 8 1.1.3.6 Data models ................................................................................................................ 8 1.1.4 Metadata architecture .................................................................................................... 8 1.2 Business Architecture: metadata .......................................................................................... 10 1.2.1 Preparatory work - Specify needs (Phase I of GSBPM) ................................................. 10 1.2.2 Preparatory work – Design (Phase 2 of GSBPM) .......................................................... 10 1.2.3 Preparatory work - Build (Phase 3 of GSBPM) .............................................................. 12 1.2.4 Critical area ................................................................................................................... 13 1.2.5 Metadata of the S-DWH layers ..................................................................................... 14 1.2.5.1 Source layer metadata .............................................................................................. 15 1.2.5.2 Integration layer metadata ....................................................................................... 15 1.2.5.3 Interpretation and data analysis layer metadata ..................................................... 16 1.2.5.4 Data access layer metadata ...................................................................................... 16 1.2.5.5 Summary of S-DWH layers and metadata categories ............................................... 17 1.3 Metadata System .................................................................................................................. 19 1.3.1 Metadata model ........................................................................................................... 20 1.3.1.1 Metadata model, general references ....................................................................... 20 1.3.1.2 Metadata models guidelines ..................................................................................... 22 1.3.2 Metadata functionality groups ..................................................................................... 22 1.3.2.1 Metadata creation .................................................................................................... 23 1.3.2.2 Metadata usage ........................................................................................................ 23 1.3.2.3 Metadata maintenance ............................................................................................. 24 1.3.2.4 Metadata evaluation ................................................................................................. 24 1.3.3 Metadata functionalities by layers: Source layer ......................................................... 24 1.3.4 Metadata functionalities by layers: Integration layer................................................... 25 1.3.5 Metadata functionalities by layers: Interpretation and data analysis layer ................. 26 1.3.6 Metadata functionalities by layers: Data access layer .................................................. 26 1.4 Metadata and SDMX ............................................................................................................. 30 1.4.1 The SDMX standard ....................................................................................................... 30 1.4.2 Structural metadata ...................................................................................................... 30 1.4.3 Reference Metadata ..................................................................................................... 30 1.4.4 Content Oriented Guidelines ........................................................................................ 33 1.4.5 SDMX metadata within the S-DWH layers .................................................................... 33 1 1 Metadata Metadata are data which describe other data. When building and maintaining a S-DWH, the following types of metadata play significant roles: . active metadata – the amount of objects (variables, value domains, etc.) stored makes it necessary to provide the users (persons and software) with active assistance finding and processing the data; . formalized metadata – the amount of metadata items will be large and the requirement for metadata to be active makes it necessary to structure the metadata very well; . structural metadata - active metadata must be structural, at least to some part; . process metadata - since the data warehouse supports many concurrent users it is very important to keep track of usage, performance, etc. In a data warehouse that has been less than perfectly designed one user’s choice of tool or operation could impair the performance for other users. An analysis of process metadata can be an input to correcting this anomaly. The table below shows the possible combinations of metadata categories and subsets. In the cells are indicated which combinations are of general interest for statistics production (“gen”) and which ones are of particular interest for a S-DWH (“sdw”). Most of the remaining combinations are possible, but less common or less likely to be useful. Metadata Metadata category subset Formalized Free-form Reference Structural Reference Structural Act Pas Act Pas Act Pas Act Pas Statistical sdw gen Process sdw sdw sdw gen gen Quality sdw gen Technical sdw Authorization gen Data model sdw sdw Metadata categories and subsets Consistency within the metadata layer is an example of an attribute regarded as desirable in any statistics production environment, but that is considered essential in a S-DWH environment. In a S- DWH, all metadata items must be uniquely identified and there must be one-to-one relationships between identity and definition, and identity and name. The concept “statistical unit”, for example, must be given an identity and a definition, and these must be consistently used in the S-DWH regardless of source, context, etc. If there will be a need for a slightly different definition, it must be given a new identity and a new name. 2 In the S-DWH it is desirable to be able to analyze data by time series at a low level of aggregation, or even to perform longitudinal analysis at unit level. To support these functions, metadata items should have validity information: “valid from 01-01-2001”, “valid until 31-12-2015”. In order to be metadata driven the S-DWH has higher demands for process metadata, and it is more likely to have a built-in ability to produce process metadata. The S-DWH is not only a data store, but it is also a system of processes to refine its data from input to output. These processes need active metadata: automated processes need formalized process metadata, such as programs, parameters, etc., and manual processes need process metadata such as instructions, scripts, etc. 3 1.1 Fundamental principles In order to use metadata in a S-DWH, basic definitions and common terminology need to be agreed. This section covers: . basic definitions . categories . subsets . architecture 1.1.1 Metadata and data basic definitions General definitions of metadata can be found in many manuals. Most of them are very short and simple. The most commonly used generic definition states that “Metadata are data about data” but more precise definition states: [Def 1.1] Metadata is data that defines and describes other data.1 This definition will obviously cover all kinds of documentation which refer to any type of data in a data store. In context of S-DWH we use statistical metadata which is applicable to metadata that refer to data stored in a S-DWH. [Def 1.2]
Recommended publications
  • Metadata for Semantic and Social Applications
    etadata is a key aspect of our evolving infrastructure for information management, social computing, and scientific collaboration. DC-2008M will focus on metadata challenges, solutions, and innovation in initiatives and activities underlying semantic and social applications. Metadata is part of the fabric of social computing, which includes the use of wikis, blogs, and tagging for collaboration and participation. Metadata also underlies the development of semantic applications, and the Semantic Web — the representation and integration of multimedia knowledge structures on the basis of semantic models. These two trends flow together in applications such as Wikipedia, where authors collectively create structured information that can be extracted and used to enhance access to and use of information sources. Recent discussion has focused on how existing bibliographic standards can be expressed as Semantic Metadata for Web vocabularies to facilitate the ingration of library and cultural heritage data with other types of data. Harnessing the efforts of content providers and end-users to link, tag, edit, and describe their Semantic and information in interoperable ways (”participatory metadata”) is a key step towards providing knowledge environments that are scalable, self-correcting, and evolvable. Social Applications DC-2008 will explore conceptual and practical issues in the development and deployment of semantic and social applications to meet the needs of specific communities of practice. Edited by Jane Greenberg and Wolfgang Klas DC-2008
    [Show full text]
  • A Metadata Registry for Metadata Interoperability
    Data Science Journal, Volume 6, Supplement, 8 July 2007 A METADATA REGISTRY FOR METADATA INTEROPERABILITY Jian-hui Li *, Jia-xin Gao, Ji-nong Dong, Wei Wu, and Yan-fei Hou Computer Network Information Center, Chinese Academy of Sciences *Email: [email protected] ABSTRACT In order to use distributed and heterogeneous scientific databases effectively, semantic heterogeneities have to be detected and resolved. To solve this problem, we propose architecture for managing metadata and metadata schema using a metadata registry. A metadata registry is a place to keep facts about characteristics of data that are necessary for data sharing and exchange in a specific domain. This paper will explore the role of metadata registries and describe some of the experiences of implementing the registry. Keywords: Metadata, Metadata Registry, Interoperability, Crosswalk, Application Profile 1 INTRODUCTION Users and applications can easily find, locate, access, and use distributed and heterogeneous scientific databases with the help of metadata. Metadata are especially important for open access to and sharing of scientific data and databases. Different domains, however, will develop or follow different metadata specifications; even the same domain develops different metadata application profiles based on the same specifications according to their special requirements. Consequently, interoperability of metadata is a major issue for scientific data sharing and exchanging. Metadata Registry is a key solution to solve this problem. The DESIRE (Heery, Gardner, Day, & Patel, 2000), SCHEMAS (UKOLN, SCHEMAS, 2003), and CORES (UKOLN, CORES, 2003) projects are successful examples. Based on the requirements of the scientific databases of the Chinese Academy of Sciences and the Basic Scientific Data Sharing Network, one project of the National Scientific Data Sharing Program, we have designed and developed a metadata registry – the Scientific Database Metadata Registry (SDBMR).
    [Show full text]
  • 1 1. Opening Page Good Morning Ladies and Gentlemen My Name Is
    1. Opening page Good morning ladies and gentlemen My name is stephen machin I currently work As a data management consultant in mons in belgium Thank you very much for allowing me to speak to you today And thanks for coming along to listen I hope that you will find the presentation informative The purpose of this presentation Is to resolve the ambiguous term metadata Into its constituent concepts Having separated out the concepts We then provide Unambiguous and "systematic" definitions for each And then we give an exact description of the nature of the relationships between them with an equation which emphasises the separation of the concepts but also shows how they are tightly bound together 1 2. Metadata Metadata is a confused and ambiguous concept. several authors have remarked upon this One even goes so far as to say that the word No longer has any meaning Iso 11179 A much quoted metadata standard Says that the word has evolved and no longer has its old traditional meaning and that it now also refers to many other things but the standard explicitly limits it scope to the traditional sense of the word and this means that these things are not described 2 3. 11179: metadata = DE = container If we take a close look at The specifications for iso 11179 "The Metadata registry“ we note that for at least 7 years between 1994 and 2001 In its original formulation It referred to itself as the "data element" registry And the standard describes a data element as being a "container for data" and so ISO 11179 describes a metadata registry a data element registry a data container registry 3 4.
    [Show full text]
  • Reference Architecture for Space Information Management
    Report Concerning Space Data System Standards REFERENCE ARCHITECTURE FOR SPACE INFORMATION MANAGEMENT INFORMATIONAL REPORT CCSDS 312.0-G-1 GREEN BOOK March 2013 Report Concerning Space Data System Standards REFERENCE ARCHITECTURE FOR SPACE INFORMATION MANAGEMENT INFORMATIONAL REPORT CCSDS 312.0-G-1 GREEN BOOK March 2013 REPORT CONCERNING REFERENCE ARCHITECTURE FOR SPACE INFORMATION MANAGEMENT AUTHORITY Issue: Green Book, Issue 1 Date: March 2013 Location: Washington, DC, USA This document has been approved for publication by the Management Council of the Consultative Committee for Space Data Systems (CCSDS) and reflects the consensus of technical working group experts from CCSDS Member Agencies. The procedure for review and authorization of CCSDS Reports is detailed in Organization and Processes for the Consultative Committee for Space Data Systems. This document is published and maintained by: CCSDS Secretariat Space Communications and Navigation Office, 7L70 Space Operations Mission Directorate NASA Headquarters Washington, DC 20546-0001, USA CCSDS 312.0-G-1 Page i March 2013 REPORT CONCERNING REFERENCE ARCHITECTURE FOR SPACE INFORMATION MANAGEMENT FOREWORD Through the process of normal evolution, it is expected that expansion, deletion, or modification of this document may occur. This Report is therefore subject to CCSDS document management and change control procedures, which are defined in Organization and Processes for the Consultative Committee for Space Data Systems (CCSDS A02.1-Y-3). Current versions of CCSDS documents are maintained at the CCSDS Web site: http://www.ccsds.org/ Questions relating to the contents or status of this document should be addressed to the CCSDS Secretariat at the address indicated on page i. CCSDS 312.0-G-1 Page ii March 2013 REPORT CONCERNING REFERENCE ARCHITECTURE FOR SPACE INFORMATION MANAGEMENT At time of publication, the active Member and Observer Agencies of the CCSDS were: Member Agencies – Agenzia Spaziale Italiana (ASI)/Italy.
    [Show full text]
  • Metadata Schema Registries in the Partially Semantic Web: the CORES Experience
    Metadata schema registries in the partially Semantic Web: the CORES experience Rachel Heery, Pete Johnston UKOLN, University of Bath, UK {r.heery, p.johnston}@ukoln.ac.uk Csaba Fülöp, András Micsik Computer and Automation Research Institute of the Hungarian Academy of Sciences (SZTAKI), Hungary {csabi, micsik}@dsd.sztaki.hu Abstract Increasingly, as the digital library becomes embedded in the wider sphere of e-Learning and e-Science, implementers are The CORES metadata schemas registry is designed to challenged to manage interworking systems based on enable users to discover and navigate metadata element different metadata standards. CORES envisages a network sets. The paper reflects on some of the experiences of of schema registries supporting the discovery and implementing the registry, and examines some of the issues navigation of core element sets. By 'declaring' such element of promoting such services in the context of a "partially sets in structured schemas and making those schemas Semantic Web" where metadata applications are evolving available to navigable registries, their owners make them and many have not yet adopted the RDF model. accessible to other users who can find and re-use either a Keywords: metadata schema registries, RDF, XML, whole element set or the component data elements, or even Semantic Web. a particular localisation of the element set captured as an 'application profile' [3]. If schemas can be located easily, implementers will be encouraged to re-use existing work, 1. Introduction and to take a common approach to the naming and identification of data elements. The CORES project has explored the potential for In order to enable such core element sets to be shared, supporting the creation and re-use of metadata schemas there needs to be a common model for identifying data using Semantic Web technology [1].
    [Show full text]
  • 2 Data Development Overview
    2 Data development overview This chapter provides an overview of data development and introduces the key components, such as data, information, data elements, metadata, data standards and their relationships. The importance of data standards to data development is explained and the relationship between terminology and data standards is discussed. 2.1 What is data? Data are representations of real world facts, concepts or instructions in a formalised manner suitable for communication, interpretation or processing by human beings or automatic means (Standards Australia 2005). Data relates to events, people, transactions and facts. For example, some of the data collected when a person buys products at a supermarket include: • cash register identifier (id) (for example, 123) • cashier identifier (id) (for example Z456) • item description (for example, apple juice, jam, bread, coffee, milk) • item identifier (id) (for example, X123) • item unit price (for example, $1.20) • quantity (for example, 2) • total cost (for example, $10.30) • date of service (for example, 26.10.2005) • time of service (for example, 14:30) • payment method (for example, cash, credit card, cheque). 2.2 What is information? Information is data that are interpreted, organised and structured in such a way as to be meaningful to the person who receives it (Standards Australia 2005). At the point of service delivery, data about items purchased by a customer in a supermarket are converted into information and provided to the customer in the form of a receipt. The same data would also be useful to the supermarket manager. For example, information in the form of a report showing total sales in the day and the best-selling products would help with inventory control.
    [Show full text]
  • Metadata Registry, Iso/Iec 11179
    LLNL-JRNL-400269 METADATA REGISTRY, ISO/IEC 11179 R. K. Pon, D. J. Buttler January 7, 2008 Encyclopedia of Database Systems Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. METADATA REGISTRY, ISO/IEC 11179 Raymond K. Pon UC Los Angeles, http://www.cs.ucla.edu/~rpon David J. Buttler Lawrence Livermore National Laboratory, http://people.llnl.gov/buttler1 This work (LLNL-JRNL-400269) was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. SYNONYMS Metadata Repository, MDR DEFINITION ISO/IEC-11179 [1] is an international standard that documents the standardization and registration of metadata to make data understandable and shareable. This standardization and registration allows for easier locating, retrieving, and transmitting data from disparate databases.
    [Show full text]
  • Metadata Standards and Metadata Registries: an Overview
    METADATA STANDARDS AND METADATA REGISTRIES: AN OVERVIEW Bruce E. Bargmeyer, Environmental Protection Agency, and Daniel W. Gillman, Bureau of Labor Statistics Daniel W. Gillman, Bureau of Labor Statistics, Washington, DC 20212 [email protected] ABSTRACT Much work is being accomplished in the national and international standards communities to reach consensus on standardizing metadata and registries for organizing that metadata. This work has had a large impact on efforts to build metadata systems in the statistical community. Descriptions of several metadata standards and their importance to statistical agencies are provided. Applications of the standards at the Census Bureau, Environmental Protection Agency, Bureau of Labor Statistics, Statistics Canada, and many others are provided as well, with an emphasis on the impact a metadata registry can have in a statistical agency. Standards and registries based on these standards help promote interoperability between organizations, systems, and people. Registries are vehicles for collecting, managing, comparing, reusing, and disseminating the designs, specifications, procedures, and outputs of systems, e.g., statistical surveys. These concepts are explained in the paper. Key Words: Data Quality, Data Management 1. INTRODUCTION Metadata is loosely defined as data about data. Though this definition is cute and easy to remember, it is not very precise. Its strength is in recognizing that metadata is data. As such, metadata can be stored and managed in a database, often called a registry or repository. However, it is impossible to identify metadata just by looking at it. We don't know when data is metadata or just data. Metadata is data that is used to describe other data, so the usage turns it into metadata.
    [Show full text]
  • Metadata and Paradata: Information Collection and Potential Initiatives
    Metadata and Paradata Institute of Education Sciences National Center for Education Statistics national institute OF statistical sciences Expert Panel report METADATA AND PARADATA: INFORMATION COLLECTION AND POTENTIAL INITIATIVES National Institute of Statistical Sciences Expert Panel Report November 2010 1 Metadata and Paradata TABLE OF CONTENTS _ Executive Summary ......................................................................................................................................... 3 Preface ............................................................................................................................................................. 5 Background ...................................................................................................................................................... 6 I. Terminology ............................................................................................................................................. 6 II. Review of Metadata Resources ................................................................................................................ 9 III. Specific Efforts in Other Countries ......................................................................................................... 13 IV. Specific Initiatives in the US Government .............................................................................................. 14 V. Survey of ICSP Websites ........................................................................................................................
    [Show full text]
  • Ebxml Manager Composite Application User's Guide
    ebXML Manager Composite Application User’s Guide Release 5.0.5 SeeBeyond Proprietary and Confidential The information contained in this document is subject to change and is updated periodically to reflect changes to the applicable software. Although every effort has been made to ensure the accuracy of this document, SeeBeyond Technology Corporation (SeeBeyond) assumes no responsibility for any errors that may appear herein. The software described in this document is furnished under a License Agreement and may be used or copied only in accordance with the terms of such License Agreement. Printing, copying, or reproducing this document in any fashion is prohibited except in accordance with the License Agreement. The contents of this document are designated as being confidential and proprietary; are considered to be trade secrets of SeeBeyond; and may be used only in accordance with the License Agreement, as protected and enforceable by law. SeeBeyond assumes no responsibility for the use or reliability of its software on platforms that are not supported by SeeBeyond. SeeBeyond, e*Gate, e*Way, and e*Xchange are the registered trademarks of SeeBeyond Technology Corporation in the United States and/or select foreign countries. The SeeBeyond logo, SeeBeyond Integrated Composite Application Network Suite, eGate, eWay, eInsight, eVision, eXchange, eView, eIndex, eTL, ePortal, eBAM, and e*Insight are trademarks of SeeBeyond Technology Corporation. The absence of a trademark from this list does not constitute a waiver of SeeBeyond Technology Corporation’s intellectual property rights concerning that trademark. This document may contain references to other company, brand, and product names. These company, brand, and product names are used herein for identification purposes only and may be the trademarks of their respective owners.
    [Show full text]
  • Semantic Technologies I OMG Ontology Definition Metamodel
    Arbeitsgruppe Semantic Business Process Management Lecture 5 – Semantic Technologies I OMG Ontology Definition Metamodel Prof. Dr. Adrian Paschke Corporate Semantic Web (AG-CSW) Institute for Computer Science, Freie Universitaet Berlin [email protected] http://www.inf.fu-berlin.de/groups/ag-csw/ Problem: Only Syntactic BPM Models Lacks of Web Service Technology . Current BPM technologies allow usage of Web Services . But: . only syntactical information descriptions . syntactic support for discovery, composition and execution => Web Service usability, usage, and integration needs to be inspected manually . no semantically marked up content / services . no support for the Semantic Web rules and ontologies => current Web Service Technology Stack failed to realize the promise of Web Services Overview . Overview Semantic Technologies . Ontologies . OMG Ontology Definition Metamodel . W3C Web Ontology Language . Rules . OMG SBVR . OMG PRR . W3C RIF . RuleML Semantic Computing Technologies 4. Software Agents and Web-based Services . Rule Responder, FIPA, Semantic Web Services, … 3. Rules and Event/Action Logic & Inference . RIF, SBVR, PRR, RuleML, Logic Programming Rule/Inference Engines,… 2. Ontologien . ODM, CL, Topic Maps RDFS, OWL Lite|DL|Full, OWL 2, … 1. Explicit Meta-data and Terminologies . vCard, PICS, Dublin Core, RDF, RDFa, Micro Formats, FOAF, SIOC … 1. Explicit Metadata on the Web . Metadata are data about data . Metadata on the Web: . Machine processable information about information on the Web . Projects . e.g., PICS, Dublin Core, RDF, FOAF, SIOC, … . Problem domains: . Syntax: . Which representation and interchange format for metadata? . Semantics: . Which metadata are allowed for resources (metadata vocabulary, schema) . Association problem: . How to connect metadata with resources (who defines the metadata, are metadata separated from the content, etc.) 2.
    [Show full text]
  • Metadata Standards & Applications
    Cataloging for the 21st Century -- Course 2 Metadata Standards & Applications Trainee Manual Original course design by Diane I. Hillmann Cornell University Library Revised by Rebecca Guenther and Allene Hayes, Library of Congress For The Library of Congress And the Association for Library Collections & Technical Services Washington, DC August 2008 THIS PAGE INTENTIONALLY LEFT BLANK FOR DOUBLE SIDED COPY Trainee Manual Course Outline Metadata Standards and Applications Outline 1. Introduction to Digital Libraries and Metadata • Discuss similarities and differences between traditional and digital libraries • Understand how the environment where metadata is developing is different from the library automation environment • Explore different types and functions of metadata (administrative, technical, administrative, etc.) Exercise: Examine three digital library instances, discuss differences in user approach and experience, and look for examples of metadata use 2. Descriptive Metadata Standards • Understand the categories of descriptive metadata standards (e.g., data content standards, data value standards, data structure standards, relationship models) • Learn about the various descriptive metadata standards and the communities that use them • Evaluate the efficacy of a standard for a particular community • Understand how relationship models are used Exercise: Create a brief descriptive metadata record using the standard assigned. 3. Technical and Administrative Metadata Standards • Understand the different types of administrative metadata
    [Show full text]