S-DWH Manual Chapter: 5 “ Meta Data” Version: Author
Total Page:16
File Type:pdf, Size:1020Kb
in partnership with Title: S-DWH Manual Chapter: 5 “ Meta data” Version: Author: Date: NSI: 1.1 CoE DWH Feb 2017 1 Handbook to set up a S-DWH: Roadmap for a Design phase Content 1 Metadata ......................................................................................................................................... 2 1.1 Fundamental principles .......................................................................................................... 4 1.1.1 Metadata and data basic definitions .............................................................................. 4 1.1.2 Categories ....................................................................................................................... 4 1.1.2.1 Passive or Active category dimension ......................................................................... 5 1.1.2.2 Formalized or Free-form category dimension ............................................................ 5 1.1.2.3 Reference or Structural category dimension .............................................................. 6 1.1.3 Metadata subsets ............................................................................................................ 6 1.1.3.1 Statistical metadata .................................................................................................... 7 1.1.3.2 Process metadata ........................................................................................................ 7 1.1.3.3 Quality metadata ........................................................................................................ 7 1.1.3.4 Technical metadata ..................................................................................................... 7 1.1.3.5 Authorization metadata .............................................................................................. 8 1.1.3.6 Data models ................................................................................................................ 8 1.1.4 Metadata architecture .................................................................................................... 8 1.2 Business Architecture: metadata .......................................................................................... 10 1.2.1 Preparatory work - Specify needs (Phase I of GSBPM) ................................................. 10 1.2.2 Preparatory work – Design (Phase 2 of GSBPM) .......................................................... 10 1.2.3 Preparatory work - Build (Phase 3 of GSBPM) .............................................................. 12 1.2.4 Critical area ................................................................................................................... 13 1.2.5 Metadata of the S-DWH layers ..................................................................................... 14 1.2.5.1 Source layer metadata .............................................................................................. 15 1.2.5.2 Integration layer metadata ....................................................................................... 15 1.2.5.3 Interpretation and data analysis layer metadata ..................................................... 16 1.2.5.4 Data access layer metadata ...................................................................................... 16 1.2.5.5 Summary of S-DWH layers and metadata categories ............................................... 17 1.3 Metadata System .................................................................................................................. 19 1.3.1 Metadata model ........................................................................................................... 20 1.3.1.1 Metadata model, general references ....................................................................... 20 1.3.1.2 Metadata models guidelines ..................................................................................... 22 1.3.2 Metadata functionality groups ..................................................................................... 22 1.3.2.1 Metadata creation .................................................................................................... 23 1.3.2.2 Metadata usage ........................................................................................................ 23 1.3.2.3 Metadata maintenance ............................................................................................. 24 1.3.2.4 Metadata evaluation ................................................................................................. 24 1.3.3 Metadata functionalities by layers: Source layer ......................................................... 24 1.3.4 Metadata functionalities by layers: Integration layer................................................... 25 1.3.5 Metadata functionalities by layers: Interpretation and data analysis layer ................. 26 1.3.6 Metadata functionalities by layers: Data access layer .................................................. 26 1.4 Metadata and SDMX ............................................................................................................. 30 1.4.1 The SDMX standard ....................................................................................................... 30 1.4.2 Structural metadata ...................................................................................................... 30 1.4.3 Reference Metadata ..................................................................................................... 30 1.4.4 Content Oriented Guidelines ........................................................................................ 33 1.4.5 SDMX metadata within the S-DWH layers .................................................................... 33 1 1 Metadata Metadata are data which describe other data. When building and maintaining a S-DWH, the following types of metadata play significant roles: . active metadata – the amount of objects (variables, value domains, etc.) stored makes it necessary to provide the users (persons and software) with active assistance finding and processing the data; . formalized metadata – the amount of metadata items will be large and the requirement for metadata to be active makes it necessary to structure the metadata very well; . structural metadata - active metadata must be structural, at least to some part; . process metadata - since the data warehouse supports many concurrent users it is very important to keep track of usage, performance, etc. In a data warehouse that has been less than perfectly designed one user’s choice of tool or operation could impair the performance for other users. An analysis of process metadata can be an input to correcting this anomaly. The table below shows the possible combinations of metadata categories and subsets. In the cells are indicated which combinations are of general interest for statistics production (“gen”) and which ones are of particular interest for a S-DWH (“sdw”). Most of the remaining combinations are possible, but less common or less likely to be useful. Metadata Metadata category subset Formalized Free-form Reference Structural Reference Structural Act Pas Act Pas Act Pas Act Pas Statistical sdw gen Process sdw sdw sdw gen gen Quality sdw gen Technical sdw Authorization gen Data model sdw sdw Metadata categories and subsets Consistency within the metadata layer is an example of an attribute regarded as desirable in any statistics production environment, but that is considered essential in a S-DWH environment. In a S- DWH, all metadata items must be uniquely identified and there must be one-to-one relationships between identity and definition, and identity and name. The concept “statistical unit”, for example, must be given an identity and a definition, and these must be consistently used in the S-DWH regardless of source, context, etc. If there will be a need for a slightly different definition, it must be given a new identity and a new name. 2 In the S-DWH it is desirable to be able to analyze data by time series at a low level of aggregation, or even to perform longitudinal analysis at unit level. To support these functions, metadata items should have validity information: “valid from 01-01-2001”, “valid until 31-12-2015”. In order to be metadata driven the S-DWH has higher demands for process metadata, and it is more likely to have a built-in ability to produce process metadata. The S-DWH is not only a data store, but it is also a system of processes to refine its data from input to output. These processes need active metadata: automated processes need formalized process metadata, such as programs, parameters, etc., and manual processes need process metadata such as instructions, scripts, etc. 3 1.1 Fundamental principles In order to use metadata in a S-DWH, basic definitions and common terminology need to be agreed. This section covers: . basic definitions . categories . subsets . architecture 1.1.1 Metadata and data basic definitions General definitions of metadata can be found in many manuals. Most of them are very short and simple. The most commonly used generic definition states that “Metadata are data about data” but more precise definition states: [Def 1.1] Metadata is data that defines and describes other data.1 This definition will obviously cover all kinds of documentation which refer to any type of data in a data store. In context of S-DWH we use statistical metadata which is applicable to metadata that refer to data stored in a S-DWH. [Def 1.2]