A Generic Data Model for Statistical Indicators and Measurement Units to Enable User-Specific Representation Formats

Total Page:16

File Type:pdf, Size:1020Kb

A Generic Data Model for Statistical Indicators and Measurement Units to Enable User-Specific Representation Formats A Generic Data Model for Statistical Indicators and Measurement Units to Enable User-Specific Representation Formats Michaela Denk1, Wilfried Grossmann2 1International Monetary Fund, Statistics Department*, e-mail: [email protected] 2University of Vienna, Institute for Scientific Computing, e-mail: [email protected] Abstract Based on a review of existing standards and guidelines as well as the current international practice of modeling measurement units and related concepts in representation of economic and statistical data, a generic data model for statistical indicators and measurement units is introduced that may contribute to the further development of the SDMX content oriented guidelines in terms of harmonizing cross-domain concepts such as measurement units. Examples from databases of SDMX sponsor organizations are used to demonstrate the applicability of the proposed data model to a broad range of statistical indicators and its ability to serve as a basis for the creation of user-specific data representation formats. This is achieved through customizable combinations of the structural elements of the generic data model according to user requirements and illustrates the practical relevance of the presented ideas. Keywords: Metadata, SDMX, semantic decomposition. 1. Introduction In data exchange the concept of indicator plays a crucial role. Proper understanding of the indicator depends on knowledge of how the indicator was calculated and what kinds of units are used for measurement. The SDMX Content Oriented Guidelines (COG) (2009) can be regarded as the most prominent current effort focusing on the harmonization of cross-domain concepts for data exchange. The guidelines recommend practices for creating interoperable data and metadata sets using the SDMX technical standards with the intent of generic applicability across subject-matter domains. In the Metadata Common Vocabulary (Annex 4 of SDMX COG) five cross-domain concepts are described for exchange of indicators and unit of measure. The (statistical) indicator itself is defined as item 331 by “A data element that represents statistical data for a specified * The views expressed herein are those of the authors and should not be attributed to the IMF, its Executive Board, or its management. time, place, and other characteristics, and is corrected for at least one dimension (usually size) to allow for meaningful comparisons”. Four items in SDMX COG are related to measurement unit, viz. unit of measure (item 384), adjustment (5; included in unit or indicator in many statistical databases as illustrated in Denk and Grossmann (2010)), base period (19; relevant for the interpretation of index data, series at real terms, or changes with respect to a certain period), and unit multiplier (382; specifies the exponent to the basis 10 observation values were divided by, usually for presentation purposes). From the 14 datasets investigated by Denk and Grossmann and Froeschl (2010), four do not separate the economic indicator from the measurement unit or do not provide the unit information at all, whereas four other datasets even split other concepts such as unit multiplier or adjustment method from the unit. The other six databases separate unit of measure from economic indicator. A broad variety of unit types is used such as index, count, ratio, rate, percentage, or changes. The cases with a single, mixed dimension at least combine information on measured (economic) indicator, type of unit, unit of measure, adjustment method, and frequency. Several examples (e.g. "Personal computers" or "Youth unemployment rate, aged 15-24, men") omit the unit information completely, assuming that it is obvious from the indicator used. On the other hand, observe that these indicators give information about the underlying population to which the measured concept refers. This shows that in contrast to the first impression one may have (viz. that this problem is a rather easy one that was resolved a long time ago, e.g. by the International System of Units), the analysis showed that the current international practice is very diverse and that neither the recommendations provided in SDMX COG (2009) nor more general measurement unit codification systems have already been adopted by the statistical organizations investigated. One reason may be that the units for indicators are in some sense simple; besides monetary units dimensionless units like percent or counts dominate in applications. What makes the usage of such units difficult is the fact that the measurement instruments are rather complicated (consider for example change in GDP over years), and many times the computation of the indicator provides little information about the used unit. Hence, it is not surprising that the main issue in the analyzed databases is that the measurement unit dimension in the data models used does not represent a "pure" unit of measure. Even SDMX recommendations do not treat all of these components separately. The first step in harmonizing the structure and content of measurement units as currently used by statistical organizations is the identification of their basic building blocks and of relations between these building blocks. Denk and Grossmann (2010) proposed a generic model for the semantic decomposition into four components, viz. Indicator, Measurement, Adjustment, and Reference. The present paper further develops the proposed model by refining it with respect to three features: (i) introduction of a "Family" concept for indicator and unit of measurement that allows grouping of indicators and units into families that share the phenomena they are destined to measure and, if applicable, the derivation method they were obtained from; (ii) generalization of the concepts of unit multiplier, adjustment, and reference which seems necessary for covering complex indicators; (iii) inclusion of additional standard dimensions required to define the meaning of any statistical figure, such as geographical and temporal reference and measurement conditions. The paper is structured as follows. Following some introductory remarks and basic definitions, the extended generic data model is presented in section 2. Section 3 illustrates the application of the model by means of examples, primarily from economic statistics. The derivation of customized data representation formats based on the needs of data consumers is described in section 4. Finally, the paper provides some concluding remarks and an outlook on future work. 2. A Generic Metadata Model Starting point of our development is a look at other institutions aiming for standardization of measurement units. Two prominent examples are the International System of Units (SI) and the Unified Code for Units of Measurement (UCUM) which mainly cover measurement in physical sciences. These systems show a number of features that are of interest for standardization of measurement in case of international statistics as well. The most important feature is a unit typology with the fundamental distinction between base units and units derived from base units by mathematical formulas. Another important feature is the idea of using a prefix notation corresponding to the idea of unit multiplier in SDMX. Besides these two major types of units UCUM also considers in section 3.2 (§24 - §26) so-called arbitrary or procedure defined units defined as “units whose meaning entirely depends on the measurement procedure that are not related to any UCUM or SI unit but completely depend on the measurement procedure”, and in section 4.2 (§34 - §42) customary units that correspond roughly to the idea of local and traditional usage of alternative measurement systems for quantities that can be measured by UCUM or SI (base) units. Customary units are grouped into some common families defined according to the corresponding standard unit (for example units for length like inch, foot or yard). Looking at statistical applications in the examples we can conclude that in the sense of UCUM there are two types of important applications: arbitrary units using in a strict sense a dimensionless unit like percent or count with proper prefixes or multipliers (millions, thousands etc.) and monetary units, which can be interpreted as customary units in a local setting of a fictitious or virtual universal currency unit. Moreover for the dimensionless unit percent many different customary units are in use for example ratio or rate. The indicator itself roughly corresponds to a combination of the UCUM name of the unit and the type of quantity measured and requires additional specification of the measurement procedure in the sense of UCUM. The following metadata model is an attempt to put these ideas into a more formal framework as outlined in Figure 1. It is based on the semantic decomposition of a statistical observation into four basic components: Indicator, Measurement, Unit Family, and Unit. Each component has a label attribute that is a simple textual descriptor that may contain a combination of information that is included in other attributes in an unstructured way. Figure 1. Semantic Model for Indicators and Corresponding Measurement 2.1. Indicator Strictly speaking, the Indicator itself is not part of the measurement, but as mentioned above a description of the quantity measured. In that sense it seems important to include the indicator into the model, in order to avoid confusion between indicator and measurement unit as it is the case in several analyzed examples. The Indicator class consists of the following
Recommended publications
  • Metric System Handout Worksheet Answer Key
    Metric System Handout Worksheet Answer Key Anguilliform and fourpenny Hugo infused, but Sherwin inertly internationalise her straggle. Greggory trauchle scienter as grammatical Benji harks her trespass shunts meditatively. Meteoritical Errol deflower some miens after esophageal Meredeth mouths scientifically. Approximately how many of your thumbprints would cover the sheet of paper? Match correctly to win. Word Problems is perfect to practice problem solving skills. What is the length of the newly paved section of road? This title is also in a list. While we say that these three outliers are renegades, they do use slight elements of the metric values. How far is this in kilometers? The International System of Units. Therefore, when the number. Continue use a to be played on meters measure temperatures, system metric worksheet answer key terms and exercises as close up the basics, rainfall calculator how hot bath into one! Very large or very small numbers easier to manipulate and understand are among the common! What is the width in metres? It is now time to put all of that knowledge to action. HD Movie Online Free In this concept, you will learn how to choose appropriate tools and units for given metric measurement situations. The governing body of units and measures decided that Carbon would be the standard of which to compare all other elements. Answers will vary depending on the container chosen. Please enter your password to sign in. When the exponent is negative, see how the exponent relates to the number of zeroes in the product. What is the length of your hand in mm? Ready for a really scary one? What you should learn.
    [Show full text]
  • Ffmpeg Documentation Table of Contents
    ffmpeg Documentation Table of Contents 1 Synopsis 2 Description 3 Detailed description 3.1 Filtering 3.1.1 Simple filtergraphs 3.1.2 Complex filtergraphs 3.2 Stream copy 4 Stream selection 5 Options 5.1 Stream specifiers 5.2 Generic options 5.3 AVOptions 5.4 Main options 5.5 Video Options 5.6 Advanced Video options 5.7 Audio Options 5.8 Advanced Audio options 5.9 Subtitle options 5.10 Advanced Subtitle options 5.11 Advanced options 5.12 Preset files 5.12.1 ffpreset files 5.12.2 avpreset files 6 Examples 6.1 Video and Audio grabbing 6.2 X11 grabbing 6.3 Video and Audio file format conversion 7 Syntax 7.1 Quoting and escaping 7.1.1 Examples 7.2 Date 7.3 Time duration 7.3.1 Examples 7.4 Video size 7.5 Video rate 7.6 Ratio 7.7 Color 7.8 Channel Layout 8 Expression Evaluation 9 Codec Options 10 Decoders 11 Video Decoders 11.1 rawvideo 11.1.1 Options 12 Audio Decoders 12.1 ac3 12.1.1 AC-3 Decoder Options 12.2 flac 12.2.1 FLAC Decoder options 12.3 ffwavesynth 12.4 libcelt 12.5 libgsm 12.6 libilbc 12.6.1 Options 12.7 libopencore-amrnb 12.8 libopencore-amrwb 12.9 libopus 13 Subtitles Decoders 13.1 dvbsub 13.1.1 Options 13.2 dvdsub 13.2.1 Options 13.3 libzvbi-teletext 13.3.1 Options 14 Encoders 15 Audio Encoders 15.1 aac 15.1.1 Options 15.2 ac3 and ac3_fixed 15.2.1 AC-3 Metadata 15.2.1.1 Metadata Control Options 15.2.1.2 Downmix Levels 15.2.1.3 Audio Production Information 15.2.1.4 Other Metadata Options 15.2.2 Extended Bitstream Information 15.2.2.1 Extended Bitstream Information - Part 1 15.2.2.2 Extended Bitstream Information - Part 2 15.2.3
    [Show full text]
  • Ffmpeg Documentation
    2/9/2020 ffmpeg Documentation ffmpeg Documentation Table of Contents 1 Synopsis 2 Description 3 Detailed description 3.1 Filtering 3.1.1 Simple filtergraphs 3.1.2 Complex filtergraphs 3.2 Stream copy 4 Stream selection 4.1 Description 4.1.1 Automatic stream selection 4.1.2 Manual stream selection 4.1.3 Complex filtergraphs 4.1.4 Stream handling 4.2 Examples 5 Options 5.1 Stream specifiers 5.2 Generic options 5.3 AVOptions 5.4 Main options 5.5 Video Options 5.6 Advanced Video options 5.7 Audio Options 5.8 Advanced Audio options 5.9 Subtitle options 5.10 Advanced Subtitle options 5.11 Advanced options 5.12 Preset files 5.12.1 ffpreset files 5.12.2 avpreset files 6 Examples 6.1 Video and Audio grabbing 6.2 X11 grabbing 6.3 Video and Audio file format conversion 7 Syntax 7.1 Quoting and escaping 7.1.1 Examples 7.2 Date 7.3 Time duration 7.3.1 Examples 7.4 Video size https://www.ffmpeg.org/ffmpeg-all.html 1/706 2/9/2020 ffmpeg Documentation 7.5 Video rate 7.6 Ratio 7.7 Color 7.8 Channel Layout 8 Expression Evaluation 9 Codec Options 10 Decoders 11 Video Decoders 11.1 rawvideo 11.1.1 Options 11.2 libdav1d 11.2.1 Options 11.3 libdavs2 12 Audio Decoders 12.1 ac3 12.1.1 AC-3 Decoder Options 12.2 flac 12.2.1 FLAC Decoder options 12.3 ffwavesynth 12.4 libcelt 12.5 libgsm 12.6 libilbc 12.6.1 Options 12.7 libopencore-amrnb 12.8 libopencore-amrwb 12.9 libopus 13 Subtitles Decoders 13.1 libaribb24 13.1.1 libaribb24 Decoder Options 13.2 dvbsub 13.2.1 Options 13.3 dvdsub 13.3.1 Options 13.4 libzvbi-teletext 13.4.1 Options 14 Encoders 15 Audio Encoders 15.1
    [Show full text]
  • Metric System. INSTITUTION .Delaware State Dept
    DOCUMENT RESUME LD096 126 SE 01.8 027 TITLE Metric System. INSTITUTION .Delaware state Dept. of Public Instruction, Dover.; Del Mod System, Dover, Del. SPONS AGENCY Natiortal Science roundation, Washington, D.C. RFPORT NO NSF-GO-6703 PUB DATE 30 Jun 73 NOTE 8p. EDPS PRICE MF-$0.75 HC-$10.50 PLUS POSTAGE DESCRIPTORS *Autoinstruct(londikrogramq; *General Science; *Measurement;\MetricActeet *Middle Schools; Science Education; *Secondary,_ hoof Science; Teacher Developed'MaterOlis; Units of Study (Subject\, Fields) IDENTIFIFRS *Del Mods System . ABSTRACT This autoinstructional unit deals with' the identification of units of measure in the metric system and the construction of relevant conversion tables. Students in middle school or in grade ten, taking a General Science 0:ourse, can handle this learning activity. It is recommended that high, middle or low level achievers can use the program. Eighteen minutes is the suggested time needed. Three behavioral objectives'are given and the equipment and materials needed to help the students achieve the objectives are Hsted. A student guide and a vocabulary list' are also included in the packet. (ES) yo , US DEPARtMENTOF HEALTH EDUCATION& WELFARE NATIONAL INSTITUTE OF EDUCATION tow.()0C0t,ft N 1 P4AS 13,.. t4 14 I P140 out XA 1 'YAS 1(k. kI I W OA: 100 PE RYON OW 014(.4NI/A ION Oki O.N ATINt. it 110IN1S 0 01i S1 Alk: 0 00 NOT 14tiSSAkeiC41, kiC Pitt SENT Ok k i(Ak NAT INstolul I. 01 OIJ( 4110N 1,05, f ION OW Pt)ey METR I C SYSTEM Prepared By Diane Sisk Sci_(_nee Teacher, NEWARK SCHOOL DISTRICT a.
    [Show full text]