3.2 S-DWH Information Systems Architecture

Total Page:16

File Type:pdf, Size:1020Kb

3.2 S-DWH Information Systems Architecture 3.2 S-DWH Information Systems Architecture The Information Systems connect the business to the infrastructures, in our context this is represented by a conceptual organization of the effective S-DWH which is able to support tactical demands. In the layered architecture, in terms of data system, we identify: - the staging data are usually of temporary nature, and its contents can be erased, or archived, after the DW has been loaded successfully; - the operational data is a database designed to integrate data from multiple sources for additional operations on the data. The data is then passed back to operational systems for further operations and to the data warehouse for reporting; - the Data Warehouse is the central repository of data which is created by integrating data from one or more disparate sources and store current as well as historical data; - data marts are kept in the access layer and are used to get data out to the users. Data marts are derived from the primary information of a data warehouse, and are usually oriented to specific business lines. Source Integration Interpretation and Access Layer Layer Analysis Layer Layer Staging Data Operational Data Data Warehouse Data Mart ICT - Survey DATA MINING ANALYSIS EDITING SBS - Survey ANALYSIS REPORTS ETrade - Survey operational information Data Mart … operational Data Warehouse Data Mart information Data Mart ADMIN Figure 3 - Information Systems Architecture The Metadata Management of metadata used and produced in all different layers of the warehouse are specifically defined in the Metadata framework 1 and the Micro data linking2 . This is used for description, identification and retrieval of information and links the various layers of the S-DWH, which occurs through the mapping of different metadata description schemes; It contains all statistical actions, all classifiers that are in use, input and output variables, selected data sources, descriptions of output tables, questionnaires and so on. All these meta-objects are collected during design phase into one metadata repository. It configures a metadata-driven system well-suited also for supporting the management of actions or IT modules, in generic workflows. In order to suggest a possible path towards process optimization and cost reduction, in this chapter we will introduce a data model and a possible simple description of a generic workflow, which links the business model with the information system in the S-DWH. 1 Lundell L.G. (2012) Metadata Framework for Statistical Data Warehousing, ver. 1.0. Deliverable 1.1 2 Ennok M et al. (2013) On Micro data linking and data warehousing in production of business statistics, ver. 1.1. Deliverable 1.4 3.2.1 S-DWH is a metadata-driven system The over-arching Metadata Management of a S-DWH as metadata-driven system supports Data Management within the statistical program of an NSI, and it is therefore vital to thoroughly manage the metadata. To address this we refer to the metadata chapter where metadata are organized in six main categories. The main six categories are: - active metadata, metadata stored and organized in a way that it enables operational use, manual or automated; - passive metadata, any metadata that are not active; - formalised metadata, metadata stored and organised according to standardised codes, lists and hierarchies; - free-form metadata, metadata that contain descriptive information using formats ranging from completely free-form to partly formalised; - reference metadata, metadata that describe the content and quality of the data in order to help the user understand and evaluate them (conceptually); - structural metadata, metadata that help the user find, identify, access and utilise the data (physically). Metadata in each of these categories belong to a specific type, or subset of metadata. The five subsets are: - statistical metadata, data about statistical data e.g. variable definition, register description, code list; - process metadata, metadata that describe the expected or actual outcome of one or more processes using evaluable and operational metrics; - quality metadata, any kind of metadata that contribute to the description or interpretation of the quality of data; - technical metadata, metadata that describe or define the physical storage or location of data; - authorization metadata are administrative data that are used by programmes, systems or subsystems to manage user’s access to data. In the S-DWH, one of the key factors is consolidation of multiple databases into a single database and identifying redundant columns of data for consolidation or elimination. This involves coherence of statistical metadata and in particular on managed variables. Statistical actions should collect unique input variables, not just rows and columns of tables in a questionnaire. Each input variable should be collected and processed once in each period of time. This should be done so that the outcome, input variable in warehouse, could be used for producing various different outputs. This variable triggers changes in almost all phases of statistical production process. So, samples, questionnaires, processing rules, imputation methods, data sources, etc., must be designed and built in compliance with standardized input variables, not according to the needs of one specific statistical action. The variable based on statistical production system reduces the administrative burden, lowers the cost of data collection and processing and enables to produce richer statistical output faster. Of course, this is true in boundaries of standardized design. This means that a coherent approach can be used if statisticians plan their actions following a logical hierarchy of the variables estimation in a common frame. What the IT must support is then an adequate environment for designing this strategy. As an example, according to a common strategy, we consider Surveys 1 and 2 which collect data with questionnaires and one administrative data source. But this time, decisions done in design phase (design of the questionnaire, sample selection, imputation method, etc.) are made “globally”, taking into consideration all three surveys. In this way, integration of processes gives us reusable data in the warehouse. Our warehouse now contains each variable only once, making it much easier to reuse and manage our valuable data. Figure 4 - Integration to achieve each variable only once - Information Re-use Another way of reusing data which is already in the warehouse is to calculate new variables. The following figure illustrates the scenario where a new variable E is calculated from variables C* and D, loaded already into the warehouse. It means that data can be moved back from the warehouse to the integration layer. Warehouse data can be used in the integration layer in multiple purposes, calculating new variables is only one example. Integrated variable based on a warehouse data opens the way to any new possible sub-sequent statistical actions that do not have to collect and process data, and can produce statistics directly from the warehouse. Skipping the collection and processing phases, one can produce new statistics, and analyses are very fast and much cheaper than in case of the classical survey. Figure 5 - Building a new variable - Information Re-Use Designing and building a statistical production system according to the integrated warehouse model takes initially more time and effort than building the stovepipe model. But maintenance costs of integrated warehouse system should be lower, and new products which can be produced faster and cheaper, to meet the changing needs, should compensate the initial investments soon. The challenge in data warehouse environment is to integrate, rearrange and consolidate large volumes of data from different sources to provide a new unified information base for business intelligence. To meet this challenge, we propose that the processes defined in GSBPM are distributed into four groups of specialized functionalities, each represented as a layer in the S-DWH. 3.2.2 Layered approach of a full active S-DWH The layered architecture reflects a conceptual organization in which we will consider the first two levels as pure statistical operational infrastructures, functional for acquiring, storing, editing and validating data and the last two layers as the effective data warehouse, i.e. levels in which data are accessible for analysis. These reflect two different IT environments: an operational one (where we support semi-automatic computer interaction systems) and an analytical one (the warehouse, where we maximize human free interaction). ACCESS LAYER DATA WAREHOUSE INTERPRETATION AND ANALYSIS LAYER INTEGRATION LAYER OPERATIONAL DATA SOURCES LAYER Figure 6 - S-DWH Layered Architecture 3.2.3 Source layer The Source layer is the gathering point for all data that is going to be stored in the Data warehouse. Input to the Source layer is data from both internal and external sources. Internal data is mainly data from surveys carried out by the NSI, but it can also be data from maintenance programs used for manipulating data in the Data warehouse. External data is administrative data, which is data collected by someone else (originally for some other purpose). The structure of data in the Source layer depends on how the data is collected and the designs of the various NSI data collection processes. The specifications of collection processes and their output, the data stored in the Source layer, have to be thoroughly described. Some vital information is names, meaning, definition and description, of any collected variable. Also the collection process itself must be described, for example the source of a collected item, when it was collected and how. When data are entering in the source layer from an external source, or administrative archive, data and relative metadata must be checked in terms of completeness and coherence. From a data structure point of view, external data are stored with the same data structure as they arrive. The integration toward the integration layer should be then implemented by mapping of the source variable with the target variable, i.e. the internal variable to the S-DWH.
Recommended publications
  • Metadata Standards and Metadata Registries: an Overview
    METADATA STANDARDS AND METADATA REGISTRIES: AN OVERVIEW Bruce E. Bargmeyer, Environmental Protection Agency, and Daniel W. Gillman, Bureau of Labor Statistics Daniel W. Gillman, Bureau of Labor Statistics, Washington, DC 20212 [email protected] ABSTRACT Much work is being accomplished in the national and international standards communities to reach consensus on standardizing metadata and registries for organizing that metadata. This work has had a large impact on efforts to build metadata systems in the statistical community. Descriptions of several metadata standards and their importance to statistical agencies are provided. Applications of the standards at the Census Bureau, Environmental Protection Agency, Bureau of Labor Statistics, Statistics Canada, and many others are provided as well, with an emphasis on the impact a metadata registry can have in a statistical agency. Standards and registries based on these standards help promote interoperability between organizations, systems, and people. Registries are vehicles for collecting, managing, comparing, reusing, and disseminating the designs, specifications, procedures, and outputs of systems, e.g., statistical surveys. These concepts are explained in the paper. Key Words: Data Quality, Data Management 1. INTRODUCTION Metadata is loosely defined as data about data. Though this definition is cute and easy to remember, it is not very precise. Its strength is in recognizing that metadata is data. As such, metadata can be stored and managed in a database, often called a registry or repository. However, it is impossible to identify metadata just by looking at it. We don't know when data is metadata or just data. Metadata is data that is used to describe other data, so the usage turns it into metadata.
    [Show full text]
  • Common Education Data Standards (CEDS) Version 5 Data Model Guide
    Common Education Data Standards (CEDS) Version 5 Data Model Guide January 2015 CONTENTS Introduction ......................................................................................................................................... 1 About This Document .......................................................................................................................... 2 The Domain Entity Schema (DES) .......................................................................................................... 2 The Normalized Data Schema (NDS) ..................................................................................................... 2 Naming Conventions and Key Terms .................................................................................................... 3 Entity Terms .................................................................................................................................... 4 Property Terms ............................................................................................................................... 4 Representation Terms .................................................................................................................... 4 Qualifiers (Optional) ....................................................................................................................... 5 Element Identifiers ................................................................................................................................ 5 CEDS Domain Entity Schema ...............................................................................................................
    [Show full text]
  • Data Definitions and Data Dictionaries
    Enterprise Data Standard Chief Data Officer Data Standard Data Definitions & Data Dictionaries Info Reference ID EDS 1.02 Date 02/09/2018 Asset Classification Public Information <this document> Data Classification n/a <related data element(s)> Steward of this Standard Chief Data Officer [email protected] Contact Mike Kelly 803-777-5230 Status APPROVED. Approved by unanimous consent of Data Administration Advisory Committee on 06/13/2016. Revised 11/09/2017. Data Standard The University of South Carolina, through its campuses, divisions, and Overview organization units, establishes definitions for data elements that (1) are required for mandatory reporting, (2) support key institutional metrics, or (3) are otherwise essential to operations and services, including Enterprise Resource Planning and information systems. Data definitions and data dictionaries are encouraged for all data assets containing data and information important to the university as a whole or any particular organizational unit. Purpose and Use This standard offers guidance about when data definitions and dictionaries may be required or recommended, as well as suggested content and available resources. Required Actions & Procedures Data definitions must be established for critical data elements in enterprise information systems and data assets. Data Stewards are responsible for defining critical data elements (CDEs) for their area of responsibility and for systems of which they are the owner. Procedures are outlined below. Justifications The Banner Student Information System (OneCarolina) Post-Implementation Review Audit Report, dated 12/04/2015, found that “standard data definitions were not established during implementation, and still do not exist for most data elements within Banner. This was one of the Banner project objectives and is a leading practice.” Audit & Advisory Services recommended: “A data dictionary with clear definitions will facilitate efficient and accurate reporting.
    [Show full text]
  • Document Content
    Information Technology Supporting Documentation Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: GEN-INF004A STD Title: Introduction to Data Warehousing Issued by: Deputy Secretary for Information Technology Date Issued: November 7, 2006 Date Revised: November 18, 2010 Domain: Information Discipline: Data Administration Technology Area: Data Warehousing Referenced by: ITP-INF004 Revision History Description: Date: 11/18/2010 ITP Refresh Introduction: Data Warehousing: Data Warehousing systems have reached a new level of maturity as both an IT discipline and a technology. Main Document Content: Data Warehouse systems assist government organizations with improved business performance by leveraging information about citizens, business partners, and internal government operations. This is done by: • Extracting data from many sources, e.g., application databases, various local and federal government repositories, external agency partners. • Centralizing, organizing, and standardizing information in repositories, such as Data Warehouses and Data Marts. This includes cleaning, appending, and integrating additional data. • Providing analytical tools that allow a broad range of business and technical specialists to run queries against the data to uncover patterns and diagnose problems. Extract, Transform and Load (ETL) Data integration technology is generally used to extract transactional data from internal and external source applications to build the Data Warehouse. This process is referred to as ETL (Extract, Transform, Load). Data is extracted from its source application or repository, transformed to a format needed by a Data Warehouse, and loaded into a Data Warehouse. Data integration technology works hand-in-hand with technologies like Enterprise Information Integration (EII), database replication, Web Services, and Enterprise Application Integration (EAI) to bridge proprietary and incompatible data formats and application protocols.
    [Show full text]
  • A Data Model for a Data Element Dictionary
    A Data Model for a Data Element Dictionary IPDA Data Modeling Task Group IPDA Steering Committee Meeting 11-12 July 2008 Montreal, Canada [email protected] Data Element Dictionary Data Model Topics • Introduction • Terminology • Problem Statement – Problem 1 - Limited Definitions – Problem 2 - Enabling Interoperability • Case Study – Questions to ask of a Data Dictionary • ISO/IEC 11179 • Status • Conclusions 2 Data Element Dictionary Data Model Introduction • The Data Dictionary Modeling task group was constituted in the summer of 2007, with mandate to “develop a data model for the data dictionary”. • This effort is related to the standard data modeling task because a data dictionary is needed to support the data model. – The IPDA draft standard data model is currently under assessment. – The IPDA standard model, a version of the PDS data model will require a similar data dictionary. 3 Data Element Dictionary Data Model Terminology • A Data Model defines the entities to be processed, their attributes, and the relationships that add meaning. • An attribute has alternate names. – Data Element – Vocabulary Term • The set of all attributes in a data model is also called its vocabulary and is collected into a data dictionary. • When defining an attribute, a set of meta-attributes or “attributes about attributes” are used. – The name of an attribute is a meta-attribute. For example when defining the data element sample_type, the meta-attribute, attribute_name, has the value “sample_type”. 4 Data Element Dictionary Data Model Problem Statement • The data model for the existing Planetary Science Data Dictionary (PSDD) is limited in its capabilities and needs an upgrade.
    [Show full text]
  • (Pdf) of Metadata Standards for Semantic Interoperability In
    Metadata Standards for Semantic Interoperability in Electronic Government Jim Davies, Steve Harris, Charles Crichton, Aadya Shukla, and Jeremy Gibbons Software Engineering Programme, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD, UK [email protected] ABSTRACT a greater degree of ownership or control over the develop- Effective data sharing, across government agencies and other ment. Standards are the means by which electronic govern- organisations, relies upon agreed meanings and representa- ment can achieve interoperability across departments and tions. A key, technological challenge in electronic gover- agencies, improve their management of supplier contracts, nance is to ensure that the meaning of data items is accu- and ensure that key data remains accessible over time. rately recorded, and accessible in an economical—effectively, Standardisation activity in software was originally focussed automatic—fashion. In response, a variety of data and meta- upon language and protocol design: upon the intended in- data standards have been put forward: from government terpretation of programming statements, and upon the con- departments, from industry groups, and from organisations crete representation of data and commands. Since then, such as the ISO and W3C. there has been a pronounced shift in focus towards metadata This paper shows how the leading standard for metadata standards: descriptions of intended functionality and mean- registration—ISO 11179—can be deployed without the need ing that can be associated with particular items of data, in for a single, monolithic conceptualisation of the domain, and order to ensure a consistent treatment and interpretation. hence without the need for universal agreement upon a par- Initial work in this area was motivated by the concerns of ticular model of electronic governance.
    [Show full text]
  • Understanding Metadata
    Understanding Metadata What is Metadata? .................................................................................................. 1 What Does Metadata Do? ................................................................................ 1 Structuring Metadata ................................................................................. 2 Metadata Schemes and Element Sets ............................................... 3 Dublin Core .....................................................................................................................3 TEI and METS ..............................................................................................................4 MODS .......................................................................................................................5 EAD and LOM......................................................................................................6 <indecs>, ONIX, CDWA, and VRA ..................................................................7 MPEG ..........................................................................................................8 FGDC and DDI ........................................................................................9 Creating Metadata ................................................ 10 Interoperability and Exchange of Metadata ....11 Future Directions .................................... 12 More Information on Metadata ........ 13 Glossary ...................................... 15 Acknowledgements Understanding Metadata is a
    [Show full text]
  • Data Element Naming Conventions and Controlled Vocabulary
    Human Services Domain Data Element Naming Conventions and Controlled Vocabulary Human Services Domain Document Version 1.5 November 29, 2016 Administration for Children and Families Office of Child Support Enforcement 330 C St. SW Washington, DC 20024 Table of Contents 1. Executive Summary ................................................................................................... 1 2. Introduction ................................................................................................................ 1 2.1 Scope ................................................................................................................. 2 2.2 Oversight ........................................................................................................... 2 2.3 Registration ....................................................................................................... 2 3 Human Services Domain Data Element Naming Convention ....................................... 3 3.1 Uniqueness Principle ........................................................................................ 3 3.1.1 Data Element Name Composition ........................................................... 3 3.1.2 Data Element Name Terms Sequence ...................................................... 5 3.1.3 Naming Rules........................................................................................... 7 3.2 Controlled Vocabulary ...................................................................................... 7 3.2.1 Vocabulary Rules ....................................................................................
    [Show full text]
  • Data Dictionary Standards Prepared by the Stony Brook Data Governance Council, June 8, 2017 Approved Nov
    Data Dictionary Standards Prepared by the Stony Brook Data Governance Council, June 8, 2017 Approved Nov. 9, 2017, updated March 28, 2018 Following a guidance from the International Standards Organization (ISO) that effective use of data requires a shared “common understanding of the meaning and descriptive characteristics of that data” (2004), the Stony Brook University Data Governance Council (DGC) has set standards for data dictionaries. A data dictionary is a compendium of data definitions for multiple data elements that exist in a data store. For systems providing data, data dictionaries must be maintained to guide data users in the meaning and proper usage of data. Principles 1. Data dictionaries are designed to promote communication and production of meaning; as such dictionaries document the existence, meaning, and use of data elements 2. Data dictionaries must be accessible to all users who enter and extract data from a data store 3. Data stewards must actively maintain data dictionary contents, including definitions, values, and other metadata 4. Data caretakers and users are responsible for actively using data dictionaries to correctly enter, select, and analyze data elements 5. Data dictionaries should be reviewed on a regular schedule to ensure currency Required Elements Dictionaries must include the following required elements: Dictionary Element Description Examples Data Store The name of the database PeopleSoft - csprod Table name The name of the table housing the PS_STDNT_CAR_TERM element Data element The alphanumeric sequence used to CUM_GPA identify the field. ACAD_CAREER Data element name The name of the field or term in Cumulative Grade Point Average English. No abbreviations or acronyms Academic Career allowed.
    [Show full text]
  • Data Warehouse Design
    DATA WAREHOUSE DESIGN: AN INVESTIGAnON OF STAR SCHEMA By WEIPING LI Bachelor ofScience Nanjing University Nanjing, Jiangsu People's Republic ofChina 1989 Master ofArts in Economics Nanjing University Nanjing, Jiangsu People's Republic ofChina 1992 Submitted to the Faculty ofthe Graduate College ofthe Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE May,200a DATA WAREHOUSE DESIGN: AN INVESTIGATION OF STAR SCHEMA Thesis Approved: Thesis Adviser I-/. £ Dean ofthe Graduate College 11 ACKNOWLEDGEMENTS I wish to express my sincere appreciation to my major advisor, Dr. G.E. Hedrick for his intelligent supervision, constructive guidance, inspiration and encouragement. My sincere appreciation extends to my other committee members Dr. Chandler and Dr. Lu, whose guidance, assistance and support are also invaluable. I am grateful for the help I have received from many individuals. I want to express my sincere gratitude to all friends who provided suggestions and assistance for this study. Finally, I would like to give my special appreciation to my parents, whose support, encouragement and love go through my study in US. 111 TABLE OF CONTENTS Chapter Page 1. INTRODUCTION 1 2. LITERATURE REVIEW 11 Concepts and Characters ofData Warehouse 12 The Application Trend ofData Warehousing 13 Technical Architecture ofData Warehouse 14 Methodology ofData Warehouse Design 15 Management ofData Warehouse 17 3. DATA WAREHOUSE VS OPERATIONAL DATABASE DESIGN 18 4. DATABASE DESGN IN WAREHOUSE: STAR SCHEMA 23 5. A DESIGN CASE USING STAR SCHEMA 32 6. SUMMARY AND CONCLUSIONS 40 APPENDIX (Acronyms, Glossary) 43 REFERENCES 48 lV LIST OF FIGURES Figure Page 1.
    [Show full text]