Data Warehouse Engineering Process (DWEP) with U.M.L
Total Page:16
File Type:pdf, Size:1020Kb
Borrador V 0.9 1 Data Warehouse Engineering Process (DWEP) with U.M.L. 2.1.1. Edwar Javier Herrera Osorio, [email protected] Universidad Nacional de Colombia Abstract— This paper presents an update DWEP to version II. RELATED WORK 2.0. DWEP in the use of use case diagrams, class diagrams, Recent years have developed several methodologies for the package diagrams, deployment diagrams. Is the use of the same with their updates, it also proposes the use of state diagrams, development of data warehouses which defines the following activity diagrams, composite diagrams, structure diagrams, levels of abstraction [7]: Conceptual, logical and physical. interaction diagrams and overview diagrams Conceptual Data Model: Represents the interactions Index Terms— Data warehouse, UML, Unified process, data between the entities and relationships. This model is closer to models real world problems to solve. Highlights the following patterns in the data warehouse: Model Multidimensional / ER (Sapia) [8], model Star / ER (Tryfona) [9], GOLD model (Trujillo) [5, 10], model Husemann [11], YAM2 model [12]. I. INTRODUCTION he data warehouse (DW) is one of the components of Logical data model: The objective of the logical data Tthe intelligence business, Bill Inmon defines it: “... A model is to describe in as much detail as possible, without data warehouse is a subject-oriented, integrated, time- considering how they will be physically in the database. Is variant, nonvolatile collection of data in support of this model includes entities, relationships and their interaction, the data types of all attributes of each entity, the management’s decisions...” [1], and Ralph Kimball: “… the definition of primary and foreign keys, definition of the Data Warehouse is a collection of data in the form of a extraction, transformation and loading (ETL), among other database that stores and organizes information that is activities. extracted directly from operational systems (sales, production, finance, marketing, etc..) and external Physical Data Model: The physical data model includes data…”[2]. Building a DW is a challenging and complex task all the specification of all tables and columns, following the because a DW concerns many organizational units and can business rules to determine the design of the data warehouse. often involve many people. Lujan poses at the 2004 [3,4] Data In this model, you write the code to create tables, views, Warehouse Engineering Process (DWEP), a methodology for integrity rules, multidimensionality consultations. building the data warehouse based on the Unified Modeling [5] and the Unified Process (UP) [6], which allows the user to On the other hand are the different methodologies for the tackle DW all design stages, from the operational data development of data warehouses [3, 5, 13, 14, 15, and 16], sources to the final implementation and including the most shortcomings: do not include a visual modeling definition of the ETL (Extraction, Transformation, and language, not to propose a series of steps or phases, or based Loading) processes and the end users' requirements. on an application (for example, the star diagram of relational The rest of the paper is structured as follows. In Section 2, databases). In 2005, Lujan proposed a methodology based on we briefly present some of the most important related work the Unified Process (Data Warehouse Engineering Process DWEP), which is based on UML version 1.4. The DWEP and point out the main shortcomings. In Section 3, we propose a collection of artifacts for standardization. summarize DWEP: first is presented phases, then workflows and your use diagrams based in UML version 2.1.1 (the In conclusion DWEP claim upgrade to version 2.1.1. of results achieved so far) and shows the use of these devices in UML which gives us more devices to implement the data the workflows that make up our process. Finally, we present warehouse. the main contributions and the future work in Section 4. Borrador V 0.9 2 mitigating the risk of technological exploration of the programming language in terms of user interface is concerned. For this first iteration was completed with a functional prototype for testing software and the definition of the model for implementing the user interface. Construction Phase: The construction phase starts from the baseline architecture that is specified in the design phase, and its purpose is to develop a product ready for initial operation at the end-user environment. Transition phase: Once the project enters the transition phase, the system has reached initial operating capability. This phase seeks to introduce the product in its operating environment. Workflows DWEP In general terms the UP, workflow is a set of activities in a Figure 1 given area resulting in the construction of artifacts (a text, a DWEP [5] diagram, a web page, code in programming language, etc.). III. DATA WAREHOUSE ENGINEERING PROCESS DWEP in version 2.1.1 present 20 diagrams (5 process and Lujan in his doctoral thesis [5] presents a Data Warehouse 3 levels), view table 1.This diagrams is use in the different Engineering Process (DWEP) based on the unified process. workflows. The UP is a methodology for software development proposed by OMG [17], its main features are: it is iterative, is Requirement: During this workflow, end users specify the addressed by the use cases is based on stages of development, measures and add more interesting, dimensional analysis, using UML as a graphical language models [18 and 19]. queries used to generate periodic reports and frequency of updating the data. For the development this stages the UP The UP and DWEP is composed of four phases [5 and 20]: use of use cases. View Figure 2. This helps to understand the inception, design, construction and transition (view Fig. 1). system and the requirements and functions for the solution. Furthermore, it must be like the interactions of the system. Phases UP and DWEP Analysis: The purpose of this workflow is to improve the Inception Phase: This phase is to develop the project analysis to justify its implementation. To achieve this there is structure and requirements from the requirements stage. This a general description of the project, a planning based on step documents the incumbent systems that feed the data interactions of the phases, there are critical risks and warehouse. The unified process diagram of the proposed use establishes the basic functionality of the software architecture of class diagrams, objects, communications, and deployment. description of a candidate. DWEP proposed use the Source Conceptual Schema (SCS, View 3), Source Conceptual Object Schema (SCOS, View 4), Development phase: Once the initial phase is to build a Source Logical Schema (SLS, View 5), Source Logical robust architecture for building software. This phase seeks to Comunications Schema (SLCS, View 6) y Source Physical establish the rationale for implementing the use cases and Schema (SPS, View 7). artifacts of the final system component, in addition to Borrador V 0.9 3 Source (S) Integration Data Warehouse (DW) Customization Client (c) SCS (Class) DM (Class) SCOS (Object) DWSS (Sequence) Conceptual DWCS (Class) DM (Class) CCS (Class) DWSMS (State Machine) DWAS (Activity) SLS (Class) Logical ETL (Class) DWLS (Class) Exporting Process (Class) CLS (Class) SLCS (Communication) Transportation Diagram Transportation Diagram Physical SPS (Comp & Deployment) DWPS (Comp & Deployment) CPS (Comp & Deployment) (Deployment) (Deployment) Table 1 DWEP 2.1.1 Diagrams Figure 2 Use Case diagrams [5] Figure 3 Source Conceptual Schema [5] TV:Products Miami:Cities 001:Orders Sony:Customer Radio:Products Figure 5 Source Logical Schema :Cities Play Statio 1: Read_table TV2:Products :Customer 2: Read_table 002:Orders Job System 3: Read_table Radio2:Products 4: Read Table :Orders Figure 4 Source Conceptual Objects Schema :Products Figure 6 Source Logical Communications Schema Borrador V 0.9 4 DWSD Open Source Customer Read and extract data to relational data base Transform and load in temporal Space in DW Load to temporal Space DW to DW Figure 7 Source Physical Schema Figure 10 Data Warehouse State Machine Schema Design: At the end of this workflow, the structure is defined in the data warehouse. The main result of this workflow is the conceptual model of the data warehouse. The UP proposes the use classes structured into packages, design of subsystems defined interfaces (components) and the form of collaboration between the classes. Figure 11 Data Warehouse activity schema[21] The DWEP proposes the use Data Warehouse Conceptual Schema (DWCS, View Figure 8), Client Conceptual Schema Implementation: During this workflow, the data warehouse (CCS),el Data Mapping (DM, View Figure 9.), Data Warehouse State Machine Schema (DWMSS, View Figure is built: The physical structure of the data warehouse is built, 10.) y el Data Warehouse Activity Schema (DWAS, View start to receive data in computer systems operations, is tuned Figure 11.). for optimized performance, among other tasks. The process proposed as unified engine components diagram. View figure 7. The DWEP propose use: Data Warehouse Physical Schema (DWPS, View Figure 12), Data Warehouse Logical Schema (DWLS, View Figure 13), , Client Logical Schema (CLS), Client Physical Schema (CPS), Data Warehouse Secuence Schema (DWSS, View Figure 14), ETL Process (View Figure 15). Figure 8 Data Warehouse Conceptual Schema [5] Figure 12 Figure 9 Physical diagram of the data warehouse [5] Data mapping [5] Borrador V 0.9 5 Workflows for maintenance and development post are not in the unified process and only part of the engineering process of the data warehouse. Maintenance: Unlike most systems, the data warehouse is a process that feeds constantly. The purpose of this workflow is to define the loading and updating processes necessary to maintain the data warehouse. This workflow starts when building the data warehouse and is delivered to end users, but does not have an end date. During this study, end users may have new needs, such as new downloads, which triggers the beginning of a new iteration with the requirements of workflow.