Lecture 7 Data Warehousing

Lecture 7 Data Warehousing

Lecture 7 Data warehousing ITM-761 Business Intelligence ดร. สลิล บุญพราหมณ์ 1 ...ผู ้หนักแน่นในสจจะั พูดอย่างไรทําอย่างนั้น จึงจะได ้รับ ความสําเร็จ พร ้อมทั้งความศรัทธาเชอถือและความยกย่องื่ สรรเสริญ จากคนทุกฝ่าย การพูดแล ้วทําคือพูดจริงทําจริง จึงเป็นปัจจัยสําคัญใน การสงเสริมเกียรติคุณของบุคคลให่ ้เด่นชดั ... คัดจากพระบรมราโชวาทของพระบาทสมเด็จพระเจ ้าอยู่หัว ในพิธีพระราชทานปริญญาบัตรของจุฬาลงกรณ์มหาวิทยาลัย ๑๐ กรกฎาคม ๒๕๔๐ 2 Topics • Data Warehousing Concepts • Data mart • Typical Architecture of a DW • Kimball vs. Inmon in DW building approach • Dimensionality modeling 3 Data Warehousing Concepts • What is Data Warehouse? A data warehouse is a collection of integrated databases designed to support a DSS. • According to Inmon’s definition(Inmon,1992): • It is a collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is non-volatile and relevant to 4 some moment in time. Data Warehousing Concepts • subject-oriented, • integrated, • time-variant, and • non-volatile 5 1. Subject-oriented Data • Organized around major subjects, such as customer, product, sales. • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. 6 2. Integrated Data • The data warehouse integrates corporate application- oriented data from different source systems, which often includes data that is inconsistent. • รูปแบบของขอมูลไมตรงกันเนื่องจากมีแหลงที่มาตางกัน • The integrated data source must be made consistent to present a unified view of the data to the users. 7 Data on a given subject is defined and stored once. Savings Current Accounts Loans Customer OLTP Applications Data Warehouse 8 3. Time-variant Data • Data in the warehouse is only accurate and valid at some point in time or over some time interval. • Time-variance is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots. 9 Data Warehouse Operational database Data warehouse Current value data: Snapshot data: • ชวงเวลา่ 60-90 days • ชวงเวลา่ 5-10 years • key may or may not • key contains an have time element element of time • data can be updated • once snapshot is made, record cannot be updated 10 4. Non-volatile Data • Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. • New data is always added as a supplement to the database, rather than a replacement. 11 Typically data in the data warehouse is not updated or deleted. Operational Warehouse Load Insert, Update, Read Delete, or Read 12 Data Warehouse vs. OLTP Property OLTP Data Warehouse Response Time Sub seconds to Seconds to hours seconds Operations DML Primarily Read only Nature of Data 30 – 60 days Snapshots over time Data Application Subject, time Organization Size Small to large Large to very large Data Sources Operational, Operational, Internal Internal, External Activities Processes Analysis 13/77 Warehouse Environment • The warehouse environment can contain: • Enterprise data warehouse • Departmental data warehouses or business unit- specific data marts • Personal data marts • Application-specific extracts • Operational data stores • Information catalogues • Publish and subscribe systems • Metadata repositories. 14 Problems of DW • Underestimation of resources for data loading • Hidden problems with source systems • Required data not captured • Increased end-user demands • Data homogenization • High demand for resources • Data ownership • High maintenance • Long duration projects • Complexity of integration 15 Data Mart • A subset of a data warehouse that supports the requirements of a particular department or business function. • Characteristics include • Focuses on only the requirements of one department or business function. • Do not normally contain detailed operational data unlike data warehouses. • More easily understood and navigated. 16 Dependent Data Mart Data Marts Operational Systems Flat Files Data marketing Warehouse Legacy Data Sales Marketing Operations Data Sales Finance HR External Data Finance 17 External Data Independent Data Mart Operational Systems Flat Files Sales or Legacy Data Marketing Operations Data External Data 18 External Data Reasons for Creating a Data Mart • To give users access to the data they need to analyze most often. • To provide data in a form that matches the collective view of the data by a group of users in a department or business function area. • To improve end-user response time due to the reduction in the volume of data to be accessed. • To provide appropriately structured data as dictated by the requirements of the end-user access tools. 19 Reasons for Creating a Data Mart (cont) • Building a data mart is simpler compared with establishing a corporate data warehouse. • The cost of implementing data marts is normally less than that required to establish a data warehouse. • The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project. 20 Typical Architecture of a DW 21 1. Operational Data Sources • Mainframe first generation hierarchical and network databases. • Departmental propriety file systems (e.g. VSAM, RMS) and relational DBMSs (e.g. Informix, Oracle). • Private workstations and servers. • External systems such as the internet, commercially available databases, or databases associated with an organization’s suppliers or customers. 22 2. Operational Data Store (ODS) • A repository of current and integrated operational data used for analysis. • Often structured and supplied with data in the same way as the data warehouse. • May act simply as a staging area for data to be moved into the warehouse. • Often created when legacy operational systems are found to be incapable of achieving reporting requirements. • Provides users with the ease-of-use of a relational database while remaining distant from the decision 23 support functions of the data warehouse. 24 3. Load Manager • Performs all the operations associated with the extraction and loading of data into the warehouse. • Size and complexity will vary between data warehouses and may be constructed using a combination of vendor data loading tools and custom-built programs. 25 4. Warehouse Manager • Performs all the operations associated with the management of the data in the warehouse. • Constructed using vendor data management tools and custom-built programs. • Operations performed include • Analysis of data to ensure consistency. • Transformation and merging of source data from temporary storage into data warehouse tables. • Creation of indexes and views on base tables. 26 • Generation of denormalizations, (if necessary). • Generation of aggregations, (if necessary). • Backing-up and archiving data • In some cases, also generates query profiles to determine which indexes and aggregations are appropriate. • A query profile can be generated for each user, group of users, or the data warehouse and is based on information that describes the characteristics of the queries such as frequency, target table(s), and size of results set. 27 5. Query Manager • Performs all the operations associated with the management of user queries. • Typically constructed using vendor end-user data access tools, data warehouse monitoring tools, database facilities, and custom-built programs. • Complexity determined by the facilities provided by the end-user access tools and the database. • In some cases, the query manager also generates query profiles to allow the warehouse manager to determine which indexes and aggregations are appropriate. 28 6. Detailed Data • Stores all the detailed data in the database schema. • In most cases, the detailed data is not stored online but aggregated to the next level of detail. • On a regular basis, detailed data is added to the warehouse to supplement the aggregated data. 29 7. Lightly and Highly Summarized Data • Stores all the pre-defined lightly and highly aggregated data generated by the warehouse manager. • The purpose of summary information is to speed up the performance of queries. • Removes the requirement to continually perform summary operations (such as sort or group by) in answering user queries. • The summary data is updated continuously as new data is loaded into the warehouse. 30 8. Archive / Backup Data • Stores detailed and summarized data for the purposes of archiving and backup. • May be necessary to backup online summary data if this data is kept beyond the retention period for detailed data. • The data is transferred to storage archives such as magnetic tape or optical disk. 31 9. Metadata • The management of metadata within the data warehouse is a very complex task that should not be underestimated. • Used for a variety of purposes • Extraction and loading processes - metadata is used to map data sources to a common view of information within the warehouse. • Warehouse management process - metadata is used to automate the production of summary tables. • Query management process - metadata is used to direct a query to the most appropriate data source. 32 General Metadata Issues General metadata issues associated with Data Warehouse use: • What tables, attributes and keys does the DW contain? • Where did each set of data come from? • What transformations

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    58 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us