Introduction: Operational Data Store (ODS)
Total Page:16
File Type:pdf, Size:1020Kb
Introduction: Operational Data Store (ODS) An Operational Data Store (ODS) is an integrated database of operational data. Its sources include Legacy Systems and it contains current or near term wise data. An ODS may contain 30 to 60 days of Data information. The goal of Operational Data Store is to integrated data information from different heterogeneous sources and homogeneous so it facilitates operational coverage in stipulated time period or near to it. Normally data information in the ODS are structured like the source systems, during integration the information is clean, De-Normalized and Business Rules applied ensure data information integrity. These integration processes is at lowest granular level and arise frequently throughout the day. Usually an ODS is not suitable for historical and trend analysis Data for future goal Decision Makers. An ODS is used for being temporary staging area for a Data Warehouse. Its main intension is to handle specific limit size data. An ODS does not contain static data but it contains data which are continuously updated through the Business Operations. ODS is particularly designed to perform relatively simply queries on small amount of data. It is contrast to Data Warehouse in which to perform complex queries on high amounts of data. ODS is short term memory and it stores recent and term wise data information. Data Warehouse It is a collection of integrated database which is designed for query and analysis of processing data. It is designed to consolidate data from multiple heterogeneous and homogeneous data sources. Data Warehouse allows integrating data from several external and internal sources and separate analysis from transaction workload. Data is twisted into high quality to meet all enterprise reporting requirements for all levels of business decision making users. Data Warehouse is the long term memory and data is stored Non-Volatile/relatively permanent. History of Data Warehouse Its development starts in late 1980 for data analysis and information management that could not be accomplished by Operational Systems. Operational Systems were designed that optimizes only transactions. Inside an organization across department numbers of transaction systems were growing rapidly. It makes the data integration more difficult. It created nuisance of data redundancy, data integration and analysis and performance problem in reporting at high level decision making system. Characteristics of Data Warehouse Subject oriented : Arrangement of Department wise data is based on how users refer it. Integrated : Removed all inconsistencies of value representations and applying proper Business Rules needful format naming convention. Non-Volatile : Data stored in it read-only format and never change/modify/update over time.. Time Variant : Representing a period of time. Summarized : Mapped operational data into a decision-usable format Large volume : Time series data sets are large amount of data. Not Normalized : Data Warehouse data can be and often are redundant. Metadata : It stored data about data. Data sources : Integrated data from one or more internal and external sources. Schema : It is a collection of Database Objects. It contains Tables, Views, Indexes and Synonyms. Different ways of arranging schema objects: 1. Star schema: It is physical database structures that store the factual data in the center surrounded by the reference text data. To provide fast response times, it uses De- Normalization. It allows database optimizers to work with simple database design in order to give better execution. It utilizes the facts/Measures that the content of realistic transactions is unlikely to change regardless. 2. Snowflake: In it some dimension tables are normalized. The normalization splits up the data into additional tables. 3. Fact Constellation: It has multiple fact tables and it is similar to the Star Schema. On-line Transaction Processing (OLTP): It contains detailed and current data. There are a large number of transactions such as INSERT, UPDATE and DELETE. The important feature is to put on fast query processing, maintaining data integrity and an effective measure in the processing time. OLTP is maintained schemas as Entity Relational Model. On-Line Analytical Processing (OLAP): It allows Users to perform fast and effective analysis on large volume of data. The data is stored in a Multi-Dimensional approach like facts and Dimensions that closely to Real Business Data. It allows Users to access summary data faster and easier for high level Decision Making Users. They would then drill down into the summary figures to get more detail data information. ETL (Extraction, Transformation and Loading) It is responsible for the extraction of data from heterogeneous and homogeneous sources, cleansing, customization, reformatting, and integration and then it inserts into a DW. Building the ETL process is biggest tasks of building a warehouse. Processes of ETL are complex and time consuming. It consumes most of DW project’s implementation efforts, costs, and resources. Construction of DW requires three main areas: Source area Mapping area Destination area Framework of ETL processes is shown in below figure. Data is extracted from various external and internal data sources and transmit to the data staging area (DSA). In DSA, it is transformed and cleansed before being loaded into data warehouse (DW). Data sources, Data Staging Area (DSA) and destination environments have different data structure formats like non-relational sources, flat files, relational tables, XML, web log sources, Raw files and Spreadsheets. In ETL process, data is extracted from an online transaction processing (OLTP) database, it transformed to match the DW schema, and then loaded into the DW database. Data warehouses also integrate data from non- online transaction processing systems, such as text files and spreadsheets. It is combination of process and technology that consumes a significant portion of the DW development efforts. ETL requires the skills of Business analysts, application developers and database designers. It is not a one-time process, when data sources are being changed, the DW will periodically update the destination data on target Database. When Business rules changes the data warehouse system needs to change in order to maintain its value as a tool for Decision Makers, as a outcome of that the ETL also changes and evolves. The ETL tool developed for easiest way of data modification/changes. A well-designed and documented ETL tool is necessary for better Business Data analysis/high level Decision Making to achieve future goals for Decision Makers. An ETL tool consists of three operations i.e. Extraction, Transformation and Loading Extraction It extracts data from the heterogeneous and homogeneous source systems. Every data source has different set of Characteristics, Business Rules and need to be managed effectively extract the source data for the ETL process. It requires to effectively incorporating systems that have different Platforms, different Database Management Systems and different Communications Protocols. Transformation This Operation consists of Data Cleaning, Transformation, and Integration at allocated Cleaning area/location. It makes cleaning and modifying extracted data to achieve accurate data which is consistent, correct, complete and unambiguous. Loading Loading data is final ETL operation. In this Operation, extracted and transformed data is loading into the Destination/Target Database structures actually accessed by the End Users and application systems. Loading operation consists of loading Dimensional Databases and Relational Databases. .