Data Warehousing and OLAP Technologies for Decision-Making Process
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 6, August - 2012 Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand decision making2.‖Typically, the data warehouse Abstract is maintained separately from the organization‘s Data warehousing and on-line analytical operational databases. There are many reasons processing (OLAP) are essential elements of for doing this. The data warehouse supports on- decision support, which has increasingly become line analytical processing (OLAP), the functional a focus of the database industry. Many and performance requirements. Data warehouses commercial products and services are now are targeted for decision support. Historical, available, and all of the principal database summarized and consolidated data is more management system vendors now have offerings important than detailed, individual records. Since in these areas. Decision support places some data warehouses contain consolidated data, rather different requirements on database perhaps from several operational databases, over technology compared to traditional on-line potentially long periods of time, they tend to be transaction processing applications. This paper orders of magnitude larger than operational provides an overview of data warehousing and databases; enterprise data warehouses are OLAP technologies. Data warehousesprovide projected to be hundreds of gigabytes to on-line analytical processing (OLAP) tools for terabytes in size. The workloads are query theinteractive analysis of multidimensional data intensive with mostly ad hoc, complex queries of varied granularities, which facilitates effective that can access millions of records and perform a data mining. Data warehousing and on-line lot of scans, joins, and aggregates. Query analytical processing (OLAP) are essential throughput and response times are more elements of decision support, which has important than transaction throughput. increasingly become a focus of the database industry. s. An OLAP system is market-oriented To facilitate complex analyses and visualization, and is used for data analysis by knowledge the data in a warehouse is typically modeled workers, including managers, executives and multidimensionally. For example, in a sales data analysts. Data warehousing and OLAP have warehouse, time of sale, sales district, emerged as leading technologies that facilitate salesperson, and product might be some of the data storage, organization and then, significant dimensions of interest. Often, these dimensions retrieval. Decision support places some rather are hierarchical; time of sale may be organized different requirements on database technology as a day-month-quarter-year hierarchy, product compared to traditional on-line transaction as a product category-industry hierarchy. Typical processing applications. OLAP operations include rollup (increasing the level of aggregation) and drill-down (decreasing Keywords: Data Warehousing, OLAP, and the level of aggregation or increasing detail) Decision Making. along one or more dimension hierarchies, slice_and_dice (selection and projection), and 1.Introduction: pivot (re-orienting the multidimensional view of data). A data warehouse is a ―subject-oriented, integrated,timevarying, non-volatile collection of Data warehouses might be implemented on Data that is used primarily in organizational standard or extended relational DBMSs, called www.ijert.org 1 International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 6, August - 2012 Relational OLAP (ROLAP) servers. These decision making process . So, data warehouse servers assume that data is stored in relational can be said to be a semantically consistent data databases, and they support extensions to SQL store that serves as physical implementation of and special access and implementation methods a decision support data model and stores the to efficiently implement the multidimensional information on which an enterprise needs to data model and operations. In contrast, make strategic decisions. Data warehouses multidimensional OLAP (MOLAP) servers are provide on-line analytical processing (OLAP) servers that directly store multidimensional data tools for the interactive analysis of in special data structures (e.g., arrays) and multidimensional data of varied implement the OLAP operations over these granularities, which facilitates effective data special data structures. mining. The functional and performance requirements of OLAP are quite different from There is more to building and maintaining a data those of the on-line transaction processing warehouse than selecting an OLAP server and applications traditionally supported by the defining a schema and some complex queries for operational databases. the warehouse. Different architectural alternatives exist. Many organizations want to A data warehouse is defined as a ―subject- implement an integrated enterprise warehouse oriented, integrated, time variant, non-volatile that collects information about all subjects (e.g., collection of data that serves as a physical customers, products, sales, assets, personnel) implementation of a decision support data model spanning the whole organization. However, and stores the information on which an building an enterprise warehouse is a long and enterprise needs to make strategic decisions. In complex process, requiring extensive business data warehouses historical, summarized and modeling, and may take many years to succeed. consolidated data is more important than Some organizations are settling for data marts detailed, individual records. Since data instead, which are departmental subsets focused warehouses contain consolidated data, perhaps on selected subjects (e.g., a marketing data mart from several operational databases, over may include customer, product, and sales potentially long periods of time, they tend to be information). These data marts enable faster roll much larger than operational databases. Most out, since they do not require enterprise-wide queries on data warehouses are ad hoc and are consensus, but they may lead to complex complex queries that can access millions of integration problems in the long run, if a records and perform a lot of scans, joins, and complete business model is not developed. aggregates. Due to the complexity query throughput and response times are more 2. Data Warehousing important than transaction throughput. 2.1 Definition of data warehousing Data warehousing is a collection of decision support technologies, aimed at enabling the 2A data warehouse is a subject-oriented, knowledge worker (executive, manager, analyst) integrated, time-variant and nonvolatile to make better and faster decisions. Data collection of data in support of management‘s warehousing technologies have been www.ijert.org 2 International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 6, August - 2012 successfully deployed in many industries: In addition to the main warehouse, there may be manufacturing (for order shipment and customer several departmental data marts. Data in the support), retail (for user profiling and inventory warehouse and data marts is stored and managed management), financial services (for claims by one or more warehouse servers, which present analysis, risk analysis, credit card analysis, and multidimensional views of data to a variety of fraud detection), transportation (for fleet front end tools: query tools, report writers, management), telecommunications (for call analysis tools, and data mining tools. Finally, analysis and fraud detection), utilities (for power there is a repository for storing and managing usage analysis), and healthcare (for outcomes metadata, and tools for monitoring and analysis). This paper presents a roadmap of data administering the warehousing system. warehousing technologies, focusing on the Designing and rolling out a data warehouse is a special requirements that data warehouses place complex process, consisting of the following on database management systems (DBMSs). activities 3 2.2 Architecture and End-to-End Process •Define the architecture, do capacity planning, and select the storage servers, database and OLAP servers, and tools. • Integrate the servers, storage, and client tools. • Design the warehouse schema and views. • Define the physical warehouse organization, data placement, partitioning, and access methods. •Connect the sources using gateways, ODBC drivers, or other wrappers. • Design and implement scripts for data extraction, cleaning, transformation, load, and refresh. •Populate the repository with the schema and view definitions, scripts, and other metadata. • Design and implement end-user applications. • Roll out the warehouse and applications. 2.3 DATA WAREHOUSING FUNDAMENTALS It includes tools for extracting data from multiple A data warehouse (or smaller-scale data mart) is a operational databases and external sources; for specially prepared repository of data designed to cleaning, transforming and integrating this data; support decision making. The data comes from for loading data into the data warehouse; and for operational systems and external sources. To create periodically refreshing the warehouse to reflect the data warehouse, data are extracted from source updates at the sources and to purge data from the systems, cleaned (e.g., to detect and correct errors), warehouse, perhaps