Yet Another Automated OLAP Workload Analyzer: Principles, and Experiences

Yet Another Automated OLAP Workload Analyzer: Principles, and Experiences Alfredo Cuzzocrea1;2, Rim Moussa3 and Enzo Mumolo1 1DIA Department, University of Trieste, Italy 2ICAR-CNR, Italy 3LaTICE and University of Carthage, Tunisia Keywords: Data Warehouse Tuning, OLAP Intelligence, Data Warehouse Workloads, OLAP Workloads. Abstract: In order to tune a data warehouse workload, we need automated recommenders on when and how (i) to partition data and (ii) to deploy summary structures such as derived attributes, aggregate tables, and (iii) to build OLAP indexes. In this paper, we share our experience of implementation of an OLAP workload analyzer, which exhaustively enumerates all materialized views, indexes and fragmentation schemas candidates. As a case of study, we consider TPC-DS benchmark -the de-facto industry standard benchmark for measuring the performance of decision support solutions including. 1 INTRODUCTION This model constitutes a decision support system framework which affords the ability to calculate, conso- Decision Support Systems (DSS) are designed to em- lidate, view, and analyze data according to multiple power the user with the ability to make effective de- dimensions. OLAP relies heavily upon a data mo- cisions regarding both the current and future activi- del known as the multidimensional databases (MDB) ties of an organization. One of the most prominent (Kimball and Ross, 2013; Kimball et al., 1998; Mo- technologies for knowledge discovery in DSS envi- lina, 2013; Imhoff et al., 2003; Inmon, 2005; DeWitt ronments are On-line Analytical Processing (OLAP) et al., 2005; Surajit and Umeshwar, 1997; Codd et al., technologies. OLAP relies heavily upon a data model 1993; Agarwal et al., 1996; Gyssens and Lakshma- known as the multidimensional database and the Data nan, 1997; Agrawal et al., 1997; Gray et al., 1997; cube. The latter has been playing an essential role Vassiliadis, 1998a). An MDB schema contains a lo- in the implementation of OLAP (Gray et al., 1997; gical model consisting of OLAP cubes. Each OLAP Vassiliadis, 1998a). However, challenges related to Cube is described by a fact table (facts), a set of di- Performance Tuning are to be addressed. OLAP wor- mensions and a set of measures. Multiple MDB de- kload Performance Tuning is usually based on (i) in- sign methods were proposed in the litterature and are dexes, (ii) summary data, i.e. derived attributes and described in (Vassiliadis, 1998b; Cabibbo and Tor- aggregate tables, and (iii) data fragmentation. lone, 1998; Niemi et al., 2001; Hung et al., 2004; The paper outline is the following, in Section II, Nair et al., 2007; Malinowski and Zimanyi,´ 2008; we overview Performance Tuning Strategies, from de- Romero and Abello,´ 2009; Thanisch et al., 2011). velopper perspective. In Section III, we present our In (Cuzzocrea and Moussa, 2013; Cuzzocrea et al., workload analyzer and our first experience with TPC- 2013a), we detail a framework for MDB schemas de- DS Benchmark. Finally we conclude the paper. sign, successfully applied to turn TPC-H benchmark into a multi-dimensional benchmark TPC-H*d. In order to tune a data warehouse workload, we need automated recommenders on when and on how (i) to par- 2 OLAP WORKLOAD tition data and (ii) to deploy summary structures (e.g. PERFORMANCE TUNING derived attributes, aggregate tables, sketches synopsis, histograms synopsis), and (iii) to build OLAP in- The term On-line Analytical Processing (OLAP) is dexes. introduced in 1993 by E. Codd (Codd et al., 1993). 293 Cuzzocrea, A., Moussa, R. and Mumolo, E. Yet Another Automated OLAP Workload Analyzer: Principles, and Experiences. DOI: 10.5220/0006812202930298 In Proceedings of the 20th International Conference on Enterprise Information Systems (ICEIS 2018), pages 293-298 ISBN: 978-989-758-298-1 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved ICEIS 2018 - 20th International Conference on Enterprise Information Systems Many research work investigated distributed rela- Since two decades, TPC-H benchmark is the most tional data warehouses and an adjunct mid-tier for pa- used benchmark in the research community. The rallel cube calculus, namely OLAP* (Cuzzocrea et al., TPC-H benchmark (Transaction Processing Coun- 2013b). Other are investigating new systems SQL-on- cil, 2013b) exploits a classical product-order-supplier Hadoop Systems (e.g. Apache Hive, Apache Spark model. It consists of a suite of business oriented ad- SQL, Apache Drill, Cloudera Impala, IBM BigInsig- hoc queries and concurrent data modifications. The hts). Partitioning schemes are very important, good workload is composed of twenty-two parameterized data fragmentation schemes allows parallel IO and decision-support SQL queries with a high degree of parallel processing. Automated distributed database complexity and two refresh functions: RF-1 new sa- design was investigated in many research papers and les (new inserts) and RF-2 old sales (deletes). The by DBMS vendor leaders AutoPart (Papadomanolakis TPC-DS benchmark is launched for next generation and Ailamaki, 2004),DB2 Design Advisor(Zilio et al., of decision support system benchmarking to replace 2004), Database Tuning Advisor for MS SQL Server the TPC-H benchmark. It is described in next Section. (Agrawal et al., 2004a; Agrawal et al., 2004b), and DDB-Expert (Moussa, 2011). 3.1 TPC-DS Benchmark Indexes and Materialized Views are physical structures which aim at accelerating performance, TPC-DS (Transaction Processing Council, 2013a) like similarly OLAP query approximation approaches was designed to examine large volumes of data, exe- (e.g., (Cuzzocrea et al., 2009; Cuzzocrea and Ma- cute complex queries of various operational require- trangolo, 2004)). Many research papers cover auto- ments and complexities (e.g., ad-hoc, reporting, itera- mated selection of materialized views and indexes for tive OLAP, data mining) within large number of user OLAP workloads AutoAdmin (Agrawal et al., 2006), sessions. The benchmark stresses hardware system Alerter Approach (Hose et al., 2008), Semi-Automatic performance in the areas of CPU utilization, memory Index Tuning (Schnaitter and Polyzotis, 2012), Au- utilization, I/O subsystem utilization, and the ability toMDB (Cuzzocrea and Moussa, 2013; Cuzzocrea of the operating system and database software to per- et al., 2013a). Related work report experiences with form TPC-DS workload. The TPC-DS schema mo- TPC-H benchmark (Transaction Processing Council, dels seven data marts the sales and sales returns pro- 2013b). The latter is obsolete now. Its successor TPC- cess for an organization that employs three primary DS (Transaction Processing Council, 2013a) is the de- sales channels: store, catalogs, and the Internet, as facto industry standard benchmark for measuring the well as the Inventory. All data is periodically syn- performance of decision support solutions. In this pa- chronized with source OLTP databases through data- per, we turn TPC-DS into a multidimensional bench- base maintenance functions. The schema includes 7 mark and we analyze TPC-DS benchmark. fact tables and 17 dimension tables. • Fact tables: store sales, store returns, catalog sales, catalog returns, web sales, 3 A MULTI-DIMENSIONAL web returns, inventory. DATABASE TPC-DS • Dimension tables: store, call center, catalog page, web site, web page, warehouse, custo- There are few decision-support benchmarks out of the mer, customer address, customer demographics, TPC benchmarks. Next, we overview most known date dim, household demographics, item, in- DSS benchmarks, APB-1 (OLAP Council, ) has been come band, promotion, reason, ship mode, released in 1998 by the OLAP council. APB-1 wa- time dim. rehouse dimensional schema is structured around five TPC-DS workload contains 99 SQL queries, cove- fixed size dimensions and its workload is composed ring SQL99, SQL-2003 (Eisenberg et al., 2004) (i.e., of 10 queries. APB-1 is proved limited (Erik, 1998) window functions) and OLAP capabilities. TPC-DS to evaluate the specificities of various activities. It benchmark reports two main metrics (i) the Query- proposes a single performance metric termed AQM per-Hour Performance Metric (Qph@Size and (ii) (Analytical Queries per Minute). The metric AQM The Price-Performance Metric ($/Qph) which reflects denotes the number of analytical queries processed the ratio of costs to performance. per minute including data loading and computation time. The most prominent benchmarks for evaluating decision support systems are the various benchmarks issued by the Transaction Processing Council (TPC). 294 Yet Another Automated OLAP Workload Analyzer: Principles, and Experiences Figure 1: Data View of TPC-DS Cube 91 -a sub-view of Catalog Returns Datamart. 3.2 Turning TPC-DS Benchmark into a 4 OLAP WORKLOAD ANALYZER Multi-dimensional Benchmark Tuning a database is a process that includes selection In order to turn the TPC-DS benchmark into a mulm- of indexes, materialized views, derived attributes, and tidimensional benchmark, an initial schema is for- fragmentation schemas.There are a number of tools med. The initial schema consists of all the cubes that have been designed to take the responsibility required to efficiently answer the TPC-DS queries. from the database designer to advise the designer on Each query is mapped to a minimal number of OLAP good choices: SAP, Oracle, Vertica, PoWA of post- cubes. We design each OLAP cube with the relevant gres, Teradata. fact table, dimensions and measures. This leads to the definition of multiple cubes. Hereafter, we detail 4.1 TPC-DS Numbers the process leading to the definition of each cube. We used the framework for automating multidimensional We parse cubes (XML files), detect common dimensi- database schema design detailed in (Cuzzocrea and ons and measures as well as different dimensions and Moussa, 2013; Cuzzocrea et al., 2013a). measures for each pair of cubes. OLAP hypercube Cube 91 shown in Figure 1 is defined as a transform of Q91 (illustrated in Figure 4.2 Candidates Enumeration 2) into an OLAP hypercube. In the example, Cube 91 is an OLAP cube for Q91 of TPC-DS Benchmark The tuning advisor generates candidate indexes, ma- (Transaction Processing Council, 2013a).

Load more