OLAP Expressions Are an Extremely Powerful Tool in SQL That Enable
Total Page:16
File Type:pdf, Size:1020Kb
OLAP expressions are an extremely powerful tool in SQL that enable advanced reporting features such as ranking, counting, averaging, adding, and more within a set of data processed in an SQL statement. This feature allows for data to be aggregated based upon values in a query in a manner very similar to coding control breaks in a program process. This allows for entire programs, or even applications, to be replaced by much more flexible and portable SQL statements. Reduce programming time and complexity, and improve flexibility and performance, by deploying OLAP expressions. This session will show you how! 1 2 3 If you’re keep up with IT news in recent times you’ll easily agree that analytics is a hot topic. The amount of data stored in our operational systems is increasing on a daily basis, and management is quickly learning that this information can and should be quickly harnessed in order for the business to make quick decisions concerning things such as sales directions, talent acquisition, cost containment, and more! One of the biggest challenges is to formulate answers to these questions that utilize the most current information, are inexpensive and easy to create, and can deliver the answers quickly. Many times great expense is incurred in moving data, creating data warehouses, and using specialized software to produce various reports. In addition to this, many times these reporting tools issue complex and redundant SQL to the data server that can result in excessive reporting costs. Having OLAP functionality built into the DB2 engine can help reduce some of the operational and software costs associated with getting answers to complex questions. This functionality can be used in data warehouses, but also against OLTP databases with equal results. One more tool in the IT department’s tool box for answering complex business questions. 4 Analytics is a widely growing segment of database (and non-database) processing. DB2 has the ability to perform analytics via built-in expressions. Once again, this means that instead of purchasing an expensive product, or writing thousands of lines of code, you can simply write an SQL statement that does the processing for you and creates output that is report ready! This type of processing is called Online Analytical Processing, OLAP. The constructs within the DB2 engine can be referred to as: • OLAP expressions • OLAP specification • OLAP functions • Window functions 5 DB2 provides for several OLAP specific functions, as well as a host of aggregate functions in support of OLAP expressions. Each of these functions returns a scalar result to the row being processed. The operations supporting OLAP processing can process a single row, multiple rows, or an entire result set in the calculation of the scalar value returned. A feature of this type of processing is the window. This window is a logical grouping of data within the result set, and the default window is the entire result set. Within a window OLAP processing can number or rank rows based upon an ordering. In addition, aggregation of values within an entire window or via a grouping within a window can be performed. Multiple OLAP functions can be specified in a SELECT clause mixing numbering, ranking, and aggregation. This results in some extremely powerful and flexible data analytics within the SQL language. 6 The key aspects to OLAP processing are the concepts of windowing and ordering. As stated before a window is a portion or grouping of the data in the result set. If no window is specified then the default window is the entire result set, and any ordering is applied to the entire result. If a window is specified then any ordering is within that window, and thus any calculations are based only upon the data in that window. You can specify many OLAP expressions in a single query, each of which can have its own independent windowing and ordering. 7 The first OLAP expression to explore is the numbering specification. Row numbering is the easiest concept to understand as it does exactly what its name implies, numbers rows in the output. Since windowing and ordering can be applied to row number, it is the perfect function to use to learn about these features since numbering is extremely easy to understand. Numbering is enabled via the ROW_NUMBER() function. There are no parameters to this function. One extremely important thing to remember is that row numbering is arbitrary to the final ordering of the result. You can number within windows and you can also apply an order to the numbering. However, the numbering itself is done arbitrarily. Despite the limited functionality this function can be extremely useful for things such as determining the minimum and maximum row according to an order, data sampling, and pagination (although there are some performance implications). 8 OLAP specification is best taught by example. Let’s start first with a simple process and add to it as we go along. OLAP specification allows for numbering of the result set. This numbering can be according to a specified order, or not. It can also be applied to something called a “partition” or “window” of the result table. The entire result set can be a window, and that’s what is happening in this example. Here we are selecting data from the employee table, returning the lastname and salary of our employees. We’ve specified that the result will be ordered by the lastname column. We’ve also specified the ROW_NUMBER() window function in the final SELECT of the statement. The ROW_NUMBER() function tells DB2 that the output row is to be numbered according to the ordering applied to the function, starting with the number 1 and continuing by adding 1 to the number for each additional row returned. If no ORDER BY is specified in the window then the numbering is arbitrary with respect to the order of the result table. Here specifically we said: ROW_NUMBER() OVER() We have specified no window and no ordering, and so the rows are number arbitrarily in the result set. The ORDER BY clause of the final SELECT (the only SELECT in this example) has no meaning for the numbering. So don’t be fooled by a coincidental numbering in the order of the result. 9 In this example we have specified: ROW_NUMBER() OVER(ORDER BY SALARY DESC) There is no window specified and so the numbering is over the entire result set. However, we have specified the order in which the rows are to be numbered in the result set. So the rows are numbered in the entire result set in the order of the SALARY column by descending value. Each row returned gets a number one greater than the previous row. Also notice that the ORDER BY clause of the final result table is dictating an order by LASTNAME. So the numbering is in the different sequence (SALARY DESC) than the result set (LASTNAME ASC). Already it’s becoming clear that we can create some outstanding reports simply from SQL. Cool! 10 In this example we have numbered the result over the entire result set, and so our window is the entire result table. We have numbered according to the SALARY column descending, and also ordered the result by the SALARY column descending. So our result table is in the same order as the numbers. 11 This example demonstrates a numbering of the entire result set over one order (SALARY DESC) and the ordering of that result set in a different order (WORKDEPT ASC, SALARY DESC). 12 It’s critical to the understanding of OLAP processing to understand the idea of windows, keeping in mind that windows can also be called partitions or groups. Basically a window is a logical grouping of data based upon a key value. That key value is determined by the specification of one or more expressions derived from the columns of the table or tables referenced in the FROM clause. For example: PARTITION BY WORKDEPT Will create one window for each department in the employee table. The window function being applied is then applied inside each window defined by each key value. Any ordering specified within the expression is applied within the scope of each window. In the following example the ordering of employees within a department will be by the date they were hired PARTITION BY WORKDEPT ORDER BY HIREDATE 13 In this example partitioning, also called windowing, has been introduced. In the specification of what the numbering will be over is: OVER(PARTITION BY EMP.WORKDEPT ORDER BY EMP.SALARY DESC) This tells DB2 that the result table is to be divided up by the values of the WORKDEPT column and within each of those “windows” the numbering of the rows will be based upon the SALARY column in descending sequence. So, the numbering is no longer over the entire result set, but instead it is established afresh inside each partition or window. The result table is also ordered by the same two columns in the same sequence as specified by the ORDER BY clause of the final SELECT (the only SELECT in this case). So the numbering of the rows appears consistent with the ordering of the output. The numbering of the output is simply that. There is no respect to the data in the result table and the next number is simply 1 more than the previous row within the window. So, even though Nicholls and Natz have the same salary they do not receive the row number. 14 Ranking differs from numbering in that if two or more rows within the window are not distinct they will receive the same rank.