<<

Using the Create Base View From Query Option

Revision 20200810

NOTE This document is confidential and proprietary of Denodo Technologies. No part of this document may be reproduced in any form by any means without prior written authorization of Denodo Technologies.

Copyright © 2021 Denodo Technologies Proprietary and Confidential Using the Create Base View From Query Option 20200810 2 of 15

CONTENTS

1 GOAL...... 3

2 INTRODUCTION...... 4 2.1 INTERPOLATION VARIABLES...... 4 2.2 WHEREEXPRESSION...... 5

3 BEST PRACTICES...... 7 3.1 PROS...... 7 3.2 CONS AND SOFTWARE LIMITATIONS...... 7

4 EXAMPLES OF USE...... 9 4.1 ANALYTICAL FUNCTION (RANK)...... 9 4.2 ANALYTICAL FUNCTION (NTILE)...... 12 Using the Create Base View From Query Option 20200810 3 of 15

1 GOAL

Denodo allows to create base views on top of JDBC and ODBC data sources that, when queried, will execute the SQL query provided in the definition of the base view. This document explains when and how to use this option. The limitations of this functionality will also be explained. Using the Create Base View From Query Option 20200810 4 of 15

2 INTRODUCTION

Developers have two options to create base views in Denodo from JDBC and ODBC sources: ● Graphically: this is the usual way of creating JDBC/ODBC views. After creating the data source, the user has to the tables to create base views. ● SQL Sentence: by using the option ‘Create From Query’, the user specifies a SQL query to be executed when querying the base view. When writing this SQL query, consider the following: ○ This SQL query has to use the syntax of the queried and not the syntax of Denodo. ○ The query may have interpolation variables, which allow the SQL query sent to the database to be parameterized according to the conditions of the query executed over the base view. ○ The SQL query should specify an alias to all the columns of the query that are the result of an expression that has one or more input parameters and at least one is an interpolation variable. For instance: SELECT ABS( ROUND( @value ) ) AS value_abs_round… If an alias is not defined, Denodo will automatically generate a name for the corresponding in the created base view, which can be counter-intuitive and change from one data source to another. ○ If the SQL query invokes a PL/SQL function using an interpolation variable, you also need to assign an alias to the return value. For instance: SELECT function(@INPUTVAR) AS value FROM sales

This document is focused on the ‘SQL Sentence’ option. The following sections explain pros and cons of using this option and some advice on when and how to use it.

2.1 INTERPOLATION VARIABLES Interpolation variables allow the SQL query sent to the database to be parameterized according to the conditions of the query executed on the base view. By using these variables, you do not have to provide the value of a field in an expression at design time (when creating the base view). Instead, you will provide the values of these variables at runtime, when you query the view.

For instance, suppose you have a SALES in a database and you want to create a base view with a SQL Sentence. You could create a base view SALES_VDB by writing a query like this one:

SELECT Function(amount) as AVERAGE FROM SALES WHERE product_id = @product_id and then execute it with different values for the product_id field. Note that the query uses a function in the SELECT clause, this is a typical example of the use of the ‘Create Base View From Query’ option when using a database-proprietary function.

This way, if we execute the following query on the base view in Denodo: Using the Create Base View From Query Option 20200810 5 of 15

SELECT * FROM SALES_VDB WHERE product_id = 2

The Server will substitute the variable @product_id with the value from the WHERE clause of the query and execute:

SELECT Function(amount) AS average FROM SALES WHERE product_id = 2

Each interpolation variable must be related with an attribute belonging to the generated base view, so that the variable can obtain its value at run time. If the SQL query used to access the data source returns any field with the same name as the variable, then the variable will be associated to the corresponding base view attribute. If the SQL query does not return any field with the same name as the variable, Denodo will add a new attribute to the base view, with the same name.

2.2 WHEREEXPRESSION Denodo provides a predefined interpolation variable called WHEREEXPRESSION that simplifies the process of creating a base view from an SQL query. At runtime, the Server will replace WHEREEXPRESSION with the condition sent to the base view.

If we create a base view named SALES_VDB with the following SQL query:

SELECT sales_id, product_id, customer_id, profit_margin as margin, StoredProcedure(amount), sales_date FROM sales WHERE @WHEREEXPRESSION we will be able to execute queries like the following on the base view in Denodo:

SELECT * FROM SALES_VDB WHERE margin > 0 AND getyear(sales_date) > 2010

Note that the user has to specify the actual name of the fields for the aliases used in the SELECT clause of the query. For instance, the SQL query defines the margin alias for profit_margin. Therefore, the user must specify the profit_margin value for the attribute named margin in the VQL query.

The Server will substitute the variable WHEREEXPRESSION with the condition of the WHERE clause of the query:

SELECT sales_id, product_id, customer_id, profit_margin as margin, StoredProcedure(amount), sales_date FROM sales WHERE profit_margin > 0 AND extract (year from sales_date) > 2010

Note that the alias name is translated to its original name when the query is finally executed.

Sometimes, you may need to filter the results of the SQL query using fields that are not projected and, therefore, will not appear in the base view schema. Denodo provides a way to do this. A typical example happens when the SQL query used to Using the Create Base View From Query Option 20200810 6 of 15

create the base view uses GROUP BY. For instance, suppose the SALES table has a field store_country that is not included in the schema of a new virtual view SALES_VDB2 defined as:

SELECT product_id, customer_id, sum(profit_margin) as total_margin FROM sales WHERE @WHEREEXPRESSION GROUP BY product_id, customer_id but you want to filter the results of the SALES table by store_country, you can execute queries like this over the Denodo base view:

SELECT * FROM SALES_VDB2 WHERE store_country = ‘US’ and they will be translated to queries like this executed on the data source:

SELECT product_id, customer_id, sum(profit_margin) as total_margin FROM sales WHERE store_country = ‘US’ GROUP BY product_id, customer_id

In order to make this work, you have to specify STORE_COUNTRY in the option ‘List of field names not included in the above list that can appear in the delegated conditions’ when you create a base view with WHEREEXPRESSION.

To avoid forcing the queries to always have the WHERE clause, Denodo provides the interpolation function ExecuteIfIsNotNull. For more information about this interpolation function and the WHEREEXPRESSION variable, you can check the section Using the WHEREEXPRESSION Variable of the Virtual DataPort Administration Guide.

Please note that the interpolation variable WHEREEXPRESSION cannot be used with ODBC data sources. Only JDBC data sources are allowed. Using the Create Base View From Query Option 20200810 7 of 15

3 BEST PRACTICES

This section will explain the main pros and cons of using the ‘Create from query’ option and some recommendations about when and how to use it.

3.1 PROS You should use this option when: 1. You have complex queries already built. Development process can be faster if you want to import the results of a complex query you have already written (for instance, because you use it for another application, as some BI tool for reporting). You will only have to copy & paste it instead of replicating the hierarchy of views in Denodo. 2. You have to use database-proprietary functions or constructions. ○ If the reason to create a view with a SQL Sentence is because you want to use a function not included in Denodo, and you are going to use it in several views, it is better to create a delegable custom function instead (when possible). Read the section Developing Custom Functions of the Developer Guide to learn how to develop custom functions. 3. You have an SQL query that has been tuned to perform better than the query that Denodo will execute. For instance, you may want to provide a certain optimization hint to the database using its proprietary syntax. 4. You have to execute a of the database or a query that uses a stored procedure, for where this is not graphically supported (currently, only Oracle, SQL Server and DB2 are supported).

Keep also in mind that, when possible you should enable the option ‘Delegate SQL sentence as subquery’(enabled by default starting from Denodo 6.0), located in Options > Search methods > wrapper source configuration in VDP Admin Tool, for the base views created using this option in order to let Denodo delegate more queries to the data source (as shown in Figure 4 and Figure 5). One limitation on this option is explained in the following section.

3.2 CONS AND SOFTWARE LIMITATIONS 1. Using a SQL sentence may be harder to maintain and reuse: ○ For instance, suppose that you have many views that use the same subquery in their definitions; if the subquery changes, you will need to manually modify it in every view. Conversely, if the subquery were a Denodo view, you would only have to do the modification in one place. A better alternative in this case could be using an hybrid approach where the common subquery is encapsulated in a different view (see Figure 1). Using the Create Base View From Query Option 20200810 8 of 15

SELECT SELECT SELECT Modify 3 a, b f1, f2 a, b views FROM FROM FROM (SELEC (SELECT (SELEC T ...) t ...) t T ...) t SELECTWHERE . JOIN ... a, b .. FROM t SELECT Modify 1 f1, f2 SELEC view FROM t T ... WHERE . SELECT .. a, b FROM t FigureJOIN 1 .... Lack of encapsulation vs. hybrid approach 2. You will not benefit from the tree view, used by, data lineage and propagation of changes functionalities among the tables used in the SQL Sentence. The query is a ‘black box’ for Denodo. 3. As stated before, in order to delegate more queries over the base view, it is recommended to enable the option ‘Delegate SQL sentence as subquery’. This option has some limitations: ○ Limitations in delegation: if the SQL Sentence includes mandatory fields (i.e, it has interpolation variables) joins with other views from the same data source will not be delegated. ○ With this option enabled, you should not use interpolation variables or WHEREEXPRESSION. If you do, please, make sure you need them because of one of the following two reasons (otherwise, them): ■ When you want to parameterize expressions not supported by Denodo. Eg: parameterize the number of divisions in the percentile functions, as shown in Figure 6. ■ When you want to parameterize queries with conditions whose result is different if executed above or under the query (e.g: group by, outer joins). This way you have two options: ● Use interpolation variables: has the inconvenience that the condition is wired up. ● Use a WHEREEXPRESSION: the condition is not wired up, but you cannot use the ‘Delegate SQL sentence as subquery’ option. Using the Create Base View From Query Option 20200810 9 of 15

4 EXAMPLES OF USE

4.1 ANALYTICAL FUNCTION (RANK) In this example we want to retrieve the sales grouped by customer, and for each group, list the results in a ranking ordered by each sale’s profit. This way, the first tuples of each group will show the best customer’s sales. Our data source is an and we want to make use of the rank analytical function. Figure 2 shows how to create a base view in Denodo for this:

Figure 2. Example of SQL Sentence with rank function

Now, we can execute queries like:

SELECT * FROM rank_of_sales_by_customer WHERE ranking <= 5 Using the Create Base View From Query Option 20200810 10 of 15

in order to get the top #5 sales grouped by customer.

Figure 3. Result of query: top #5 sales by customer

Figure 4 and Figure 5 show the difference in delegation depending on whether the option ‘Delegate SQL Sentence as Sub Query’ is enabled or not: Using the Create Base View From Query Option 20200810 11 of 15

Figure 4. Execution trace with ‘Delegate SQL Sentence as Sub Query’ set to false: filtering executed by Denodo

Figure 5. Tree view with ‘Delegate SQL Sentence as Sub Query’ set to true: filtering executed by Oracle Using the Create Base View From Query Option 20200810 12 of 15

When the ‘Delegate SQL Sentence as Sub Query’ option is enabled, the WHERE clause is delegated to Oracle, so the filter is done directly in the source, with the corresponding positive impact on performance, because less tuples are moved through the network from the sources to Denodo.

4.2 ANALYTICAL FUNCTION (NTILE) In this example we use the function ntile to divide the results of the previous example in a number of tiles (this number is specified by the user). So the query will be:

Figure 6. NTILE function with an interpolation variable Using the Create Base View From Query Option 20200810 13 of 15

Figure 7. Results of query with 4 tiles

4.2.1 Difference filtering above or under the SQL Sentence view Results from queries may differ if the query conditions are directly defined in the base view created by means of a SQL Sentence or if they are applied over that query.

For instance, consider the base view ntile_of_sales_by_customer_parameterized (where the filtering by ss_net_profit is directly defined in the SQL query) defined as:

SELECT ss_item_sk, ss_ticket_number, c_customer_sk customer_id, c_first_name customer_first_name, c_last_name customer_last_name, ss_net_profit, rank() OVER( by c_customer_sk ss_net_profit DESC) AS ranking, ntile(@num_of_tiles) OVER(partition by c_customer_sk ORDER BY ss_net_profit DESC) AS ntile FROM EDW_TPCDS.CUSTOMER INNER JOIN ADMIN.STORE_SALES ON (c_customer_sk = ss_customer_sk) WHERE ss_net_profit > @net_profit GROUP BY c_customer_sk, c_first_name, c_last_name, ss_item_sk, ss_ticket_number, ss_net_profit Using the Create Base View From Query Option 20200810 14 of 15

If we execute the query:

SELECT * FROM ntile_of_sales_by_customer_parameterized WHERE num_of_tiles = 4 AND ss_net_profit > 1000

We get the following results:

Figure 8. Filtering before creating groups

The result is different if we execute the same query over the previously created view ntile_of_sales_by_customer:

SELECT * FROM ntile_of_sales_by_customer WHERE num_of_tiles = 4 AND ss_net_profit > 1000 Using the Create Base View From Query Option 20200810 15 of 15

Figure 9. Filtering after creating groups

This is because in the first case, the tiles are calculated without the tuples with ss_net_profit <= 1000 and in the second case with them (although they are filtered later from the final results).