How to Connect to Apache Drill from Denodo

How to connect to Apache Drill from Denodo Revision 20200610 NOTE This document is confidential and proprietary of Denodo Technologies. No part of this document may be reproduced in any form by any means without prior written authorization of Denodo Technologies. Copyright © 2021 Denodo Technologies Proprietary and Confidential How to connect to Apache Drill from Denodo 20200610 2 of 20 CONTENTS 1 GOAL...............................................................................3 2 OVERVIEW........................................................................4 2.1 DRILL SETUP.................................................................................4 2.2 CREATING A JDBC DATA SOURCE.....................................................6 2.3 CREATING BASE VIEWS................................................................13 How to connect to Apache Drill from Denodo 20200610 3 of 20 1 GOAL This document explains how to integrate Apache Drill into the Denodo Platform using its JDBC driver. How to connect to Apache Drill from Denodo 20200610 4 of 20 2 OVERVIEW Apache Drill is a distributed and schema-free SQL query engine that can connect to data stored in multiple formats (avro, parquet, csv, json,...) and datastores and provide a common SQL interface. Apache Drill As Drill was kicked off by MapR (many of the committers and contributors to Drill have come from MapR), and it is included in the MapR Data Platform this document will take MapR as the reference environment. 2.1 DRILL SETUP Check which storage plugins are configured by visiting the Drill Web Console at https://<HOST>:8047/storage. For this guide we have enabled the storage plugins that allows MapR Filesystem, MapR Database and Hive to serve as data sources for Drill: How to connect to Apache Drill from Denodo 20200610 5 of 20 Drill Storage Plugins configuration You can find here the supported data sources and formats in Drill on MapR: https://mapr.com/docs/61/Drill/drill_storage_and_format_plugin_support_matrix.html How to connect to Apache Drill from Denodo 20200610 6 of 20 2.2 CREATING A JDBC DATA SOURCE Download the JDBC driver Before being able to connect to your Apache Drill instance, you need to download the JDBC driver. There are two different options that you could use as JDBC driver: 1. Drill JDBC driver provided by MapR a. This driver supports MapR-SASL and Plain authentication mechanism b. Note: The current version MapRDrill_jdbc_v1.6.0.1001 of this driver does not work with Java 9 and newer due to a bug in the driver. Although the bug was fixed in Drill version 1.15, the MapR driver was not updated yet to include the latest fixes. If you are on Denodo 8.0 (which uses Java 11) and while the driver was not updated yet, you should use the next option. See https://issues.apache.org/jira/browse/DRILL-6349 2. The JDBC driver (drill-jdbc-all.jar file) included with Drill distribution from http://apache.osuosl.org/drill a. This driver supports Kerberos and Plain authentication mechanisms, but does not support the MapR-SASL authentication mechanism. There is no need to install both of them, but in case you face issues with one, you could try the other one. Modify and Install the JDBC driver Before we can go ahead and install the JDBC driver, we need to modify them as the drivers may include dependencies that conflict with the ones included in the Denodo Platform. If you do not follow these steps, you will receive different kinds of error messages when testing the connection and will not be able to connect. So, depending on the driver you use, please consider these notes: 1. Drill JDBC driver provided by MapR ! Important note The slf4j dependency should be deleted from the MapRDrillJDBC41- 1.<version> directory, otherwise Denodo will fail: How to connect to Apache Drill from Denodo 20200610 7 of 20 The jar file that you need to delete from that directory is slf4j-api- <version>.jar 2. Apache Drill driver ! Important note The org.slf4j, javax.xml, org.w3c and org.xml dependencies need to be deleted from the drill-jdbc-all.jar file, otherwise Denodo will fail with errors like these: How to connect to Apache Drill from Denodo 20200610 8 of 20 For deleting these dependencies, you should open the drill-jdbc-all.jar file as an archive and delete the following folders from it: ● org/slf4j ● org/w3c ● org/xml ● javax/xml Starting from Denodo 8.0, the preferred way of installing the JDBC driver is to upload the JDBC driver via the Administration Tool. This can be done under File > Extension Management > Libraries > Import. Select jdbc_other and provide a custom name for using it later. Import all the necessary jars for connecting to Drill. How to connect to Apache Drill from Denodo 20200610 9 of 20 In previous versions, the recommended way is to copy the driver jars to the folder <DENODO_HOME>/lib-external/jdbc-drivers/database name - version Later, you need to provide the 'database name - version' in the driver class path. Create the JDBC data source From the Web Design Studio or from the Virtual DataPort Administration tool, create a new JDBC data source by selecting File > New > Data source > JDBC. This will open a wizard to create a connection with a JDBC driver. Fill all the required fields: 1. Database adapter: Generic 2. Driver class: Depending on your chosen driver, you need to specify either com.mapr.drill.jdbc41.Driver for the MapR driver or org.apache.drill.jdbc.Driver for the Apache Drill driver. 3. Driver class path: Select the driver class path that you have uploaded previously or, for previous versions, copy the driver jars to the folder <DENODO_HOME>/lib-external/jdbc-drivers/database name - version and fill the wizard field with the value ‘database name - version’ ! Important The MapR native library is included in the MapR Drill driver and can be loaded only once. Therefore, if you plan to access to other MapR sources with Denodo, like: ● MapR FileSystem with HDFS Custom Wrapper ● MapR Database with HBase Custom Wrapper ● MapR Event Store with Kafka Custom Wrapper you have to use the same classpath to configure all the custom wrappers and the Drill JDBC driver. With this configuration Denodo can reuse the same classloader and load the native library only once. 4. Database URI. There are two possibilities: i. Recommended: Connect to the ZooKeeper. ZooKeeper returns to the client the available Drillbits in the cluster to which the query can be submitted. The URI is of the form: How to connect to Apache Drill from Denodo 20200610 10 of 20 jdbc:drill:zk=<zk.connect>/<drill_directory_in_zookeepe r>/<cluster_ID> Being the configuration file <DRILL-HOME>/conf/drill- distrib.conf: the JDBC URI will be: jdbc:drill:zk=maprdemo:5181/drill/demo_mapr_com- drillbits ii. Connect directly to a specific Drillbit, though this is generally not recommended due to the hardcoding that can happen in client. jdbc:drill:drillbit=<host>:<port> c. Delimiter. The default setting for identifier quotes in Drill is backticks. Therefore, in the Read & Write tab of the data source configuration, set backticks (`) as the Delimiter Identifier. How to connect to Apache Drill from Denodo 20200610 11 of 20 Drill Delimiter configuration d. Ping query. Enable Test connections and configure Select 1 in Connection Pool Configuration > Ping query Drill Ping Query configuration e. Source Configuration. Drill does not support Prepared-statement dynamic parameters. Therefore, set Allow Literal as Parameters to false in the Source Configuration tab of the data source. Drill Source Configuration Otherwise, you will receive the following errors when adding a WHERE clause to filter the data returned by a base view: How to connect to Apache Drill from Denodo 20200610 12 of 20 ● MapR Driver: - [*]SQLE [MapR][JDBC](10940) Invalid parameter index: 1 - [MapR][DrillJDBCDriver](500980) Encountered error while creating prepared statement. Details: PLAN ERROR: Cannot convert RexNode to equivalent Drill expression. RexNode Class: org.apache.calcite.rex.RexDynamicParam, RexNode Digest: ?0 ● Apache driver: Received exception with message 'Prepared-statement dynamic parameters are not supported.' Drill data source with MapR JDBC Driver Note: Due to the aforementioned bug in the driver MapRDrill_jdbc_v1.6.0.1001 this shows an example configuration working for Denodo 7.0, but not for Denodo 8.0. See https://issues.apache.org/jira/browse/DRILL-6349. If you plan to use the MapR driver, make sure that the latest MapR includes this drill fix. As of time of writing, there is no updated MapR driver working with Java 9 or newer versions. How to connect to Apache Drill from Denodo 20200610 13 of 20 Drill data source with Apache Drill JDBC Driver 2.3 CREATING BASE VIEWS Once we have configured our data source pointing to Drill, we can create different views for accessing the desired entities. Denodo will display a tree with the schemas available in Drill. In this example it will include the Hive and MapR Database schemas, as well as the workspaces configured in the file system storage plugin: How to connect to Apache Drill from Denodo 20200610 14 of 20 Drill schemas ● For Hive databases you can inspect the full schema, its tables and their fields, as the metadata is available in the Hive metastore. Then, you have to select the tables you want to import and click the Create selected button. ● For MapR Database you can inspect the schema partially. Wide-column NoSQL databases can be schema-less by design; every row has its own set of column name-value pairs in a given column family, and the column value can be of any data type. A MAP complex type represents this variable column name-value structure. As shown in the image above, day, hour and total are column families and are represented as MAP. Then, you have to select the tables you want to import and click the Create selected button.

How to Connect to Apache Drill from Denodo

A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem

Performance Tuning Apache Drill on Hadoop Clusters with Evolutionary Algorithms

Apache Calcite: a Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

Umltographdb: Mapping Conceptual Schemas to Graph Databases

Technology Overview

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"

Apache Drill

Technical Expertise

Apache Calcite for Enabling SQL Access to Nosql Data Systems Such As Apache Geode Christian Tzolov Whoami Christian Tzolov

A Big Data Revolution Cloud Computing, 22 Aerospike, 91, 217 Competing Definitions, 21 Aerospike Query Language (AQL), 218 Industrial Revolution, 22 AJAX

Bright Cluster Manager 8.0 Big Data Deployment Manual

Modern ETL Tools for Cloud and Big Data