Simba Apache Drill JDBC Driver Installation and Configuration Guide
Total Page:16
File Type:pdf, Size:1020Kb
Simba Drill JDBC Driver with SQL Connector Installation and Configuration Guide Simba Technologies Inc. Version 1.6.0 October 10, 2018 Installation and Configuration Guide Copyright © 2018 Simba Technologies Inc. All Rights Reserved. Information in this document is subject to change without notice. Companies, names and data used in examples herein are fictitious unless otherwise noted. No part of this publication, or the software it describes, may be reproduced, transmitted, transcribed, stored in a retrieval system, decompiled, disassembled, reverse-engineered, or translated into any language in any form by any means for any purpose without the express written permission of Simba Technologies Inc. Trademarks Simba, the Simba logo, SimbaEngine, and Simba Technologies are registered trademarks of Simba Technologies Inc. in Canada, United States and/or other countries. All other trademarks and/or servicemarks are the property of their respective owners. Contact Us Simba Technologies Inc. 938 West 8th Avenue Vancouver, BC Canada V5Z 1E5 Tel: +1 (604) 633-0008 Fax: +1 (604) 633-0004 www.simba.com www.simba.com 2 Installation and Configuration Guide About This Guide Purpose The Simba Drill JDBC Driver with SQL Connector Installation and Configuration Guide explains how to install and configure the Simba Drill JDBC Driver with SQL Connector on all supported platforms. The guide also provides details related to features of the driver. Audience The guide is intended for end users of the Simba Drill JDBC Driver. Knowledge Prerequisites To use the Simba Drill JDBC Driver, the following knowledge is helpful: l Familiarity with the platform on which you are using the Simba Drill JDBC Driver l Ability to use the data store to which the Simba Drill JDBC Driver is connecting l An understanding of the role of JDBC technologies in connecting to a data store l Experience creating and configuring JDBC connections l Exposure to SQL Document Conventions Italics are used when referring to book and document titles. Bold is used in procedures for graphical user interface elements that a user clicks and text that a user types. Monospace font indicates commands, source code or contents of text files. Note: A text box with a pencil icon indicates a short note appended to a paragraph. Important: A text box with an exclamation mark indicates an important comment related to the preceding paragraph. www.simba.com 3 Installation and Configuration Guide Table of Contents About the Simba Drill JDBC Driver 6 About Apache Drill 6 About the Driver 6 System Requirements 8 Simba Drill JDBC Driver Files 9 Installing and Using the Simba Drill JDBC Driver 10 Referencing the JDBC Driver Libraries 10 Registering the Driver Class 11 Building the Connection URL 12 Configuring Authentication 14 Using Your Drill User Name and Password 14 Using Kerberos 14 Using MapR-SASL 16 Configuring TLS (SSL) Verification 18 Configuring Logging 19 Features 21 Catalog and Schema Support 21 Schema Types and File Formats 21 Specifying Column Names 22 Data Types 24 Casting Binary Data 24 Security and Authentication 25 Driver Configuration Options 26 AuthMech or auth 26 CastAnyToVarchar 27 CatalogSupport 27 DelegationUID 27 enableTLS 28 disableCertificateVerification 28 disableHostVerification 28 ExcludedSchemas 29 GetMetadataWithQueries 29 www.simba.com 4 Installation and Configuration Guide IncludedSchemas 30 KrbHostFQDN or service_host 30 KrbKeytab or keytab 31 KrbPrincipal or principal 31 KrbRealm or realm 31 KrbServiceName or service_name 32 LogLevel 32 LogPath 33 NettyMaxDirectMem 33 NumNettyDirectArenas 34 PWD 34 RemoveQryCatalog 34 sasl_encrypt 35 Schema 35 StringColumnLength 35 TLSHandshakeTimeout 36 TLSProtocol 36 TLSProvider 36 trustStorePassword 37 trustStorePath 37 trustStoreType 37 UID 38 UpdateColMetadataOnSchemaChange 38 Third-Party Trademarks 39 Third-Party Licenses 40 www.simba.com 5 Installation and Configuration Guide About the Simba Drill JDBC Driver About the Simba Drill JDBC Driver About Apache Drill Apache Drill is a low latency distributed query engine capable of querying large datasets from multiple data sources using SQL. The data sources that Drill supports include file systems (local or distributed) and HBase. Drill also integrates with the Hive metastore seamlessly to complement existing Hive environments with low latency queries. Unlike traditional RDBMS or SQL-on-Hadoop solutions that require centralized schema definitions, Drill can query self-describing data as well as complex or multi-structured data that is commonly seen in big data systems. Moreover, Drill does not require a fully structured schema and can support semi-structured or nested data types such as JSON. Apache Drill processes the data in record batches and discovers the schema during the processing of each record batch. Thus, Drill has the capability to support changing schemas over the lifetime of a query. Drill reconfigures its operators and handles these situations to make sure that data is not lost. Apache Drill can support multiple Hive, HBase, and DFS data sources, including CSV, TSV, Parquet, and JSON files. Connections between data sources and Drill can be configured through Drill’s web UI. Note: For information about connecting Apache Drill to data sources, see the Apache Drill documentation: http://drill.apache.org/docs/. About the Driver The Simba Drill JDBC Driver lets organizations connect their BI tools to Drill. Drill provides an ANSI SQL query layer and also exposes the metadata information through an ANSI SQL standard metadata database called INFORMATION_SCHEMA. The Simba Drill JDBC Driver leverages INFORMATION_SCHEMA to expose Drill's metadata to BI tools as needed. The Simba Drill JDBC Driver complies with the JDBC 4.0 and 4.1 data standards. JDBC is one of the most established and widely supported APIs for connecting to and working with databases. At the heart of the technology is the JDBC driver, which connects an application to the database. For more information about JDBC, see Data www.simba.com 6 Installation and Configuration Guide About the Simba Drill JDBC Driver Access Standards on the Simba Technologies website: https://www.simba.com/resources/data-access-standards-glossary. This guide is suitable for users who want to access data residing within Drill from their desktop environment. Application developers might also find the information helpful. Refer to your application for details on connecting via JDBC. www.simba.com 7 Installation and Configuration Guide System Requirements System Requirements Each machine where you use the Simba Drill JDBC Driver must have Java Runtime Environment (JRE) 8.0 installed. www.simba.com 8 Installation and Configuration Guide Simba Drill JDBC Driver Files Simba Drill JDBC Driver Files The Simba Drill JDBC Driver is delivered in a ZIP archive named DrillJDBC41- [Version].zip, where [Version] is the version number of the driver. The archive contains the driver supporting the JDBC API version indicated in the archive name, as well as release notes and third-party license information. In addition, the required third-party libraries and dependencies are packaged and shared in the driver JAR file in the archive. www.simba.com 9 Installation and Configuration Guide Installing and Using the Simba Drill JDBC Driver Installing and Using the Simba Drill JDBC Driver To install the Simba Drill JDBC Driver on your machine, extract the files from the ZIP archive to the directory of your choice. Important: If you received a license file through email, then you must copy the file into the same directory as the driver JAR file before you can use the Simba Drill JDBC Driver. To access a Drill data store using the Simba Drill JDBC Driver, you need to configure the following: l The list of driver library files (see Referencing the JDBC Driver Libraries on page 10) l The Driver or DataSource class (see Registering the Driver Class on page 11) l The connection URL for the driver (see Building the Connection URL on page 12 ) Important: The Simba Drill JDBC Driver provides read-only access to Drill data stores. Referencing the JDBC Driver Libraries Before you use the Simba Drill JDBC Driver, the JDBC application or Java code that you are using to connect to your data must be able to access the driver JAR files. In the application or code, specify all the JAR files that you extracted from the ZIP archive. Using the Driver in a JDBC Application Most JDBC applications provide a set of configuration options for adding a list of driver library files. Use the provided options to include all the JAR files from the ZIP archive as part of the driver configuration in the application. For more information, see the documentation for your JDBC application. Using the Driver in Java Code You must include all the driver library files in the class path. This is the path that the Java Runtime Environment searches for classes and other resource files. For more information, see "Setting the Class Path" in the appropriate Java SE Documentation. www.simba.com 10 Installation and Configuration Guide Installing and Using the Simba Drill JDBC Driver For Java SE 8: l For Windows: http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html l For Linux and Solaris: http://docs.oracle.com/javase/8/docs/technotes/tools/solaris/classpath.html Registering the Driver Class Before connecting to the data store, you must register the appropriate class for your application. The Simba Drill JDBC Driver provides the following fully-qualified class names (FQCNs) to support