Oracle® Big Data Connectors User's Guide
Total Page:16
File Type:pdf, Size:1020Kb
Oracle® Big Data Connectors User's Guide Release 4 (4.12) E93614-03 July 2018 Oracle Big Data Connectors User's Guide, Release 4 (4.12) E93614-03 Copyright © 2011, 2018, Oracle and/or its affiliates. All rights reserved. Primary Author: Frederick Kush This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency- specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government. This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle. Contents Preface Audience xv Documentation Accessibility xv Related Documents xvi Text Conventions xvi Syntax Conventions xvi Changes in Oracle Big Data Connectors Release 4.12 xvi Part I Setup 1 Getting Started with Oracle Big Data Connectors 1.1 About Oracle Big Data Connectors 1-1 1.2 Big Data Concepts and Technologies 1-2 1.2.1 What is MapReduce? 1-3 1.2.2 What is Apache Hadoop? 1-3 1.3 Downloading and Installing Oracle Big Data Connectors 1-4 1.4 Certified Hadoop Platforms 1-5 1.5 Oracle SQL Connector for Hadoop Distributed File System Setup 1-5 1.5.1 Software Requirements 1-5 1.5.2 Installing and Configuring a Hadoop Client on the Oracle Database System 1-6 1.5.3 Installing Oracle SQL Connector for HDFS 1-8 1.5.4 Oracle Database Privileges for OSCH Users 1-12 1.5.5 OS-Level Requirements for OSCH Users 1-13 1.5.6 Using Oracle SQL Connector for HDFS on a Secure Hadoop Cluster 1-13 1.5.7 Using OSCH in Oracle SQL Developer 1-14 1.6 Oracle Loader for Hadoop Setup 1-14 1.6.1 Software Requirements 1-14 1.6.2 Installing Oracle Loader for Hadoop 1-14 1.6.3 Providing Support for Offline Database Mode 1-15 1.6.4 Using Oracle Loader for Hadoop on a Secure Hadoop Cluster 1-16 iii 1.7 Oracle Shell for Hadoop Loaders Setup 1-16 1.7.1 Installing Oracle Shell for Hadoop Loaders on a Hadoop Node 1-18 1.7.2 Configuring OHSH to Enable Job Monitoring 1-19 1.8 Oracle XQuery for Hadoop Setup 1-21 1.8.1 Software Requirements 1-21 1.8.2 Installing Oracle XQuery for Hadoop 1-21 1.8.3 Troubleshooting the File Paths 1-23 1.8.4 Configuring Oozie for the Oracle XQuery for Hadoop Action 1-24 1.9 Oracle R Advanced Analytics for Hadoop Setup 1-24 1.9.1 Installing the Software on Hadoop 1-25 1.9.1.1 Software Requirements for a Third-Party Hadoop Cluster 1-25 1.9.1.2 Installing Sqoop on a Third-Party Hadoop Cluster 1-26 1.9.1.3 Installing Hive on a Third-Party Hadoop Cluster 1-26 1.9.1.4 Installing R on a Hadoop Client 1-27 1.9.1.5 Installing R on a Third-Party Hadoop Cluster 1-27 1.9.1.6 Installing the ORCH Package on a Third-Party Hadoop Cluster 1-27 1.9.2 Installing Additional R Packages 1-28 1.9.3 Providing Remote Client Access to R Users 1-30 1.9.3.1 Software Requirements for Remote Client Access 1-30 1.9.3.2 Configuring the Server as a Hadoop Client 1-30 1.9.3.3 Installing Sqoop on a Hadoop Client 1-31 1.9.3.4 Installing R on a Hadoop Client 1-31 1.9.3.5 Installing the ORCH Package on a Hadoop Client 1-31 1.9.3.6 Installing the Oracle R Enterprise Client Packages (Optional) 1-31 1.10 Oracle Data Integrator 1-32 1.11 Oracle Datasource for Apache Hadoop Setup 1-32 1.11.1 Configuring HiveServer2 1-32 Part II Oracle Database Connectors 2 Oracle SQL Connector for Hadoop Distributed File System 2.1 About Oracle SQL Connector for HDFS 2-1 2.2 Getting Started With Oracle SQL Connector for HDFS 2-2 2.3 Configuring Your System for Oracle SQL Connector for HDFS 2-6 2.4 Using Oracle SQL Connector for HDFS with Oracle Big Data Appliance and Oracle Exadata 2-7 2.5 Using the ExternalTable Command-Line Tool 2-7 2.5.1 About ExternalTable 2-7 2.5.2 ExternalTable Command-Line Tool Syntax 2-7 2.6 Creating External Tables 2-9 iv 2.6.1 Creating External Tables with the ExternalTable Tool 2-9 2.6.2 Creating External Tables from Data Pump Format Files 2-10 2.6.2.1 Required Properties 2-10 2.6.2.2 Optional Properties 2-10 2.6.2.3 Defining Properties in XML Files for Data Pump Format Files 2-11 2.6.2.4 Example 2-11 2.6.3 Creating External Tables from Hive Tables 2-12 2.6.3.1 Hive Table Requirements 2-13 2.6.3.2 Data Type Mappings 2-13 2.6.3.3 Required Properties 2-13 2.6.3.4 Optional Properties 2-14 2.6.3.5 Defining Properties in XML Files for Hive Tables 2-14 2.6.3.6 Example 2-15 2.6.3.7 Creating External Tables from Partitioned Hive Tables 2-16 2.6.4 Creating External Tables from Delimited Text Files 2-20 2.6.4.1 Data Type Mappings 2-20 2.6.4.2 Required Properties 2-20 2.6.4.3 Optional Properties 2-21 2.6.4.4 Defining Properties in XML Files for Delimited Text Files 2-21 2.6.4.5 Example 2-23 2.6.5 Creating External Tables in SQL 2-23 2.7 Updating External Tables 2-24 2.7.1 ExternalTable Syntax for Publish 2-24 2.7.2 ExternalTable Example for Publish 2-25 2.8 Exploring External Tables and Location Files 2-25 2.8.1 ExternalTable Syntax for Describe 2-25 2.8.2 ExternalTable Example for Describe 2-26 2.9 Dropping Database Objects Created by Oracle SQL Connector for HDFS 2-26 2.9.1 ExternalTable Syntax for Drop 2-26 2.9.2 ExternalTable Example for Drop 2-27 2.10 More About External Tables Generated by the ExternalTable Tool 2-27 2.10.1 About Configurable Column Mappings 2-27 2.10.1.1 Default Column Mappings 2-27 2.10.1.2 All Column Overrides 2-27 2.10.1.3 One Column Overrides 2-28 2.10.1.4 Mapping Override Examples 2-28 2.10.2 What Are Location Files? 2-29 2.10.3 Enabling Parallel Processing 2-29 2.10.3.1 Setting Up the Degree of Parallelism 2-29 2.10.4 Location File Management 2-30 2.10.5 Location File Names 2-30 v 2.11 Configuring Oracle SQL Connector for HDFS 2-31 2.11.1 Creating a Configuration File 2-31 2.11.2 Oracle SQL Connector for HDFS Configuration Property Reference 2-32 2.12 Performance Tips for Querying Data in HDFS 2-45 3 Oracle Loader for Hadoop 3.1 What Is Oracle Loader for Hadoop? 3-1 3.2 Interfaces to Oracle Loader For Hadoop 3-2 3.3 Getting Started With Oracle Loader for Hadoop 3-2 3.3.1 Additional Information 3-6 3.4 Using Oracle Loader for Hadoop With the Hadoop Command Line Utility 3-7 3.4.1 About the Modes of Operation 3-8 3.4.1.1 Online Database Mode 3-8 3.4.1.2 Offline Database Mode 3-10 3.4.2 Creating the Target Table 3-12 3.4.2.1 Supported Data Types for Target Tables 3-12 3.4.2.2 Supported Partitioning Strategies for Target Tables 3-12 3.4.2.3 Compression 3-13 3.4.3 Creating a Job Configuration File 3-13 3.4.4 About the Target Table Metadata 3-15 3.4.4.1 Providing the Connection Details for Online Database Mode 3-15 3.4.4.2 Generating the Target Table Metadata for Offline Database Mode 3-16 3.4.5 About Input Formats 3-18 3.4.5.1 Delimited Text