Using Oracle Goldengate for Big Data

Oracle® Fusion Middleware Using Oracle GoldenGate for Big Data Release 19c (19.1.0.0) F19523-12 April 2021 Oracle Fusion Middleware Using Oracle GoldenGate for Big Data, Release 19c (19.1.0.0) F19523-12 Copyright © 2015, 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government. This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle. Contents Preface Audience xx Documentation Accessibility xx Conventions xx Related Information xxi 1 Introducing Oracle GoldenGate for Big Data 1.1 What’s Supported in Oracle GoldenGate for Big Data? 1-1 1.1.1 Verifying Certification, System, and Interoparability Requirements 1-1 1.1.2 What are the Additional Support Considerations? 1-2 1.2 Setting Up Oracle GoldenGate for Big Data 1-4 1.2.1 About Oracle GoldenGate Properties Files 1-4 1.2.2 Setting Up the Java Runtime Environment 1-4 1.2.3 Configuring Java Virtual Machine Memory 1-5 1.2.4 Grouping Transactions 1-6 1.3 Configuring Oracle GoldenGate for Big Data 1-6 1.3.1 Running with Replicat 1-6 1.3.1.1 Configuring Replicat 1-7 1.3.1.2 Adding the Replicat Process 1-7 1.3.1.3 Replicat Grouping 1-7 1.3.1.4 About Replicat Checkpointing 1-8 1.3.1.5 About Initial Load Support 1-8 1.3.1.6 About the Unsupported Replicat Features 1-8 1.3.1.7 How the Mapping Functionality Works 1-8 1.3.2 Overview of Logging 1-8 1.3.2.1 About Replicat Process Logging 1-9 1.3.2.2 About Java Layer Logging 1-9 1.3.3 About Schema Evolution and Metadata Change Events 1-10 1.3.4 About Configuration Property CDATA[] Wrapping 1-10 1.3.5 Using Regular Expression Search and Replace 1-11 1.3.5.1 Using Schema Data Replace 1-11 1.3.5.2 Using Content Data Replace 1-12 iii 1.3.6 Scaling Oracle GoldenGate for Big Data Delivery 1-13 1.3.7 Configuring Cluster High Availability 1-16 1.3.8 Using Identities in Oracle GoldenGate Credential Store 1-17 1.3.8.1 Creating a Credential Store 1-17 1.3.8.2 Adding Users to a Credential Store 1-17 1.3.8.3 Configuring Properties to Access the Credential Store 1-18 2 Using the BigQuery Handler 2.1 Detailing the Functionality 2-1 2.1.1 Data Types 2-1 2.1.2 Operation Modes 2-2 2.1.3 Operation Processing Support 2-3 2.1.4 Proxy Settings 2-4 2.2 Setting Up and Running the BigQuery Handler 2-4 2.2.1 Schema Mapping for BigQuery 2-4 2.2.2 Understanding the BigQuery Handler Configuration 2-4 2.2.3 Review a Sample Configuration 2-6 2.2.4 Configuring Handler Authentication 2-7 3 Using the Cassandra Handler 3.1 Overview 3-1 3.2 Detailing the Functionality 3-2 3.2.1 About the Cassandra Data Types 3-2 3.2.2 About Catalog, Schema, Table, and Column Name Mapping 3-3 3.2.3 About DDL Functionality 3-4 3.2.3.1 About the Keyspaces 3-4 3.2.3.2 About the Tables 3-4 3.2.3.3 Adding Column Functionality 3-5 3.2.3.4 Dropping Column Functionality 3-5 3.2.4 How Operations are Processed 3-6 3.2.5 About Compressed Updates vs. Full Image Updates 3-6 3.2.6 About Primary Key Updates 3-7 3.3 Setting Up and Running the Cassandra Handler 3-7 3.3.1 Understanding the Cassandra Handler Configuration 3-8 3.3.2 Review a Sample Configuration 3-10 3.3.3 Configuring Security 3-11 3.4 About Automated DDL Handling 3-11 3.4.1 About the Table Check and Reconciliation Process 3-11 3.4.2 Capturing New Change Data 3-12 iv 3.5 Performance Considerations 3-12 3.6 Additional Considerations 3-13 3.7 Troubleshooting 3-13 3.7.1 Java Classpath 3-14 3.7.2 Logging 3-14 3.7.3 Write Timeout Exception 3-14 3.7.4 Logging 3-15 3.7.5 Datastax Driver Error 3-15 4 Using the Elasticsearch Handler 4.1 Overview 4-1 4.2 Detailing the Functionality 4-1 4.2.1 About the Elasticsearch Version Property 4-2 4.2.2 About the Index and Type 4-2 4.2.3 About the Document 4-2 4.2.4 About the Primary Key Update 4-2 4.2.5 About the Data Types 4-3 4.2.6 Operation Mode 4-3 4.2.7 Operation Processing Support 4-3 4.2.8 About the Connection 4-3 4.3 Setting Up and Running the Elasticsearch Handler 4-4 4.3.1 Configuring the Elasticsearch Handler 4-4 4.3.2 About the Transport Client Settings Properties File 4-11 4.4 Performance Consideration 4-11 4.5 About the Shield Plug-In Support 4-11 4.6 About DDL Handling 4-12 4.7 Troubleshooting 4-12 4.7.1 Incorrect Java Classpath 4-12 4.7.2 Elasticsearch Version Mismatch 4-12 4.7.3 Transport Client Properties File Not Found 4-13 4.7.4 Cluster Connection Problem 4-13 4.7.5 Unsupported Truncate Operation 4-13 4.7.6 Bulk Execute Errors 4-13 4.8 Logging 4-14 4.9 Known Issues in the Elasticsearch Handler 4-14 5 Using the File Writer Handler 5.1 Overview 5-1 5.1.1 Detailing the Functionality 5-2 v 5.1.1.1 Using File Roll Events 5-2 5.1.1.2 Automatic Directory Creation 5-3 5.1.1.3 About the Active Write Suffix 5-4 5.1.1.4 Maintenance of State 5-4 5.1.1.5 Using Templated Strings 5-4 5.1.2 Configuring the File Writer Handler 5-6 5.1.3 Review a Sample Configuration 5-14 6 Using the HDFS Event Handler 6.1 Detailing the Functionality 6-1 6.1.1 Configuring the Handler 6-1 6.1.2 Configuring the HDFS Event Handler 6-1 6.1.3 Using Templated Strings 6-3 7 Using the Optimized Row Columnar Event Handler 7.1 Overview 7-1 7.2 Detailing the Functionality 7-1 7.2.1 About the Upstream Data Format 7-1 7.2.2 About the Library Dependencies 7-1 7.2.3 Requirements 7-2 7.2.4 Using Templated Strings 7-2 8 Configuring the ORC Event Handler 9 Using the Oracle Cloud Infrastructure Event Handler 9.1 Overview 9-1 9.2 Detailing the Functionality 9-1 9.3 Configuring the Oracle Cloud Infrastructure Event Handler 9-2 9.4 Configuring Credentials for Oracle Cloud Infrastructure 9-5 9.5 Using Templated Strings 9-6 9.6 Troubleshooting 9-8 10 Using the Parquet Event Handler 10.1 Overview 10-1 10.2 Detailing the Functionality 10-1 10.2.1 Configuring the Parquet Event Handler to Write to HDFS 10-1 10.2.2 About the Upstream Data Format 10-2 vi 10.2.3 Using Templated Strings 10-2 10.3 Configuring the Parquet Event Handler 10-3 11 Using the S3 Event Handler 11.1 Overview 11-1 11.2 Detailing Functionality 11-1 11.2.1

Using Oracle Goldengate for Big Data

Java Linksammlung

Analysis of Big Data Storage Tools for Data Lakes Based on Apache Hadoop Platform

Oracle Metadata Management V12.2.1.3.0 New Features Overview

Developer Tool Guide

Unravel Data Systems Version 4.5

Talend Open Studio for Big Data Release Notes

Oracle Big Data SQL Release 4.1

Hive, Spark, Presto for Interactive Queries on Big Data

Hybrid Transactional/Analytical Processing: a Survey

Hortonworks Data Platform Apache Spark Component Guide (December 15, 2017)

Schema Evolution in Hive Csv

Getting Started with Apache Avro