Red Hat Integration 2020-Q2 Data Virtualization Tutorial
Total Page:16
File Type:pdf, Size:1020Kb
Red Hat Integration 2020-Q2 Data Virtualization Tutorial TECHNOLOGY PREVIEW - Data Virtualization Tutorial Last Updated: 2020-07-11 Red Hat Integration 2020-Q2 Data Virtualization Tutorial TECHNOLOGY PREVIEW - Data Virtualization Tutorial Legal Notice Copyright © 2020 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project. The OpenStack ® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community. All other trademarks are the property of their respective owners. Abstract Combine data from multiple sources so that applications can connect to a single, virtual data model Table of Contents Table of Contents .C . H. .A . P. .T .E . R. 1.. .O . .V . E. .R .V . I. E. W. .O . .F . T. .H . E. D. .A . T. .A . .V . I.R . T. U. A. .L .I Z. .A . T. .I O. .N . T. U. .T . O. R. I. A. .L . 3. .C . H. .A . P. .T .E . R. 2. G. E. T. .T . I.N . G. .S .T . A. .R . T. .E .D . W. I.T .H . D. .A . T. A. V. .I .R .T . U. .A . L. .I Z. .A . T. I. O. .N . .4 . .C . H. .A . P. .T .E . R. 3. I. N. .T . R. .O . D. .U . C. .T . I.O . .N . .T . O. D. .A . T. .A . .V . I.R . T. .U . A. .L .I Z. .A . T. .I O. .N . 6. .C . H. .A . P. .T .E . R. 4. .S .E . T. .T . I.N . G. U. .P . .T . H. .E . .E . N. .V . I.R . O. .N . .M . E. .N . T. 7. 4.1. CLONING THE TUTORIAL RESOURCES 7 4.2. CREATING THE SOURCE DATABASE 7 .C . H. .A . P. .T .E . R. 5. A. .D . .D . I.N . G. .T .H . E. D. .A . T. .A . .V . I.R . T. .U . A. .L .I .Z .A . T. .I O. .N . O. .P . E. .R . A. .T .O . .R . .T .O . T. .H . E. .P . R. .O . .J .E . C. .T . 1.0 . 5.1. CREATING A DOCKER-REGISTRY PULL SECRET 10 5.2. INSTALLING THE DATA VIRTUALIZATION OPERATOR IN YOUR PROJECT 11 5.3. LINKING THE PULL SECRET TO THE DATA VIRTUALIZATION OPERATOR SERVICE ACCOUNT 12 .C . H. .A . P. .T .E . R. 6. .B . U. .I L. .D . I.N . G. T. .H . E. V. .I R. .T . U. .A . L. D . .A . T. A. .B . A. .S . E. 1.4 . 6.1. VIRTUAL DATABASE CUSTOM RESOURCES 14 6.1.1. Data source configuration 14 6.1.2. DDL for defining the virtual database 15 6.1.2.1. Virtual database creation 16 6.1.2.2. Translator definition 16 6.1.2.3. External data source definitions 16 6.1.2.4. Schema creation 17 6.1.2.5. Metadata import 17 6.1.2.6. Virtual schema definition 17 6.2. COMPLETED VIRTUAL DATABASE CUSTOM RESOURCE FILE 19 .C . H. .A . P. .T .E . R. 7. D. E. P. .L . O. .Y . I.N . .G . .T . H. .E . .V . I.R . T. U. .A . L. D. .A . T. .A . B. .A . S. .E . 2. .1 . .C . H. .A . P. .T .E . R. 8. .C . O. .N . N. E. C. .T . I.N . .G . .C . L. .I E. .N . T. .S . T. .O . .T . H. .E . .V . I.R . T. .U . A. .L . D. A. .T .A . B. .A . S. .E . 2. 2. 8.1. CONNECTING AN INTERNAL JDBC CLIENT 22 8.1.1. Installing SQLLine 22 8.1.2. Connecting SQLLine to the Portfolio virtual database 23 8.2. CONNECTING TO THE VIRTUAL DATABASE FROM AN EXTERNAL JDBC CLIENT 24 8.2.1. Configuring an OpenShift load balancer service to enable external JDBC clients to access the virtual database 24 8.2.2. Enabling external JDBC client access through port forwarding 25 8.2.3. Installing the SQuirreL JDBC client 26 8.2.4. Configuring SQuirreL to connect to the Portfolio virtual database 27 8.2.5. Querying the Portfolio virtual database from the SQuirreL SQL client 28 8.3. SAMPLE QUERIES 28 8.4. ACCESS THE VIRTUAL DATABASE THROUGH THE ODATA API 29 1 Red Hat Integration 2020-Q2 Data Virtualization Tutorial 2 CHAPTER 1. OVERVIEW OF THE DATA VIRTUALIZATION TUTORIAL CHAPTER 1. OVERVIEW OF THE DATA VIRTUALIZATION TUTORIAL This tutorial demonstrates how to use Red Hat Integration data virtualization to create a customer portfolio virtual database. The virtual database that you create integrates data from the following two sources: A postgreSQL accounts database Stores customer account information, and data about the stock holdings for each customer. A web-based REST service Provides live stock market price data. By integrating data from these two sources, the Portfolio database calculates the value of individual customer portfolios based on current stock prices. After we deploy the virtual database, we’ll submit queries to it to demonstrate how data from both of sources is combined. Figure 1.1. Architecture of the Portfolio virtual database in this tutorial 3 Red Hat Integration 2020-Q2 Data Virtualization Tutorial CHAPTER 2. GETTING STARTED WITH DATA VIRTUALIZATION In this tutorial we demonstrate how to create and use a "Portfolio" virtual database by completing the following tasks: 1. Setting up our environment. 2. Adding a Data Virtualization Operator to OpenShift. 3. Creating a postgreSQL accounts database with a simple sample database to serve as our data source. 4. Creating a custom resource (CR) that defines a Portfolio virtual database. The CR specifies how to integrate data from our postgreSQL database and a REST service. 5. Running the Operator to deploy the virtual database. 6. Demonstrating how JDBC and OData clients can access and query the virtual database. NOTE ODBC access is also available, but information about how to enable ODBC access is beyond the scope of this tutorial. Time 30 minutes to 1 hour Skill Beginner IMPORTANT Data virtualization is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/. Prerequisites Before you can use data virtualization to create a virtual database, you must have: Access to an OpenShift 4.4 cluster with cluster-admin privileges. This can be an enterprise deployment of OpenShift (OpenShift Container Platform, OpenShift Dedicated, or OpenShift Online), or a local installation that uses Red Hat Code Ready Containers (CRC). For more information about CRC, see the article on Red Hat Developer. A version of the OpenShift CLI (oc) binary that matches the version of the OpenShift server. Working knowledge of SQL. Working knowledge of Openshift and the OpenShift Operator model. 4 CHAPTER 2. GETTING STARTED WITH DATA VIRTUALIZATION Working knowledge of git and GitHub. 5 Red Hat Integration 2020-Q2 Data Virtualization Tutorial CHAPTER 3. INTRODUCTION TO DATA VIRTUALIZATION A virtual database enables you to aggregate data from one or more external data sources and apply a custom schema to the data. By applying a customized logical schema to the aggregated source data, you can curate the data in a way that makes it easy for your applications to consume. The virtual view exposes just the data that you want from each data source, down to the level of specific tables, columns or procedures. Query processing logic in the virtual database enables users to access and join data in different formats from across data sources. The virtual database provides an abstraction layer that shields client applications from the details of the physical data sources. Data consumers don’t have figure out how to connect to the host sources. They also don’t have to worry about how a view combines data from various sources, or how to configure translators to normalize data into a usable format. Data and access where you want it Data virtualization acts as a logical data warehouse, one that relies on metadata to make data available to client applications.