Hitachi Solution for Databases in Enterprise Data Warehouse Offload Package for Oracle Database with Mapr Distribution of Apache

Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide By Shashikant Gaikwad, Subhash Shinde December 2018 Feedback Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to [email protected]. To assist the routing of this message, use the paper number in the subject and the title of this white paper in the text. Revision History Revision Changes Date MK-SL-131-00 Initial release December 27, 2018 Table of Contents Solution Overview 2 Business Benefits 2 High Level Infrastructure 3 Key Solution Components 4 Pentaho 6 Hitachi Advanced Server DS120 7 Hitachi Virtual Storage Platform Gx00 Models 7 Hitachi Virtual Storage Platform Fx00 Models 7 Brocade Switches 7 Cisco Nexus Data Center Switches 7 MapR Converged Data Platform 8 Red Hat Enterprise Linux 10 Solution Design 10 Server Architecture 11 Storage Architecture 13 Network Architecture 14 Data Analytics and Performance Monitoring Using Hitachi Storage Advisor 17 Oracle Enterprise Data Workflow Offload 17 Engineering Validation 29 Test Methodology 29 Test Results 30 1 Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide Use this reference architecture guide to implement Hitachi Solution for Databases in an enterprise data warehouse offload package for Oracle Database. This Oracle converged infrastructure provides a high performance, integrated, solution for advanced analytics using the following big data applications: . Hitachi Advanced Server DS120 with Intel Xeon Silver 4110 processors . Pentaho Data Integration . MapR distribution for Apache Hadoop This converged infrastructure establishes best practices for environments where you can copy data in an enterprise data warehouse to an Apache Hive database on top of Hadoop Distributed File System (HDFS). Then, you obtain your data for analysis from the Hive database instead of from the busy Oracle server. This reference architecture guide is for you if you are in one of the following roles and need to create a big data management and advanced business analytics platform: . Data scientist . Database administrators . System administrator . Storage administrator . Database performance analyzer . IT professional with the responsibility of planning and deploying an EDW offload solution To use this reference architecture guide to create your big data infrastructure, you should have familiarity with the following: . Hitachi Advanced Server DS120 . Pentaho . Big data and the MapR distribution for Apache Hadoop . Apache Hive . Oracle Real Application Cluster Database 12c Release 1 . IP networks . Red Hat Enterprise Linux 1 2 Note — Testing of this configuration was in a lab environment. Many things affect production environments beyond prediction or duplication in a lab environment. Follow the recommended practice of conducting proof-of-concept testing for acceptable results in a non-production, isolated test environment that otherwise matches your production environment before your production implementation of this solution. Solution Overview Use this reference architecture to implement Hitachi Solution for Databases in an enterprise data warehouse offload package for Oracle Database with the MapR distribution of Apache Hadoop. Business Benefits This solution provides the following benefits: . Improve database manageability You can take a “divide and conquer” approach to data management by moving data onto a lower cost storage tier without disrupting access to data. Extreme scalability Leveraging the extreme scalability of Apache Hadoop Distributed File System, you can offload data from the Oracle servers into commodity servers running big data solutions. Lower total cost of ownership (TCO) Reduce your capital expenditure by reducing the resources needed to run applications. Using Hadoop Distributed File System and low-cost storage makes it possible to keep information that is not deemed currently critical, but that you still might want to access, off the Oracle servers. This approach reduces the number of CPUs needed to run the Oracle database, optimizing your infrastructure. This potentially delivers hardware and software savings, including maintenance and support costs. Reduce the costs of running your workloads by leveraging less expensive, general purpose servers running Hadoop cluster. Improve availability Reduce scheduled downtime by allowing database administrators to perform backup operations on a smaller subset of your data. With offloading, perform daily backups on hot data, and less frequent backups on warm data. For an extremely large database, this can make the difference between having enough time to complete a backup in off-business hours or not. This also reduces the amount of time required to recover your data. Analyze data without affecting the production environment When processing and offloading data through Pentaho Data Integration, dashboards in Pentaho can analyze your data without affecting the performance of the Oracle production environment. 2 3 High Level Infrastructure Figure 1 shows the high-level infrastructure for this solution. The configuration of Hitachi Advanced Server DS120 provides the following characteristics: . Fully redundant hardware . High compute and storage density . Flexible and scalable I/O options . Sophisticated power and thermal design to avoid unnecessary operation expenditures . Quick deployment and maintenance Figure 1 3 4 To avoid any performance impact to the production database, Hitachi Vantara recommends using a configuration with a dedicated IP network for the following: . Production Oracle database . Pentaho server . Hadoop servers Uplink speed to the corporate network depends on your environment and requirements. The Cisco Nexus 93180YC-EX switches can support uplink speeds of 40 GbE. This solution uses Hitachi Unified Compute Platform CI in a solution for an Oracle Database architecture with Hitachi Advanced Server DS220, Hitachi Virtual Storage Platform G600, and two Brocade G620 SAN switches to host Oracle Enterprise Data Warehouse. You can use your existing Oracle database environment or purchase Hitachi Unified Compute Platform CI to host Oracle RAC or a standalone solution to host Enterprise Data Warehouse. Key Solution Components The key solution components for this solution are listed in Table 1, “Hardware Components,” on page 4 and Table 2, “Software Components,” on page 6. TABLE 1. HARDWARE COMPONENTS Hardware Detailed Description Firmware or Driver Quantity Version Hitachi Virtual Storage . One controller 83-04-47-40/00 1 Platform G600 (VSP G600) . 8 × 16 Gb/s Fibre Channel ports . 8 × 12 Gb/s backend SAS ports . 256 GB cache memory . 40 × 960 GB SSDs, plus 2 spares . 16 Gb/s × 2 ports CHB Hitachi Advanced . 2 Intel Xeon Gold 6140 CPU @ 2.30 GHz 3A10.H3 1 Server DS220 (Oracle host) . 768 GB (64GB × 12) DIMM DDR4 synchronous registered (buffered) 2666 MHz . Intel XXV710 Dual Port 25 GbE NIC cards i40e-2.3.6 2 . Emulex LightPulse LPe31002-M6 2-Port 16 Gb/s 11.2.156.27 2 Fibre Channel adapter 4 5 TABLE 1. HARDWARE COMPONENTS (CONTINUED) Hardware Detailed Description Firmware or Driver Quantity Version Hitachi Advanced . 2 Intel Xeon Silver 4110 CPU @ 2.10GHz 3A10.H8 3 Server DS120 (Apache Hadoop . 2 × 128 GB MLC SATADOM for boot host) . 384 GB (32 GB × 12) DIMM DDR4 synchronous registered (buffered) 2666 MHz . Intel XXV710 dual port 25 GbE NIC cards i40e-2.3.6 6 . 1.8 TB SAS drives 4 Hitachi Advanced . 2 Intel Xeon Silver 4110 CPU @ 2.10GHz 3A10.H8 1 Server DS120 (Pentaho host) . 2 × 128 GB MLC SATADOM for boot . 128 GB (32 GB × 4) DIMM DDR4 synchronous registered (buffered) 2666 MHz . Intel XXV710 Dual Port 25 GbE i40e-2.3.6 2 Brocade G620 . 48 port Fibre Channel switch V8.0.1 2 switches . 16 Gb/s SFPs . Brocade hot-pluggable SFP+, LC connector Cisco Nexus . 48 × 10/25 GbE Fiber ports . 7.0(3)I5(1) 2 93180YC-EX switches . 6 × 40/100 Gb/s quad SFP (QSFP28) ports Cisco Nexus 3048TP . 1 GbE 48-Port Ethernet switch . 7.0(3)I4(2) 1 switch 5 6 TABLE 2. SOFTWARE COMPONENTS Software Version Function Red Hat Enterprise Linux Version 7.4 Operating system for the MapR distribution of the Hadoop environment. Kernel Version: kernel-3.10.0- 514.36.5.el7.x86_64 Red Hat Enterprise Linux Version 7.4 Operating system for the Oracle environment 3.10.0-693.11.6.el7.x86_64 Microsoft® Windows Server® 2012 R2 Standard Operating system for the Pentaho Data Integration environment. Oracle 12c Release 1 (12.1.0.2.0) Database software Oracle Grid Infrastructure 12c Release 1 (12.1.0.2.0) Volume management, file system software, and Oracle Automatic Storage Management Pentaho Data Integration 8.1 Extract-transfer-load software MapR distribution for Apache Hadoop 6.0 Hadoop distribution from MapR Apache Hive 1.1.0 Hive database Red Hat Enterprise Linux Device Mapper 1.02.140 Multipath software Multipath Hitachi Storage Navigator [Note 1] Microcode dependent Storage management software Hitachi Storage Advisor [Note 1] 2.1.0 Storage orchestration software [Note 1] These software programs were used for this Oracle Database architecture built on Hitachi Unified Compute Platform CI. They may not be required for your implementation. Pentaho A unified data integration and analytics program, Pentaho addresses the barriers that block your organization's ability to get value from all your data. Simplify preparing and

Hitachi Solution for Databases in Enterprise Data Warehouse Offload Package for Oracle Database with Mapr Distribution of Apache

Database Solutions on AWS

Mapr Spark Certification Preparation Guide

IBM Big SQL (With Hbase), Splice Major Contributor to the Apache Be a Major Determinant“ Machine (Which Incorporates Hbase Madlib Project

PROCESSING LARGE / BIG DATA SET THROUGH Mapr and PIG

Your Expert Guide to Hadoop Big Data Platforms

Amazon Elastic Mapreduce Developer Guide API Version 2009-03-31 Amazon Elastic Mapreduce Developer Guide

Digging Into Hadoop-Based Big Data Architectures

From Apache Hadoop/Spark to Amazon EMR: Best Migration Practices and Cost Optimization Strategies Executive Summary

Big Data Fundamentals

Implementing Informatica® Big Data Management 10.2 in an Amazon

Big Data Solutions Glossary

Implementation of Hadoop -2