Apache Spark Implementation on IBM Z/OS

Front cover Apache Spark Implementation on IBM z/OS Lydia Parziale Joe Bostian Ravi Kumar Ulrich Seelbach Zhong Yu Ye Redbooks International Technical Support Organization Apache Spark Implementation on IBM z/OS August 2016 SG24-8325-00 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. First Edition (August 2016) This edition applies to Version 2, Release 2 of IBM z/OS (product number 5650 ZOS), Apache Spark 1.5.2 © Copyright International Business Machines Corporation 2016. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii IBM Redbooks promotions . ix Preface . xi Authors. xi Now you can become a published author, too . xii Comments welcome. xiii Stay connected to IBM Redbooks . xiii Chapter 1. Architectural overview . 1 1.1 Open source analytics on z/OS. 2 1.1.1 Benefits of Spark on z/OS. 2 1.1.2 Drawbacks of implementing off-platform analytics . 3 1.1.3 A new chapter in analytics . 4 1.2 Planning your environment . 4 1.3 Reference architecture . 6 1.3.1 Spark server architecture . 6 1.3.2 Spark environment architecture . 7 1.3.3 Implementation with Jupyter Notebooks . 9 1.3.4 Scala IDE . 12 1.4 Security . 13 Chapter 2. Components and extensions. 15 2.1 Apache Spark component overview . 16 2.1.1 Resilient Distributed Datasets and caching. 16 2.1.2 Components of a Spark cluster on z/OS. 17 2.1.3 Monitoring . 18 2.1.4 Spark and Hadoop . 21 2.2 Mainframe Data Services for IBM z/OS Platform for Apache Spark . 21 2.2.1 Virtual tables . 23 2.2.2 Virtual views . 23 2.2.3 SQL queries . 23 2.2.4 MDSS JDBC driver . 23 2.2.5 IBM z/OS Platform for Apache Spark Interface for CICS/TS . 24 2.3 Spark SQL. 25 2.3.1 Reading from z/OS data source into a DataFrame. 26 2.3.2 Writing DataFrame to a DB2 for z/OS table using saveTable method . 26 2.4 Streaming . 27 2.5 GraphX . 28 2.5.1 System G . 28 2.6 MLlib . 28 2.7 Spark R . 29 © Copyright IBM Corp. 2016. All rights reserved. iii Chapter 3. Installation and configuration . 31 3.1 Installing IBM z/OS Platform for Apache Spark . 32 3.2 The Mainframe Data Service for Apache Spark . 32 3.2.1 Installing the MDSS started task. 32 3.2.2 Configuring access to DB2 . 37 3.2.3 Configuring access to IMS databases. 38 3.2.4 The ISPF Panels. 39 3.2.5 Installing and configuring Bash . 45 3.2.6 Check for /usr/bin/env . 46 3.3 Installing workstation components . 46 3.3.1 Installing Data Service Studio . 46 3.3.2 Installing the JDBC driver on the workstation . 48 3.4 Configuring Apache Spark for z/OS . 49 3.4.1 Create log and worker directories . 49 3.4.2 Apache Spark directory structure . 50 3.4.3 Create directories and local configuration. 50 3.4.4 Installing the Data Server JDBC driver . 53 3.4.5 Modifying the log4j configuration . 54 3.4.6 Adding the Spark binaries to your PATH . 55 3.5 Verifying the installation . 55 3.6 Starting the Spark daemons . 57 Chapter 4. Spark application development on z/OS . 61 4.1 Setting up the development environment . 62 4.1.1 Installing Scala IDE. 62 4.1.2 Installing Data Server Studio plugins into Scala IDE . 62 4.1.3 Installing and using sbt . 63 4.2 Accessing VSAM data as an RDD . 65 4.2.1 Defining the data mapping . 65 4.2.2 Building and running the application . 70 4.3 Accessing sequential files and PDS members . 71 4.4 Accessing IBM DB2 data as a DataFrame . 71 4.5 Joining DB2 data with VSAM . 72 4.6 IBM IMS data to DataFrames . ..

Apache Spark Implementation on IBM Z/OS

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support