Supplement for Hadoop Company
Total Page:16
File Type:pdf, Size:1020Kb
PUBLIC SAP Data Services Document Version: 4.2 Support Package 12 (14.2.12.0) – 2020-02-06 Supplement for Hadoop company. All rights reserved. All rights company. affiliate THE BEST RUN 2020 SAP SE or an SAP SE or an SAP SAP 2020 © Content 1 About this supplement........................................................4 2 Naming Conventions......................................................... 5 3 Apache Hadoop.............................................................9 3.1 Hadoop in Data Services....................................................... 11 3.2 Hadoop sources and targets.....................................................14 4 Prerequisites to Data Services configuration...................................... 15 5 Verify Linux setup with common commands ...................................... 16 6 Hadoop support for the Windows platform........................................18 7 Configure Hadoop for text data processing........................................19 8 Setting up HDFS and Hive on Windows...........................................20 9 Apache Impala.............................................................22 9.1 Connecting Impala using the Cloudera ODBC driver ................................... 22 9.2 Creating an Apache Impala datastore and DSN for Cloudera driver.........................24 10 Connect to HDFS...........................................................26 10.1 HDFS file location objects......................................................26 HDFS file location object options...............................................27 10.2 HDFS file format objects....................................................... 31 HDFS file format options.................................................... 32 Configuring custom Pig script results as source ....................................37 Previewing HDFS file data....................................................38 11 Connect to Hive............................................................39 11.1 Hive adapter datastores....................................................... 39 Hive adapter installation and configuration........................................41 Hive adapter datastore configuration options......................................42 SSL connection support for Hive adapter.........................................44 Metadata mapping for Hive...................................................45 Apache Hive data type conversion..............................................46 Hive adapter source options..................................................47 Hive adapter target options.................................................. 48 Hive adapter datastore support for SQL function and transform.........................50 Pushing the JOIN operation to Hive.............................................51 Supplement for Hadoop 2 PUBLIC Content About partitions...........................................................51 Previewing Hive table data...................................................52 Using Hive template tables...................................................53 11.2 Hive database datastores...................................................... 54 Configuring ODBC driver in Windows............................................55 Creating a DSN connection with SSL protocol in Windows.............................56 Configuring ODBC driver with SSL protocol for Linux.................................57 Configuring bulk loading for Hive...............................................58 Hive database datastore option descriptions...................................... 59 11.3 Configuring Kerberos authentication for Hive connection................................61 12 Upload data to HDFS in the cloud...............................................63 13 Google Cloud Dataproc...................................................... 64 13.1 Configure driver and data source name (DSN)........................................65 13.2 Hive database datastore for Google Dataproc........................................66 13.3 Create a WebHDFS file location.................................................. 67 Configuring host for WebHDFS file location....................................... 67 14 SAP Cloud Platform Big Data Services .......................................... 69 14.1 Setting SSH tunneling with port forwarding on Windows.................................70 14.2 Setting SSH tunneling with port forwarding on UNIX or Linux............................. 71 14.3 ODBC driver requirements......................................................73 About the ODBC Drivers Selector for Windows.....................................74 Configuring ODBC Hive driver with DS Connection Manager............................74 14.4 Generating API tokens for the Hive Server...........................................75 14.5 Generating SWebHdfs delegation token for the HttpFS Service............................76 14.6 Obtaining an SSL certificate file..................................................77 14.7 Creating a DSN connection for Windows............................................78 14.8 Creating a DSN connection for UNIX or Linux........................................ 79 14.9 Creating a file location for Kerberos-secured Hadoop cluster............................. 80 14.10 Creating a datastore for Kerberos-secured Hive cluster..................................81 Supplement for Hadoop Content PUBLIC 3 1 About this supplement This supplement describes how to use SAP Data Services to access your Hadoop data for Data Services processes. Only experienced Data Services and Hadoop users should attempt to perform any of the processes in this supplement. The supplement discusses the Data Services objects and processes related to accessing your Hadoop account for downloading and uploading data, and the processes for configuring these objects. Use the following Data Services documentation as companions to this supplement: ● Designer Guide ● Reference Guide ● Supplement for Adapters ● SAP Cloud Platform for Big Data Services Access all related documentation from our User Assistance Customer Portal. Supplement for Hadoop 4 PUBLIC About this supplement 2 Naming Conventions We refer to certain systems with shortened names plus we use specific environment variables when we refer to locations for SAP and SAP Data Services files. Shortened names ● The terms “Data Services system” and “SAP Data Services” mean the same thing. ● The term “BI platform” refers to “SAP BusinessObjects Business Intelligence platform.” ● The term “IPS” refers to “SAP BusinessObjects Information platform services.” Note Data Services requires BI platform components. However, IPS, a scaled back version of BI, also provides these components. ● CMC refers to the Central Management Console provided by the BI or IPS platform. ● CMS refers to the Central Management Server provided by the BI or IPS platform. Variables Variables Description INSTALL_DIR The installation directory for the SAP software. Default location: ● For Windows: C:\Program Files (x86)\SAP BusinessObjects ● For UNIX: $HOME/sap businessobjects Note INSTALL_DIR is not an environment variable. The in stallation location of SAP software may be different than what we list for INSTALL_DIR based on the location that your administrator set during installation. Supplement for Hadoop Naming Conventions PUBLIC 5 Variables Description <BIP_INSTALL_DIR> The root directory of the BI or IPS platform. Default location: ● For Windows: <INSTALL_DIR>\SAP BusinessObjects Enterprise XI 4.0 Example C:\Program Files (x86)\SAP BusinessObjects\SAP BusinessObjects Enterprise XI 4.0 ● For UNIX: <INSTALL_DIR>/enterprise_xi40 Note These paths are the same for both BI and IPS. <LINK_DIR> The root directory of the Data Services system. Default location: ● All platforms <INSTALL_DIR>\Data Services Example C:\Program Files (x86)\SAP BusinessObjects\Data Services Supplement for Hadoop 6 PUBLIC Naming Conventions Variables Description <DS_COMMON_DIR> The common configuration directory for the Data Services system. Default location: ● If your system is on Windows (Vista and newer): <AllUsersProfile>\SAP BusinessObjects \Data Services Note The default value of <AllUsersProfile> environ ment variable for Windows Vista and newer is C: \ProgramData. Example C:\ProgramData\SAP BusinessObjects \Data Services ● If your system is on Windows (Older versions such as XP) <AllUsersProfile>\Application Data \SAP BusinessObjects\Data Services Note The default value of <AllUsersProfile> environ ment variable for Windows older versions is C: \Documents and Settings\All Users. Example C:\Documents and Settings\All Users\Application Data\SAP BusinessObjects\Data Services ● UNIX systems (for compatibility) <LINK_DIR> The installer automatically creates this system environment variable during installation. Note Starting with Data Services 4.2 SP6, users can desig nate a different default location for <DS_COMMON_DIR> during installation. If you cannot find the <DS_COMMON_DIR> in the listed default location, ask Supplement for Hadoop Naming Conventions PUBLIC 7 Variables Description your System Administrator to find out where your de fault location is for <DS_COMMON_DIR>. <DS_USER_DIR> The user-specific configuration directory for the Data Services system. Default location: ● If you are on Windows (Vista and newer): <UserProfile>\AppData\Local\SAP BusinessObjects\Data Services Note The default value of <UserProfile> environment variable for Windows Vista and newer versions