<<

How to Set Up a Hadoop Cluster Using Oracle Solaris

Hands-On Labs of the System Admin and Developer Community of OTN by Orgad Kimchi with contributions from Jeff Taylor

How to set up a Hadoop cluster using the Oracle Solaris Zones, ZFS, and network virtualization technologies.

Lab Introduction

This hands-on lab presents exercises that demonstrate how to set up an Apache Hadoop cluster using Oracle Solaris 11 technologies such as Oracle Solaris Zones, ZFS, and network virtualization. Key topics include the Hadoop Distributed (HDFS) and the Hadoop MapReduce programming model.

We will also cover the Hadoop installation process and the cluster building blocks: NameNode, a secondary NameNode, and DataNodes. In addition, you will see how you can combine the Oracle Solaris 11 technologies for better scalability and security, and you will learn how to load data into the Hadoop cluster and run a MapReduce job.

Prerequisites

This hands-on lab is appropriate for system administrators who will be setting up or maintaining a Hadoop cluster in production or development environments. Basic or Oracle Solaris system administration experience is a prerequisite. Prior knowledge of Hadoop is not required.

System Requirements

This hands-on lab is run on Oracle Solaris 11 in Oracle VM VirtualBox. The lab is self- contained. All you need is in the Oracle VM VirtualBox instance.

For those attending the lab at Oracle OpenWorld, your laptops are already preloaded with the correct Oracle VM VirtualBox image.

If you want to try this lab outside of Oracle OpenWorld, you will need an Oracle Solaris 11 system. Do the following to set up your machine:

1. If you do not have Oracle Solaris 11, download it here. 2. Download the Oracle Solaris 11.1 VirtualBox Template (file size 1.7GB). 3. Install the template as described here. (Note: On step 4 of Exercise 2 for installing the template, set the RAM size to 4 GB in order to get good performance.) Notes for Oracle Open World Attendees

 Each attendee will have his or her own laptop for the lab.  In this lab we are going to use the “welcome1” password for all the user accounts

 Oracle Solaris 11 uses the GNOME desktop. If you have used the desktops on Linux or other operating systems, the interface should be familiar. Here are some quick basics in case the interface is new for you.

o In order to open a terminal window in the GNOME desktop system, right-click the background of the desktop, and select Open Terminal in the pop-up menu. o The following source code editors are provided on the lab machines: vi (type vi in a terminal window) and emacs (type emacs in a terminal window).

Summary of Lab Exercises

This hands-on lab consists the following exercises covering various Oracle Solaris and Apache Hadoop technologies:

Download and Install Hadoop Configure the Network Time Protocol Create the Scripts Create the NameNodes, DataNodes, and ResourceManager Zones Configure the Active NameNode Set Up SSH Set Up the Standby NameNode and the ResourceManager Set Up the DataNode Zones Verify the SSH Setup Verify Name Resolution Format the Hadoop File System Start the Hadoop Cluster About Hadoop High Availability Configure Manual Failover About Apache ZooKeeper and Automatic Failover Configure Automatic Failover Conclusion

The Case for Hadoop

The Apache Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

To store data, Hadoop uses the Hadoop Distributed File System (HDFS), which provides high- throughput access to application data and is suitable for applications that have large data sets.

For more information about Hadoop and HDFS, see http://hadoop.apache.org/. The Hadoop cluster building blocks are as follows:

 Active NameNode: The centerpiece of HDFS, which stores file system metadata and is responsible for all client operations  Standby NameNode: A secondary NameNode that synchronizes its state with the active NameNode in order to provide fast failover if the active NameNode goes down  ResourceManager: The global resource scheduler, which directs the slave NodeManager daemons to perform the low-level I/O tasks  DataNodes: Nodes that store the data in the HDFS file system and are also known as slaves; these nodes run the NodeManager process that communicates with the ResourceManager  History Server: Provides REST APIs in order to allow the user to get the status of finished applications and provides information about finished jobs

In the previous Hadoop version, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Hadoop version 2.2 provides the ability to build an HDFS cluster with high availability (HA), and this article describes the steps involved in building such a configuration.

In the example presented in this article, all the Hadoop cluster building blocks are installed using Oracle Solaris Zones, ZFS, and Unified Archive. Figure 1 shows the architecture:

Figure 1

Exercise 1: Install Hadoop

1. In Oracle VM VirtualBox, enable a bidirectional "shared clipboard" between the host and the guest in order to enable copying and pasting text from this file.

 Figure 2

In this lab, we will use the Apache Hadoop "15 October, 2013: Release 2.2.0" release.

Note: Oracle OpenWorld attendees can skip the following step (because the preloaded Oracle VM VirtualBox image already provides the Hadoop image).

 Download the Hadoop binary file using a web browser. Open the Firefox web browser from the desktop and download the file.

Figure 3

 Open a terminal window by right-clicking any point in the background of the desktop and selecting Open Terminal in the pop-up menu.

 Figure 4

Important: In the examples presented in this article, the command prompt indicates which user needs to run each command in addition to indicating the environment where the command should be run. For example, the command prompt root@global _zone:~# indicates that user root needs to run the command from the global zone.

Note: For Oracle OpenWorld attendees, the root password has been provided in the one-pager associated with this lab. For those running this lab outside of Oracle OpenWorld, enter the root password you entered when you followed the steps in the "System Requirements" section. oracle@global_zone:~$ su - Password: SunOS 5.11 11.1 September 2012

 Set up the virtual network interface card (VNIC) in order to enable network access to the global zone from the non-global zones. root@global_zone:~# dladm create-vnic -l net0 vnic0 root@global_zone:~# ipadm create-ip vnic0 root@global_zone:~# ipadm create-addr -T static -a local=192.168.1.100/24 vnic0/addr  Verify the VNIC creation: root@global_zone:~# ipadm show-addr vnic0 ADDROBJ TYPE STATE ADDR vnic0/addr static ok 192.168.1.100/24

 In the global zone, create the /usr/local directory if it doesn't exist.

Note: The cluster configuration will share the Hadoop directory structure (/usr/local/hadoop) across the zones as a read-only file system. Every Hadoop cluster node needs to be able to write its logs to an individual directory. The directory /var/log is a best-practice directory for every Oracle Solaris Zone. root@global_zone:~# mkdir -p /usr/local

1. Copy the Hadoop tarball to /usr/local:

root@global_zone:~# cp /export/home/oracle/hadoop-2.2.0.tar.gz /usr/local

Unpack the tarball:

root@global_zone:~# cd /usr/local root@global_zone:~# tar -xfz /usr/local/hadoop-2.2.0.tar.gz

2. Create the hadoop group:

root@global_zone:~# groupadd -g 200 hadoop

3. Create a symlink for the Hadoop binaries:

root@global_zone:~# ln -s /usr/local/hadoop-2.2.0 /usr/local/hadoop

4. Give ownership to the hadoop group:

root@global_zone:~# chown - root:hadoop /usr/local/hadoop-2.2.0

5. Change the permissions:

root@global_zone:~# -R 755 /usr/local/hadoop-2.2.0

6. Edit the Hadoop configuration files, which are shown in Table 1:

Table 1. Hadoop Configuration Files

File Name Description

hadoop-env.sh Specifies environment variable settings used by Hadoop

yarn-env.sh Specifies environment variable settings used by YARN

mapred-env.sh Specifies environment variable settings used by MapReduce Contains a list of machine names that run the DataNode and Slaves NodeManager pair of daemons

core-site. Specifies parameters relevant to all Hadoop daemons and clients

hdfs-site.xml Specifies parameters used by the HDFS daemons and clients

mapred- Specifies parameters used by the MapReduce daemons and clients site.xml

Specifies the configurations for the ResourceManager and yarn-site.xml NodeManager

7. Run the following commands to change the hadoop-env.sh script: root@global_zone:~# export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop root@global_zone:~# cd $HADOOP_CONF_DIR

Append the following lines to the hadoop-env.sh script: root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> hadoop-env.sh root@global_zone:~# echo "export HADOOP_LOG_DIR=/var/log/hadoop/hdfs" >> hadoop-env.sh

Append the following lines to the yarn-env.sh script: root@global_zone:~# vi yarn-env.sh export JAVA_HOME=/usr/java export YARN_LOG_DIR=/var/log/hadoop/yarn export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Append the following lines to the mapred-env.sh script: root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> mapred-env.sh root@global_zone:~# echo "export HADOOP_MAPRED_LOG_DIR=/var/log/hadoop/mapred" >> mapred-env.sh root@global_zone:~# echo "export HADOOP_MAPRED_IDENT_STRING=mapred" >> mapred-env.sh

Edit the slaves file to replace the localhost entry with the following lines: root@global_zone:~# vi slaves data-node1 data-node2 data-node3

Edit the core-site.xml file so it looks like the following:

Note: fs.defaultFS is the URI that describes the NameNode address (protocol specifier, hostname, and port) for the cluster. Each DataNode instance will register with this NameNode and make its data available through it. In addition, the DataNodes send heartbeats to the NameNode to confirm that each DataNode is operating and the block replicas it hosts are available.

root@global_zone:~# vi core-site.xml

fs.defaultFS hdfs://name-node1

Edit the hdfs-site.xml file so it looks like the following.

Notes:

The path on the local file system in which the dfs.datanode.data.dir DataNode instance should store its data.

The path on the local file system of the NameNode instance where the NameNode metadata is stored. It is dfs.namenode.name.dir used only by the NameNode instance to find its information.

The default factor for each block of data in dfs.replication the file system. (For a production cluster, this should usually be left at its default value of 3). Specifies the UNIX group containing users that will be treated as superusers by HDFS. You can stick with the dfs.permission.supergroup value of hadoop or pick your own group depending on the security policies at your site.

root@global_zone:~# vi hdfs-site.xml

dfs.datanode.data.dir /var/data/1/dfs/dn dfs.namenode.name.dir /var/data/1/dfs/nn dfs.replication 3 dfs.permission.supergroup hadoop

Create and then edit the mapred-site.xml file so it looks like the following:

Notes:

Sets the execution framework to Hadoop .framework.name YARN

Specifies the MapReduce History Server's mapreduce.jobhistory.address host:port

Specifies the MapReduce History Server's mapreduce.jobhistory.webapp.address web UI host:port

yarn.app.mapreduce.am.staging-dir Specifies a staging directory, which YARN requires for temporary files created by running jobs root@global_zone:~# cp mapred-site.xml.template mapred-site.xml root@global_zone:~# vi mapred-site.xml

mapreduce.framework.name yarn mapreduce.jobhistory.address resource-manager:10020 mapreduce.jobhistory.webapp.address resource-manager:19888 yarn.app.mapreduce.am.staging-dir /user

Edit the yarn-site.xml file so it looks like the following:

Notes:

Specifies the shuffle service that needs yarn.nodemanager.aux-services to be set for MapReduce applications.

yarn.nodemanager.aux- Specifies the exact name of the class services.mapreduce.shuffle.class for shuffle service.

Specifies the ResourceManager's host yarn.resourcemanager.hostname name.

Is a comma-separated list of paths on yarn.nodemanager.local-dirs the local file system where intermediate data is written.

yarn.nodemanager.log-dirs Specifies the URIs of the directories where the NodeManager stores container log files.

Specifies the configuration to enable yarn.log-aggregation-enable or disable log aggregation.

Specifies a comma-separated list of yarn.nodemanager.log-dirs paths on the local file system where logs are written.

root@global_zone:~# vi yarn-site.xml

yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux- services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.hostname resource-manager yarn.nodemanager.local-dirs file:///var/data/1/yarn/local yarn.nodemanager.log-dirs file:///var/data/1/yarn/logs yarn.log.aggregation.enable true Where to aggregate logs yarn.nodemanager.remote-app-log-dir hdfs://var/log/hadoop-yarn/apps Configure the Network Time Protocol

We should ensure that the system clock on the Hadoop zones is synchronized by using the Network Time Protocol (NTP).

Note: It is best to select an NTP server that can be a dedicated time synchronization source so that other services are not negatively affected if the node is brought down for planned maintenance.

In the following example, the global zone is configured as an NTP server.

1. Configure an NTP server:

root@global_zone:~# cp /etc/inet/ntp.server /etc/inet/ntp.conf root@global_zone:~# chmod +w /etc/inet/ntp.conf root@global_zone:~# touch /var/ntp/ntp.drift

2. Append the following lines to the NTP server configuration file:

root@global_zone:~# vi /etc/inet/ntp.conf

server 127.127.1.0 prefer broadcast 224.0.1.1 ttl 4 enable auth monitor driftfile /var/ntp/ntp.drift statsdir /var/ntp/ntpstats/ filegen peerstats file peerstats type day enable filegen loopstats file loopstats type day enable filegen clockstats file clockstats type day enable keys /etc/inet/ntp.keys trustedkey 0 requestkey 0 controlkey 0

3. Enable the NTP server service:

root@global_zone:~# svcadm enable ntp

4. Verify that the NTP server is online by using the following command:

root@global_zone:~# svcs ntp STATE STIME FMRI online 15:27:55 svc:/network/ntp:default

root@global_zone:~# mkdir /usr/local/Scripts

Create the Scripts

In the following steps, you will create utility scripts that will be used to simplify repetitive processes.

Table 2. Utility scripts provided for this lab exercise. File Name Description

createzone Used for initial zone creation

Used to create profiles that will specify details such as hostnames and buildprofile IP addresses during the initial zone creation

verifycluster We will use this script to verify the Hadoop cluster setup.

testssh Verify that password-less ssh is enabled between zones

startCluster Start every Solaris zone in the Hadoop Cluster

stopCluster Stop every Solaris zone in the Hadoop Cluster

1. Create the createzone script using your favorite editor, as shown in Listing 1. We will use this script to set up the Oracle Solaris Zones.

root@global_zone:~# vi /usr/local/Scripts/createzone

#!/bin/ksh

# FILENAME: createzone # Create a zone # Usage: # createzone if [ $# != 1 ] then echo "Usage: createzone " exit 1 fi

ZONENAME=$1 VNICNAME=$2

zonecfg -z $ZONENAME > /dev/null 2>&1 << EOF create set autoboot=true set limitpriv=default,dtrace_proc,dtrace_user,sys_time set zonepath=/zones/$ZONENAME add fs set dir=/usr/local set special=/usr/local set type=lofs set options=[ro,nodevices] end verify exit EOF if [ $? == 0 ] ; then echo "Successfully created the $ZONENAME zone" else echo "Error: unable to create the $ZONENAME zone" exit 1 fi

Listing 1. createzone script

2. Create the buildprofile script using your favorite editor, as shown in Listing 2. We will use this script to set up the Oracle Solaris Zones.

root@global_zone:~# vi /usr/local/Scripts/buildprofile

#!/bin/ksh # # Copyright 2006-2011 Oracle Corporation. All rights reserved. # Use is subject to license terms. # # This script serves as an example of how to instantiate several zones # with no administrative interaction. Run the script with no arguments # to get a usage message. export PATH=/usr/bin:/usr/sbin

me=$(basename $0)

function fail_usage { print -u2 "Usage: $me " exit 2 }

function error { print -u2 "$me: ERROR: $@" }

# Parse and check arguments (( $# != 3 )) && fail_usage

# Be sure the sysconfig profile is readable and ends in .xml sysconfig=$1 zone=$2 ipaddr=$3

if [[ ! -f $sysconfig || ! -r $sysconfig || $sysconfig != *.xml ]] ; then error "sysconfig profile missing, unreadable, or not *.xml" fail_usage fi # # Create a temporary directory for all temp files # export TMPDIR=$(mktemp -d /tmp/$me.XXXXXX) if [[ -z $TMPDIR ]]; then error "Could not create temporary directory" exit 1 fi trap 'rm -rf $TMPDIR' EXIT

# Customize the nodename in the sysconfig profile z_sysconfig=$TMPDIR/{$zone}.xml z_sysconfig2=$TMPDIR/{$zone}2.xml

search="" replace="" sed "s|$search|$replace|" $sysconfig > $z_sysconfig

search="" replace="" sed "s|$search|$replace|" $z_sysconfig > $z_sysconfig2

cp $z_sysconfig2 ./$zone-template.xml rm -rf $TMPDIR exit 0

Listing 2. buildprofile script

3. Create the verifycluster script using your favorite editor, as shown in Listing 3. We will use this script to verify the Hadoop cluster setup.

root@global_zone:~# vi /usr/local/Scripts/verifycluster

#!/bin/ksh

# FILENAME: verifycluster # Verify the hadoop cluster configuration # Usage: # verifycluster

RET=1

for transaction in _; do

for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3 do

cmd="zlogin $i ls /usr/local > /dev/null 2>&1 " eval $cmd || break 2

done

for i in name-node1 name-node2 resource-manager data-node1 data- node2 data-node3 do cmd="zlogin $i ping name-node1 > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node1 name-node2 resource-manager data-node1 data- node2 data-node3 do cmd="zlogin $i ping name-node2 > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3 do cmd="zlogin $i ping resource-manager > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node1 name-node2 resource-manager data-node1 data- node2 data-node3 do cmd="zlogin $i ping data-node1 > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node1 name-node2 resource-manager data-node1 data- node2 data-node3 do cmd="zlogin $i ping data-node2 > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node1 name-node2 resource-manager data-node1 data- node2 data-node3 do cmd="zlogin $i ping data-node3 > /dev/null 2>&1" eval $cmd || break 2 done

RET=0 done

if [ $RET == 0 ] ; then echo "The cluster is verified" else echo "Error: unable to verify the cluster" fi exit $RET

Listing 3. verifycluster script

4. Create the testssh script, as shown in Listing 4. We will use this script to verify the SSH setup. root@global_zone:~# vi /usr/local/Scripts/testssh

#!/bin/ksh

for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3 do

ssh $i exit

done

Listing 4. testssh script

5. Create the startcluster script, as shown in Listing 5. We will use this script to start all the services on the Hadoop cluster.

root@global_zone:~# vi /usr/local/Scripts/startcluster

#!/bin/ksh

zlogin -l hdfs name-node1 'hadoop-daemon.sh start namenode' zlogin -l hdfs data-node1 'hadoop-daemon.sh start datanode' zlogin -l hdfs data-node2 'hadoop-daemon.sh start datanode' zlogin -l hdfs data-node3 'hadoop-daemon.sh start datanode' zlogin -l yarn resource-manager 'yarn-daemon.sh start resourcemanager' zlogin -l yarn data-node1 'yarn-daemon.sh start nodemanager' zlogin -l yarn data-node2 'yarn-daemon.sh start nodemanager' zlogin -l yarn data-node3 'yarn-daemon.sh start nodemanager' zlogin -l mapred resource-manager 'mr-jobhistory-daemon.sh start historyserver'

Listing 5. startcluster script

6. Create the stopcluster script, as shown in Listing 6. We will use this script to stop all the services on the Hadoop cluster.

root@global_zone:~# vi /usr/local/Scripts/stopcluster

#!/bin/ksh

zlogin -l hdfs name-node1 'hadoop-daemon.sh stop namenode' zlogin -l hdfs data-node1 'hadoop-daemon.sh stop datanode' zlogin -l hdfs data-node2 'hadoop-daemon.sh stop datanode' zlogin -l hdfs data-node3 'hadoop-daemon.sh stop datanode' zlogin -l yarn resource-manager 'yarn-daemon.sh stop resourcemanager' zlogin -l yarn data-node1 'yarn-daemon.sh stop nodemanager' zlogin -l yarn data-node2 'yarn-daemon.sh stop nodemanager' zlogin -l yarn data-node3 'yarn-daemon.sh stop nodemanager' zlogin -l mapred resource-manager 'mr-jobhistory-daemon.sh stop historyserver'

Listing 6. stopcluster script 7. The Solaris command “wc –l” will display the number of lines in files. You can use this as a sanity check to verify that your scripts are about the right size:

root@global_zone:~# wc -l /usr/local/Scripts/*

64 /usr/local/Scripts/buildprofile 36 /usr/local/Scripts/createzone 12 /usr/local/Scripts/startcluster 10 /usr/local/Scripts/stopcluster 9 /usr/local/Scripts/testssh 67 /usr/local/Scripts/verifycluster 198 total!/bin/ksh

8. Change the scripts' permissions:

root@global_zone:~# chmod +x /usr/local/Scripts/*

Create the NameNodes, DataNodes, and ResourceManager Zones

We will leverage the integration between Oracle Solaris Zones virtualization technology and the ZFS file system that is built into Oracle Solaris.

Table 2 shows a summary of the Hadoop zones we will create:

Table 2. Zone Summary

Function Zone Name ZFS Mount Point IP Address

Active NameNode name-node1 /zones/name-node 192.168.1.1/24

Standby name-node2 /zones/sec-name-node 192.168.1.2/24 NameNode

resource- /zones/resource- ResourceManager 192.168.1.3/24 manager manager

DataNode data-node1 /zones/data-node1 192.168.1.4/24

DataNode data-node2 /zones/data-node2 192.168.1.5/24

DataNode data-node3 /zones/data-node3 192.168.1.6/24

1. Create the name-node1 zone using the createzone script, which will create the zone configuration file. For the argument, the script needs the zone's name, for example, createzone .

root@global_zone:~# /usr/local/Scripts/createzone name-node1 Successfully created the name-node1 zone

2. Create the name-node2 zone using the createzone script:

root@global_zone:~# /usr/local/Scripts/createzone name-node2 Successfully created the name-node2 zone

3. Create the resource-manager zone using the createzone script:

root@global_zone:~# /usr/local/Scripts/createzone resource-manager Successfully created the resource-manager zone

4. Create the three DataNode zones using the createzone scripts:

root@global_zone:~# /usr/local/Scripts/createzone data-node1 Successfully created the data-node1 zone

root@global_zone:~# /usr/local/Scripts/createzone data-node2 Successfully created the data-node2 zone

root@global_zone:~# /usr/local/Scripts/createzone data-node3 Successfully created the data-node3 zone

Configure the Active NameNode

Let's create a system configuration profile template for the name-node1 zone. The system configuration profile will include the host information, such as the host name, IP address, and name services.

1. Run the sysconfig command, which will start the System Configuration Tool (see Figure 2):

root@global_zone:~# sysconfig create-profile

Figure 2. System Configuration Tool

2. Press ESC-2 to start the wizard

3. Provide the zone's host information by using the following configuration for the name-node1 zone:

a. For the host name, use name-node1 b. Select manual network configuration. . Ensure the network interface net0 has an IP address of 192.168.1.1 and a netmask of 255.255.255.0 . Leave the “router” field blank. d. Ensure the name service is based on your network configuration. In this article, we will use /etc/hosts for name resolution, so we won't set up DNS for host name resolution. Select Do not configure DNS. e. For Alternate Name Service, select None. f. For Time Zone Regions, select Americas. g. For Time Zone Locations, select United States. h. For Time Zone, select Pacific Time. i. For Locale: Language, select English. j. For Locale: Territory, select United States (en_US.UTF-8). k. For Keyboard, select US-English. l. Enter your root password, but leave the optional user account blank. m. For Support – Registration, provide your My Oracle Support credentials. n. For Support – Network Configuration, select an internet access method for Oracle Configuration Manager and Oracle Auto Service Request. o. Review the settings below before hitting Esc-2 “apply”. The changes are not “applied”, instead, they are written to a file named /system/volatile/profile/sc_profile.xml

4. Copy the profile to /root/name-node1-template.xml:

root@global_zone:~# cp /system/volatile/profile/sc_profile.xml /root/name-node1-template.xml

5. Now, install the name-node1 zone. Installing the first zone will take a couple of minutes. Later we will clone this zone in order to accelerate the creation of the other zones:

root@global_zone:~# zoneadm -z name-node1 install -c /root/name-node1- template.xml The following ZFS file system(s) have been created: rpool/zones/name-node1 Progress being logged to /var/log/zones/zoneadm.20140225T111519Z.name- node1.install Image: Preparing at /zones/name-node1/root. [...]

6. Boot the name-node1 zone:

root@global_zone:~# zoneadm -z name-node1 boot

7. Check the status of the zones we've created:

root@global_zone:~# zoneadm list -cv

ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 1 name-node1 running /zones/name-node1 solaris excl - name-node2 configured /zones/name-node2 solaris excl - resource-manager configured /zones/resource-manager solaris excl - data-node1 configured /zones/data-node1 solaris excl - data-node2 configured /zones/data-node2 solaris excl - data-node3 configured /zones/data-node3 solaris excl

We can see the six zones that we have created.

8. zlogin is a utility that is used to enter a non-global zone from the global zone. zlogin has 3 modes: interactive, non-interactive and console. For our first login to the newly created zone, we will use the console (-C) mode. When you log in to the console of name-node1 zone, you will see the progress of the initial boot. Subsequent boots will be much faster.

root@global_zone:~# zlogin –C name-node1

[Connected to zone 'name-node1' console] 134/134 Hostname: name-node1 . . . login: root Password: ********

9. Verify that all the services are up and running:

root@name-node1:~# svcs -xv

10. If all the services are up and running without any issues, the command will return to the system prompt without any error message. To disconnect from a zone virtual console, use the tilde (~) character and a period:

root@name-node1:~ # ~. [Connection to zone 'name-node1' console closed]

11. Re-enter the zone in interactive mode.

root@global_zone:~# zlogin name-node1 [Connected to zone 'name-node1' pts/4] Oracle Corporation SunOS 5.11 11.2 June 2014 root@name-node1:~#

12. Developing for Hadoop requires a Java programming environment. You can install Java Development Kit (JDK) 7 using the following command:

root@name-node1:~# pkg install --accept jdk-7

13. Verify the Java installation:

root@name-node1:~# java -version

java version "1.7.0_55” Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) Server VM (build 24.55-b03, mixed mode

14. Create the hadoop group:

root@name-node1:~# groupadd -g 200 hadoop

For the Hadoop cluster, create the four users shown in Table 3.

Table 3. Hadoop Users Summary

User:Group Description

hdfs:hadoop The NameNodes and DataNodes run as this user.

yarn:hadoop The ResourceManager and NodeManager services run as this user.

mapred:hadoop The History Server runs as this user. bob:staff This user will run the MapReduce jobs.

15. Add the hdfs user:

root@name-node1:~# useradd -u 200 -m -g hadoop hdfs

Set the hdfs user's password.

In this lab we are going to use the “welcome1” password for all he accounts

root@name-node1:~# passwd hdfs New Password: Re-enter new Password: passwd: password successfully changed for hdfs

Add the yarn user:

root@name-node1:~# useradd -u 201 -m -g hadoop yarn root@name-node1:~# passwd yarn New Password: Re-enter new Password: passwd: password successfully changed for yarn

Add the mapred user:

root@name-node1:~# useradd -u 202 -m -g hadoop mapred root@name-node1:~# passwd mapred New Password: Re-enter new Password: passwd: password successfully changed for mapred

Create a directory for the YARN log files:

root@name-node1:~# mkdir -p /var/log/hadoop/yarn root@name-node1:~# chown yarn:hadoop /var/log/hadoop/yarn

Create a directory for the HDFS log files:

root@name-node1:~# mkdir -p /var/log/hadoop/hdfs root@name-node1:~# chown hdfs:hadoop /var/log/hadoop/hdfs

Create a directory for the mapred log files:

root@name-node1:~# mkdir -p /var/log/hadoop/mapred root@name-node1:~# chown mapred:hadoop /var/log/hadoop/mapred

Create a directory for the HDFS metadata: root@name-node1:~# mkdir -p /var/data/1/dfs/nn root@name-node1:~# chmod 700 /var/data/1/dfs/nn root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/nn

Create a Hadoop data directory to store the HDFS blocks:

root@name-node1:~# mkdir -p /var/data/1/dfs/dn root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/dn

Configure local storage directories for use by YARN:

root@name-node1:~# mkdir -p /var/data/1/yarn/local root@name-node1:~# mkdir -p /var/data/1/yarn/logs root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/local root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/logs

Create the runtime directories:

root@name-node1:~# mkdir -p /var/run/hadoop/yarn root@name-node1:~# chown yarn:hadoop /var/run/hadoop/yarn root@name-node1:~# mkdir -p /var/run/hadoop/hdfs root@name-node1:~# chown hdfs:hadoop /var/run/hadoop/hdfs root@name-node1:~# mkdir -p /var/run/hadoop/mapred root@name-node1:~# chown mapred:hadoop /var/run/hadoop/mapred

Add the user bob (later this user will run the MapReduce jobs):

root@name-node1:~# useradd -m -u 1000 bob root@name-node1:~# passwd bob New Password: Re-enter new Password: passwd: password successfully changed for bob

16. Switch to user bob

root@name-node1:~# su - bob

17. Using your favorite editor, append the following lines to .profile:

bob@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME export JAVA_HOME=/usr/java # Add Hadoop bin/ directory to PATH export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

18. Logout using the exit command

bob@name-node1:~$ exit logout

19. Configure an NTP client, as shown in the following example:

Install the NTP package:

root@name-node1:~# pkg install ntp

Create the NTP client configuration files:

root@name-node1:~# cp /etc/inet/ntp.client /etc/inet/ntp.conf root@name-node1:~# chmod +w /etc/inet/ntp.conf root@name-node1:~# touch /var/ntp/ntp.drift

a. Edit the NTP client configuration file:

Note: In this setup, we are using the global zone as a time server so we add its name (for example, global-zone) to /etc/inet/ntp.conf.

root@name-node1:~# vi /etc/inet/ntp.conf

Append these lines to the bottom of the file:

server global-zone prefer driftfile /var/ntp/ntp.drift statsdir /var/ntp/ntpstats/ filegen peerstats file peerstats type day enable filegen loopstats file loopstats type day enable

20. Add the Hadoop cluster members' host names and IP addresses to /etc/hosts:

root@name-node1:~# vi /etc/hosts

::1 localhost 127.0.0.1 localhost loghost 192.168.1.1 name-node1 192.168.1.2 name-node2 192.168.1.3 resource-manager 192.168.1.4 data-node1 192.168.1.5 data-node2 192.168.1.6 data-node3 192.168.1.100 global-zone

21. Enable the NTP client service:

root@name-node1:~# svcadm enable ntp

22. Verify the NTP client status:

root@name-node1:~#:~# svcs ntp STATE STIME FMRI online 1:04:35 svc:/network/ntp:default Check whether the NTP client can synchronize its clock with the NTP server:

root@name-node1:~# ntpq -p remote refid st t when poll reach delay offset jitter ======global-zone LOCAL(0) 6 u 19 64 1 0.374 0.119 0.000

You can see that the global-zone is the NTP server

Set Up SSH

Set up SSH key-based authentication for the Hadoop users on the name-node1 zone in order to enable password-less login to other zones in the Hadoop cluster:

First, switch to the user hdfs and copy the SSH public key into the ~/.ssh/authorized_keys file:

root@name-node1:~# su - hdfs Oracle Corporation SunOS 5.11 11.1 September 2012

hdfs@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa

hdfs@name-nod1e:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Edit $HOME/.profile and append to the end of the file the following lines:

hdfs@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME export JAVA_HOME=/usr/java # Add Hadoop bin/ directory to PATH export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

1. Switch to user yarn and edit $HOME/.profile to append to the end of the file the following lines:

hdfs@name-node1:~$ su - yarn Password: Oracle Corporation SunOS 5.11 11.1 September 2012 yarn@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME export JAVA_HOME=/usr/java # Add Hadoop bin/ directory to PATH export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

2. Copy the SSH public key into the ~/.ssh/authorized_keys file:

yarn@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa

yarn@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3. Switch to user mapred and edit $HOME/.profile to append to the end of the file the following lines:

yarn@name-node1:~$ su - mapred Password: Oracle Corporation SunOS 5.11 11.1 September 2012 mapred@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME export JAVA_HOME=/usr/java # Add Hadoop bin/ directory to PATH export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

4. Copy the SSH public key into the ~/.ssh/authorized_keys file:

mapred@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa

5. mapred@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Set Up the Standby NameNode and the ResourceManager

1. Run the following command to execute the .profile script:

mapred@name-node1:~$ source $HOME/.profile

2. Check that Hadoop runs by running the following command:

mapred@name-node1:~$ hadoop version Hadoop 2.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768 Compiled by hortonmu on 2013-10-07T06:28Z Compiled with protoc 2.5.0 From source with checksum 79e53ce7994d1628b240f09af91e1af4 This command was run using /usr/local/hadoop- 2.2.0/share/hadoop/common/hadoop-common-2.2.0.

Note: Press Ctrl-D several times until you exit from the name-node1 console and return to the global zone. You can verify that you are in the global zone by using the zonename command: root@global_zone:~# zonename global

3. Create a profile for the name-node2 zone using the name-node1 profile as a template and using the buildprofile script. In a later step, we will use this profile in order to create the name-node2 zone.

Note: For arguments, the script needs the template profile's name (/root/name-node1- template.xml, which we created in a previous step), the zone's name (name-node2), and the zone's IP address (192.168.1.2, as shown in Table 2).

Change to the /root directory and create the zone profile there:

root@global_zone:~# cd /root root@global_zone:~# /usr/local/Scripts/buildprofile /root/name- node1-template.xml name-node2 192.168.1.2/24

Verify the profile's creation:

root@global_zone:~# ls -l /root/name-node2-template.xml -rw-r--r-- 1 root root 3715 Feb 25 05:59 /root/name- node2-template.xml

4. From the global zone, run the following command to create the name-node2 zone as a clone of the name-node1:

Shut down the name-node1 zone (we can clone only halted zones):

root@global_zone:~# zoneadm -z name-node1 shutdown

Then clone the zone using the profile we created for name-node2:

root@global_zone:~# zoneadm -z name-node2 clone -c /root/name- node2-template.xml name-node1

5. Boot the name-node2 zone:

root@global_zone:~# zoneadm -z name-node2 boot

6. Log in to the name-node2 zone:

root@global_zone:~# zlogin name-node2

7. Wait two minutes and verify that all the services are up and running:

root@name-node2:~# svcs -xv

If all the services are up and running without any issues, the command will return to the system prompt without any error message.

8. Exit from the name-node2 zone by pressing Ctrl- D.

9. Create the resource-manager profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1- template.xml resource-manager 192.168.1.3/24

10. Create the data-node1 profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1- template.xml data-node1 192.168.1.4/24

11. Create the data-node2 profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1- template.xml data-node2 192.168.1.5/24

12. Create the data-node3 profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1- template.xml data-node3 192.168.1.6/24

13. Verify the creation of the profiles:

root@global_zone:~# ls -l /root/*.xml -rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node1- template.xml -rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node2- template.xml -rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node3- template.xml -r------1 root root 3715 Feb 25 03:11 /root/name-node1- template.xml -rw-r--r-- 1 root root 3715 Feb 25 07:57 /root/name-node2- template.xml -rw-r--r-- 1 root root 3735 Feb 25 08:04 /root/resource-manager- template.xml

14. From the global zone, run the following command to create the resource-manager zone as a clone of name-node1:

root@global_zone:~# zoneadm -z resource-manager clone -c /root/resource-manager-template.xml name-node1

15. Boot the resource-manager zone:

root@global_zone:~# zoneadm -z resource-manager boot Set Up the DataNode Zones

In this section, we can leverage the integration between Oracle Solaris Zones virtualization technology and the ZFS file system that is built into Oracle Solaris.

6. Run the following commands to create the three DataNode zones as a clone of the name-node1 zone, and then boot the new zones:

root@global_zone:~# zoneadm -z data-node1 clone -c /root/data-node1- template.xml name-node1 root@global_zone:~# zoneadm -z data-node1 boot root@global_zone:~# zoneadm -z data-node2 clone -c /root/data-node2- template.xml name-node1 root@global_zone:~# zoneadm -z data-node2 boot root@global_zone:~# zoneadm -z data-node3 clone -c /root/data-node3- template.xml name-node1 root@global_zone:~# zoneadm -z data-node3 boot

7. Boot the name-node1 zone:

root@global_zone:~# zoneadm -z name-node1 boot

8. Check the status of the zones we've created:

root@global_zone:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 6 name-node1 running /zones/name-node1 solaris excl 10 name-node2 running /zones/name-node2 solaris excl 11 resource-manager running /zones/resource-manager solaris excl 12 data-node1 running /zones/data-node1 solaris excl 13 data-node2 running /zones/data-node2 solaris excl 14 data-node3 running /zones/data-node3 solaris excl

We can see that all the zones are running now.

Verify the SSH Setup

1. Log in to the name-node1 zone:

root@global_zone:~# zlogin name-node1 [Connected to zone 'name-node1' pts/1] Oracle Corporation SunOS 5.11 11.1 September 2012 root@name-node1:~# su - hdfs Oracle Corporation SunOS 5.11 11.1 September 2012

2. Run the testssh script to log in to the cluster nodes using the ssh command: Note: Once for each zone (name-node1 and name-node2), six times, you will need to enter yes at the command prompt for the "Are you sure you want to continue connecting (yes/no)?" question.

hdfs@name-node1:~$ /usr/local/Scripts/testssh The authenticity of host 'name-node1 (192.168.1.1)' can't be established. RSA key fingerprint is 07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c. Are you sure you want to continue connecting (yes/no)? yes

3. Switch to user yarn and run the testssh script again:

root@name-node1:~# su - yarn Password: yarn@name-node1:~$ /usr/local/Scripts/testssh

4. Switch to user mapred and run the testssh script again:

yarn@name-node1:~$ su - mapred Password: mapred@name-node1:~$ /usr/local/Scripts/testssh

5. Press Control-D four times to return to the global zone and repeat similar steps for name- node2:

Edit the /etc/hosts file inside name-node2 in order to add the name-node1 entry:

root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name- node1" >> /etc/hosts'

Log in to the name-node2 zone:

root@global_zone:~# zlogin name-node2 [Connected to zone 'name-node1' pts/1] Oracle Corporation SunOS 5.11 11.1 September 2012 root@name-node2:~# su - hdfs Oracle Corporation SunOS 5.11 11.1 September 2012

a. Run the testssh script in order to log in to the cluster nodes using the ssh command.

Note: Enter yes at the command prompt for the "Are you sure you want to continue connecting (yes/no)?" question.

hdfs@name-node2:~$ /usr/local/Scripts/testssh The authenticity of host 'name-node1 (192.168.1.1)' can't be established. RSA key fingerprint is 07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c. Are you sure you want to continue connecting (yes/no)? yes

Switch to user yarn: root@name-node2:~# su - yarn Password:

Run the testssh script:

yarn@name-node2:~$ /usr/local/Scripts/testssh

Switch to user mapred:

yarn@name-node2:~$ su - mapred Password:

Run the testssh script:

mapred@name-node2:~$ /usr/local/Scripts/testssh

Verify Name Resolution

1. From the global zone, edit the /etc/hosts files inside resource-manager and the DataNodes in order to add the name-node1 entry:

root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-node1" >> /etc/hosts' root@global_zone:~# zlogin resource-manager 'echo "192.168.1.1 name- node1" >> /etc/hosts' root@global_zone:~# zlogin data-node1 'echo "192.168.1.1 name-node1" >> /etc/hosts' root@global_zone:~# zlogin data-node2 'echo "192.168.1.1 name-node1" >> /etc/hosts' root@global_zone:~# zlogin data-node3 'echo "192.168.1.1 name-node1" >> /etc/hosts'

2. Verify name resolution by ensuring that the /etc/hosts files for the global zone and all the Hadoop zones have the host entries shown below:

root@global-zone:~# for zone in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3; do echo "======$zone ======"; zlogin $zone cat /etc/hosts; done

======name-node1 ======::1 localhost 127.0.0.1 localhost loghost 192.168.1.1 name-node1 192.168.1.2 name-node2 192.168.1.3 resource-manager 192.168.1.4 data-node1 192.168.1.5 data-node2 192.168.1.6 data-node3 192.168.1.100 global-zone

Note: If you are using the global zone as an NTP server, you must also add its host name and IP address to /etc/hosts.

3. Verify the cluster using the verifycluster script:

root@global_zone:~# /usr/local/Scripts/verifycluster

If the cluster setup is correct, you will get a cluster is verified message.

Note: If the verifycluster script fails with an error message, check that the /etc/hosts file in every zone includes all the zones names, as described in Step 1, and then rerun the verifycluster script again.

Format the Hadoop File System

1. To format HDFS, run the following commands:

root@global_zone:~# zlogin -l hdfs name-node1 hdfs@name-node:$ hdfs namenode -format

2. Look for the following message, which indicates HDFS has been set up:

... INFO common.Storage: Storage directory /var/data/1/dfs/nn has been successfully formatted. ...

Start the Hadoop Cluster

Table 4 describes the startup scripts.

Table 4. Startup Scripts

User Command Command Description

Starts the HDFS daemon (NameNode hdfs hadoop-daemon.sh start namenode process)

Starts the DataNode process on all hdfs hadoop-daemon.sh start datanode DataNodes

yarn-daemon.sh start Starts YARN on the yarn resourcemanager ResourceManager yarn-daemon.sh start Starts the NodeManager process on yarn nodemanager all DataNodes

mr-jobhistory-daemon.sh start mapred Starts the MapReduce History Server historyserver

1. Start HDFS by running the following command:

hdfs@name-node1:~$ hadoop-daemon.sh start namenode starting namenode, logging to /var/log/hadoop/hdfs/hadoop--namenode- name-node1.out

2. Run the jps command to verify that the NameNode process has been started:

hdfs@name-node1:~$ /usr/jdk/latest/bin/jps | grep NameNode 4223 NameNode

You should see the NameNode process ID (for example, 4223). If the process did not start, look at the log file /var/log/hadoop/hdfs/hadoop--namenode-name-node1.log to find the reason.

3. Exit from the name-node1 zone by pressing Ctrl-D. 4. Start the DataNodes on all the slaves (data-node1, data-node2, and data-node3):

Run the following commands for data-node1:

root@global_zone:~# zlogin -l hdfs data-node1 hdfs@data-node1:~$ hadoop-daemon.sh start datanode hdfs@data-node1:~$ /usr/jdk/latest/bin/jps | grep DataNode 19762 DataNode

Exit from the data-node1 zone by pressing Ctrl-D.

Run the following commands for data-node2:

root@global_zone:~# zlogin -l hdfs data-node2 hdfs@data-node2:~$ hadoop-daemon.sh start datanode hdfs@data-node2:~$ /usr/jdk/latest/bin/jps | grep DataNode 21525 DataNode

Exit from the data-node2 zone by pressing Ctrl-D.

Run the following commands for data-node3:

root@global_zone:~# zlogin -l hdfs data-node3 hdfs@data-node3:~$ hadoop-daemon.sh start datanode hdfs@data-node3:~$ /usr/jdk/latest/bin/jps | grep DataNode 29699 DataNode

Exit from the data-node3 zone by pressing Ctrl-D.

5. Create a /tmp directory and set its permissions to 1777 (drwxrwxrwt). Then create the HDFS file system using the hadoop fs command:

root@global_zone:~# zlogin -l hdfs name-node1 hdfs@name-node1:~$ hadoop fs -mkdir /tmp

Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform...using builtin-java classes where applicable. Hadoop isn’t able to use native platform libraries that accelerate the Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop 2.x native libraries is a work in progress.

hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /tmp

6. Create a history directory and set permissions and ownership:

hdfs@name-node1:~$ hadoop fs -mkdir /user hdfs@name-node1:~$ hadoop fs -mkdir /user/history hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /user/history hdfs@name-node1:~$ hadoop fs -chown yarn /user/history

7. Create the log directories:

hdfs@name-node1:~$ hadoop fs -mkdir /var hdfs@name-node1:~$ hadoop fs -mkdir /var/log hdfs@name-node1:~$ hadoop fs -mkdir /var/log/hadoop-yarn hdfs@name-node1:~$ hadoop fs -chown yarn:mapred /var/log/hadoop-yarn

8. Create a directory for user bob and set ownership:

hdfs@name-node1:~$ hadoop fs -mkdir /user/bob hdfs@name-node1:~$ hadoop fs -chown bob /user/bob

9. Verify the HDFS file structure:

hdfs@name-node:~$ hadoop fs -ls -R / drwxrwxrwt - hdfs supergroup 0 2014-02-26 10:43 /tmp drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:58 /user drwxr-xr-x - bob supergroup 0 2014-02-26 10:58 /user/bob drwxrwxrwt - yarn supergroup 0 2014-02-26 10:50 /user/history drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:53 /var drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:53 /var/log drwxr-xr-x - yarn mapred 0 2014-02-26 10:53 /var/log/hadoop-yarn 10. Exit from the name-node1 zone by pressing Ctrl-D. 11. Start the YARN resource-manager service using the following commands:

root@global_zone:~# zlogin -l yarn resource-manager yarn@resource-manager:~$ yarn-daemon.sh start resourcemanager yarn@resource-manager:~$ /usr/jdk/latest/bin/jps | grep ResourceManager 29776 ResourceManager

12. Start the NodeManager process on all DataNodes and verify the status:

root@global_zone:~# zlogin -l yarn data-node1 yarn-daemon.sh start nodemanager root@global_zone:~# zlogin -l yarn data-node1 /usr/jdk/latest/bin/jps | grep NodeManager 29920 NodeManager root@global_zone:~# zlogin -l yarn data-node2 yarn-daemon.sh start nodemanager root@global_zone:~# zlogin -l yarn data-node2 /usr/jdk/latest/bin/jps | grep NodeManager 29930 NodeManager root@global_zone:~# zlogin -l yarn data-node3 yarn-daemon.sh start nodemanager root@global_zone:~# zlogin -l yarn data-node3 /usr/jdk/latest/bin/jps | grep NodeManager 29982 NodeManager

13. Start the MapReduce History Server and verify its status:

root@global_zone:~# zlogin -l mapred resource-manager mapred@history-server:~$ mr-jobhistory-daemon.sh start historyserver mapred@history-server:~$ /usr/jdk/latest/bin/jps | grep JobHistoryServer 654 JobHistoryServer

Exit the resource-manager zone by pressing Cntr-D

14. Log in to name-node1:

15. root@global_zone:~# zlogin -l hdfs name-node1 16. Use the following command to show basic HDFS statistics for the cluster:

hdfs@name-node1:~$ hdfs dfsadmin -report

13/11/26 05:16:51 WARN util.NativeCodeLoader: Unable to load native- hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 1077762507264 (1003.74 GB) Present Capacity: 1075847407736 (1001.96 GB) DFS Remaining: 1075845337088 (1001.96 GB) DFS Used: 2070648 (1.97 MB) DFS Used%: 0.00% Under replicated blocks: 4 Blocks with corrupt replicas: 0 Missing blocks: 0

------Datanodes available: 3 (3 total, 0 dead)

17. Use the following command to show the cluster topology:

18. hdfs@name-node1:~$ hdfs dfsadmin -printTopology

13/11/26 05:19:03 WARN util.NativeCodeLoader: Unable to load native- hadoop library for your platform... using builtin-java classes where applicable Rack: /default-rack 10.153.111.222:50010 (data-node1) 10.153.111.223:50010 (data-node2) 10.153.111.224:50010 (data-node3)

Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. Hadoop is able to use native platform libraries that accelerate the Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop 2.x native libraries is a work in progress.

19. Run a simple MapReduce job:

root@global_zone:~# zlogin -l bob name-node1 hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples- 2.2.0.jar pi 10 20

where:

o zlogin -l bob name-node1 specifies that the command be run as user bob on the name-node1 zone. o hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop- mapreduce-examples-2.2.0.jar pi specifies the Hadoop .jar file. o 10 specifies the number of maps. o 20 specifies the number of samples.