Getting Started Development with Mahout

By Reeshu Patel

Getting Started Development with Mahout

What is Apache Mahout Apache mahout give big data sets. Maout core algorithms for clustaring, classfication and batch primarily based cooperative filtering area Perform with the top of with the map/reduce paradigm. but we have a tendency to don't prohibit contributions to Hadoop based implementations: Contributions that run on one node or on a non-Hadoop cluster . The core libraries area very extremely optimized to allow good performance additionally for non-distributed algorithms Installing Mahout Mahout is a extremely ascendible algorithms over very big data sets. though the important thing of maout is vouched for less than on big HDFS data, however mahout itself supports running algorithm on local filesystem data, that may assist you get a feel of a way to run mahout algorithms.

Before you can run any Mahout algorithm you need a Mahout installation ready on your Linux machine which can be carried out in two ways as described below.

Before you'll be able to run any mahout algorithm you would like a mahout installation ready on your Linux machine which may be needed in two ways as given below. Step 1:- Now you willl have to Download mahout-distribution-0.x.tar.gz Download mahout-distribution-0.x.tar.gz from the Apache Download Mirrors and extract the contents in the Hadoop package and you can choice location itself. I haved /usr/local/hadoop. Make sure to change the complete of all the files to the hduser user and hadoop group, for example: cd /usr/local 1. $ sudo tar xzf mahout-distribution-0.x.tar.gz

Getting Started Development with Mahout

2. $ sudo mv mahout-distribution-0.x.tar.gz to mahout 3. $ sudo chown -R hduser:hadoop mahout This must be result in a folder with name /path_to_downloaded_tarball/mahout-distribution-0.x Now, If u want you can used run any of the algorithms using with the script “bin/mahout” present in the extracted folder. You can check your Installtion . Now we willl have to set the path in the .bashrc file

1. export MAHOUT_HOME=/usr/local/mahout 2. path=$path:$MAHOUT_HOME/bin Create a directory where you would want to check out the Mahout code, we’ll call it here MAHOUT_HOME:

1. $ sudo mkdir -p /app/mahout 2. $ sudo chown hduser:hadoop /app/mahout 3. # ...and if you want to tighten up security, chmod from 755 to 750... 4. $ sudo chmod 750 /app/mahout Step:3- Now we willl set Hadoop_confi path in .hadoop.env.sh /usr/local/mahout/lib/* Step 4:- Now we INSTALLATION OF MAVEN . 1. $ sudo tar xzf apache-maven-2.2.1-bin.zip

Getting Started Development with Mahout

2. $ sudo mv apache-maven-2.2.1-bin.zip maven $ sudo chown -R hduser:hadoop maven Now we willl set the path in the .bashrc file

1. export M2_HOME=/usr/local/maven 2. export PATH=$M2:$PATH 3. export M2=$M2_HOME/bin 4. PATH=$PATH:$JAVA_HOME/bin:$M2_HOME/bin

Now you have to Install maven on mahut .lets you go mahout home directory and run maven on mahout: /usr/local/mahout$ mvn install you should see something as below :

Getting Started Development with Mahout

You have finised your maven installation in mahout.Now you can fined .m2 reposwtrory in root. Now we need to install hadoop on eclips. Hadoop Configuration with Eclips.

My inspiration to write a blog on hadoop configuration and MapReduce task with Eclipse so that you can eassily start devlopment with mahout we have complete the stuffs but,we did not know how it works or how we can do it.It make a problems to get .This configuration willl help you to get through all that you need to run a MapReduce application on Eclipse.

I know you all are familier with the Wordcount example.If not then you do not worry you willl get know all the thing step by step.This is more esily MapReduce application that you whould ever get to know . In this example we are going to find size for words and make count for similar sizes. After performed this example you can used mahout and start runing mahout example. • Here,I was using stable version of Ubuntu 12.04 , Hadoop 1.1.2 and Eclipse juno.If you want you can do it by using latest versions of them.

For working environment of Hadoop first of all you need to setup hadoop cluster. Follow this link to setup http://www.attuneuniversity.com/blog/apache- hadoop-installation-with-single-node-cluster-setup.html After setting up your hadoop cluster.You need to start your hadoop cluster. Step 1: Now need to start your single node hadoop cluster 1. root@reeshupatel-desktop:~#/usr/local/hadoop$ bin/start-all.sh Step 2: Copy “Hadoop Eclipse Plugin” into Plugins directory of eclipse

Getting Started Development with Mahout

1. root@reeshupatel-desktop:~#sudo cp /home/attune/Desktop/hadoop-eclipse-plugin-1.1.2.jar /opt/eclipse/plugins 2. Here,I my hadoop version is 1.1.2. So,that's why I am using hadoop-eclipse-plugin-1.1.2. Step 3: Start Eclipse IDE 1. You can see “Map/Reduce” at your right corner of your IDE,select it. Now,look at bottom of your IDE.You can see “Map/Reduce Locations”. 2. Now need Create your “New Hadoop Location” and set ports for MapReduce and dfs. Step 4: Create MapReduce project

Step 5: After makeing a project you can select “MapReduce driver” for application. 1. You can give the appropriate name for your application.And start programming. Step 6: You will have to setup location Now you will have go define location and set one by one

Getting Started Development with Mahout

1. local name : according to you 2. Map/reduce master host:According to core.site.xml file Exam: hdfs:// reeeshu:54311 3. Port; same as core site.xml port 54311 4. DFS master:Here Port:54310 AS we enter according to map reduce site .xml file Exam:hdfs://reeeshu:54310 5. User name:hduser Exa: If your hadoop working with hduser then 6. If you are Working with Hduser so You willl have tp give permission for eclips $chown -R hduser:hadoop hadoop 7. Set the class path to hadoop folder 8. Set the java build path from hadoop.cor jar file 9. Here My input file contains Unstructured Text data. Map reduce Example with Hadoop If you want to check you hadoop you can follow this example .Downlode blow file for example 1. The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson 2. The Notebooks of Leonardo Da Vinci 3. Ulysses by James Joyce Copy the data from local directory to Hadoop distributed file system 4. root@reeshupatel-desktop:~#/usr/local/hadoop$ bin/hadoop fs -copyFromLocal /home/attune/Desktop/attune_1.txt/home/reeshupate/attune_text_inp ut/attune_1.txt 5. root@reeshupatel-desktop:~#/usr/local/hadoop$ bin/hadoop fs -copyFromLocal /home/attune/Desktop/attune_2.txt/home/reeshupate/attune_text_inp ut/attune_2.txt

Getting Started Development with Mahout

6. root@reeshupatel-desktop:~#/usr/local/hadoop$ bin/hadoop fs -copyFromLocal /home/attune/Desktop/attune_3.txt/home/reeshupate/attune_text_inp ut/attune_3.txt As I said before this application is going to find size for words and make count of words having same size. I used two classes one is mapper class and second is reducer class.In that classes there are two functions,one is map and other is reduce function.Map function willl split the text and tokenize the words then after it willl get the size words and consider as key.Reduce function willl make it count for perticular key's(having same size). package com.attuneinfocom.size.count; import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat;

public class Sizecounting {

Getting Started Development with Mahout

public static class MapClass extends MapReduceBase implements Mapper { IntWritable one; Text word=new Text();

public void map(LongWritable key,Text value,OutputCollectoroutput,Reporter reporter) throws IOException {

String line=value.toString(); StringTokenizer st=new StringTokenizer(line," ");

while(st.hasMoreTokens()) { String word=st.nextToken();

int size=word.length();

output.collect(new IntWritable(size),new Text(String.valueOf(one))); }

} } public static class ReduceClass extends MapReduceBase implements Reducer { public void reduce(IntWritable key,Iterator values,OutputCollectoroutput,Reporter reporter)throws IOException {

Getting Started Development with Mahout

int sum=0;

while(values.hasNext()) {

values.next(); sum++;

}

output.collect(key, new IntWritable(sum)); } }

public static void main(String[] args) {

JobClient client = new JobClient();

//Used to distinguish Map and Reduce jobs from others JobConf conf = new JobConf(com.attuneinfocom.size.count.Sizecounting.class);

//Specify key and value class for Mapper

conf.setMapOutputKeyClass(IntWritable.class); conf.setMapOutputValueClass(Text.class);

// Specify output types conf.setOutputKeyClass(IntWritable.class); conf.setOutputValueClass(IntWritable.class);

Getting Started Development with Mahout

// Specify input and output DIRECTORIES (not files) FileInputFormat.setInputPaths(conf, new Path("hdfs://localhost:54310/user/attune/attune_text_input")); FileOutputFormat.setOutputPath(conf, new Path("hdfs://localhost:54310/user/attune/attune_text_output"));

//Specify input and output format

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

//Specify Mapper and Reducer class conf.setMapperClass(MapClass.class); conf.setReducerClass(ReduceClass.class);

client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } }

}

Step 7: No you can run MapReduce Application

Getting Started Development with Mahout

If you want to check your console you can check and look like this.

You can see Map and Reduce processing in console look like this.Finally here is your output you can see in. browser

Getting Started Development with Mahout

Mahout Intregation with Maven and Eclipse We are useing here eclipse-jee-juno-SR2-linux-gtk.tar.gz and m2eclipse then we are import maven projects with complete feature and we are checking out the mahout sources into workspace directory.you whould like to do a full build on the command-line and import in Eclipse from File - Import - Maven Projects. Point it at the apache mahout root directory. You was then given the opportunity to choose which sub- modules to import. Integration configuration of Mahout with eclipse Now we needs to follow some following steps. Step 1:- Now we INSTALLATION OF MAVEN . 1. $ sudo tar xzf apache-maven-2.2.1-bin.zip 2. $ sudo mv apache-maven-2.2.1-bin.zip maven Now we willl set the path in the .bashrc file

1. export M2_HOME=/usr/local/maven 2. export PATH=$M2:$PATH 3. export M2=$M2_HOME/bin 4. PATH=$PATH:$JAVA_HOME/bin:$M2_HOME/bin

Getting Started Development with Mahout

Now we can Start Mahout by .bin/mahout command

Step 2:Now we need to download mahoot jar file with below link 1. http://apache.techartifact.com/mirror/mahout/0.7/ 2. http://apache.techartifact.com/mirror/mahout/0.7/mahout- distribution-0.7-src.zip 3. Extract from archive it. 4. Convert the project into eclipse project. 5. $ cd mahout-distribution-0.7 6. $ mvn eclipse:eclipse Wait for a some time till it builds the eclipse project.

Step3 .Now we need set the classpath variable M2_REPO of Eclipse to Maven 2 local repository but frist install maven in your pc. If you want to set maven -repo autometicaly you can follow this one • mvn -Declipse.workspace= eclipse:add-maven-repo another way you can set manualy by • mvn -Declipse.workspace= eclipse:eclipse Now we need to go eclips>window> preference >java class path > here we need to set M2_REPO Step3: Finally we will import the converte Eclipse project of Mahout. Open File > Import > General > Existing Projects into Workspace from Eclipse menu.

Getting Started Development with Mahout

Mahout Setup Figure 1 Step 4 :Now we willl generate a Maven project for sample codes on the Eclipse workspace directory. $ mvn archetype:create -DgroupId=com.orzota.mahout.recommender -DartifactId=recommender The name of the project created is ?ecommenderin the workspace directory Step 5 :Now Convert the newly created java project into eclipse project 1. $ cd recommender $ mvn eclipse:eclipse Step 6 Import the project into eclipse in the same . Open File > Import > General > Existing Projects into Workspace from Eclipse menu and select the ?ecommenderproject. Right click the ?ecommenderproject, select Properties > Java Build Path > Projects from pop-up menu and click ?ddand select the below Mahout projects.

Getting Started Development with Mahout

Mahout Setup Figure 2 The Folder structure looks like the below. Now we can build apps using the Step 7: You can select right click the ?ecommenderproject, select Properties > Java Build Path > Projects from pop-up menu and click ? ddand select the below Mahout projects.

Mahout Setup Figure 3 Now we have complete Mahout Intregation with eclips .Now we need to go to source file of action according to choice add any one project in

Getting Started Development with Mahout recommender. Now you have need to create a java class file inside the recommender project and add following code inside the class file : package com.orzota.mahout.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; import java.io.*; import java.util.*; class RecommenderIntro {

private RecommenderIntro() { }

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood =

Getting Started Development with Mahout

new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity);

List recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); }

}

} Now you can right click on the project and run as Hadoop. If you are runing any mapreduce program . Then you can check your output in the browser the hadoop port. Conclusion Now affter this configuration you can esily start devlopment with mahout then you are able to esily configure Mahout Source Code and directly accessing , debugging our program with eclipse and you can make mahout program with eclips itself.

Getting Started Development with Mahout