Search This Blog

Sunday, 5 February 2017

map reduce program execution in cloudera



First boot vmware1.6.vmo iso image in to vmware work station  , then it opens cloudera centos image  os as shown below







Open eclipse ide and go for projects build projects , u can find out  in project explorer


Create a class file for wordcount:  driver code,mappercode,reducer code as shown below




AFTER EXPORT YOUR JAR FILE  TO HDFS
PROJECTFOLDER-àEXPORT-->JARFILE-àNAME OF THE JAR FILE





Driver code:
WordCount.java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount extends Configured implements Tool {

  @Override
  public int run(String[] args) throws Exception {

    if (args.length != 2) {
      System.out.printf(
          "Usage: %s [generic options] <input dir> <output dir>\n", getClass()
              .getSimpleName());
      ToolRunner.printGenericCommandUsage(System.out);
      return -1;
    }

    JobConf conf = new JobConf(getConf(), WordCount.class);
    conf.setJobName(this.getClass().getName());

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    conf.setMapperClass(WordMapper.class);
    conf.setReducerClass(SumReducer.class);

    conf.setMapOutputKeyClass(Text.class);
    conf.setMapOutputValueClass(IntWritable.class);

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    JobClient.runJob(conf);
    return 0;
  }

  public static void main(String[] args) throws Exception {
    int exitCode = ToolRunner.run(new WordCount(), args);
    System.exit(exitCode);
  }
}


WordMapper.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WordMapper extends MapReduceBase implements
    Mapper<LongWritable, Text, Text, IntWritable> {

  @Override
  public void map(LongWritable key, Text value,
      OutputCollector<Text, IntWritable> output, Reporter reporter)
      throws IOException {

    String s = value.toString();
    for (String word : s.split(" ")) {
      if (word.length() > 0) {
        output.collect(new Text(word), new IntWritable(1));
      }
    }
  }
}

// hi this is is arshia

// hi {1} this {1} is {1} is {1} arshia{1}











SumReducer.java

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class SumReducer extends MapReduceBase implements
    Reducer<Text, IntWritable, Text, IntWritable> {

  @Override
  public void reduce(Text key, Iterator<IntWritable> values,
      OutputCollector<Text, IntWritable> output, Reporter reporter)
      throws IOException {

    int wordCount = 0;
    while (values.hasNext()) {
      IntWritable value = values.next();
      wordCount += value.get();
    }
    output.collect(key, new IntWritable(wordCount));
  }
}
// hi {1} is {1} is {1} ==> hi {1} is {1+1+1+1}
//hi {1}, is {2}












After add the required jar files 
Src->Build-pathàconfigure Buildpath-àAdd External Jar files

The required jar files that will support and  mapper are:






Commons-cli-1.2.jar   /hadoop/lib
Commons-codec-1.4.j1r    /hadoop/lib
Commons-daemon-1.0.1.jar    /hadoop/lib
Commons-el-1.0.jar            /hadoop/lib
Commons-httpclient-3.1.jar    /hadoop/lib
Commons-logging-1.0.4.jar   /hadoop/lib
Commons-logging-api-1.0.4.jar  /hadoop/lib
Commons-net-1.4.1.jar   /hadoop/lib
Hadoop-0.20.2-cdh3u2-core.jar   /hadoop
Hadoop-core.jar          /hadoop
Jackson-core-asl-1.5.2 .jar           /hadoop/lib
Jackson-mapper-asl-1.5.2.jar       /hadoop/lib
Log4j-1-2-15.jar      /hadoop/lib


Add all these 16 jar files for your build path then u r program EXECUTES AS FOLLOWS
  






MAKE A INPUT DIRECTORY FOR YOUR PROGRAM

[training@localhost ~]$ hadoop fs -mkdir wcs
[training@localhost ~]$ cat > input.txt
hi
i
f
dd
dd
f
[training@localhost ~]$ hadoop fs -put input.txt wcs
[training@localhost ~]$ hadoop fs -mkdir wcs/ouput
[training@localhost ~]$ hadoop jar wc.jar WordCount wcs/input.txt wcs/output
16/09/25 08:38:49 WARN snappy.LoadSnappy: Snappy native library is available
16/09/25 08:38:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/09/25 08:38:49 INFO snappy.LoadSnappy: Snappy native library loaded
16/09/25 08:38:49 INFO mapred.FileInputFormat: Total input paths to process : 1
16/09/25 08:38:50 INFO mapred.JobClient: Running job: job_201609250806_0001
16/09/25 08:38:51 INFO mapred.JobClient:  map 0% reduce 0%
16/09/25 08:39:00 INFO mapred.JobClient:  map 66% reduce 0%
16/09/25 08:39:04 INFO mapred.JobClient:  map 100% reduce 0%
16/09/25 08:39:13 INFO mapred.JobClient:  map 100% reduce 100%
16/09/25 08:39:14 INFO mapred.JobClient: Job complete: job_201609250806_0001
16/09/25 08:39:14 INFO mapred.JobClient: Counters: 23
16/09/25 08:39:14 INFO mapred.JobClient:   Job Counters
16/09/25 08:39:14 INFO mapred.JobClient:     Launched reduce tasks=1
16/09/25 08:39:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18336
16/09/25 08:39:14 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/09/25 08:39:14 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/09/25 08:39:14 INFO mapred.JobClient:     Launched map tasks=3
16/09/25 08:39:14 INFO mapred.JobClient:     Data-local map tasks=3
16/09/25 08:39:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13372
16/09/25 08:39:14 INFO mapred.JobClient:   FileSystemCounters
16/09/25 08:39:14 INFO mapred.JobClient:     FILE_BYTES_READ=57
16/09/25 08:39:14 INFO mapred.JobClient:     HDFS_BYTES_READ=317
16/09/25 08:39:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=219760
16/09/25 08:39:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=18
16/09/25 08:39:14 INFO mapred.JobClient:   Map-Reduce Framework
16/09/25 08:39:14 INFO mapred.JobClient:     Reduce input groups=4
16/09/25 08:39:14 INFO mapred.JobClient:     Combine output records=0
16/09/25 08:39:14 INFO mapred.JobClient:     Map input records=6
16/09/25 08:39:14 INFO mapred.JobClient:     Reduce shuffle bytes=69
16/09/25 08:39:14 INFO mapred.JobClient:     Reduce output records=4
16/09/25 08:39:14 INFO mapred.JobClient:     Spilled Records=12
16/09/25 08:39:14 INFO mapred.JobClient:     Map output bytes=39
16/09/25 08:39:14 INFO mapred.JobClient:     Map input bytes=15
16/09/25 08:39:14 INFO mapred.JobClient:     Combine input records=0
16/09/25 08:39:14 INFO mapred.JobClient:     Map output records=6
16/09/25 08:39:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=291
16/09/25 08:39:14 INFO mapred.JobClient:     Reduce input records=6
[training@localhost ~]$
[training@localhost ~]$ hadoop fs -ls
Found 8 items
drwxr-xr-x   - training supergroup          0 2016-09-23 10:48 /user/training/INPUT
-rw-r--r--   1 training supergroup         49 2016-05-22 16:50 /user/training/hadoop1
drwxr-xr-x   - training supergroup          0 2016-05-22 17:02 /user/training/hadoop1_out
drwxr-xr-x   - training supergroup          0 2016-05-12 17:01 /user/training/hadoop_China
drwxr-xr-x   - training supergroup          0 2016-09-23 10:48 /user/training/hdir
drwxr-xr-x   - training supergroup          0 2016-09-22 11:28 /user/training/input
drwxr-xr-x   - training supergroup          0 2016-09-22 11:28 /user/training/outpt
drwxr-xr-x   - training supergroup          0 2016-09-25 08:38 /user/training/wcs
[training@localhost ~]$ hadoop fs -ls /user/training/wcs
Found 3 items
-rw-r--r--   1 training supergroup         15 2016-09-25 08:36 /user/training/wcs/input.txt
drwxr-xr-x   - training supergroup          0 2016-09-25 08:37 /user/training/wcs/ouput
drwxr-xr-x   - training supergroup          0 2016-09-25 08:39 /user/training/wcs/output
[training@localhost ~]$ hadoop fs -cat /user/training/wcs/input.txt
hi
i
f
dd
dd
f
[training@localhost ~]$ hadoop fs -ls /user/training/wcs/output
Found 3 items
-rw-r--r--   1 training supergroup          0 2016-09-25 08:39 /user/training/wcs/output/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-09-25 08:38 /user/training/wcs/output/_logs
-rw-r--r--   1 training supergroup         18 2016-09-25 08:39 /user/training/wcs/output/part-00000
[training@localhost ~]$ hadoop fs -cat /user/training/wcs/output/_SUCCESS
[training@localhost ~]$ hadoop fs -cat /user/training/wcs/output/part-00000
dd      2
f       2
hi      1
i       1
[training@localhost ~]$
[training@localhost ~]$






Brief Introduction How Hadoop  MapR Works:


Hadoop MAPR program used to reduce the data u submitted to HDFS from local File System

first u create a File in  Local File System:
step1:
 Create local file

cat > myfile.txt

this is my file
this is first file
my file
ctrl+d


it will creates a myfile.txt in your linux  local file system

step2:

loaded in to HDFS

create a directory in HDFS

$ hadoop fs -mkdir mydir

t
place your local file in to this directory



$ hadoop fs -put myfile.txt mydir


Step3:

execute MapR program u done in ecllipse


$hadoop jar wordcount.jar  WordCount  mydir/myfile.txt  mydir/out


once the above command executed MapR program executed and generate output maintain in OUT directory



$hadoop fs -cat /user/training/mydir/out/part-00000


this  2
is     2
my 2
file 3



The Anlytic MapR program u should write in JAVA

No comments:

Post a Comment

Hadoop Analytics

NewolympicData

  Alison Bartosik 21 United States 2004 08-29-04 Synchronized Swimming 0 0 2 2 Anastasiya Davydova 21 Russia 2004 0...