Hadoop learning pot: map reduce program execution in cloudera

First boot vmware1.6.vmo iso image in to vmware work station , then it opens cloudera centos image os as shown below

Open eclipse ide and go for projects build projects , u can find out in project explorer

Create a class file for wordcount: driver code,mappercode,reducer code as shown below

AFTER EXPORT YOUR JAR FILE TO HDFS

PROJECTFOLDER-àEXPORT-->JARFILE-àNAME OF THE JAR FILE

Driver code:

WordCount.java

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class WordCount extends Configured implements Tool {

@Override

public int run(String[] args) throws Exception {

if (args.length != 2) {

System.out.printf(

"Usage: %s [generic options] <input dir> <output dir>\n", getClass()

.getSimpleName());

ToolRunner.printGenericCommandUsage(System.out);

return -1;

}

JobConf conf = new JobConf(getConf(), WordCount.class);

conf.setJobName(this.getClass().getName());

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

conf.setMapperClass(WordMapper.class);

conf.setReducerClass(SumReducer.class);

conf.setMapOutputKeyClass(Text.class);

conf.setMapOutputValueClass(IntWritable.class);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

JobClient.runJob(conf);

return 0;

}

public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new WordCount(), args);

System.exit(exitCode);

}

WordMapper.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

public class WordMapper extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable> {

@Override

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

String s = value.toString();

for (String word : s.split(" ")) {

if (word.length() > 0) {

output.collect(new Text(word), new IntWritable(1));

}

// hi this is is arshia

// hi {1} this {1} is {1} is {1} arshia{1}

SumReducer.java

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class SumReducer extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

@Override

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

int wordCount = 0;

while (values.hasNext()) {

IntWritable value = values.next();

wordCount += value.get();

}

output.collect(key, new IntWritable(wordCount));

}

// hi {1} is {1} is {1} ==> hi {1} is {1+1+1+1}

//hi {1}, is {2}

After add the required jar files

Src->Build-pathàconfigure Buildpath-àAdd External Jar files

The required jar files that will support and mapper are:

Commons-cli-1.2.jar /hadoop/lib

Commons-codec-1.4.j1r /hadoop/lib

Commons-daemon-1.0.1.jar /hadoop/lib

Commons-el-1.0.jar /hadoop/lib

Commons-httpclient-3.1.jar /hadoop/lib

Commons-logging-1.0.4.jar /hadoop/lib

Commons-logging-api-1.0.4.jar /hadoop/lib

Commons-net-1.4.1.jar /hadoop/lib

Hadoop-0.20.2-cdh3u2-core.jar /hadoop

Hadoop-core.jar /hadoop

Jackson-core-asl-1.5.2 .jar /hadoop/lib

Jackson-mapper-asl-1.5.2.jar /hadoop/lib

Log4j-1-2-15.jar /hadoop/lib

Add all these 16 jar files for your build path then u r program EXECUTES AS FOLLOWS

MAKE A INPUT DIRECTORY FOR YOUR PROGRAM

[training@localhost ~]$ hadoop fs -mkdir wcs

[training@localhost ~]$ cat > input.txt

[training@localhost ~]$ hadoop fs -put input.txt wcs

[training@localhost ~]$ hadoop fs -mkdir wcs/ouput

[training@localhost ~]$ hadoop jar wc.jar WordCount wcs/input.txt wcs/output

16/09/25 08:38:49 WARN snappy.LoadSnappy: Snappy native library is available

16/09/25 08:38:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library

16/09/25 08:38:49 INFO snappy.LoadSnappy: Snappy native library loaded

16/09/25 08:38:49 INFO mapred.FileInputFormat: Total input paths to process : 1

16/09/25 08:38:50 INFO mapred.JobClient: Running job: job_201609250806_0001

16/09/25 08:38:51 INFO mapred.JobClient: map 0% reduce 0%

16/09/25 08:39:00 INFO mapred.JobClient: map 66% reduce 0%

16/09/25 08:39:04 INFO mapred.JobClient: map 100% reduce 0%

16/09/25 08:39:13 INFO mapred.JobClient: map 100% reduce 100%

16/09/25 08:39:14 INFO mapred.JobClient: Job complete: job_201609250806_0001

16/09/25 08:39:14 INFO mapred.JobClient: Counters: 23

16/09/25 08:39:14 INFO mapred.JobClient: Job Counters

16/09/25 08:39:14 INFO mapred.JobClient: Launched reduce tasks=1

16/09/25 08:39:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18336

16/09/25 08:39:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

16/09/25 08:39:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

16/09/25 08:39:14 INFO mapred.JobClient: Launched map tasks=3

16/09/25 08:39:14 INFO mapred.JobClient: Data-local map tasks=3

16/09/25 08:39:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13372

16/09/25 08:39:14 INFO mapred.JobClient: FileSystemCounters

16/09/25 08:39:14 INFO mapred.JobClient: FILE_BYTES_READ=57

16/09/25 08:39:14 INFO mapred.JobClient: HDFS_BYTES_READ=317

16/09/25 08:39:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=219760

16/09/25 08:39:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18

16/09/25 08:39:14 INFO mapred.JobClient: Map-Reduce Framework

16/09/25 08:39:14 INFO mapred.JobClient: Reduce input groups=4

16/09/25 08:39:14 INFO mapred.JobClient: Combine output records=0

16/09/25 08:39:14 INFO mapred.JobClient: Map input records=6

16/09/25 08:39:14 INFO mapred.JobClient: Reduce shuffle bytes=69

16/09/25 08:39:14 INFO mapred.JobClient: Reduce output records=4

16/09/25 08:39:14 INFO mapred.JobClient: Spilled Records=12

16/09/25 08:39:14 INFO mapred.JobClient: Map output bytes=39

16/09/25 08:39:14 INFO mapred.JobClient: Map input bytes=15

16/09/25 08:39:14 INFO mapred.JobClient: Combine input records=0

16/09/25 08:39:14 INFO mapred.JobClient: Map output records=6

16/09/25 08:39:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=291

16/09/25 08:39:14 INFO mapred.JobClient: Reduce input records=6

[training@localhost ~]$

[training@localhost ~]$ hadoop fs -ls

Found 8 items

drwxr-xr-x - training supergroup 0 2016-09-23 10:48 /user/training/INPUT

-rw-r--r-- 1 training supergroup 49 2016-05-22 16:50 /user/training/hadoop1

drwxr-xr-x - training supergroup 0 2016-05-22 17:02 /user/training/hadoop1_out

drwxr-xr-x - training supergroup 0 2016-05-12 17:01 /user/training/hadoop_China

drwxr-xr-x - training supergroup 0 2016-09-23 10:48 /user/training/hdir

drwxr-xr-x - training supergroup 0 2016-09-22 11:28 /user/training/input

drwxr-xr-x - training supergroup 0 2016-09-22 11:28 /user/training/outpt

drwxr-xr-x - training supergroup 0 2016-09-25 08:38 /user/training/wcs

[training@localhost ~]$ hadoop fs -ls /user/training/wcs

Found 3 items

-rw-r--r-- 1 training supergroup 15 2016-09-25 08:36 /user/training/wcs/input.txt

drwxr-xr-x - training supergroup 0 2016-09-25 08:37 /user/training/wcs/ouput

drwxr-xr-x - training supergroup 0 2016-09-25 08:39 /user/training/wcs/output

[training@localhost ~]$ hadoop fs -cat /user/training/wcs/input.txt

[training@localhost ~]$ hadoop fs -ls /user/training/wcs/output

Found 3 items

-rw-r--r-- 1 training supergroup 0 2016-09-25 08:39 /user/training/wcs/output/_SUCCESS

drwxr-xr-x - training supergroup 0 2016-09-25 08:38 /user/training/wcs/output/_logs

-rw-r--r-- 1 training supergroup 18 2016-09-25 08:39 /user/training/wcs/output/part-00000

[training@localhost ~]$ hadoop fs -cat /user/training/wcs/output/_SUCCESS

[training@localhost ~]$ hadoop fs -cat /user/training/wcs/output/part-00000

dd 2

f 2

hi 1

i 1

[training@localhost ~]$

Brief Introduction How Hadoop MapR Works:

Hadoop MAPR program used to reduce the data u submitted to HDFS from local File System

first u create a File in Local File System:
step1:
Create local file

cat > myfile.txt

this is my file
this is first file
my file
ctrl+d

it will creates a myfile.txt in your linux local file system

step2:

loaded in to HDFS

create a directory in HDFS

$ hadoop fs -mkdir mydir

t
place your local file in to this directory

$ hadoop fs -put myfile.txt mydir

Step3:

execute MapR program u done in ecllipse

$hadoop jar wordcount.jar WordCount mydir/myfile.txt mydir/out

once the above command executed MapR program executed and generate output maintain in OUT directory

$hadoop fs -cat /user/training/mydir/out/part-00000

this 2
is 2
my 2
file 3

The Anlytic MapR program u should write in JAVA

Hadoop learning pot

Search This Blog

Sunday, 5 February 2017

map reduce program execution in cloudera

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog