First boot vmware1.6.vmo iso image in to vmware work
station , then it opens cloudera centos
image os as shown below
Open eclipse ide and go for projects build projects , u can find
out in project explorer
Create a class file for wordcount: driver code,mappercode,reducer code as shown
below
AFTER EXPORT YOUR JAR FILE TO HDFS
PROJECTFOLDER-àEXPORT-->JARFILE-àNAME OF THE JAR FILE
Driver code:
WordCount.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapred.FileInputFormat;
import
org.apache.hadoop.mapred.FileOutputFormat;
import
org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WordCount extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf(
"Usage:
%s [generic options] <input dir> <output dir>\n", getClass()
.getSimpleName());
ToolRunner.printGenericCommandUsage(System.out);
return -1;
}
JobConf conf = new JobConf(getConf(), WordCount.class);
conf.setJobName(this.getClass().getName());
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf,
new
Path(args[1]));
conf.setMapperClass(WordMapper.class);
conf.setReducerClass(SumReducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new WordCount(), args);
System.exit(exitCode);
}
}
WordMapper.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import
org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text,
IntWritable> {
@Override
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable>
output, Reporter reporter)
throws IOException {
String s = value.toString();
for (String word : s.split("
")) {
if (word.length() > 0) {
output.collect(new Text(word), new IntWritable(1));
}
}
}
}
// hi
this is is arshia
// hi
{1} this {1} is {1} is {1} arshia{1}
SumReducer.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapred.OutputCollector;
import
org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class SumReducer extends MapReduceBase implements
Reducer<Text, IntWritable, Text,
IntWritable> {
@Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable>
output, Reporter reporter)
throws IOException {
int wordCount = 0;
while (values.hasNext()) {
IntWritable value = values.next();
wordCount += value.get();
}
output.collect(key, new IntWritable(wordCount));
}
}
// hi
{1} is {1} is {1} ==> hi {1} is {1+1+1+1}
//hi
{1}, is {2}
After add the required jar files
Src->Build-pathàconfigure Buildpath-àAdd External Jar files
The required jar files that will support and mapper are:
Commons-cli-1.2.jar
/hadoop/lib
Commons-codec-1.4.j1r
/hadoop/lib
Commons-daemon-1.0.1.jar
/hadoop/lib
Commons-el-1.0.jar
/hadoop/lib
Commons-httpclient-3.1.jar
/hadoop/lib
Commons-logging-1.0.4.jar
/hadoop/lib
Commons-logging-api-1.0.4.jar /hadoop/lib
Commons-net-1.4.1.jar
/hadoop/lib
Hadoop-0.20.2-cdh3u2-core.jar /hadoop
Hadoop-core.jar
/hadoop
Jackson-core-asl-1.5.2 .jar /hadoop/lib
Jackson-mapper-asl-1.5.2.jar /hadoop/lib
Log4j-1-2-15.jar /hadoop/lib
Add all these 16 jar files for your build path then u r
program EXECUTES AS FOLLOWS
MAKE A INPUT DIRECTORY FOR YOUR
PROGRAM
[training@localhost ~]$ hadoop fs -mkdir wcs
[training@localhost ~]$ cat > input.txt
hi
i
f
dd
dd
f
[training@localhost ~]$ hadoop fs -put input.txt wcs
[training@localhost ~]$ hadoop fs -mkdir wcs/ouput
[training@localhost
~]$ hadoop jar wc.jar WordCount wcs/input.txt wcs/output
16/09/25 08:38:49 WARN snappy.LoadSnappy: Snappy native
library is available
16/09/25 08:38:49 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
16/09/25 08:38:49 INFO snappy.LoadSnappy: Snappy native
library loaded
16/09/25 08:38:49 INFO mapred.FileInputFormat: Total input
paths to process : 1
16/09/25 08:38:50 INFO mapred.JobClient: Running job:
job_201609250806_0001
16/09/25 08:38:51 INFO mapred.JobClient: map 0% reduce 0%
16/09/25 08:39:00 INFO mapred.JobClient: map 66% reduce 0%
16/09/25 08:39:04 INFO mapred.JobClient: map 100% reduce 0%
16/09/25 08:39:13 INFO mapred.JobClient: map 100% reduce 100%
16/09/25 08:39:14 INFO mapred.JobClient: Job complete:
job_201609250806_0001
16/09/25 08:39:14 INFO mapred.JobClient: Counters: 23
16/09/25 08:39:14 INFO mapred.JobClient: Job Counters
16/09/25 08:39:14 INFO mapred.JobClient: Launched reduce tasks=1
16/09/25 08:39:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18336
16/09/25 08:39:14 INFO mapred.JobClient: Total time spent by all reduces waiting
after reserving slots (ms)=0
16/09/25 08:39:14 INFO mapred.JobClient: Total time spent by all maps waiting after
reserving slots (ms)=0
16/09/25 08:39:14 INFO mapred.JobClient: Launched map tasks=3
16/09/25 08:39:14 INFO mapred.JobClient: Data-local map tasks=3
16/09/25 08:39:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13372
16/09/25 08:39:14 INFO mapred.JobClient: FileSystemCounters
16/09/25 08:39:14 INFO mapred.JobClient: FILE_BYTES_READ=57
16/09/25 08:39:14 INFO mapred.JobClient: HDFS_BYTES_READ=317
16/09/25 08:39:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=219760
16/09/25 08:39:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18
16/09/25 08:39:14 INFO mapred.JobClient: Map-Reduce Framework
16/09/25 08:39:14 INFO mapred.JobClient: Reduce input groups=4
16/09/25 08:39:14 INFO mapred.JobClient: Combine output records=0
16/09/25 08:39:14 INFO mapred.JobClient: Map input records=6
16/09/25 08:39:14 INFO mapred.JobClient: Reduce shuffle bytes=69
16/09/25 08:39:14 INFO mapred.JobClient: Reduce output records=4
16/09/25 08:39:14 INFO mapred.JobClient: Spilled Records=12
16/09/25 08:39:14 INFO mapred.JobClient: Map output bytes=39
16/09/25 08:39:14 INFO mapred.JobClient: Map
input bytes=15
16/09/25 08:39:14 INFO mapred.JobClient: Combine input records=0
16/09/25 08:39:14 INFO mapred.JobClient: Map output records=6
16/09/25 08:39:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=291
16/09/25 08:39:14 INFO mapred.JobClient: Reduce input records=6
[training@localhost ~]$
[training@localhost ~]$ hadoop fs -ls
Found 8 items
drwxr-xr-x -
training supergroup 0 2016-09-23
10:48 /user/training/INPUT
-rw-r--r-- 1
training supergroup 49 2016-05-22
16:50 /user/training/hadoop1
drwxr-xr-x -
training supergroup 0 2016-05-22
17:02 /user/training/hadoop1_out
drwxr-xr-x -
training supergroup 0 2016-05-12
17:01 /user/training/hadoop_China
drwxr-xr-x -
training supergroup 0 2016-09-23
10:48 /user/training/hdir
drwxr-xr-x -
training supergroup 0 2016-09-22
11:28 /user/training/input
drwxr-xr-x -
training supergroup 0 2016-09-22
11:28 /user/training/outpt
drwxr-xr-x -
training supergroup 0 2016-09-25
08:38 /user/training/wcs
[training@localhost ~]$ hadoop fs -ls /user/training/wcs
Found 3 items
-rw-r--r-- 1
training supergroup 15 2016-09-25
08:36 /user/training/wcs/input.txt
drwxr-xr-x -
training supergroup 0 2016-09-25
08:37 /user/training/wcs/ouput
drwxr-xr-x -
training supergroup 0 2016-09-25
08:39 /user/training/wcs/output
[training@localhost ~]$ hadoop fs -cat
/user/training/wcs/input.txt
hi
i
f
dd
dd
f
[training@localhost ~]$ hadoop fs -ls
/user/training/wcs/output
Found 3 items
-rw-r--r-- 1
training supergroup 0 2016-09-25
08:39 /user/training/wcs/output/_SUCCESS
drwxr-xr-x -
training supergroup 0 2016-09-25
08:38 /user/training/wcs/output/_logs
-rw-r--r-- 1
training supergroup 18 2016-09-25
08:39 /user/training/wcs/output/part-00000
[training@localhost ~]$ hadoop fs -cat
/user/training/wcs/output/_SUCCESS
[training@localhost ~]$ hadoop fs -cat
/user/training/wcs/output/part-00000
dd 2
f 2
hi 1
i 1
[training@localhost ~]$
[training@localhost ~]$
Hadoop MAPR program used to reduce the data u submitted to HDFS from local File System
first u create a File in Local File System:
step1:
Create local file
cat > myfile.txt
this is my file
this is first file
my file
ctrl+d
it will creates a myfile.txt in your linux local file system
step2:
loaded in to HDFS
create a directory in HDFS
$ hadoop fs -mkdir mydir
t
place your local file in to this directory
$ hadoop fs -put myfile.txt mydir
Step3:
execute MapR program u done in ecllipse
$hadoop jar wordcount.jar WordCount mydir/myfile.txt mydir/out
once the above command executed MapR program executed and generate output maintain in OUT directory
$hadoop fs -cat /user/training/mydir/out/part-00000
this 2
is 2
my 2
file 3
The Anlytic MapR program u should write in JAVA
No comments:
Post a Comment