Hadoop learning pot: practical key value pairs

Saturday, 11 February 2017

practical key value pairs

Analysis of Mapreduce program without Mapper and without Reducer and study what exactly happens:

primary input data i send to HDFS:

[training@localhost ~]$ hadoop fs -cat qaz/sample.txt
aaa bb
aa
aa
bbb
cc
dd dd
dd ff
cc
dd
ff gg
gg
dd

I execute My MAPR program by eliminating Mapper and Reducer by making

mapper reducer as Identity mapper and Identity reducer i got following Output

[training@localhost ~]$ hadoop fs -cat qaz/mapop/part-00000 [ remove mapper and reducer]
0 aaa bb
7 aa
10 aa
13 bbb
17 cc
20 dd dd
26 dd ff
32 cc
35 dd
38 ff gg
44 gg
47 dd

if u observed the above output the data in Hadoop Cluster divided in to Blocks by getting Record information in Byteoffset address key and Value as entire line

[training@localhost ~]$ hadoop fs -cat qaz/redop/part-00000
aa 1
aa 1
aaa 1
bb 1
bbb 1
cc 1
cc 1
dd 1
dd 1
dd 1
dd 1
dd 1
ff 1
ff 1
gg 1
gg 1

In above output eliminated reducer only mapper is there, it generated input data blocks executed mapping program by DataNode daemons and produce final mapping sorting shuffling output

in key value pair to reducer program

[training@localhost ~]$ hadoop fs -cat qaz/op1/part-00000
aa 2
aaa 1
bb 1
bbb 1
cc 2
dd 5
ff 2
gg 2

finally the reducer program generates group similar key values and reduces the final output result

Hadoop learning pot

Search This Blog

Saturday, 11 February 2017

practical key value pairs

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog