Saturday 11 February 2017

practical key value pairs

Analysis of Mapreduce program without Mapper and without Reducer and study what exactly happens:

primary input data i send to HDFS:

[training@localhost ~]$ hadoop fs -cat qaz/sample.txt
aaa bb
dd dd
dd ff
ff gg

I execute My MAPR program by eliminating Mapper and Reducer by making 

mapper reducer as Identity mapper and Identity reducer i got following Output

[training@localhost ~]$ hadoop fs -cat qaz/mapop/part-00000  [ remove mapper and reducer]
0       aaa bb
7       aa
10      aa
13      bbb
17      cc
20      dd dd
26      dd ff
32      cc
35      dd
38      ff gg
44      gg
47      dd

if u observed the above  output the data in Hadoop Cluster divided in to  Blocks by getting Record information in Byteoffset address key and Value as entire line

[training@localhost ~]$ hadoop fs -cat qaz/redop/part-00000
aa      1
aa      1
aaa     1
bb      1
bbb     1
cc      1
cc      1
dd      1
dd      1
dd      1
dd      1
dd      1
ff      1
ff      1
gg      1
gg      1

In above output eliminated reducer only mapper is there, it generated input data blocks executed mapping program by DataNode daemons and produce final mapping sorting shuffling output

in key value pair to reducer program

[training@localhost ~]$ hadoop fs -cat qaz/op1/part-00000
aa      2
aaa     1
bb      1
bbb     1
cc      2
dd      5
ff      2
gg      2

finally the reducer program generates group similar key values and reduces the final output result

