Sunday, 1 March 2026

map reduce WORD COUNT ubuntu hadoop

Step 1: Create Input File

Open Ubuntu terminal:


nano input.txt

Add sample content:


hello world
hello hadoop
hello world
big data hadoop

Save and exit.

Step 2: Create Mapper Script


nano mapper.py

mapper.py


#!/usr/bin/env python3
import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
        print(f"{word}\t1")

Make it executable:


chmod +x mapper.py

Step 3: Create Reducer Script


nano reducer.py

reducer.py


#!/usr/bin/env python3
import sys

current_word = None
current_count = 0

for line in sys.stdin:
    line = line.strip()
    word, count = line.split("\t")
    count = int(count)

    if word == current_word:
        current_count += count
    else:
        if current_word:
            print(f"{current_word}\t{current_count}")
        current_word = word
        current_count = count

if current_word:
    print(f"{current_word}\t{current_count}")

Make it executable:


chmod +x reducer.py

Step 4: Test Locally (Without Hadoop)


cat input.txt | ./mapper.py | sort | ./reducer.py

Output:


big     1
data    1
hadoop  2
hello   3
world   2

✔ Works correctly.

Step 5: Run Using Hadoop (Hadoop Streaming)

Create HDFS directory


hdfs dfs -mkdir /wordcount

Upload input file


hdfs dfs -put input.txt /wordcount

Run Hadoop Streaming Job


hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*.jar \
-input /wordcount/input.txt \
-output /wordcount_output \
-mapper mapper.py \
-reducer reducer.py \
-file mapper.py \
-file reducer.py

Step 6: View Output


hdfs dfs -cat /wordcount_output/part-00000

Hadoop learning pot

Search This Blog

Sunday, 1 March 2026

map reduce WORD COUNT ubuntu hadoop

Step 1: Create Input File

Step 2: Create Mapper Script

mapper.py

Step 3: Create Reducer Script

reducer.py

Step 4: Test Locally (Without Hadoop)

Step 5: Run Using Hadoop (Hadoop Streaming)

Create HDFS directory

Upload input file

Run Hadoop Streaming Job

Step 6: View Output

No comments:

Post a Comment

Hadoop Analytics

AI & DS HUE EXPERIMENT

Search This Blog