Search This Blog

Sunday, 1 March 2026

map reduce weather data ubuntu hadoop using python

 

Weather Data Mining using Hadoop Streaming


 Step 1: Create Input File

Open terminal:

nano weather_data.txt

Add sample data:

2023-10-01,25,60,0
2023-10-02,30,70,5
2023-10-03,15,80,10
2023-10-04,10,90,15
2023-10-05,35,50,0

Format:

Date,Temperature,Humidity,Precipitation

 Step 2: Create Mapper Script

nano mapper.py

 mapper.py

#!/usr/bin/env python3
import sys

for line in sys.stdin:
line = line.strip()
if not line:
continue

try:
date, temp, humidity, precipitation = line.split(",")

temp = float(temp)
humidity = float(humidity)
precipitation = float(precipitation)

# Weather condition logic
if precipitation > 0:
message = "Rainy day"
elif temp >= 35:
message = "Very Hot day"
elif temp >= 30:
message = "Hot day"
elif temp <= 10:
message = "Very Cold day"
elif temp <= 15:
message = "Cold day"
elif humidity > 85:
message = "Humid day"
else:
message = "Pleasant day"

print(f"{date}\t{message}")

except:
continue

Make executable:

chmod +x mapper.py

 Step 3: Create Reducer Script

nano reducer.py

 reducer.py

#!/usr/bin/env python3
import sys

for line in sys.stdin:
line = line.strip()
if line:
print(line)

Make executable:

chmod +x reducer.py

👉 Note: Reducer is simple because classification is done in mapper.


 Step 4: Test Locally (Without Hadoop)

cat weather_data.txt | ./mapper.py | sort | ./reducer.py

 Expected Output

2023-10-01 Pleasant day
2023-10-02 Rainy day
2023-10-03 Rainy day
2023-10-04 Rainy day
2023-10-05 Very Hot day

 Step 5: Run in Hadoop Streaming

 Create HDFS directory

hdfs dfs -mkdir /weather

 Upload file

hdfs dfs -put weather_data.txt /weather

 Run Hadoop job

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*.jar \
-input /weather/weather_data.txt \
-output /weather_output \
-mapper mapper.py \
-reducer reducer.py \
-file mapper.py \
-file reducer.py

 Step 6: View Output

hdfs dfs -cat /weather_output/part-00000

No comments:

Post a Comment

Hadoop Analytics

word count ubuntu hadoop and spark

  PART 1: Word Count in Hadoop (Python – Hadoop Streaming)  Step 1: Create Input File nano input.txt Example content: hello world hell...