Weather Data Mining using Hadoop Streaming
Step 1: Create Input File
Open terminal:
nano weather_data.txt
Add sample data:
2023-10-01,25,60,0
2023-10-02,30,70,5
2023-10-03,15,80,10
2023-10-04,10,90,15
2023-10-05,35,50,0
Format:
Date,Temperature,Humidity,Precipitation
Step 2: Create Mapper Script
nano mapper.py
mapper.py
#!/usr/bin/env python3
import sys
for line in sys.stdin:
line = line.strip()
if not line:
continue
try:
date, temp, humidity, precipitation = line.split(",")
temp = float(temp)
humidity = float(humidity)
precipitation = float(precipitation)
# Weather condition logic
if precipitation > 0:
message = "Rainy day"
elif temp >= 35:
message = "Very Hot day"
elif temp >= 30:
message = "Hot day"
elif temp <= 10:
message = "Very Cold day"
elif temp <= 15:
message = "Cold day"
elif humidity > 85:
message = "Humid day"
else:
message = "Pleasant day"
print(f"{date}\t{message}")
except:
continue
Make executable:
chmod +x mapper.py
Step 3: Create Reducer Script
nano reducer.py
reducer.py
#!/usr/bin/env python3
import sys
for line in sys.stdin:
line = line.strip()
if line:
print(line)
Make executable:
chmod +x reducer.py
👉 Note: Reducer is simple because classification is done in mapper.
Step 4: Test Locally (Without Hadoop)
cat weather_data.txt | ./mapper.py | sort | ./reducer.py
Expected Output
2023-10-01 Pleasant day
2023-10-02 Rainy day
2023-10-03 Rainy day
2023-10-04 Rainy day
2023-10-05 Very Hot day
Step 5: Run in Hadoop Streaming
Create HDFS directory
hdfs dfs -mkdir /weather
Upload file
hdfs dfs -put weather_data.txt /weather
Run Hadoop job
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*.jar \
-input /weather/weather_data.txt \
-output /weather_output \
-mapper mapper.py \
-reducer reducer.py \
-file mapper.py \
-file reducer.py
Step 6: View Output
hdfs dfs -cat /weather_output/part-00000
No comments:
Post a Comment