Hadoop learning pot: weather mapreduce

A Hadoop MapReduce program to mine weather data and display messages based on the weather conditions can be broken down into several steps. In this example, we will assume that we have a dataset with weather data in the form of key-value pairs, and we want to determine the weather conditions (such as "Hot," "Cold," "Moderate," etc.) based on temperature and display messages accordingly.

Steps for the Hadoop MapReduce Program:

1. Define the Input Data Format

The input data should be in a format that is easy to process, such as a CSV or text file. Each row could represent a weather record with attributes such as temperature, humidity, etc.

For example, the weather data may look like this:

yaml
Date, Temperature, Humidity, Wind Speed
2025-03-23, 85, 60, 10
2025-03-22, 50, 65, 5
2025-03-21, 65, 70, 15

The main data we will focus on is the temperature value.

2. Write the Mapper

The Mapper will process each record, extract the temperature value, and categorize it into different weather conditions.

If the temperature is above 80, the weather condition is "Hot."
If the temperature is below 60, the weather condition is "Cold."
If the temperature is between 60 and 80, the weather condition is "Moderate."

Here’s a sample Mapper code in Java:

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;

public class WeatherMapper extends Mapper<Object, Text, Text, Text> {

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        // Split the input record into individual fields
        String[] fields = value.toString().split(",");

        if (fields.length >= 2) {
            try {
                // Extract temperature (assuming it's the second column)
                int temperature = Integer.parseInt(fields[1].trim());

                String condition;
                if (temperature > 80) {
                    condition = "Hot";
                } else if (temperature < 60) {
                    condition = "Cold";
                } else {
                    condition = "Moderate";
                }

                // Emit the condition as key and the message as value
                context.write(new Text(condition), new Text("The weather is " + condition + " with temperature " + temperature + "°F."));
            } catch (NumberFormatException e) {
                // Handle the case where temperature is not a valid number
                // Ignore or log the error
            }
        }
    }
}

3. Write the Reducer

The Reducer will receive the weather condition as the key and the corresponding messages as the values. The reducer will combine all messages for each weather condition and output them.

Here’s the Reducer code in Java:

java
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;

public class WeatherReducer extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        // For each weather condition, write a summary message
        for (Text value : values) {
            context.write(key, value);
        }
    }
}

4. Configure and Run the Job

Now, you need to configure the Job and run it. The configuration specifies the Mapper and Reducer classes, input/output paths, and output formats.

Here’s the Job configuration:

java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WeatherJob {
    public static void main(String[] args) throws Exception {
        // Set up the configuration
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Weather Data Mining");
        job.setJarByClass(WeatherJob.class);

        // Set the Mapper and Reducer classes
        job.setMapperClass(WeatherMapper.class);
        job.setReducerClass(WeatherReducer.class);

        // Set the output key and value types
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // Set the input and output paths
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // Wait for the job to complete
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

5. Run the Program

After writing the Mapper, Reducer, and Job configuration, compile the program and run it on a Hadoop cluster or in local mode. You can execute the program using the following command (assuming the necessary JAR file is compiled):

bash
hadoop jar WeatherJob.jar WeatherJob /input/weather_data.csv /output

In this case, the input file weather_data.csv contains the weather data, and the output directory will contain the results.

6. Output Example

After the program runs, you should see output like this:

csharp
Hot    The weather is Hot with temperature 85°F.
Moderate    The weather is Moderate with temperature 65°F.
Cold    The weather is Cold with temperature 50°F.

This is a simple MapReduce program that reads weather data, classifies the weather conditions based on the temperature, and outputs a message for each condition. The messages could be extended to provide more detailed weather information if other attributes (like humidity or wind speed) were included in the input data.

Hadoop learning pot

Search This Blog

Sunday, 23 March 2025

weather mapreduce

Steps for the Hadoop MapReduce Program:

1. Define the Input Data Format

2. Write the Mapper

3. Write the Reducer

4. Configure and Run the Job

5. Run the Program

6. Output Example

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog