Hadoop learning pot: matrix-3

To implement matrix multiplication using MapReduce in Hadoop for 2x2 matrices, we can break the process into the following steps:

Input Format: We'll use a text file format where each matrix element is represented by a line containing the row, column, and value.
Map Function: In the mapper, we will split the matrices into rows and columns, and emit key-value pairs for the corresponding matrix operations.
Reduce Function: The reducer will handle the summation of products for matrix multiplication.

For simplicity, let's assume:

Matrix A is represented as A = [[a11, a12], [a21, a22]]
Matrix B is represented as B = [[b11, b12], [b21, b22]]

The matrix multiplication C = A * B will give us:

C = [[c11, c12], [c21, c22]]

Where:

c11 = a11 * b11 + a12 * b21
c12 = a11 * b12 + a12 * b22
c21 = a21 * b11 + a22 * b21
c22 = a21 * b12 + a22 * b22

Here's a simple code example using Hadoop MapReduce to implement 2x2 matrix multiplication:

Step 1: Mapper

The Mapper will read each matrix element and emit intermediate key-value pairs that represent the positions and values for matrix multiplication.

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MatrixMultiplicationMapper extends Mapper<Object, Text, Text, IntWritable> {

    // The input format will be something like:
    // Matrix A: (row, col, value)
    // Matrix B: (row, col, value)
    
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        // Parsing the input (row, col, value)
        String[] parts = value.toString().split(",");
        String matrix = parts[0].trim(); // Matrix A or B
        int row = Integer.parseInt(parts[1].trim());
        int col = Integer.parseInt(parts[2].trim());
        int val = Integer.parseInt(parts[3].trim());

        if (matrix.equals("A")) {
            // For matrix A, emit (row, col) as key and value as val
            context.write(new Text("A_" + row + "_" + col), new IntWritable(val));
        } else if (matrix.equals("B")) {
            // For matrix B, emit (row, col) as key and value as val
            context.write(new Text("B_" + row + "_" + col), new IntWritable(val));
        }
    }
}

Step 2: Reducer

The Reducer will collect the intermediate key-value pairs and compute the product for each element in the result matrix.

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class MatrixMultiplicationReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        String keyString = key.toString();
        String matrixInfo = keyString.split("_")[0]; // Either A or B
        int row = Integer.parseInt(keyString.split("_")[1]);
        int col = Integer.parseInt(keyString.split("_")[2]);
        
        List<Integer> aValues = new ArrayList<>();
        List<Integer> bValues = new ArrayList<>();

        // Separate the values from matrix A and matrix B based on the key
        for (IntWritable val : values) {
            if (matrixInfo.equals("A")) {
                aValues.add(val.get());
            } else {
                bValues.add(val.get());
            }
        }

        // Calculate the multiplication if both matrix values exist
        if (matrixInfo.equals("A") && !bValues.isEmpty()) {
            // Multiply row of A with column of B
            int result = aValues.get(0) * bValues.get(0) + aValues.get(1) * bValues.get(1); // Example for c11 calculation
            context.write(new Text("C_" + row + "_" + col), new IntWritable(result));
        } else if (matrixInfo.equals("B") && !aValues.isEmpty()) {
            // Process the multiplication for matrix B if necessary
            // Similar logic can be added to process matrix B part
        }
    }
}

Step 3: Driver

Finally, the driver will configure and execute the Hadoop job. It specifies input/output paths, the Mapper, and the Reducer.

java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MatrixMultiplicationJob {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Matrix Multiplication");

        job.setJarByClass(MatrixMultiplicationJob.class);
        job.setMapperClass(MatrixMultiplicationMapper.class);
        job.setReducerClass(MatrixMultiplicationReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));  // Input path
        FileOutputFormat.setOutputPath(job, new Path(args[1])); // Output path

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Step 4: Input Format

Ensure that the input format to the MapReduce job is properly structured. For example, if you're using text files:

less
Matrix A:
A, 0, 0, 1
A, 0, 1, 2
A, 1, 0, 3
A, 1, 1, 4

Matrix B:
B, 0, 0, 5
B, 0, 1, 6
B, 1, 0, 7
B, 1, 1, 8

Output

For the given matrices, the output for the multiplication would be something like:

nginx
C_0_0    19
C_0_1    22
C_1_0    43
C_1_1    50

Conclusion

This simple MapReduce program performs 2x2 matrix multiplication. You can extend this to larger matrices or modify it for more complex scenarios by enhancing the Mapper and Reducer logic. Make sure to manage the input format and output format carefully when running the job on a real Hadoop cluster.

Hadoop learning pot

Search This Blog

Saturday, 22 March 2025

matrix-3

Step 1: Mapper

Step 2: Reducer

Step 3: Driver

Step 4: Input Format

Output

Conclusion

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog