Hadoop learning pot: MATRIX MULTIPLICATION

Step 1: Define the Input Format

For matrix multiplication, we typically represent the matrices in a sparse format like (i, j, value) for matrix elements, where i is the row index, j is the column index, and value is the element value.

We will assume two matrices: Matrix A (size MxN) and Matrix B (size NxP). The result will be Matrix C (size MxP).

Step 2: Mapper Class

The mapper will emit intermediate key-value pairs that represent the multiplication of elements in Matrix A and Matrix B. The key will be a tuple (i, k) where i is the row of Matrix A, and k is the column of Matrix B. The value will be the relevant elements of the matrices being multiplied.

Step 3: Reducer Class

The reducer will aggregate the results for each (i, k) key by summing the products of the corresponding matrix elements from Matrix A and Matrix B.

Step 4: Final Output

The output will be the resulting matrix C, where each element is the sum of the products of the corresponding row of Matrix A and the column of Matrix B.

Step-by-Step Code Example:

1. MatrixMultiplication.java (Main Driver Class)

java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MatrixMultiplication {
    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: MatrixMultiplication <input_path> <output_path>");
            System.exit(-1);
        }

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Matrix Multiplication");
        job.setJarByClass(MatrixMultiplication.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setMapperClass(MatrixMultiplicationMapper.class);
        job.setReducerClass(MatrixMultiplicationReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

2. MatrixMultiplicationMapper.java (Mapper Class)

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MatrixMultiplicationMapper extends Mapper<Object, Text, Text, IntWritable> {

    private static final String A_PREFIX = "A";  // For Matrix A
    private static final String B_PREFIX = "B";  // For Matrix B

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] tokens = line.split(",");
        
        // Check if this is Matrix A or Matrix B
        String matrixType = tokens[0].trim();
        int i = Integer.parseInt(tokens[1].trim()); // Row or Column Index
        int j = Integer.parseInt(tokens[2].trim()); // Column or Row Index
        int val = Integer.parseInt(tokens[3].trim()); // Matrix Element Value

        if (matrixType.equals("A")) {
            // For Matrix A (i, j, value), we emit (i, k) pairs for each element in A
            for (int k = 0; k < 3; k++) { // Assume B is 3x3 for example; adapt for general cases
                context.write(new Text(i + "," + k), new IntWritable(val));
            }
        } else if (matrixType.equals("B")) {
            // For Matrix B (i, j, value), we emit (i, k) pairs for each element in B
            for (int k = 0; k < 3; k++) {  // Again adapt for general cases
                context.write(new Text(i + "," + k), new IntWritable(val));
            }
        }
    }
}

3. MatrixMultiplicationReducer.java (Reducer Class)

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class MatrixMultiplicationReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable value : values) {
            sum += value.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

4. Input Format Example

You would feed the matrices in a CSV format:

Matrix A (3x3):

css
A, 0, 0, 1
A, 0, 1, 2
A, 0, 2, 3
A, 1, 0, 4
A, 1, 1, 5
A, 1, 2, 6
A, 2, 0, 7
A, 2, 1, 8
A, 2, 2, 9

Matrix B (3x3):

css
B, 0, 0, 1
B, 0, 1, 2
B, 0, 2, 3
B, 1, 0, 4
B, 1, 1, 5
B, 1, 2, 6
B, 2, 0, 7
B, 2, 1, 8
B, 2, 2, 9

5. Output Format Example

For the output of the final matrix C (which will be of size 3x3):

Step 5: Compilation and Execution

Compile the Java files:

bash
javac -classpath `hadoop classpath` -d . MatrixMultiplication.java MatrixMultiplicationMapper.java MatrixMultiplicationReducer.java
jar cf matrix_multiplication.jar MatrixMultiplication*.class

Run the Hadoop job:

bash
hadoop jar matrix_multiplication.jar MatrixMultiplication input_path output_path

Hadoop learning pot

Search This Blog

Saturday, 22 March 2025

MATRIX MULTIPLICATION