Hadoop learning pot: matrix-2

MapReduce can be used to perform matrix multiplication by breaking down the problem into smaller subproblems. This is often used when working with large datasets that cannot fit into memory, or when a distributed system like Hadoop is available.

In matrix multiplication, the element at position (i, j) in the product matrix is calculated as the dot product of the ith row of the first matrix and the jth column of the second matrix. Here's how you can apply the MapReduce framework to perform matrix multiplication in Java:

Steps for Matrix Multiplication using MapReduce

Mapper Phase:
- Emit key-value pairs for each matrix element.
- For matrix A, the key will be the row index i, and for matrix B, the key will be the column index j.
- Each mapper will emit (i, (A[i][k], k)) for matrix A and (k, (B[k][j], j)) for matrix B.
Reducer Phase:
- The reducer will aggregate the partial products for each (i, j) position in the result matrix by computing the sum of the dot products of corresponding row and column elements.

Example Code:

Mapper Class

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MatrixMultiplicationMapper extends Mapper<Object, Text, Text, Text> {

    // Matrix A and B dimensions (assume we have these values)
    private static final int MATRIX_A_COLUMNS = 3;  // Example value (A is a 3x3 matrix)
    private static final int MATRIX_B_ROWS = 3;     // Example value (B is a 3x3 matrix)

    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String[] fields = value.toString().split(",");
        int row = Integer.parseInt(fields[0]);
        int col = Integer.parseInt(fields[1]);
        int matrixType = Integer.parseInt(fields[2]);  // 1 for Matrix A, 2 for Matrix B
        int val = Integer.parseInt(fields[3]);

        if (matrixType == 1) {
            // For Matrix A: Emit (row, (value, column index))
            for (int k = 0; k < MATRIX_B_ROWS; k++) {
                context.write(new Text(row + "," + k), new Text("A," + col + "," + val));
            }
        } else if (matrixType == 2) {
            // For Matrix B: Emit (column, (value, row index))
            for (int i = 0; i < MATRIX_A_COLUMNS; i++) {
                context.write(new Text(i + "," + col), new Text("B," + row + "," + val));
            }
        }
    }
}

Reducer Class

java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.*;

public class MatrixMultiplicationReducer extends Reducer<Text, Text, Text, IntWritable> {

    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        List<int[]> AValues = new ArrayList<>();
        List<int[]> BValues = new ArrayList<>();

        // Split values into A and B contributions
        for (Text value : values) {
            String[] fields = value.toString().split(",");
            String matrixType = fields[0];
            int index = Integer.parseInt(fields[1]);
            int valueAtIndex = Integer.parseInt(fields[2]);

            if (matrixType.equals("A")) {
                AValues.add(new int[]{index, valueAtIndex});
            } else if (matrixType.equals("B")) {
                BValues.add(new int[]{index, valueAtIndex});
            }
        }

        // For each combination of A and B (dot product calculation)
        int sum = 0;
        for (int[] a : AValues) {
            for (int[] b : BValues) {
                if (a[0] == b[0]) {  // Matching indices for multiplication
                    sum += a[1] * b[1];
                }
            }
        }

        // Write the result as (i, j) -> value in the result matrix
        context.write(key, new IntWritable(sum));
    }
}

Driver Class

java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MatrixMultiplication {

    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: MatrixMultiplication <input path> <output path>");
            System.exit(-1);
        }

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Matrix Multiplication");

        job.setJarByClass(MatrixMultiplication.class);
        job.setMapperClass(MatrixMultiplicationMapper.class);
        job.setReducerClass(MatrixMultiplicationReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Input Data Format

The input data format should consist of rows and columns of matrices A and B. For example:

sql
1,0,1,3  // Matrix A: row 1, column 0, value 3
1,1,1,2  // Matrix A: row 1, column 1, value 2
2,0,2,4  // Matrix A: row 2, column 0, value 4
0,1,2,5  // Matrix B: row 0, column 1, value 5
1,1,2,6  // Matrix B: row 1, column 1, value 6

Where:

1,0,1,3 represents matrix A with row 1, column 0, and value 3.
1,1,2,6 represents matrix B with row 1, column 1, and value 6.

Output Data Format

The output will be a set of key-value pairs representing the matrix multiplication results:

Running the Job

First, compile the code and package it into a .jar file.

Run the job on Hadoop using the following command:

lua
hadoop jar MatrixMultiplication.jar MatrixMultiplication <input path> <output path>

This will read the input, perform matrix multiplication using the MapReduce paradigm, and store the output in the specified output directory.

Hadoop learning pot

Search This Blog

Saturday, 22 March 2025

matrix-2

Steps for Matrix Multiplication using MapReduce

Example Code:

Mapper Class

Reducer Class

Driver Class

Input Data Format

Output Data Format

Running the Job

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog