Hadoop learning pot: BIGLAB-HBASE

Prepare sample data :

In local directory and copy in to hdfs directory

row1 Family1:col1 Family1:col2 Family2:col1

row2 Family1:col1 Family1:col2 Family2:col1

row3 Family1:col1 Family1:col2 Family2:col1

Hadoop fs -mkdir data

Hadoop fs -put data.tsv data

Create Hbase table

hbase shell>

create 'my_table', 'Family1', 'Family2'

Ø $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv

-Dimporttsv.columns="HBASE_ROW_KEY,Family1:col1,Family1:col2,Family2:col1"

my_table /user/training/data/inputfile.tsv

----------------------------------------------------------------------------------------------------------------

Let's assume we have a sales dataset with the following columns:

transaction_id (Row Key)
product_id (Product ID)
product_name (Product Name)
region (Region where the sale happened)
sales_amount (Amount of Sale)
sale_date (Date of Transaction)

Steps for HBase Big Data Analytics Without Spark:

Step 1: Create and Load Sample Data into HBase

1.1 Create Sales Table in HBase

In HBase shell, we first need to create a table to store the sales data.

hbase shell

create 'sales', 'details'

This creates a table called sales with a column family details.

1.2 Load Sample Sales Data into HBase

Now, let's load some sample sales data into this table using the put command in the HBase shell.

put 'sales', '1', 'details:product_id', '1001'

put 'sales', '1', 'details:product_name', 'Laptop'

put 'sales', '1', 'details:region', 'North'

put 'sales', '1', 'details:sales_amount', '1200'

put 'sales', '1', 'details:sale_date', '2025-01-15'

put 'sales', '2', 'details:product_id', '1002'

put 'sales', '2', 'details:product_name', 'Smartphone'

put 'sales', '2', 'details:region', 'South'

put 'sales', '2', 'details:sales_amount', '800'

put 'sales', '2', 'details:sale_date', '2025-01-16'

put 'sales', '3', 'details:product_id', '1003'

put 'sales', '3', 'details:product_name', 'Tablet'

put 'sales', '3', 'details:region', 'East'

put 'sales', '3', 'details:sales_amount', '500'

put 'sales', '3', 'details:sale_date', '2025-01-17'

put 'sales', '4', 'details:product_id', '1001'

put 'sales', '4', 'details:product_name', 'Laptop'

put 'sales', '4', 'details:region', 'West'

put 'sales', '4', 'details:sales_amount', '1100'

put 'sales', '4', 'details:sale_date', '2025-01-17'

This inserts four records, each representing a sale with product, region, and sales amount information.

Step 2: Querying Data with HBase Shell

You can retrieve or query data directly from HBase using the HBase shell.

2.1 Retrieve a Single Record

To retrieve data for a specific row, use the get command:

get 'sales', '1'

This will return the sale data for the record with row key 1.

2.2 Scan the Entire Table

To scan all the rows in the sales table, use the scan command:

scan 'sales'

This will display all the records from the sales table.

2.3 Scan with Filters

You can also apply filters to your queries. For example, to get all records where product_name is Laptop, use:

scan 'sales', {FILTER => "SingleColumnValueFilter('details', 'product_name', =, 'binary:Laptop')"}

This scans for all sales where the product_name is Laptop.

-----------------------------------------------------------------

Hadoop learning pot

Search This Blog

Sunday, 2 March 2025

BIGLAB-HBASE

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog