Search This Blog

Sunday, 2 March 2025

BIGLAB-HBASE

 Prepare sample data  :

In local directory and copy in to hdfs directory

 

row1     Family1:col1      Family1:col2      Family2:col1

row2     Family1:col1      Family1:col2      Family2:col1

row3     Family1:col1      Family1:col2      Family2:col1

 

Hadoop fs -mkdir data

Hadoop fs -put data.tsv data

Create Hbase table

hbase shell>

create 'my_table', 'Family1', 'Family2'

Ø  $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv

  -Dimporttsv.columns="HBASE_ROW_KEY,Family1:col1,Family1:col2,Family2:col1"

  my_table  /user/training/data/inputfile.tsv

 ----------------------------------------------------------------------------------------------------------------

Let's assume we have a sales dataset with the following columns:

  • transaction_id (Row Key)
  • product_id (Product ID)
  • product_name (Product Name)
  • region (Region where the sale happened)
  • sales_amount (Amount of Sale)
  • sale_date (Date of Transaction)

Steps for HBase Big Data Analytics Without Spark:


Step 1: Create and Load Sample Data into HBase

1.1 Create Sales Table in HBase

In HBase shell, we first need to create a table to store the sales data.

hbase shell

create 'sales', 'details'

This creates a table called sales with a column family details.

1.2 Load Sample Sales Data into HBase

Now, let's load some sample sales data into this table using the put command in the HBase shell.


put 'sales', '1', 'details:product_id', '1001'

put 'sales', '1', 'details:product_name', 'Laptop'

put 'sales', '1', 'details:region', 'North'

put 'sales', '1', 'details:sales_amount', '1200'

put 'sales', '1', 'details:sale_date', '2025-01-15'

 

put 'sales', '2', 'details:product_id', '1002'

put 'sales', '2', 'details:product_name', 'Smartphone'

put 'sales', '2', 'details:region', 'South'

put 'sales', '2', 'details:sales_amount', '800'

put 'sales', '2', 'details:sale_date', '2025-01-16'

 

put 'sales', '3', 'details:product_id', '1003'

put 'sales', '3', 'details:product_name', 'Tablet'

put 'sales', '3', 'details:region', 'East'

put 'sales', '3', 'details:sales_amount', '500'

put 'sales', '3', 'details:sale_date', '2025-01-17'

 

put 'sales', '4', 'details:product_id', '1001'

put 'sales', '4', 'details:product_name', 'Laptop'

put 'sales', '4', 'details:region', 'West'

put 'sales', '4', 'details:sales_amount', '1100'

put 'sales', '4', 'details:sale_date', '2025-01-17'

This inserts four records, each representing a sale with product, region, and sales amount information.


Step 2: Querying Data with HBase Shell

You can retrieve or query data directly from HBase using the HBase shell.

2.1 Retrieve a Single Record

To retrieve data for a specific row, use the get command:


get 'sales', '1'

This will return the sale data for the record with row key 1.

2.2 Scan the Entire Table

To scan all the rows in the sales table, use the scan command:

scan 'sales'

This will display all the records from the sales table.

2.3 Scan with Filters

You can also apply filters to your queries. For example, to get all records where product_name is Laptop, use:

scan 'sales', {FILTER => "SingleColumnValueFilter('details', 'product_name', =, 'binary:Laptop')"}

This scans for all sales where the product_name is Laptop.



-----------------------------------------------------------------

No comments:

Post a Comment

Hadoop Analytics

BIGLAB-HABSE-1

  Create an HBase Table for Orders Let’s create a table named orders to store the order data. The table will have the following columns: Ro...