Search This Blog

Sunday, 2 March 2025

BIGLAB-HABSE-1

 

Create an HBase Table for Orders

Let’s create a table named orders to store the order data. The table will have the following columns:

  • Row Key: order_id (unique identifier for the order)
  • Column Family 1: order_info
    • customer_id: Customer who made the order
    • product: The product ordered
    • quantity: Quantity of the product ordered
    • price: Price per unit of the product
    • order_date: Date when the order was placed
    • status: Status of the order (e.g., Shipped, Pending)

Create the Table:


create 'orders', 'order_info'

This creates a table named orders with a single column family order_info.


Step 3: Insert Data into the Table

Now, let's insert some sample data into the orders table.

Sample Orders Data:

  • Order 1: order_id = 1, customer_id = C123, product = Laptop, quantity = 1, price = 1200, order_date = 2023-03-01, status = Shipped
  • Order 2: order_id = 2, customer_id = C124, product = Phone, quantity = 2, price = 600, order_date = 2023-03-02, status = Pending
  • Order 3: order_id = 3, customer_id = C125, product = Tablet, quantity = 3, price = 400, order_date = 2023-03-02, status = Shipped

Insert Data:

You can insert the above data into HBase using the following commands in the HBase shell:

put 'orders', '1', 'order_info:customer_id', 'C123'
put 'orders', '1', 'order_info:product', 'Laptop' put 'orders', '1', 'order_info:quantity', '1' put 'orders', '1', 'order_info:price', '1200' put 'orders', '1', 'order_info:order_date', '2023-03-01' put 'orders', '1', 'order_info:status', 'Shipped' put 'orders', '2', 'order_info:customer_id', 'C124' put 'orders', '2', 'order_info:product', 'Phone' put 'orders', '2', 'order_info:quantity', '2' put 'orders', '2', 'order_info:price', '600' put 'orders', '2', 'order_info:order_date', '2023-03-02' put 'orders', '2', 'order_info:status', 'Pending' put 'orders', '3', 'order_info:customer_id', 'C125' put 'orders', '3', 'order_info:product', 'Tablet' put 'orders', '3', 'order_info:quantity', '3' put 'orders', '3', 'order_info:price', '400' put 'orders', '3', 'order_info:order_date', '2023-03-02' put 'orders', '3', 'order_info:status', 'Shipped'

Step 4: Perform Query Analysis on the Data

Once the data is inserted into HBase, we can start querying and performing some analysis. Here are some example queries you might run.

1. Query a Single Order by order_id

To retrieve all the information for the order with order_id = 1, you can use the following command:


get 'orders', '1'

This will return the data for order 1:


COLUMN CELL order_info:customer_id timestamp=..., value=C123 order_info:product timestamp=..., value=Laptop order_info:quantity timestamp=..., value=1 order_info:price timestamp=..., value=1200 order_info:order_date timestamp=..., value=2023-03-01 order_info:status timestamp=..., value=Shipped

2. Query All Orders for a Specific Product

To find all orders that contain the product "Laptop", you would have to scan the table and filter by the product column. While HBase does not support direct SQL-style queries, you can scan through the rows and filter based on the product column.


scan 'orders', {FILTER => "SingleColumnValueFilter('order_info', 'product', =, 'binary:Laptop')"}

This command scans the orders table and filters rows where the product column contains the value "Laptop".

3. Get the Total Quantity of Products Ordered

To get the total quantity of products ordered, you can scan the table and sum the values of the quantity column:

scan 'orders'

Then, you would manually sum the quantity values in the output (in this case, 1 + 2 + 3 = 6).

4. Query Orders Based on order_date Range

To query for orders placed after a certain date, you will have to manually check the order_date field during the scan or use a more advanced approach like HBase Filters. For example:


scan 'orders', {FILTER => "SingleColumnValueFilter('order_info', 'order_date', >, 'binary:2023-03-01')"}

This query would return all orders placed after 2023-03-01.

5. Get Count of Orders by status (e.g., Pending, Shipped)

HBase doesn't support direct aggregation like SQL databases, but you can scan the table and count how many orders have a certain status. You would filter rows based on the status column:


scan 'orders', {FILTER => "SingleColumnValueFilter('order_info', 'status', =, 'binary:Shipped')"}

This would return all orders with the status "Shipped".



No comments:

Post a Comment

Hadoop Analytics

BIGLAB-HABSE-1

  Create an HBase Table for Orders Let’s create a table named orders to store the order data. The table will have the following columns: Ro...