Prepare sample data :
In local
directory and copy in to hdfs directory
row1 Family1:col1 Family1:col2 Family2:col1
row2 Family1:col1 Family1:col2 Family2:col1
row3 Family1:col1 Family1:col2 Family2:col1
Hadoop fs
-mkdir data
Hadoop fs
-put data.tsv data
Create
Hbase table
hbase shell>
create
'my_table', 'Family1', 'Family2'
Ø $ hbase
org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.columns="HBASE_ROW_KEY,Family1:col1,Family1:col2,Family2:col1"
my_table /user/training/data/inputfile.tsv
----------------------------------------------------------------------------------------------------------------
Let's
assume we have a sales dataset with the following columns:
- transaction_id (Row Key)
- product_id (Product ID)
- product_name (Product Name)
- region (Region where the sale
happened)
- sales_amount (Amount of Sale)
- sale_date (Date of Transaction)
Steps for HBase Big Data Analytics Without Spark:
Step 1: Create and Load Sample Data into HBase
1.1 Create Sales Table in HBase
In HBase
shell, we first need to create a table to store the sales data.
hbase shell
create 'sales', 'details'
This
creates a table called sales with a column family details.
1.2 Load Sample Sales Data into HBase
Now,
let's load some sample sales data into this table using the put command in the HBase shell.
put 'sales', '1', 'details:product_id', '1001'
put 'sales', '1', 'details:product_name', 'Laptop'
put 'sales', '1', 'details:region', 'North'
put 'sales', '1', 'details:sales_amount', '1200'
put 'sales', '1', 'details:sale_date', '2025-01-15'
put 'sales', '2', 'details:product_id', '1002'
put 'sales', '2', 'details:product_name',
'Smartphone'
put 'sales', '2', 'details:region', 'South'
put 'sales', '2', 'details:sales_amount', '800'
put 'sales', '2', 'details:sale_date', '2025-01-16'
put 'sales', '3', 'details:product_id', '1003'
put 'sales', '3', 'details:product_name', 'Tablet'
put 'sales', '3', 'details:region', 'East'
put 'sales', '3', 'details:sales_amount', '500'
put 'sales', '3', 'details:sale_date', '2025-01-17'
put 'sales', '4', 'details:product_id', '1001'
put 'sales', '4', 'details:product_name', 'Laptop'
put 'sales', '4', 'details:region', 'West'
put 'sales', '4', 'details:sales_amount', '1100'
put 'sales', '4', 'details:sale_date', '2025-01-17'
This
inserts four records, each representing a sale with product, region, and sales
amount information.
Step 2: Querying Data with HBase Shell
You can
retrieve or query data directly from HBase using the HBase shell.
2.1 Retrieve a Single Record
To
retrieve data for a specific row, use the get command:
get 'sales', '1'
This will
return the sale data for the record with row key 1.
2.2 Scan the Entire Table
To scan
all the rows in the sales table, use the scan command:
scan 'sales'
This will
display all the records from the sales table.
2.3 Scan with Filters
You can
also apply filters to your queries. For example, to get all records where product_name is Laptop, use:
scan 'sales', {FILTER => "SingleColumnValueFilter('details', 'product_name', =, 'binary:Laptop')"}
This
scans for all sales where the product_name is Laptop.
-----------------------------------------------------------------
No comments:
Post a Comment