STEP 1 — Create Sample Dataset (On Linux)
Create emp1.csv:
vi emp1.csv
Paste:
employee_id,name,department,salary
1,John Doe,Engineering,70000
2,Jane Smith,Marketing,60000
3,Jim Brown,Engineering,80000
4,Jake White,Sales,50000
5,Emily Davis,Marketing,75000
Save and exit.
start hue
http://localhost:8088
training
training
STEP 2 — Upload File to HDFS
Option A (Using Command Line)
hadoop fs -mkdir -p /user/hue
hadoop fs -put emp1.csv /user/hue/
hadoop fs -ls /user/hue
hadoop fs -mkdir -p /user/hue
hadoop fs -put emp1.csv /user/hue/
hadoop fs -ls /user/hue
hadoop fs -put emp1.csv /user/hue/
hadoop fs -ls /user/hue
You should see emp1.csv listed.
Option B (Using Hue UI)
-
Open Hue
-
Go to File Browser
-
Click Upload
-
Select employees.csv
-
Upload to:
/user/hue/
Open Hue
Go to File Browser
Click Upload
Select employees.csv
Upload to:
/user/hue/
✅ STEP 3 — Create Hive Table in Hue
Go to:
Hue → Query Editors → Hive
CREATE TABLE emp1 (
employee_id INT,
name STRING,
department STRING,
salary FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Then load data:
LOAD DATA INPATH '/user/hue/emp1.csv'
INTO TABLE emp1;
✅ STEP 4 — Verify Data
Run:
SELECT * FROM emp1;
You should see 5 rows.
✅ STEP 5 — Analysis Queries
1️⃣ Count Employees by Department
SELECT department, COUNT(*) AS employee_count
FROM emp1
GROUP BY department;
SELECT department, COUNT(*) AS employee_count
FROM emp1
GROUP BY department;
FROM emp1
GROUP BY department;
2️⃣ Average Salary by Department
SELECT department, AVG(salary) AS average_salary
FROM emp1
GROUP BY department;
SELECT department, AVG(salary) AS average_salary
FROM emp1
GROUP BY department;
FROM emp1
GROUP BY department;
3️⃣ Salary Greater Than 60000
SELECT name, salary
FROM emp1
WHERE salary > 60000;
SELECT name, salary
FROM emp1
WHERE salary > 60000;
FROM emp1
WHERE salary > 60000;
✅ STEP 6 — Generate Reports
After running query in Hue:
-
Results appear below
-
Click Download
-
Export as:
-
CSV
-
Excel
-
This becomes your report.
✅ STEP 7 — Create Dashboard in Hue
-
Go to Dashboards
-
Click New Dashboard
-
Add a widget
-
Select your saved Hive query
-
Choose:
-
Bar Chart → Employee count by department
-
Pie Chart → Salary distribution
Go to Dashboards
Click New Dashboard
Add a widget
Select your saved Hive query
Choose:
-
Bar Chart → Employee count by department
-
Pie Chart → Salary distribution
Save dashboard.
Now you have visual report.
✅ OPTIONAL — Using Impala (Faster Queries)
In Hue:
Query Editors → Impala
Run same queries (faster performance).
No comments:
Post a Comment