Tuesday, 3 March 2026

AI & DS HUE EXPERIMENT

STEP 1 — Create Sample Dataset (On Linux)

Create emp1.csv:


vi emp1.csv

Paste:


employee_id,name,department,salary
1,John Doe,Engineering,70000
2,Jane Smith,Marketing,60000
3,Jim Brown,Engineering,80000
4,Jake White,Sales,50000
5,Emily Davis,Marketing,75000

Save and exit.

start hue

http://localhost:8088

training

STEP 2 — Upload File to HDFS

Option A (Using Command Line)


hadoop fs -mkdir -p /user/hue
hadoop fs -put emp1.csv /user/hue/
hadoop fs -ls /user/hue

You should see emp1.csv listed.

Option B (Using Hue UI)

Open Hue
Go to File Browser
Click Upload
Select employees.csv
Upload to:


/user/hue/

✅ STEP 3 — Create Hive Table in Hue

Go to:


Hue → Query Editors → Hive


CREATE TABLE emp1 (
   employee_id INT,
   name STRING,
   department STRING,
   salary FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Then load data:


LOAD DATA INPATH '/user/hue/emp1.csv'
INTO TABLE emp1;

✅ STEP 4 — Verify Data

Run:


SELECT * FROM emp1;

You should see 5 rows.

✅ STEP 5 — Analysis Queries

1️⃣ Count Employees by Department


SELECT department, COUNT(*) AS employee_count
FROM emp1
GROUP BY department;

2️⃣ Average Salary by Department


SELECT department, AVG(salary) AS average_salary
FROM emp1
GROUP BY department;

3️⃣ Salary Greater Than 60000


SELECT name, salary
FROM emp1
WHERE salary > 60000;

✅ STEP 6 — Generate Reports

After running query in Hue:

Results appear below

Click Download

Export as:

CSV

Excel

This becomes your report.

✅ STEP 7 — Create Dashboard in Hue

Go to Dashboards

Click New Dashboard

Add a widget

Select your saved Hive query

Choose:

Bar Chart → Employee count by department

Pie Chart → Salary distribution

Save dashboard.

Now you have visual report.

✅ OPTIONAL — Using Impala (Faster Queries)

In Hue:


Query Editors → Impala

Run same queries (faster performance).

AI &DS MONGO DB EXPERIMENT IN CLOUD ERA

sudo rm -rf /usr/local/mongodb

wget http://fastdl.mongodb.org/linux/mongodb-linux-i686-2.6.12.tgz

tar -xvzf mongodb-linux-i686-2.6.12.tgz

sudo mv mongodb-linux-i686-2.6.12 /usr/local/mongodb

file /usr/local/mongodb/bin/mongo

Start MongoDB Server

First create data directory (only first time):


sudo mkdir -p /data/db
sudo chmod -R 777 /data

/usr/local/mongodb/bin/mongod --dbpath /data/db

You will see logs like:


waiting for connections on port 27017


Open Mongo Shell (New Terminal)
Open a new terminal window and run:
/usr/local/mongodb/bin/mongo

You will see:>
> show dbs

>use trainingdb

db.students.insert([

{name:"Ravi", age:22, course:"Hadoop", marks:75},

{name:"Anita", age:21, course:"MongoDB", marks:85},

{name:"Suresh", age:23, course:"Hadoop", marks:65},

{name:"Priya", age:22, course:"Spark", marks:90},

{name:"Kiran", age:24, course:"MongoDB", marks:70},

{name:"Meena", age:21, course:"Hadoop", marks:88},

{name:"Arun", age:23, course:"Spark", marks:60}

])

COUNT Function

Count total documents:


db.students.count()

Count with condition:


db.students.count({course:"Hadoop"})

SORT Function

Sort by marks ascending:


db.students.find().sort({marks:1})

Sort by marks descending:


db.students.find().sort({marks:-1})

Sort by age:


db.students.find().sort({age:1})

LIMIT Function

Top 3 students:


db.students.find().limit(3)

Top 3 highest marks:


db.students.find().sort({marks:-1}).limit(3)

SKIP Function

Skip first 2 records:


db.students.find().skip(2)

Pagination example (skip 2, show next 3):


db.students.find().skip(2).limit(3)

AGGREGATE Function (Important)

Aggregation framework example.

🔹 Total number of students per course


db.students.aggregate([
 { $group: { _id: "$course", totalStudents: { $sum: 1 } } }
])

🔹 Average marks per course


db.students.aggregate([
 { $group: { _id: "$course", avgMarks: { $avg: "$marks" } } }
])

🔹 Maximum marks per course


db.students.aggregate([
 { $group: { _id: "$course", maxMarks: { $max: "$marks" } } }
])

🔹 Minimum marks per course


db.students.aggregate([
 { $group: { _id: "$course", minMarks: { $min: "$marks" } } }
])

🔹 Sort after aggregation


db.students.aggregate([
 { $group: { _id: "$course", avgMarks: { $avg: "$marks" } } },
 { $sort: { avgMarks: -1 } }
])

STOP MONGO DB

If you started MongoDB like this:


/usr/local/mongodb/bin/mongod --dbpath /data/db

Then simply go to that terminal and press:


CTRL + C

AI & DS PIG EXPERIMENT

first create these files in your terminal using cat command

cat > employees.csv

1, John Doe, 101, 50000

2, Jane Smith, 102, 60000

3, Jim Brown, 101, 55000

4, Jake White, 103, 70000

cat > departments.csv

101, HR

102, Engineering

103, Marketing

move to hadoop directory

>hadoop fs -put employees.csv emp.csv

>hadoop fs -put departments.csv dept.csv

> vi pig1.pig

employees = LOAD 'emp.csv' USING PigStorage(',') AS (employee_id:int, name:chararray,

department_id:int, salary:float);

dump employees;

> pig pig1.pig

> vi pig2.pig

departments = LOAD 'dept.csv' USING PigStorage(',') AS (department_id:int,

department_name:chararray);

dump departments;

>pig pig2.pig

>vi pig3.pig

employees = LOAD 'emp.csv' USING PigStorage(',') AS (employee_id:int, name:chararray,

department_id:int, salary:float);

high_salary_employees = FILTER employees BY salary > 55000;

dump high_salary_employees;

>pig pig3.pig

TRY these scripts

-- Project only the name and salary of high salary employees

projected_employees = FOREACH high_salary_employees GENERATE name, salary;

-- Group employees by department_id

grouped_employees = GROUP employees BY department_id;

-- Join employees with departments to get department names

joined_data = JOIN employees BY department_id, departments BY department_id;

-- Sort employees by salary in descending order

sorted_employees = ORDER employees BY salary DESC;

-- Store the results of the sorted employees

STORE sorted_employees INTO 'sorted_employees_output' USING PigStorage(',');

AI & DS HIVE EXPERIMENT

open terminal

>hive

>CREATE DATABASE IF NOT EXISTS company_db;

USE company_db;

open new terminal

> vi hivescript1.ql

use company_db;

CREATE TABLE IF NOT EXISTS employees (

employee_id INT,

name STRING,

department_id INT,

salary FLOAT

) ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

> hive -S -f hivescript1.ql

> vi hivescript2.ql

use company_db;

CREATE TABLE IF NOT EXISTS departments (

department_id INT,

department_name STRING

) ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

> hive -S -f hivescript2.ql

cat > emp.txt

1,John,101,50000
2,Anita,101,45000
3,Ravi,102,65000
4,Meena,103,55000
5,Kiran,104,40000

cat>department.txt

101,hr
102,mark
103,fin

go to hive

hive> load data local inpath 'emp.txt' into table employees;

hive> load data local inpath 'department.txt' into table departments;

-- Add a new column to the employees table

ALTER TABLE employees ADD COLUMNS (hire_date DATE);

-- Create a view to show high salary employees

CREATE VIEW IF NOT EXISTS high_salary_employees AS

SELECT name, salary

FROM employees

WHERE salary > 60000;

-- Create an index on the employees table for the department_id column

CREATE INDEX idx_department_id

ON TABLE employees (department_id)

AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

WITH DEFERRED REBUILD;

-- Drop the view

DROP VIEW IF EXISTS high_salary_employees;

-- Drop the employees table

DROP TABLE IF EXISTS employees;

-- Drop the departments table

DROP TABLE IF EXISTS departments;

-- Drop the index

DROP INDEX IF EXISTS idx_department_id ON TABLE employees;

-- Drop the database

DROP DATABASE IF EXISTS company_db CASCADE;

Monday, 2 March 2026

load sample data in to hive

Step 1: Create Sample Delimited Text File

We’ll create a simple employee dataset.

📄 Create a text file in Linux


vi employee.txt

Add sample data (Comma Delimited)


1,John,IT,50000
2,Anita,HR,45000
3,Ravi,Finance,60000
4,Meena,IT,55000
5,Kiran,Sales,40000

Save and exit.

✅ Step 2: Move File to HDFS

Create HDFS directory


hadoop fs -mkdir -p /user/hive/employee_data

Upload file to HDFS


hadoop fs -put employee.txt /user/hive/employee_data/

Verify


hadoop fs -ls /user/hive/employee_data

✅ Step 3: Start Hive


hive

✅ Step 4: Create Hive Database (Optional)


CREATE DATABASE company;
USE company;

✅ Step 5: Create Hive Table (Row Formatted Delimited)


CREATE TABLE employee (
    id INT,
    name STRING,
    department STRING,
    salary INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

✅ Step 6: Load Data from HDFS into Hive Table


LOAD DATA INPATH '/user/hive/employee_data/employee.txt'
INTO TABLE employee;

✅ Step 7: Verify Data


SELECT * FROM employee;

hive lab manual

Use Hive to create, alter, and drop databases, tables, views, functions, and

indexes.

-- Create a new database

CREATE DATABASE IF NOT EXISTS company_db;

-- Switch to the newly created database

USE company_db;

-- Create an employees table

CREATE TABLE IF NOT EXISTS employees (

employee_id INT,

name STRING,

department_id INT,

salary FLOAT

) ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

-- Create a departments table

CREATE TABLE IF NOT EXISTS departments (

department_id INT,

department_name STRING

) ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

-- Add a new column to the employees table

ALTER TABLE employees ADD COLUMNS (hire_date DATE);

-- Create a view to show high salary employees

CREATE VIEW IF NOT EXISTS high_salary_employees AS

SELECT name, salary

FROM employees

WHERE salary > 60000;

-- Create an index on the employees table for the department_id column

CREATE INDEX idx_department_id

ON TABLE employees (department_id)

AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

WITH DEFERRED REBUILD;

-- Drop the view

DROP VIEW IF EXISTS high_salary_employees;

-- Drop the employees table

DROP TABLE IF EXISTS employees;

-- Drop the departments table

DROP TABLE IF EXISTS departments;

-- Drop the index

DROP INDEX IF EXISTS idx_department_id ON TABLE employees;

-- Drop the database

DROP DATABASE IF EXISTS company_db CASCADE;

Notes

• Make sure to replace the paths and class names with the actual paths and class names for your

UDFs.

• The CASCADE option in the DROP DATABASE command will drop all tables and views in the

database.

• You can run these commands in a Hive shell or through a Hive client that supports HiveQL.

Search This Blog

Tuesday, 3 March 2026

STEP 1 — Create Sample Dataset (On Linux)

Create emp1.csv: vi emp1.csv Paste: employee_id,name,department,salary1,John Doe,Engineering,700002,Jane Smith,Marketing,600003,Jim Brown,Engineering,800004,Jake White,Sales,500005,Emily Davis,Marketing,75000 Save and exit.

start hue

STEP 2 — Upload File to HDFS

Option A (Using Command Line)

hadoop fs -mkdir -p /user/huehadoop fs -put emp1.csv /user/hue/hadoop fs -ls /user/hue You should see emp1.csv listed.

Option B (Using Hue UI)

Open Hue Go to File Browser Click Upload Select employees.csv Upload to: /user/hue/

✅ STEP 3 — Create Hive Table in Hue

Go to: Hue → Query Editors → Hive CREATE TABLE emp1 ( employee_id INT, name STRING, department STRING, salary FLOAT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','STORED AS TEXTFILE; Then load data: LOAD DATA INPATH '/user/hue/emp1.csv'INTO TABLE emp1;

✅ STEP 4 — Verify Data

Run: SELECT * FROM emp1; You should see 5 rows.

✅ STEP 5 — Analysis Queries

1️⃣ Count Employees by Department

SELECT department, COUNT(*) AS employee_countFROM emp1GROUP BY department;

2️⃣ Average Salary by Department

SELECT department, AVG(salary) AS average_salaryFROM emp1GROUP BY department;

3️⃣ Salary Greater Than 60000

SELECT name, salaryFROM emp1WHERE salary > 60000;

✅ STEP 6 — Generate Reports

After running query in Hue: Results appear below Click Download Export as: CSV Excel This becomes your report.

✅ STEP 7 — Create Dashboard in Hue

Go to Dashboards Click New Dashboard Add a widget Select your saved Hive query Choose: Bar Chart → Employee count by department Pie Chart → Salary distribution Save dashboard. Now you have visual report.

✅ OPTIONAL — Using Impala (Faster Queries)

In Hue: Query Editors → Impala Run same queries (faster performance).

Start MongoDB Server

Open Mongo Shell (New Terminal)

COUNT Function

SORT Function

LIMIT Function

SKIP Function

AGGREGATE Function (Important)

🔹 Total number of students per course

🔹 Average marks per course

🔹 Maximum marks per course

🔹 Minimum marks per course

🔹 Sort after aggregation

STOP MONGO DB

Monday, 2 March 2026

Step 1: Create Sample Delimited Text File

📄 Create a text file in Linux

Add sample data (Comma Delimited)

✅ Step 2: Move File to HDFS

Create HDFS directory

Upload file to HDFS

Verify

✅ Step 3: Start Hive

✅ Step 4: Create Hive Database (Optional)

✅ Step 5: Create Hive Table (Row Formatted Delimited)

✅ Step 6: Load Data from HDFS into Hive Table

✅ Step 7: Verify Data

Hadoop Analytics

Create emp1.csv:

vi emp1.csv

Paste:

employee_id,name,department,salary
1,John Doe,Engineering,70000
2,Jane Smith,Marketing,60000
3,Jim Brown,Engineering,80000
4,Jake White,Sales,50000
5,Emily Davis,Marketing,75000

Save and exit.

hadoop fs -mkdir -p /user/hue
hadoop fs -put emp1.csv /user/hue/
hadoop fs -ls /user/hue

You should see emp1.csv listed.

Open Hue

Go to File Browser

Click Upload

Select employees.csv

Upload to:

/user/hue/

Go to:

Hue → Query Editors → Hive

CREATE TABLE emp1 (
employee_id INT,
name STRING,
department STRING,
salary FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Then load data:

LOAD DATA INPATH '/user/hue/emp1.csv'
INTO TABLE emp1;

Run:

SELECT * FROM emp1;

You should see 5 rows.

SELECT department, COUNT(*) AS employee_count
FROM emp1
GROUP BY department;

SELECT department, AVG(salary) AS average_salary
FROM emp1
GROUP BY department;

SELECT name, salary
FROM emp1
WHERE salary > 60000;

After running query in Hue:

Results appear below

Click Download

Export as:

CSV

Excel

This becomes your report.

Go to Dashboards

Click New Dashboard

Add a widget

Select your saved Hive query

Choose:

Bar Chart → Employee count by department

Pie Chart → Salary distribution

Save dashboard.

Now you have visual report.

In Hue:

Query Editors → Impala

Run same queries (faster performance).