Search This Blog

Tuesday, 3 March 2026

AI &DS MONGO DB EXPERIMENT IN CLOUD ERA

sudo rm -rf /usr/local/mongodb

wget http://fastdl.mongodb.org/linux/mongodb-linux-i686-2.6.12.tgz

tar -xvzf mongodb-linux-i686-2.6.12.tgz

sudo mv mongodb-linux-i686-2.6.12 /usr/local/mongodb


file /usr/local/mongodb/bin/mongo

Start MongoDB Server

First create data directory (only first time):

sudo mkdir -p /data/db
sudo chmod -R 777 /data



/usr/local/mongodb/bin/mongod --dbpath /data/db


You will see logs like:

waiting for connections on port 27017


Open Mongo Shell (New Terminal)

Open a new terminal window and run:

/usr/local/mongodb/bin/mongo

You will see:

>
> show dbs

>use trainingdb

db.students.insert([

 {name:"Ravi", age:22, course:"Hadoop", marks:75},

 {name:"Anita", age:21, course:"MongoDB", marks:85},

 {name:"Suresh", age:23, course:"Hadoop", marks:65},

 {name:"Priya", age:22, course:"Spark", marks:90},

 {name:"Kiran", age:24, course:"MongoDB", marks:70},

 {name:"Meena", age:21, course:"Hadoop", marks:88},

 {name:"Arun", age:23, course:"Spark", marks:60}

])


>

COUNT Function

Count total documents:

db.students.count()

Count with condition:

db.students.count({course:"Hadoop"})

 SORT Function

Sort by marks ascending:

db.students.find().sort({marks:1})

Sort by marks descending:

db.students.find().sort({marks:-1})

Sort by age:

db.students.find().sort({age:1})

 LIMIT Function

Top 3 students:

db.students.find().limit(3)

Top 3 highest marks:

db.students.find().sort({marks:-1}).limit(3)

 SKIP Function

Skip first 2 records:

db.students.find().skip(2)

Pagination example (skip 2, show next 3):

db.students.find().skip(2).limit(3)

 AGGREGATE Function (Important)

Aggregation framework example.


🔹 Total number of students per course

db.students.aggregate([
{ $group: { _id: "$course", totalStudents: { $sum: 1 } } }
])

🔹 Average marks per course

db.students.aggregate([
{ $group: { _id: "$course", avgMarks: { $avg: "$marks" } } }
])

🔹 Maximum marks per course

db.students.aggregate([
{ $group: { _id: "$course", maxMarks: { $max: "$marks" } } }
])

🔹 Minimum marks per course

db.students.aggregate([
{ $group: { _id: "$course", minMarks: { $min: "$marks" } } }
])

🔹 Sort after aggregation

db.students.aggregate([
{ $group: { _id: "$course", avgMarks: { $avg: "$marks" } } },
{ $sort: { avgMarks: -1 } }
])

STOP MONGO DB

If  you started MongoDB like this:

/usr/local/mongodb/bin/mongod --dbpath /data/db

Then simply go to that terminal and press:

CTRL + C


AI & DS PIG EXPERIMENT

 first create these files in your terminal using cat command


cat > employees.csv

     1, John Doe, 101, 50000

     2, Jane Smith, 102, 60000

     3, Jim Brown, 101, 55000

     4, Jake White, 103, 70000

 

    cat > departments.csv

     101, HR

     102, Engineering

     103, Marketing



move to hadoop directory

>hadoop fs -put employees.csv emp.csv

>hadoop fs -put departments.csv dept.csv






> vi pig1.pig


employees = LOAD 'emp.csv' USING PigStorage(',') AS (employee_id:int, name:chararray,

     department_id:int, salary:float);

dump employees;


> pig pig1.pig


> vi pig2.pig


departments = LOAD 'dept.csv'        USING    PigStorage(',')   AS   (department_id:int,

     department_name:chararray);

dump departments;


>pig pig2.pig



>vi pig3.pig

    employees = LOAD 'emp.csv' USING PigStorage(',') AS (employee_id:int, name:chararray,

     department_id:int, salary:float);

     high_salary_employees = FILTER employees BY salary > 55000;

     dump high_salary_employees;


>pig pig3.pig

TRY these scripts 

-- Project only the name and salary of high salary employees

     projected_employees = FOREACH high_salary_employees GENERATE name, salary;

 

     -- Group employees by department_id

     grouped_employees = GROUP employees BY department_id;

 

     -- Join employees with departments to get department names

     joined_data = JOIN employees BY department_id, departments BY department_id;

 

     -- Sort employees by salary in descending order

     sorted_employees = ORDER employees BY salary DESC;

 

     -- Store the results of the sorted employees

     STORE sorted_employees INTO 'sorted_employees_output' USING PigStorage(',');


AI & DS HIVE EXPERIMENT

 open terminal


>hive


>CREATE DATABASE IF NOT EXISTS company_db;


USE company_db;

 

        


open new terminal


> vi  hivescript1.ql


       use company_db;

        CREATE TABLE IF NOT EXISTS employees (

           employee_id INT,

           name STRING,

           department_id INT,

           salary FLOAT

        ) ROW FORMAT DELIMITED

        FIELDS TERMINATED BY  ','

        STORED AS TEXTFILE;

 

>  hive -S -f hivescript1.ql


> vi  hivescript2.ql


     use company_db;

        CREATE TABLE IF NOT EXISTS departments (

           department_id INT,

           department_name STRING

        ) ROW FORMAT DELIMITED

        FIELDS TERMINATED BY   ','

        STORED AS TEXTFILE;


>  hive -S -f hivescript2.ql


cat > emp.txt

1,John,101,50000
2,Anita,101,45000
3,Ravi,102,65000
4,Meena,103,55000
5,Kiran,104,40000


cat>department.txt

101,hr
102,mark
103,fin


go to hive


hive> load data local inpath 'emp.txt' into table employees;

hive> load data local inpath 'department.txt' into table departments;


-- Add a new column to the employees table

        ALTER TABLE employees ADD COLUMNS (hire_date DATE);

 

        -- Create a view to show high salary employees

        CREATE VIEW IF NOT EXISTS high_salary_employees AS

        SELECT name, salary

        FROM employees

        WHERE salary > 60000;

 

        -- Create an index on the employees table for the department_id column

        CREATE INDEX idx_department_id

        ON TABLE employees (department_id)

  AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

     WITH DEFERRED REBUILD;

 

     -- Drop the view

     DROP VIEW IF EXISTS high_salary_employees;

 

     -- Drop the employees table

     DROP TABLE IF EXISTS employees;

 

     -- Drop the departments table

     DROP TABLE IF EXISTS departments;

 

     -- Drop the index

     DROP INDEX IF EXISTS idx_department_id ON TABLE employees;

 

     -- Drop the database

     DROP DATABASE IF EXISTS company_db CASCADE;


Monday, 2 March 2026

load sample data in to hive

 

Step 1: Create Sample Delimited Text File

We’ll create a simple employee dataset.

📄 Create a text file in Linux

vi employee.txt

Add sample data (Comma Delimited)

1,John,IT,50000
2,Anita,HR,45000
3,Ravi,Finance,60000
4,Meena,IT,55000
5,Kiran,Sales,40000

Save and exit.


✅ Step 2: Move File to HDFS

Create HDFS directory

hadoop fs -mkdir -p /user/hive/employee_data

Upload file to HDFS

hadoop fs -put employee.txt /user/hive/employee_data/

Verify

hadoop fs -ls /user/hive/employee_data

✅ Step 3: Start Hive

hive

✅ Step 4: Create Hive Database (Optional)

CREATE DATABASE company;
USE company;

✅ Step 5: Create Hive Table (Row Formatted Delimited)

CREATE TABLE employee (
id INT,
name STRING,
department STRING,
salary INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

✅ Step 6: Load Data from HDFS into Hive Table

LOAD DATA INPATH '/user/hive/employee_data/employee.txt'
INTO TABLE employee;

✅ Step 7: Verify Data

SELECT * FROM employee;

hive lab manual

 

Use Hive to create, alter, and drop databases, tables, views, functions, and

    indexes.

 

        -- Create a new database

        CREATE DATABASE IF NOT EXISTS company_db;

 

        -- Switch to the newly created database

        USE company_db;

 

        -- Create an employees table

        CREATE TABLE IF NOT EXISTS employees (

           employee_id INT,

           name STRING,

           department_id INT,

           salary FLOAT

        ) ROW FORMAT DELIMITED

        FIELDS TERMINATED BY ','

        STORED AS TEXTFILE;

 

        -- Create a departments table

        CREATE TABLE IF NOT EXISTS departments (

           department_id INT,

           department_name STRING

        ) ROW FORMAT DELIMITED

        FIELDS TERMINATED BY ','

        STORED AS TEXTFILE;

 

        -- Add a new column to the employees table

        ALTER TABLE employees ADD COLUMNS (hire_date DATE);

 

        -- Create a view to show high salary employees

        CREATE VIEW IF NOT EXISTS high_salary_employees AS

        SELECT name, salary

        FROM employees

        WHERE salary > 60000;

 

        -- Create an index on the employees table for the department_id column

        CREATE INDEX idx_department_id

        ON TABLE employees (department_id)

  AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

     WITH DEFERRED REBUILD;

 

     -- Drop the view

     DROP VIEW IF EXISTS high_salary_employees;

 

     -- Drop the employees table

     DROP TABLE IF EXISTS employees;

 

     -- Drop the departments table

     DROP TABLE IF EXISTS departments;

 

     -- Drop the index

     DROP INDEX IF EXISTS idx_department_id ON TABLE employees;

 

     -- Drop the database

     DROP DATABASE IF EXISTS company_db CASCADE;

 

     Notes

    • Make sure to replace the paths and class names with the actual paths and class names for your

       UDFs.

    • The CASCADE option in the DROP DATABASE command will drop all tables and views in the

       database.

    • You can run these commands in a Hive shell or through a Hive client that supports HiveQL.

 

 

 

check vmware keys

 

VMware Workstation Pro 16.x Serials
YA7RA-F6Y46-H889Z-LZMXZ-WF8UA
ZV7HR-4YX17-M80EP-JDMQG-PF0RF
UC3XK-8DD1J-089NP-MYPXT-QGU80
GV100-84W16-M85JP-WXZ7E-ZP2R6
YF5X2-8MW91-4888Y-DNWGC-W68TF
AY1XK-0NG5P-0855Y-K6ZXG-YK0T4
VMware Workstation Player 16.x Serials
FC3D0-FTFE5-H81WQ-RNWZX-PV894
AU3TA-8VFDP-08DUZ-VMM7X-YK8GF
ZF3XK-22F5K-M88AQ-ZMWQV-QCRGA
FF718-6JDEK-M8DRQ-FNWEX-QG2X4
ZA3DA-43Z9J-089TQ-36N5V-NLHEF
YY19A-6TX94-H88KQ-4NNXG-XAUF6

Hadoop Analytics

AI &DS MONGO DB EXPERIMENT IN CLOUD ERA

sudo rm -rf /usr/local/mongodb wget http://fastdl.mongodb.org/linux/mongodb-linux-i686-2.6.12.tgz tar -xvzf mongodb-linux-i686-2.6.12.tgz su...