Search This Blog

Sunday, 23 March 2025

mongo bda lab

 

Step 1: Create Sample Data in MongoDB

We'll first create a sample collection named people that contains information like name and age. You can run the following commands to insert the sample data into MongoDB.

Sample Data:

json
[ { "name": "Alice", "age": 30 }, { "name": "Bob", "age": 25 }, { "name": "Charlie", "age": 30 }, { "name": "David", "age": 25 }, { "name": "Eva", "age": 20 }, { "name": "Frank", "age": 35 }, { "name": "Grace", "age": 30 }, { "name": "Hannah", "age": 40 }, { "name": "Ivy", "age": 25 }, { "name": "Jack", "age": 20 } ]

You can insert this data using the MongoDB shell or MongoDB client.

Using MongoDB Shell:

  1. Connect to your MongoDB instance:

bash
mongo
  1. Switch to the database where you want to create the collection (for example, test_db):

javascript
use test_db;
  1. Insert the sample data into the people collection:

javascript
db.people.insertMany([ { "name": "Alice", "age": 30 }, { "name": "Bob", "age": 25 }, { "name": "Charlie", "age": 30 }, { "name": "David", "age": 25 }, { "name": "Eva", "age": 20 }, { "name": "Frank", "age": 35 }, { "name": "Grace", "age": 30 }, { "name": "Hannah", "age": 40 }, { "name": "Ivy", "age": 25 }, { "name": "Jack", "age": 20 } ]);

Now we have 10 documents in the people collection.

Step 2: Count Sort with Limit and Skip

Now, let's implement the Count Sort operation by counting the occurrences of each age, sorting them by the count, and using limit and skip for pagination.

MongoDB Aggregation Pipeline:

  1. Group by age and count occurrences.

  2. Sort the ages by their counts.

  3. Apply skip and limit for pagination.

MongoDB Aggregation Query:

javascript
db.people.aggregate([ // Step 1: Group by 'age' and count occurrences { $group: { _id: "$age", // Group by age count: { $sum: 1 } // Count the occurrences of each age } }, // Step 2: Sort by count in descending order (most frequent ages first) { $sort: { count: -1 } }, // Step 3: Apply skip and limit for pagination { $skip: 2 // Skip the first 2 records }, { $limit: 3 // Limit to 3 records after skip }, // Step 4: Optional - Join back with original 'people' collection to get details { $lookup: { from: "people", // The collection to join localField: "_id", // The field from the previous stage to match foreignField: "age", // The field from the original collection to match as: "person_details" // The new field containing matched documents } }, // Step 5: Project the desired fields { $project: { age: "$_id", // Include the 'age' count: 1, // Include the count of occurrences person_details: 1 // Include the matched person details } } ])

Explanation:

  1. $group: Groups by the age field and counts how many times each age appears.

  2. $sort: Sorts by the count field in descending order to show the most frequent ages first.

  3. $skip: Skips the first 2 records.

  4. $limit: Limits the results to 3 records after skipping.

  5. $lookup: Joins the result back with the original people collection to get detailed information about each person who has that age.

  6. $project: Projects only the necessary fields, like age, count, and person_details.

Step 3: Run the Query in MongoDB

In MongoDB shell, execute the above aggregation query to see the result.

javascript
db.people.aggregate([ { $group: { _id: "$age", count: { $sum: 1 } } }, { $sort: { count: -1 } }, { $skip: 2 }, { $limit: 3 }, { $lookup: { from: "people", localField: "_id", foreignField: "age", as: "person_details" } }, { $project: { age: "$_id", count: 1, person_details: 1 } } ])

Expected Output:

The result will show the top 3 ages (after skipping 2 records) along with the count of occurrences and the corresponding person details. Here's a possible output based on the sample data:

json
[ { "age": 25, "count": 3, "person_details": [ { "_id": ObjectId("..."), "name": "Bob", "age": 25 }, { "_id": ObjectId("..."), "name": "David", "age": 25 }, { "_id": ObjectId("..."), "name": "Ivy", "age": 25 } ] }, { "age": 30, "count": 3, "person_details": [ { "_id": ObjectId("..."), "name": "Alice", "age": 30 }, { "_id": ObjectId("..."), "name": "Charlie", "age": 30 }, { "_id": ObjectId("..."), "name": "Grace", "age": 30 } ] }, { "age": 20, "count": 3, "person_details": [ { "_id": ObjectId("..."), "name": "Eva", "age": 20 }, { "_id": ObjectId("..."), "name": "Jack", "age": 20 } ] } ]

Step 4: Testing with Pagination (Optional)

You can adjust the values of skip and limit to test pagination. For example:

  • Skip the first 0 records and limit to 2.

  • Skip the first 3 records and limit to 2

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

  1. What is NLP? NLP is a field of artificial intelligence (AI) that focuses on the interaction between computers and human languages. Its...