Search This Blog

Sunday, 23 March 2025

mongodb install centos

 

Step-by-Step Guide

1. Add MongoDB Repository to CentOS

MongoDB is not included in the default CentOS repositories, so you'll need to add the MongoDB repository.

  1. Create a .repo file for MongoDB in /etc/yum.repos.d/:

    bash
    sudo nano /etc/yum.repos.d/mongodb-org-5.0.repo
  2. Add the following repository configuration (MongoDB 5.0 is the latest version at the time):

    bash
    [mongodb-org-5.0] name=MongoDB 5.0 Repository baseurl=https://repo.mongodb.org/yum/centos/7/mongodb-org/5.0/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-5.0.asc

    Save the file and exit (Ctrl + X, then press Y and Enter to save).

2. Install MongoDB

Once the repository is configured, install MongoDB:

bash
sudo yum install -y mongodb-org

This will install MongoDB 5.0 along with all necessary dependencies.

3. Start and Enable MongoDB

Once MongoDB is installed, you need to start the MongoDB service and enable it to start on boot.

bash
# Start MongoDB service sudo systemctl start mongod # Enable MongoDB to start on boot sudo systemctl enable mongod

4. Check MongoDB Service Status

Ensure that MongoDB is running by checking its status:

bash
sudo systemctl status mongod

If it's running, you should see an output indicating that MongoDB is active (running).

5. Configure MongoDB to Allow Remote Connections (Optional)

By default, MongoDB binds to the local interface (127.0.0.1). To allow remote connections (if needed), you’ll have to edit the MongoDB configuration file.

  1. Edit the configuration file (mongod.conf):

    bash
    sudo nano /etc/mongod.conf
  2. Find the bindIp setting under net and change it to allow remote connections. Modify it like this:

    yaml
    net: bindIp: 0.0.0.0 # Listen on all IP addresses (for remote connections)

    Alternatively, you can specify specific IP addresses to allow remote access.

  3. Save and exit (Ctrl + X, then press Y and Enter).

  4. Restart the MongoDB service for the changes to take effect:

    bash
    sudo systemctl restart mongod

6. Check MongoDB Logs (Optional)

If you need to troubleshoot any issues, MongoDB logs are typically located in /var/log/mongodb/mongod.log.

To check the logs:

bash
tail -f /var/log/mongodb/mongod.log

7. Verify MongoDB Installation

Once MongoDB is running, you can connect to it using the mongo shell:

bash
mongo

This command should bring up the MongoDB shell, and you can start executing commands like:

bash
show dbs;

This will list the available databases.

8. Security Configuration (Optional)

For production systems, MongoDB should be configured with proper security settings (authentication, user roles, etc.).

  1. To enable authentication, add the following to your mongod.conf:

    yaml
    security: authorization: enabled
  2. Restart MongoDB after modifying the configuration:

    bash
    sudo systemctl restart mongod
  3. Create an admin user for MongoDB:

    bash
    mongo

    Then run:

    javascript
    use admin db.createUser({ user: "admin", pwd: "password", // Change this to a secure password roles: [ { role: "root", db: "admin" } ] })
  4. After this, you’ll need to authenticate when connecting to MongoDB:

    bash
    mongo -u admin -p --authenticationDatabase admin

9. Integrating MongoDB with Cloudera (Optional)

If you're looking to integrate MongoDB into a Cloudera Hadoop ecosystem, you typically use tools such as MongoDB Connector for Hadoop to allow seamless data transfer between MongoDB and Hadoop. This step is more complex and would require additional setup.

You can download the MongoDB Connector for Hadoop from the MongoDB website, configure it with your Cloudera cluster, and use it to transfer data between MongoDB and Hadoop.

Conclusion

By following these steps, you should have MongoDB successfully installed and running on your CentOS system, whether it's part of a Cloudera cluster or standalone. If you're planning to integrate MongoDB into a larger Hadoop ecosystem, consider using the MongoDB Connector for Hadoop to facilitate the integration.

Let me know if you encounter any issues or need further assistance!

You said:
mongodb connector for hadoop
ChatGPT said:

The MongoDB Connector for Hadoop allows integration between MongoDB and Hadoop ecosystems. This connector facilitates the ability to read and write data between MongoDB and Hadoop (including HDFS, Hive, and other Hadoop tools). It's commonly used to transfer data into or out of MongoDB, perform analytics, and work with big data in Hadoop environments.

Steps to Set Up MongoDB Connector for Hadoop

Here are the general steps to install and configure the MongoDB Connector for Hadoop:

1. Prerequisites

Before proceeding, ensure you have the following:

  • MongoDB installed and running (either standalone or part of a cluster).

  • Hadoop cluster set up, including HDFS and any other tools you wish to integrate (e.g., Hive, Spark).

  • Java installed (as the connector requires Java to run).

2. Download MongoDB Connector for Hadoop

The MongoDB Connector for Hadoop is available for download on the MongoDB website. The latest version of the connector is compatible with Hadoop 2.x and 3.x.

You can download it from here: MongoDB Connector for Hadoop.

Alternatively, you can download the .tar.gz or .zip file for your system.

3. Install MongoDB Connector for Hadoop

Once you have the connector downloaded, you need to install it. Follow these steps:

  1. Download the MongoDB Connector for Hadoop: Navigate to the directory where you downloaded the file and unzip or untar it. For example:

    bash
    tar -xzvf mongodb-hadoop-connector-<version>.tar.gz

    Or, if you downloaded a .zip file:

    bash
    unzip mongodb-hadoop-connector-<version>.zip
  2. Move the Connector Files to Your Hadoop Directory: After extracting the connector, you will have several .jar files. Move the relevant .jar files into your Hadoop's lib directory or include them in your classpath.

    For example:

    bash
    cp mongodb-hadoop-connector-<version>/lib/* /usr/lib/hadoop/lib/

    Alternatively, if you're using Cloudera, you can copy the JAR files to the following directory:

    bash
    cp mongodb-hadoop-connector-<version>/lib/* /opt/cloudera/parcels/CDH/lib/hadoop/

    Ensure the JAR files are added to the classpath of your Hadoop cluster.

4. Configure MongoDB Connector for Hadoop

MongoDB Connector for Hadoop requires configuration, primarily through configuration files or command-line options.

Here are the main configuration files and properties you'll need to set:

  • MongoDB Input Format (for reading data from MongoDB into Hadoop)

  • MongoDB Output Format (for writing data from Hadoop to MongoDB)

Example Configuration for MongoDB Input (Reading Data)

The MongoDB Input Format is used when reading data from MongoDB to Hadoop, for example, in a MapReduce job or when using Spark.

xml
<configuration> <property> <name>mapreduce.input.format.class</name> <value>com.mongodb.hadoop.mapreduce.MongoInputFormat</value> </property> <property> <name>mongo.input.uri</name> <value>mongodb://localhost:27017/mydatabase.mycollection</value> </property> <!-- Optional: Specify the query filter --> <property> <name>mongo.input.query</name> <value>{}</value> <!-- Example: Select all records --> </property> </configuration>

Example Configuration for MongoDB Output (Writing Data)

The MongoDB Output Format is used when writing data from Hadoop into MongoDB.

xml
<configuration> <property> <name>mapreduce.output.format.class</name> <value>com.mongodb.hadoop.mapreduce.MongoOutputFormat</value> </property> <property> <name>mongo.output.uri</name> <value>mongodb://localhost:27017/mydatabase.mycollection</value> </property> </configuration>

5. Configure MongoDB URI

In the mongo.input.uri and mongo.output.uri, replace localhost with the appropriate hostname or IP address of your MongoDB server. If you're using MongoDB replication or a cluster, you would use a connection string that includes multiple nodes, such as:

bash
mongodb://node1:27017,node2:27017,node3:27017/mydatabase.mycollection

6. Running MongoDB Connector with Hadoop

You can now run a Hadoop job using the MongoDB Connector. Here's an example of how to run a MapReduce job with the MongoDB Input and Output Format.

Example: Running a MapReduce Job

bash
hadoop jar mongodb-hadoop-connector-<version>/lib/mongo-hadoop.jar \ -Dmapreduce.input.format.class=com.mongodb.hadoop.mapreduce.MongoInputFormat \ -Dmongo.input.uri=mongodb://localhost:27017/mydatabase.mycollection \ -Dmapreduce.output.format.class=com.mongodb.hadoop.mapreduce.MongoOutputFormat \ -Dmongo.output.uri=mongodb://localhost:27017/mydatabase.outputcollection \ <your_mapreduce_class> /input /output

In this command:

  • <your_mapreduce_class> should be replaced with the actual class containing your MapReduce logic.

  • /input and /output are the directories in HDFS for input and output data.

7. Use MongoDB with Hive or Spark (Optional)

MongoDB Connector for Hadoop also supports integration with Hive and Spark. Below is an example of how you might use it with Hive.

Hive Integration Example

  1. Create the External Table in Hive to read from MongoDB:

sql
CREATE EXTERNAL TABLE my_table ( id INT, name STRING ) STORED BY 'org.apache.hadoop.hive.mongodb.MongoStorageHandler' WITH SERDEPROPERTIES ( "mongo.input.uri" = "mongodb://localhost:27017/mydatabase.mycollection" );
  1. Query the Data in Hive:

sql
SELECT * FROM my_table;

Spark Integration Example

You can read and write data from MongoDB in Spark using the following code:

scala
val mongoUri = "mongodb://localhost:27017/mydatabase.mycollection" val df = spark.read.format("mongo").option("uri", mongoUri).load() df.show()

8. Security and Authentication (Optional)

If your MongoDB instance has authentication enabled, you'll need to pass authentication credentials in the connection URI.

bash
mongodb://username:password@localhost:27017/mydatabase.mycollection

Make sure the user has the appropriate permissions to read from or write to the MongoDB database.

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

  1. What is NLP? NLP is a field of artificial intelligence (AI) that focuses on the interaction between computers and human languages. Its...