Hadoop learning pot: HADOOP MANUAL INSTALLATION IN UBUNTU

Prerequisites

Ubuntu 20.04 or later.
JDK 8 or newer installed (since Hadoop requires Java).
Sudo privileges.

Step 1: Install Java Development Kit (JDK)

Hadoop requires Java, so you'll need to install it first.

Update your system package index:
```
bash
sudo apt update
```
Install OpenJDK 8 (or any other version that is supported):
```
bash
sudo apt install openjdk-8-jdk
```
Verify the installation:
```
bash
java -version
```

Step 2: Install Hadoop (Supported version: 3.3.x)

Hadoop is the first tool you need to install.

Download Hadoop: Go to the Hadoop releases page and get the latest stable version (e.g., 3.3.x). At the time of writing, Hadoop 3.3.1 is stable.
```
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
```
Extract the downloaded tar.gz file:
```
tar -xvzf hadoop-3.3.1.tar.gz
```
Move Hadoop to /usr/local/ directory:
```
sudo mv hadoop-3.3.1 /usr/local/hadoop
```
Set Hadoop Environment Variables: Open your .bashrc file to add environment variables for Hadoop:
```
nano ~/.bashrc
```

Add the following lines at the end of the file:

bash
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME

Save and close the file. Then, apply the changes:

bash
source ~/.bashrc

Configure Hadoop: Hadoop needs to be configured with some basic settings.

Edit the following files in /usr/local/hadoop/etc/hadoop/:

hadoop-env.sh: Set Java home.

bash
nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Add the line:

bash
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

core-site.xml: Configure the core site to set the default filesystem.

bash
nano /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following content:

xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml: Set the replication factor and directories for Hadoop.

bash
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following content:

xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/hdfs/data</value>
    </property>
</configuration>

mapred-site.xml: Set the job tracker URI.

bash
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

Add:

xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml: Set the YARN Resource Manager URI.

bash
nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

Add:

xml
<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>
</configuration>

Step 3: Format the HDFS Filesystem

Before starting Hadoop, format the HDFS filesystem:

bash
hdfs namenode -format

Step 4: Start Hadoop Services

Start HDFS and YARN services:

bash
start-dfs.sh
start-yarn.sh

Step 5: Install Hive (Supported Version: 3.1.2)

Download Hive: Go to the Hive releases page and download the latest stable version (e.g., 3.1.2).
```
bash
wget https://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
```

Extract and move Hive to the appropriate directory:

bash
tar -xvzf apache-hive-3.1.2-bin.tar.gz
sudo mv apache-hive-3.1.2-bin /usr/local/hive

Set Hive Environment Variables: Open your .bashrc file to set the Hive environment variables:

bash
nano ~/.bashrc

Add the following:

bash
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export HIVE_CONF_DIR=$HIVE_HOME/conf

Apply the changes:

bash
source ~/.bashrc

Configure Hive: Configure Hive by editing the hive-site.xml file.

bash
nano /usr/local/hive/conf/hive-site.xml

Add the following configuration:

xml
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:derby:;databaseName=/usr/local/hive/metastore_db;create=true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    </property>
</configuration>

Start Hive: You can now start Hive using the following command:
```
bash
hive
```

Step 6: Install Apache Pig (Supported Version: 0.17.0)

Download Apache Pig: Go to the Pig releases page and download the latest stable version (e.g., 0.17.0).
```
bash
wget https://downloads.apache.org/pig/pig-0.17.0/apache-pig-0.17.0.tar.gz
```

Extract and move Pig:

bash
tar -xvzf apache-pig-0.17.0.tar.gz
sudo mv apache-pig-0.17.0 /usr/local/pig

Set Pig Environment Variables: Open your .bashrc file to set the Pig environment variables:

bash
nano ~/.bashrc

Add:

bash
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin

Apply the changes:

bash
source ~/.bashrc

Start Apache Pig: You can now start Apache Pig using the following command:
```
bash
pig
```

Conclusion

You’ve successfully installed Hadoop, Hive, and Pig on Ubuntu. To summarize:

Install Java (JDK 8 or newer).
Install Hadoop (3.3.x).
Configure Hadoop's environment variables and HDFS/YARN settings.
Install Hive (3.1.2) and configure the hive-site.xml file.
Install Apache Pig (0.17.0) and configure the environment.

Hadoop learning pot

Search This Blog

Tuesday, 18 March 2025

HADOOP MANUAL INSTALLATION IN UBUNTU

Prerequisites

Step 1: Install Java Development Kit (JDK)

Step 2: Install Hadoop (Supported version: 3.3.x)

Step 3: Format the HDFS Filesystem

Step 4: Start Hadoop Services

Step 5: Install Hive (Supported Version: 3.1.2)

Step 6: Install Apache Pig (Supported Version: 0.17.0)

Conclusion

No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

Search This Blog