Prerequisites
- Ubuntu 20.04 or later.
- JDK 8 or newer installed (since Hadoop requires Java).
- Sudo privileges.
Step 1: Install Java Development Kit (JDK)
Hadoop requires Java, so you'll need to install it first.
-
Update your system package index:
-
Install OpenJDK 8 (or any other version that is supported):
-
Verify the installation:
Step 2: Install Hadoop (Supported version: 3.3.x)
Hadoop is the first tool you need to install.
-
Download Hadoop: Go to the Hadoop releases page and get the latest stable version (e.g., 3.3.x). At the time of writing, Hadoop 3.3.1 is stable.
-
Extract the downloaded tar.gz file:
-
Move Hadoop to
/usr/local/
directory: -
Set Hadoop Environment Variables: Open your
.bashrc
file to add environment variables for Hadoop:
Add the following lines at the end of the file:
Save and close the file. Then, apply the changes:
-
Configure Hadoop: Hadoop needs to be configured with some basic settings.
Edit the following files in
/usr/local/hadoop/etc/hadoop/
:-
hadoop-env.sh: Set Java home.
Add the line:
-
core-site.xml: Configure the core site to set the default filesystem.
Add the following content:
-
hdfs-site.xml: Set the replication factor and directories for Hadoop.
Add the following content:
-
mapred-site.xml: Set the job tracker URI.
Add:
-
yarn-site.xml: Set the YARN Resource Manager URI.
Add:
-
Step 3: Format the HDFS Filesystem
Before starting Hadoop, format the HDFS filesystem:
Step 4: Start Hadoop Services
Start HDFS and YARN services:
Step 5: Install Hive (Supported Version: 3.1.2)
-
Download Hive: Go to the Hive releases page and download the latest stable version (e.g., 3.1.2).
-
Extract and move Hive to the appropriate directory:
-
Set Hive Environment Variables: Open your
.bashrc
file to set the Hive environment variables:Add the following:
Apply the changes:
-
Configure Hive: Configure Hive by editing the
hive-site.xml
file.Add the following configuration:
-
Start Hive: You can now start Hive using the following command:
Step 6: Install Apache Pig (Supported Version: 0.17.0)
-
Download Apache Pig: Go to the Pig releases page and download the latest stable version (e.g., 0.17.0).
-
Extract and move Pig:
-
Set Pig Environment Variables: Open your
.bashrc
file to set the Pig environment variables:Add:
Apply the changes:
-
Start Apache Pig: You can now start Apache Pig using the following command:
Conclusion
You’ve successfully installed Hadoop, Hive, and Pig on Ubuntu. To summarize:
- Install Java (JDK 8 or newer).
- Install Hadoop (3.3.x).
- Configure Hadoop's environment variables and HDFS/YARN settings.
- Install Hive (3.1.2) and configure the
hive-site.xml
file. - Install Apache Pig (0.17.0) and configure the environment.
No comments:
Post a Comment