Hadoop Setup Tutorial - Installation & Configuration

Prerequisites:
You must have Ubuntu installed and running
You must have Java Installed.
Step 1) Add a Hadoop system user using below command
sudo addgroup hadoop_
Hadoop Setup Tutorial - Installation & Configuration
sudo adduser --ingroup hadoop_ hduser_
Hadoop Setup Tutorial - Installation & Configuration
Enter your password , name and other details.
NOTE:
There is a possibility of below mentioned error in this setup and installation process.
"hduser is not in the sudoers file. This incident will be reported."
Hadoop Setup Tutorial - Installation & Configuration
This error can be resolved by
Login as a root user
Hadoop Setup Tutorial - Installation & Configuration
Execute the command
sudo adduser hduser_ sudo
Hadoop Setup Tutorial - Installation & Configuration
Re-login as hduser_
Hadoop Setup Tutorial - Installation & Configuration
Step 2) . Configure SSH
In order to manage nodes in a cluster, Hadoop require SSH access
First, switch user, enter following command
su - hduser_
Hadoop Setup Tutorial - Installation & Configuration
This command will create a new key.
ssh-keygen -t rsa -P ""
Hadoop Setup Tutorial - Installation & Configuration
Enable SSH access to local machine using this key.
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Hadoop Setup Tutorial - Installation & Configuration
Now test SSH setup by connecting to locahost as 'hduser' user.
ssh localhost
Hadoop Setup Tutorial - Installation & Configuration
Note:
Please note, if you see below error in response to 'ssh localhost', then there is a possibility that SSH is not available on this system-
Hadoop Setup Tutorial - Installation & Configuration
To resolve this -
Purge SSH using,
sudo apt-get purge openssh-server
It is good practice to purge before start of installation
Hadoop Setup Tutorial - Installation & Configuration
Install SSH using command-
sudo apt-get install openssh-server
Hadoop Setup Tutorial - Installation & Configuration
Step 3) Next step is to Download Hadoop
Hadoop Setup Tutorial - Installation & Configuration
Select Stable
Hadoop Setup Tutorial - Installation & Configuration
Select the tar.gz file ( not the file with src)
Hadoop Setup Tutorial - Installation & Configuration
Once download is complete, navigate to the directory containing the tar file
Hadoop Setup Tutorial - Installation & Configuration
Enter , sudo tar xzf hadoop-2.2.0.tar.gz
Hadoop Setup Tutorial - Installation & Configuration
Now, rename rename hadoop-2.2.0 as hadoop
sudo mv hadoop-2.2.0 hadoop
Hadoop Setup Tutorial - Installation & Configuration
sudo chown -R hduser_:hadoop_ hadoop
Hadoop Setup Tutorial - Installation & Configuration
Step 4) Modify ~/.bashrc file
Add following lines to end of file ~/.bashrc
#Set HADOOP_HOME
export HADOOP_HOME=<Installation Directory of Hadoop>
#Set JAVA_HOME
export JAVA_HOME=<Installation Directory of Java>
# Add bin/ directory of Hadoop to PATH
export PATH=$PATH:$HADOOP_HOME/bin
Hadoop Setup Tutorial - Installation & Configuration

Now, source this environment configuration using below command
. ~/.bashrc
Hadoop Setup Tutorial - Installation & Configuration
Step 5) Configurations related to HDFS
Set JAVA_HOME inside file $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Hadoop Setup Tutorial - Installation & Configuration
Hadoop Setup Tutorial - Installation & Configuration
With
Hadoop Setup Tutorial - Installation & Configuration
There are two parameters in $HADOOP_HOME/etc/hadoop/core-site.xml which need to be set-
1. 'hadoop.tmp.dir' - Used to specify directory which will be used by Hadoop to store its data files.
2. 'fs.default.name' - This specifies the default file system.
To set these parameters, open core-site.xml
sudo gedit $HADOOP_HOME/etc/hadoop/core-site.xml
Hadoop Setup Tutorial - Installation & Configuration
Copy below line in between tags <configuration></configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Parent directory for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS </name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. </description>
</property>
Hadoop Setup Tutorial - Installation & Configuration
Navigate to the directory $HADOOP_HOME/etc/Hadoop
Hadoop Setup Tutorial - Installation & Configuration
Now, create the directory mentioned in core-site.xml
sudo mkdir -p <Path of Directory used in above setting>
Hadoop Setup Tutorial - Installation & Configuration
Grant permissions to the directory
sudo chown -R hduser_:Hadoop_ <Path of Directory created in above step>
Hadoop Setup Tutorial - Installation & Configuration
sudo chmod 750 <Path of Directory created in above step>
Hadoop Setup Tutorial - Installation & Configuration
Step 6) Map Reduce Configuration
Before you begin with these configurations, lets set HADOOP_HOME path
sudo gedit /etc/profile.d/hadoop.sh
And Enter
export HADOOP_HOME=/home/guru99/Downloads/Hadoop
Hadoop Setup Tutorial - Installation & Configuration
Next enter
sudo chmod +x /etc/profile.d/hadoop.sh
Hadoop Setup Tutorial - Installation & Configuration
Exit the Terminal and restart again
Type echo $HADOOP_HOME. To verify the path
Hadoop Setup Tutorial - Installation & Configuration
Now copy files
sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
Hadoop Setup Tutorial - Installation & Configuration
Open the mapred-site.xml file
sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml
Hadoop Setup Tutorial - Installation & Configuration
Add below lines of setting in between tags <configuration> and </configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:54311</value>
<description>MapReduce job tracker runs at this host and port.
</description>
</property>
Hadoop Setup Tutorial - Installation & Configuration
Open $HADOOP_HOME/etc/hadoop/hdfs-site.xml as below,
sudo gedit $HADOOP_HOME/etc/hadoop/hdfs-site.xml


Hadoop Setup Tutorial - Installation & Configuration
Add below lines of setting between tags <configuration> and </configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hduser_/hdfs</value>
</property>
Hadoop Setup Tutorial - Installation & Configuration
Create directory specified in above setting-
sudo mkdir -p <Path of Directory used in above setting>
sudo mkdir -p /home/hduser_/hdfs
Hadoop Setup Tutorial - Installation & Configuration
sudo chown -R hduser_:hadoop_ <Path of Directory created in above step>
sudo chown -R hduser_:hadoop_ /home/hduser_/hdfs
Hadoop Setup Tutorial - Installation & Configuration
sudo chmod 750 <Path of Directory created in above step>
sudo chmod 750 /home/hduser_/hdfs
Hadoop Setup Tutorial - Installation & Configuration
Step 7) Before we start Hadoop for the first time, format HDFS using below command
$HADOOP_HOME/bin/hdfs namenode -format
Hadoop Setup Tutorial - Installation & Configuration
Step 8) Start Hadoop single node cluster using below command
$HADOOP_HOME/sbin/start-dfs.sh
Output of above command
Hadoop Setup Tutorial - Installation & Configuration
$HADOOP_HOME/sbin/start-yarn.sh
Hadoop Setup Tutorial - Installation & Configuration
Using 'jps' tool/command, verify whether all the Hadoop related processes are running or not.
Hadoop Setup Tutorial - Installation & Configuration
If Hadoop has started successfully then output of jps should show NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.
Step 9) Stopping Hadoop
$HADOOP_HOME/sbin/stop-dfs.sh
Hadoop Setup Tutorial - Installation & Configuration
$HADOOP_HOME/sbin/stop-yarn.sh
Hadoop Setup Tutorial - Installation & Configuration
 

Comments

Post a Comment

Popular posts from this blog

Introduction to BIG DATA: Types, Characteristics & Benefits

Learn Python Main Function with Examples: Understand __main__

Hadoop Tutorial: Features, Components, Cluster & Topology