Home > Hadoop > Hadoop 2.7 Installing on Ubuntu 14.04 (Pseudo-distributed mode)

Hadoop 2.7 Installing on Ubuntu 14.04 (Pseudo-distributed mode)

Installing Java

Please read this artcile

Adding a dedicated Hadoop system user

For Hadoop, the accounts should have the same username on all of the nodes. This account is only for managing your Hadoop cluster. Once the
cluster daemons are up and running, you’ll be able to run your actual MapReduce jobs from other accounts.

bluething@ubuntu:~$ sudo addgroup hadoop
bluething@ubuntu:~$ sudo adduser --ingroup hadoop hadoop-user

Installing SSH

ssh has two main components:

ssh : The command we use to connect to remote machines – the client.
sshd : The daemon that is running on the server and allows clients to connect to the server.

The ssh is pre-enabled on Linux, but in order to start sshd daemon, we need to install ssh first.

Verify SSH installation

bluething@ubuntu:~$ which ssh
bluething@ubuntu:~$ which sshd
bluething@ubuntu:~$ which ssh-keygen

Install openssh-server

bluething@ubuntu:~$ sudo apt-get install openssh-server
bluething@ubuntu:~$ which sshd

Generate SSH key pair

bluething@ubuntu:~$ su - hadoop-user
hadoop-user@ubuntu:~$ ssh-keygen -t rsa -P ""
This command will create an RSA key pair with an empty password. Generally, using an empty password is not recommended, but in this case it is needed to unlock the key without your interaction (you don’t want to enter the passphrase every time Hadoop interacts with its nodes).
If you want to see public key form

hadoop-user@ubuntu:~$ more /home/hadoop-user/.ssh/id_rsa

Enable SSH access to your local machine with this newly created key.

hadoop-user@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Test the SSH setup by connecting to your local machine with the hadoop-user user. The step is also needed to save your local machine’s host key fingerprint to the hadoop-user user’s known_hosts file.

hadoop-user@ubuntu:~$ ssh localhost
The authenticity of host 'localhost (' can't be established.
ECDSA key fingerprint is 5b:5d:57:1a:08:34:51:9d:b2:26:3b:19:b3:84:eb:a0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

hadoop-user@ubuntu:~$ exit
Connection to localhost closed.


Installing Hadoop

Get Hadoop from https://hadoop.apache.org/releases.html

bluething@ubuntu:~$ sudo mkdir -p /usr/local/hadoop
bluething@ubuntu:~$ sudo tar xvzf hadoop-2.7.1.tar.gz -C /usr/local/hadoop
bluething@ubuntu:/usr/local$ sudo chown -R hadoop-user:hadoop hadoop

Add the following lines to the end of the $HOME/.bashrc file of user hadoop-user.

export JAVA_HOME=/usr/local/java/jdk1.8.0_60
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.1
# Add Hadoop bin/ directory to PATH

Hadoop Configuration

I create a separate configuration folder for each of the modes and place the appropriate version of the XML files in the corresponding folder. For default I use Pseudo-distributed mode

hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ mkdir conf.cluster
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ cp -a hadoop/. conf.cluster/
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ mkdir conf.pseudo
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ cp -a hadoop/. conf.pseudo/
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ mkdir conf.standalone
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ cp -a hadoop/. conf.standalone/
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ rm -r hadoop/
hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/etc$ ln -s conf.pseudo/ hadoop

According to https://issues.apache.org/jira/browse/HADOOP-3437 I disable IPv6 only for Hadoop.
adding the following line to conf.pseudo/hadoop-env.sh

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Configure JAVA_HOME

export JAVA_HOME=/usr/local/java/jdk1.8.0_60 

core-site.xml file contains configuration properties that Hadoop uses when starting up.

<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.

By default, the /usr/local/hadoop/hadoop-2.7.1/etc/conf.pseudo folder contains /usr/local/hadoop/hadoop-2.7.1/etc/conf.pseudo/mapred-site.xml.template, renamed with the name mapred-site.xml. The mapred-site.xml file is used to specify which framework is being used for MapReduce.

<description>The host and port that the MapReduce j ob tracker runs
at. </description>

It is used to specify the directories which will be used as the namenode and the datanode on that host. Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation.

bluething@ubuntu:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode 
bluething@ubuntu:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
bluething@ubuntu:~$ sudo chown -R hadoop-user:hadoop /usr/local/hadoop_store

Edit hdfs-site.xml

<description>The actual number of replications can be specified when the
file is created. </description>

Format the New Hadoop Filesystem

hadoop namenode -format command should be executed once before we start using Hadoop. If this command is executed again after Hadoop has been used, it’ll destroy all the data on the Hadoop file system.

hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1/bin$ hadoop namenode -format

Starting Hadoop

We can use start-all.sh or (start-dfs.sh and start-yarn.sh)

hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1$ sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-user-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-user-datanode-ubuntu.out
Starting secondary namenodes [] starting secondarynamenode, logging to /usr/local/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-user-secondarynamenode-ubuntu.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/hadoop-2.7.1/logs/yarn-hadoop-user-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.1/logs/yarn-hadoop-user-nodemanager-ubuntu.out

Check with netstat command. See if port in core-site.xml listen

hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1$ netstat -plten | grep java

Or open Web UI:


Stopping Hadoop

hadoop-user@ubuntu:/usr/local/hadoop/hadoop-2.7.1$ sbin/stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: no datanode to stop
Stopping secondary namenodes [] stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop


Categories: Hadoop
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: