Thursday, May 12, 2016

Steps For Installing & Configuring Hadoop in Standalone Mode

Steps For Installing & Configuring Hadoop in Standalone Mode


You might want to create a dedicated user for running Apache Hadoop but it is not a prerequisite. In our demonstration, we will be using a default user for running Hadoop.
Environment
Ubuntu 10.10
JDK 6 or above
Hadoop-1.1.2 (Any stable release)
Follow these steps for installing and configuring Hadoop on a single node:
Step-1. Install Java
In this tutorial, we will use Java 1.6 therefore describing the installation of Java 1.6 in detail.
Use the below command to begin the installation of Java

$ sudo apt-get install openjdk-6-jdk

or

$ sudo apt-get install sun-java6-jdk
This will install the full JDK under /usr/lib/jvm/java-6-sundirectory.
Step-2. Verify Java installation
You can verify java installation using the following command

$ java -version
On executing this command, you should see output similar to the following:
java version “1.6.0_27″
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
Step-3. SSH configuration
  • Install SSH using the command.
    sudo apt-get install ssh
  • Generate ssh key
    ssh -keygen -t rsa -P “” (press enter when asked for a file name; this will generate a passwordless ssh file)
  • Now copy the public key (id_rsa.pub) of current machine to authorized_keysBelow command copies the generated public key in the .ssh/authorized_keys file. 
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


  • Verify ssh configuration using the comman 
ssh localhost
Pressing yes will add localhost to known hosts
Step-4. Download Hadoop
Download the latest stable release of Apache Hadoop from http://hadoop.apache.org/releases.html.
Unpack the release tar – zxvf hadoop-1.0.3.tar.gz
Save the extracted folder to an appropriate location, HADOOP_HOME will be pointing to this directory.
Step-5. Verify Hadoop
Check if the following directories exist under HADOOP_HOME: bin, conf, lib, bin
Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME)
export HADOOP_HOME=/home/user/hadoop
Now place the Hadoop binary directory on your command-line path by executing the command
export PATH=$PATH:$HADOOP_HOME/bin
Use this command to verify your Hadoop installation:
hadoop version
The o/p should be similar to below one
Hadoop 1.1.2
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
Step-6. Configure JAVA_HOME
Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir.
Java_Home can be configured in ~/.bash_profile or ~/.bashrc file. Alternatively you can also let hadoop know this by setting  Java_Home  in hadoop conf/hadoop-env.sh file.
Use the below command to set JAVA_HOME on Ubuntu

export JAVA_HOME=/usr/lib/jvm/java-6-sun
JAVA_HOME can be verified by command
echo $JAVA_HOME
Step-7. Create Data Directory for Hadoop
An advantage of using Hadoop is that with just a limited number of directories you can set it up to work correctly. Let us create a directory with the name hdfs and three sub-directories name, data and tmp.
Since a Hadoop user would require to read-write to these directories you would need to change the permissions of above directories to 755 or 777 for Hadoop user.
Step-8. Configure Hadoop XML files
Next, we will configure Hadoop XML file.  Hadoop configuration files are in the  HADOOP_HOME/conf dir.
conf/core-site.xml
1
2
3
4
5
6
7
8
9
10
<!--?xml version="1.0"-->>
<!--?xml -stylesheet type="text/xsl" href="configuration.xsl"?-->
<! -- Putting site-specific property overrides the file. -->
fs.default.name
hdfs://localhost:9000
hadoop.temp.dir
/home/girish/hdfs/temp<span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;"> </span>
conf/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
<! -- Putting site specific property overrides in the file. -->
dfs.name.dir
/home/girish/hdfs/name
dfs.data.dir
/home/girish/hdfs/data
dfs.replication
1
1
<strong style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;">conf/mapred-site.xml</strong>
1
2
3
4
<! -- Putting site-specific property overrides this file. -->
mapred.job.tracker
localhost:9001
conf/masters
Not required in single node cluster.
conf/slaves
Not required in single node cluster.
Step-9. Format Hadoop Name Node-
Execute the below command from hadoop home directory
1
$ ~/hadoop/bin/hadoop namenode -format

The following image gives an overview of a Hadoop Distributed File System Architecture.
HDFS Architecture

Step-10. Start Hadoop daemons
1
$ ~/hadoop/bin/start-all.sh
Step-11. Verify the daemons are running
1
$ jps  (if jps is not in path, try  /usr/java/latest/bin/jps)
output will look similar to this
9316 SecondaryNameNode
9203 DataNode
9521 TaskTracker
9403 JobTracker
9089 NameNode
Now we have all the daemons running:
Note: If your master server fails to start due to the dfs safe mode issue, execute this on the Hadoop command line:
1
hadoop dfsadmin -safemode leave
Also make sure to format the namenode again if you make changes to your configuration.
Step-12. Verify UIs by namenode & job tracker
Open a browser window and type the following URLs:
namenode UI:   http://machine_host_name:50070
job tracker UI:   http://machine_host_name:50030
substitute ‘machine host name’ with the public IP of your node e.g:  http://localhost:50070
Now you have successfully installed and configured Hadoop on a single node.

BASIC HADOOP ADMIN COMMANDS

(Source: Getting Started with Hadoop):
The ~/hadoop/bin directory contains some scripts used to launch Hadoop DFS and Hadoop Map/Reduce daemons. These are:
  • start-all.sh – Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers.
  • stop-all.sh – Stops all Hadoop daemons.
  • start-mapred.sh – Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
  • stop-mapred.sh – Stops the Hadoop Map/Reduce daemons.
  • start-dfs.sh – Starts the Hadoop DFS daemons, the namenode and datanodes.
  • stop-dfs.sh – Stops the Hadoop DFS daemons.


No comments:

Post a Comment

Write a comment . .