Pin Your Interest: Steps For Installing & Configuring Hadoop in Standalone Mode

Steps For Installing & Configuring Hadoop in Standalone Mode

You might want to create a dedicated user for running Apache Hadoop but it is not a prerequisite. In our demonstration, we will be using a default user for running Hadoop.

Environment

Ubuntu 10.10

JDK 6 or above

Hadoop-1.1.2 (Any stable release)

Follow these steps for installing and configuring Hadoop on a single node:

Step-1. Install Java

In this tutorial, we will use Java 1.6 therefore describing the installation of Java 1.6 in detail.

Use the below command to begin the installation of Java

$ sudo apt-get install openjdk-6-jdk

or

$ sudo apt-get install sun-java6-jdk

This will install the full JDK under /usr/lib/jvm/java-6-sundirectory.

Step-2. Verify Java installation

You can verify java installation using the following command

$ java -version

On executing this command, you should see output similar to the following:

java version “1.6.0_27″

Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

Step-3. SSH configuration

Install SSH using the command.

sudo apt-get install ssh
Generate ssh key
ssh -keygen -t rsa -P “” (press enter when asked for a file name; this will generate a passwordless ssh file)

Now copy the public key (id_rsa.pub) of current machine to authorized_keysBelow command copies the generated public key in the .ssh/authorized_keys file.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Verify ssh configuration using the comman

ssh localhost

Pressing yes will add localhost to known hosts

Step-4. Download Hadoop

Download the latest stable release of Apache Hadoop from http://hadoop.apache.org/releases.html.

Unpack the release tar – zxvf hadoop-1.0.3.tar.gz

Save the extracted folder to an appropriate location, HADOOP_HOME will be pointing to this directory.

Step-5. Verify Hadoop

Check if the following directories exist under HADOOP_HOME: bin, conf, lib, bin

Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME)

export HADOOP_HOME=/home/user/hadoop

Now place the Hadoop binary directory on your command-line path by executing the command

export PATH=$PATH:$HADOOP_HOME/bin

Use this command to verify your Hadoop installation:

hadoop version

The o/p should be similar to below one

Hadoop 1.1.2

Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r911707

Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010

Step-6. Configure JAVA_HOME

Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir.

Java_Home can be configured in ~/.bash_profile or ~/.bashrc file. Alternatively you can also let hadoop know this by setting Java_Home in hadoop conf/hadoop-env.sh file.

Use the below command to set JAVA_HOME on Ubuntu

export JAVA_HOME=/usr/lib/jvm/java-6-sun

JAVA_HOME can be verified by command

echo $JAVA_HOME

Step-7. Create Data Directory for Hadoop

An advantage of using Hadoop is that with just a limited number of directories you can set it up to work correctly. Let us create a directory with the name hdfs and three sub-directories name, data and tmp.

Since a Hadoop user would require to read-write to these directories you would need to change the permissions of above directories to 755 or 777 for Hadoop user.

Step-8. Configure Hadoop XML files

Next, we will configure Hadoop XML file. Hadoop configuration files are in the HADOOP_HOME/conf dir.

conf/core-site.xml

<!--?xml version="1.0"-->>

<!--?xml -stylesheet type="text/xsl" href="configuration.xsl"?-->

<! -- Putting site-specific property overrides the file. -->

fs.default.name

hdfs://localhost:9000

hadoop.temp.dir

/home/girish/hdfs/temp<span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;"> </span>

conf/hdfs-site.xml

<! -- Putting site specific property overrides in the file. -->

dfs.name.dir

/home/girish/hdfs/name

dfs.data.dir

/home/girish/hdfs/data

dfs.replication

1

<strong style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;">conf/mapred-site.xml</strong>

<! -- Putting site-specific property overrides this file. -->

mapred.job.tracker

localhost:9001

conf/masters

Not required in single node cluster.
conf/slaves
Not required in single node cluster.
Step-9. Format Hadoop Name Node-

Execute the below command from hadoop home directory

$ ~/hadoop/bin/hadoop namenode -format

The following image gives an overview of a Hadoop Distributed File System Architecture.

Step-10. Start Hadoop daemons

$ ~/hadoop/bin/start-all.sh

Step-11. Verify the daemons are running

$ jps  (if jps is not in path, try  /usr/java/latest/bin/jps)

output will look similar to this

9316 SecondaryNameNode

9203 DataNode

9521 TaskTracker

9403 JobTracker

9089 NameNode

Now we have all the daemons running:

Note: If your master server fails to start due to the dfs safe mode issue, execute this on the Hadoop command line:

hadoop dfsadmin -safemode leave

Also make sure to format the namenode again if you make changes to your configuration.

Step-12. Verify UIs by namenode & job tracker

Open a browser window and type the following URLs:

namenode UI: http://machine_host_name:50070

job tracker UI: http://machine_host_name:50030

substitute ‘machine host name’ with the public IP of your node e.g: http://localhost:50070

Now you have successfully installed and configured Hadoop on a single node.

BASIC HADOOP ADMIN COMMANDS

(Source: Getting Started with Hadoop):

The ~/hadoop/bin directory contains some scripts used to launch Hadoop DFS and Hadoop Map/Reduce daemons. These are:

start-all.sh – Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers.
stop-all.sh – Stops all Hadoop daemons.
start-mapred.sh – Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
stop-mapred.sh – Stops the Hadoop Map/Reduce daemons.
start-dfs.sh – Starts the Hadoop DFS daemons, the namenode and datanodes.
stop-dfs.sh – Stops the Hadoop DFS daemons.

Pin Your Interest

Pages

Thursday, May 12, 2016

Steps For Installing & Configuring Hadoop in Standalone Mode

BASIC HADOOP ADMIN COMMANDS

No comments:

Post a Comment