Hadoop install - June 2013
Just the command lines to get hadoop 2 installed on Ubuntu. These are all cribbed from the following source notes, and I am preserving them here for my own benefit so I can quickly repeat what I did. Note many of these instructions are also in the main hadoop docs from apache.
Source material
Use Michael-noll’s guide for version 1 & ssh http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html
Or this one for Hadoop 2 http://jugnu-life.blogspot.com/2012/05/hadoop-20-install-tutorial-023x.html http://hadoop.apache.org/docs/r2.0.5-alpha/
Create the hadoop user and ssh
sudo apt-get install openssh-server openssh-client sudo addgroup hadoop sudo adduser --ingroup hadoop hduser su - hduser
If you cannot ssh to localhost without a passphrase, execute the following commands:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Testing your SSH ssh localhost Say yes #exit
Get hadoop all set up
As the hduser, after downloading the tar
tar -xvf hadoop-2.0.5-alpha.tar.gz ln -s hadoop-2.0.5-alpha hadoop #edit .bashrc export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_21/ export HADOOP_PREFIX="/home/hduser/hadoop" export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin export HADOOP_MAPRED_HOME=${HADOOP_PREFIX} export HADOOP_COMMON_HOME=${HADOOP_PREFIX} export HADOOP_HDFS_HOME=${HADOOP_PREFIX} export YARN_HOME=${HADOOP_PREFIX}
Stolen entirely from JJ, but with path changed for my Ubuntu
Stolen from http://jugnu-life.blogspot.com/2012/05/hadoop-20-install-tutorial-023x.html Please click on his blog.
Login again so bash has paths above. In Hadoop 2.x version /etc/hadoop is the default conf directory. We need to modify / create following property files in the /etc/hadoop directory
cd ~ mkdir -p /home/hduser/workspace/hadoop_space/hadoop23/dfs/name;mkdir -p /home/hduser/workspace/hadoop_space/hadoop23/dfs/data;mkdir -p /home/hduser/workspace/hadoop_space/hadoop23/mapred/system;mkdir -p /home/hduser/workspace/hadoop_space/hadoop23/mapred/local
Edit core-site.xml with following contents
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS.</description> <final>true</final> </property> </configuration>
Edit hdfs-site.xml with following contents
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hduser/workspace/hadoop_space/hadoop23/dfs/name</value> <description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hduser/workspace/hadoop_space/hadoop23/dfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
The path
file:/home/hduser/workspace/hadoop_space/hadoop23/dfs/name AND
file:/home/hduser/workspace/hadoop_space/hadoop23/dfs/data
are some folders in your computer which would give space to store data and name edit files
Path should be specified as URI
Create a file mapred-site.xml inside /etc/hadoop with following contents
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.system.dir</name> <value>file:/home/hduser/workspace/hadoop_space/hadoop23/mapred/system</value> <final>true</final> </property> <property> <name>mapred.local.dir</name> <value>file:/home/hduser/workspace/hadoop_space/hadoop23/mapred/local</value> <final>true</final> </property> </configuration>
The path
file:/home/hduser/workspace/hadoop_space/hadoop23/mapred/system AND
file:/home/hduser/workspace/hadoop_space/hadoop23/mapred/local
are some folders in your computer which would give space to store data
Path should be specified as URI
Edit yarn-site.xml with following contents
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
Edit the ~/hadoop/etc/hadoop/hadoop-env.sh, to set the JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_21/
Format the namenode
hdfs namenode –format
Say Yes and let it complete the format
Time to start the daemons
hadoop-daemon.sh start namenode hadoop-daemon.sh start datanode
You can also start both of them together by
start-dfs.sh
Start Yarn Daemons
yarn-daemon.sh start resourcemanager yarn-daemon.sh start nodemanager
You can also start all yarn daemons together by
start-yarn.sh
Time to check if Daemons have started
Enter the command
jps 2539 NameNode 2744 NodeManager 3075 Jps 3030 DataNode 2691 ResourceManager
Time to launch UI
Open the localhost:8088 to see the Resource Manager page
Done :)
Happy Hadooping :)