'[Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by SameerParanjpye'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-commits
Subject:    [Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by SameerParanjpye
From:       Apache Wiki <wikidiffs () apache ! org>
Date:       2006-09-20 7:22:36
Message-ID: 20060920072236.9884.9715 () ajax ! apache ! org
[Download RAW message or body]

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for \
change notification.

The following page has been changed by SameerParanjpye:
http://wiki.apache.org/lucene-hadoop/GettingStartedWithHadoop

------------------------------------------------------------------------------
    * {{{mapred.local.dir}}}
  
  === Formatting the Namenode ===
- 
  The first step to starting up your Hadoop installation is formatting the \
filesystem. You need to do this the first time you set up a Hadoop installation. \
'''Do not''' format a running filesystem, this will cause all your data to be erased. \
To format the filesystem, run the command: [[BR]] {{{% \
$HADOOP_INSTALL/hadoop/bin/hadoop namenode -format}}}  
  === Starting a Single node cluster ===
@@ -59, +58 @@

  === Stopping a Single node cluster ===
  Run the command [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/stop-all.sh}}} [[BR]] to \
stop all the daemons running on your machine.  
+ === Separating Configuration from Installation ===
+ In the example described above, the configuration files used by the Hadoop cluster \
all lie in the Hadoop installation. This can become cumbersome when upgrading to a \
new release since all custom config has to be re-created in the new installation. It \
is possible to separate the config from the install. To do so, select a  + directory \
to house Hadoop configuration (let's say {{{/foo/bar/hadoop-config}}}. Copy the \
{{{hadoop-site.xml, slaves}}} and {{{hadoop-env.sh}}} files to this directory. You \
can either set the {{{HADOOP_CONF_DIR}}} environment variable to refer to this \
directory or pass it directly to the Hadoop scripts with the {{{--config}}} option. + \
In this case, the cluster start and stop commands specified in the above two \
sub-sections become [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/start-all.sh --config \
/foo/bar/hadoop-config}}} and [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/stop-all.sh \
--config /foo/bar/hadoop-config}}}. [[BR]] Only the absolute path to the config \
directory should be passed to the scripts. + 
- === Starting up a real cluster ===
+ === Starting up a larger cluster ===
-  * After formatting the namenode run bin/start-dfs.sh on the Namenode. This will \
bring up the dfs with Namenode running on the machine you ran the command on and \
Datanodes  on the machines listed in the slaves file mentioned above. + 
+  * Ensure that the Hadoop package is accessible from the same path on all nodes \
that are to be included in the cluster. If you have separated configuration from the \
install then ensure that the config directory is also accessible the same way. +  * \
Populate the {{{slaves}}} file with the nodes to be included in the cluster. One node \
per line. +  * Follow the steps in the ''Basic Configuration'' section above.
+  * Format the Namenode
+  * Run the command {{{% $HADOOP_INSTALL/hadoop/bin/start-dfs.sh}}} on the node you \
want the Namenode to run on. This will bring up HDFS with the Namenode running on the \
machine you ran the command on and Datanodes  on the machines listed in the slaves \
                file mentioned above.
-  * Run bin/start-mapred.sh on the machine you plan to run the Jobtracker on. This \
will bring up the map reduce cluster with Jobtracker running on the machine you ran \
the command on and Tasktrackers running on machines listed in the slaves file. +  * \
Run the command {{{% $HADOOP_INSTALL/hadoop/bin/start-mapred.sh}}} on the machine you \
plan to run the Jobtracker on. This will bring up the Map/Reduce cluster with \
Jobtracker running on the machine you ran the command on and Tasktrackers running on \
machines listed in the slaves file. +  * The above two commands can also be executed \
                with a {{{--config}}} option.
-  * In case you have not set the HADOOP_CONF_DIR variable, you can use \
                bin/start-mapred.sh (bin/start-dfs.sh) --config configure_directory.
-  * Try executing bin/hadoop dfs -lsr / to see if it is working.
  
  === Stopping the cluster ===
-  * You can stop the cluster by running bin/stop-mapred.sh and then bin/stop-dfs.sh \
on your Jobtracker and Namenode respectively. You can specify the configure directory \
by using the --config option. +  * The cluster can be stopped by running {{{% \
$HADOOP_INSTALL/hadoop/bin/stop-mapred.sh}}} and then {{{% \
$HADOOP_INSTALL/hadoop/bin/stop-dfs.sh}}} on your Jobtracker and Namenode \
respectively. These commands also accept the {{{--config}}} option.  


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic