'Re: How to Start Hadoop Cluster from source code in Eclipse'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: How to Start Hadoop Cluster from source code in Eclipse
From:       KrzyCube <yuxh312 () gmail ! com>
Date:       2007-06-22 7:17:21
Message-ID: 11247363.post () talk ! nabble ! com
[Download RAW message or body]


Finally, I start it successfully , with the "NameNode"  and one "DataNode"
which both on the localhost.

My configures are :

1. extract the code from the tar.gz  , i got the version hadoop-0.12.3
2. in eclipse, new a project from the ant file "build.xml" which under the
source code folder.
3. Try compile. (may be have to configure the java compile version,in
project properties or eclipse preference. I just enable the java6.0 in my
ubuntu7.04)
4.if that done well , find the NameNode.java ,configure as a JavaApplication
, and try to run.
5. If there are some exceptions in log4j like "can not found log appender".
It might be  "conf" problem. I fix this with add the "Hadoop/conf" folder to
"use as source folder". In eclipse it is easy , 
find the the conf folder in the source exploer tree view , then right-click
->Build Path->"use as source folder"

6. rebuild , try run again . Now there may be exception like "NameNode have
not be formatted"
7.add "-format" arguments to the application once , it will format the
namenode , then drop this arguments.
8. then i take some other configurations here , export the "HADOOP_HOME" in
the hadoop-env.sh.
    make it direct to the source code path is OK.
    configure the hadoop-site.xml , just as there in the hadoop wiki
says,host, ports , and the paths such as dfs.name.dir. here i just give it
the path that format generated , something look like this
"*/workspace/Hadoop/filesystem/name".

9.  rebuild and retry, then , must go to the "webapps not found in classpath
" found,which i refer in my last post . Just copy to the Hadoop/bin folder
won't be ok , that just cause another strange exception.

10. after trace some code , i found that while create the httpserver it
found webapps in /src/webapps, yes , it's there , but not work , i copy the
"Hadoop/src/webapps" to "Hadoop/src/java/webapps" the , refresh the tree
view in eclipse , and find the webapps folder under java/ ,
right-click->build-path->include.
Now the webapps folder will be copy to the output to the path which we set
for the bulid-output-folder, Hadoop/bin or Hadoop/build , i choose the first
as default.

11. Try again , the NameNode started , cheer.
12. Configure DataNode.java as JavaApplication , run , started , cheer
again.
13. Then i toggle some breakpoints in the source files , and write some code
who calls the FSShell from another computer, wonderful , the breakpoints
actived at the server-side.
-------------------------------------------------------------------------------------------------------------------------


Then after that , still i have problem that :

1. If i want to start the jobtrackers ,etc , just do as the DataNode  ??
2. The how can i start a cluster with several datanode , need some scripts ?

------------And thanks for you guys reply , those really help me a lot ,
thanks.
                                                                                      \
 KrzyCube





KrzyCube wrote:
> 
> I take steps below:
> 
> 1. New a project from the exist ant file "build.xml"
> 2.try to compile the project , its done well.
> 3.find NameNode.java and configure as a Java App to run.
> 4.Told me that NameNode not formatted , then i do it with -format argument
> 5.Then , Exceptions as "webapps" not found in classpath
> 6.so i try to configure the src/webapps folder as Build->Use as source
> folder
> 7.Build the project again. But i can find the webapps output to
> build_output_path
> 8.Then i just copy the "webapps" to the bin/ path , as my build output
> path is Hadoop/bin.
> 9.Then Exceptions like these:
> ----------------------------------------------------------------------------------------------------------------------
>  07/06/22 12:42:22 INFO dfs.StateChange: STATE* Network topology has 0
> racks and 0 datanodes
> 07/06/22 12:42:22 INFO dfs.StateChange: STATE* UnderReplicatedBlocks has 0
> blocks
> 07/06/22 12:42:22 INFO util.Credential: Checking Resource aliases
> 07/06/22 12:42:22 INFO http.HttpServer: Version Jetty/5.1.4
> 07/06/22 12:42:22 INFO util.Container: Started
> HttpContext[/static,/static]
> 07/06/22 12:42:23 INFO util.Container: Started
> org.mortbay.jetty.servlet.WebApplicationHandler@1ec6696
> 07/06/22 12:42:23 INFO http.SocketListener: Started SocketListener on
> 0.0.0.0:50070
> 07/06/22 12:42:23 ERROR dfs.NameNode: java.io.IOException: Problem
> starting http server
> 	at
> org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:211)
> 	at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:274)
> 	at org.apache.hadoop.dfs.NameNode.init(NameNode.java:178)
> 	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:195)
> 	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:728)
> Caused by:
> org.mortbay.util.MultiException[java.lang.ClassNotFoundException:
> org.apache.hadoop.dfs.dfshealth_jsp, java.lang.ClassNotFoundException:
> org.apache.hadoop.dfs.nn_005fbrowsedfscontent_jsp]
> 	at org.mortbay.http.HttpServer.doStart(HttpServer.java:731)
> 	at org.mortbay.util.Container.start(Container.java:72)
> 	at
> org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:188)
> 	... 4 more
> -------------------------------------------------------------------------------------------------------------------------
>  I tried configure here and there and try and try , but , this exception is
> still there.
> what the problem this exception might be?
> 
> 
> Thanks a lot
> KrzyCube
> 
> 
> Konstantin Shvachko wrote:
> > 
> > I run entire one node cluster in eclipse by just executing main() (run 
> > or debug menus) for each component.
> > You need to configure eclipse correctly in order to do that. Can you 
> > compile the whole thing under eclipse?
> > NameNode example:
> > = Open NameNode.java in the editor.
> > = Run / Run
> > = New Java Application -> will create an entry under "Java Application" 
> > named NameNode
> > = Select NameNode, go to tab Arguments and enter the following arguments 
> > under "VM Arguments":
> > -Dhadoop.log.dir=./logs  
> > -Xmx500m
> > -ea
> > The first one is required, can point to your log directory, the 
> > other two are optional
> > = go to the "Classpath" tab, add "hadoop/build" path under "User entries"
> > by
> > Advanced / New Folder / select "hadoop/build"
> > That should be it, if the default classpath is configured correctly, and 
> > if I am not forgetting anything.
> > Let me know if that helped, I 'll send you screenshots of my 
> > configuration if not.
> > 
> > --Konstantin
> > 
> > 
> > Mahajan, Neeraj wrote:
> > 
> > > There are two sepearete issues you are asking here:
> > > 1. How to modify/add to haddop code and execute the changed -
> > > Eclipse is just an IDE, it doesn't matter whether you use eclipse or
> > > some other editor.
> > > I have been using eclipse. What I do is modify the code using eclipse
> > > and then run "ant jar" in the root folder of hadoop (you could also
> > > configure this to work directly from eclipse). This would regenerate the
> > > jars and put them in build/ folder. Now you can either copy these jars
> > > into hadoop root folder (removing "dev" in their name) so that they
> > > replace the original jars or modify the scripts in bin/ to point to the
> > > newly generated jars.
> > > 
> > > 2. How to debug using a IDE -
> > > This page gives a high-level intro to debugging hadoop -
> > > http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms
> > > According to me, there are two ways you can debug hadoop programs: Run
> > > hadoop in local mode and debug in process in the IDE or run hadoop in
> > > distributed mode and remote debug using IDE.
> > > 
> > > The first way is easy. In the bin/hadoop script at the end there is a
> > > exec command, instead of that put a echo command and run your program.
> > > You can see what the paramters the script passes while starting hadoop.
> > > Use these same parameters in the IDE and you can debug hadoop. Remember
> > > to make change to the conf files so that hadoop runs in local mode. To
> > > be more specific, you will have to set the program arguemnts, VM
> > > arguments and add an entry in the classpath pointing to the conf folder.
> > > 
> > > The second method is compilcated. You will have to modify the scripts
> > > and put in some extra params like "-Xdebug
> > > -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=<port>" for the
> > > java command. Specify the <port> of you choice in it. On the server
> > > where you are running both the namenode/jobnode there will be a conflict
> > > as the same port would be specified. So you will have to do some
> > > intelligent scripting to take care of this. Once the java processes
> > > start you can attach eclipse debugger to that machine's <port> and set
> > > breakpoints. Till this part you can debug all the things before map
> > > reduce tasks. Mapp reduce tasks run in separate process, for debugging
> > > them you will have to figure out yourself.
> > > 
> > > The best way is to debug using the first approach (as the above link
> > > says). I think by that approach you can fix any map-reduce related
> > > problems and for other purely distributed kind of problems you can
> > > follow the second approach.
> > > 
> > > ~ Neeraj
> > > 
> > > -----Original Message-----
> > > From: KrzyCube [mailto:yuxh312@gmail.com] 
> > > Sent: Thursday, June 21, 2007 2:08 AM
> > > To: hadoop-user@lucene.apache.org
> > > Subject: How to Start Hadoop Cluster from source code in Eclipse
> > > 
> > > 
> > > Hi,all:
> > > 
> > > I am using Eclipse to View Hadoop source code , and i want to trace to
> > > see how it works, I code a few code to call the FSClient  and when i
> > > call into the RPC  object, it can not to be deep more .
> > > 
> > > So i just want to start cluster from source code , which i am holding
> > > them in Eclipse now. 
> > > I browse the start-*.sh , and find that it must start several threads ,
> > > such as namenode , datanode,secondnamenode. i just don't know how to
> > > figure out.
> > > 
> > > or is there any way to attach my code to a running process , just as the
> > > gdb while we are debug c code 
> > > 
> > > Does any body ever use Eclipse to debug these source code , please give
> > > some tip.
> > > 
> > > 
> > > 
> > > Thanks .
> > > 
> > > 
> > > KrzyCube
> > > --
> > > View this message in context:
> > > http://www.nabble.com/How-to-Start-Hadoop-Cluster-from-source-code-in-Ec
> > > lipse-tf3957457.html#a11229322
> > > Sent from the Hadoop Users mailing list archive at Nabble.com.
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 
> 

-- 
View this message in context: \
http://www.nabble.com/How-to-Start-Hadoop-Cluster-from-source-code-in-Eclipse-tf3957457.html#a11247363
 Sent from the Hadoop Users mailing list archive at Nabble.com.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic