'[Nutch-cvs] [Nutch Wiki] Update of "NutchDistributedFileSystem" by PiotrKosiorowski'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-cvs
Subject:    [Nutch-cvs] [Nutch Wiki] Update of "NutchDistributedFileSystem" by PiotrKosiorowski
From:       Apache Wiki <wikidiffs () apache ! org>
Date:       2005-04-25 11:29:55
Message-ID: 20050425112955.18683.4058 () ajax ! apache ! org
[Download RAW message or body]

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change \
notification.

The following page has been changed by PiotrKosiorowski:
http://wiki.apache.org/nutch/NutchDistributedFileSystem

The comment on the change is:
Package structure and example updated

------------------------------------------------------------------------------
  
  How much is sufficiently? That should be user-controlled, but right now it's \
hard-coded. The NDFS tries to make sure each Block exists on two Datanodes at any one \
time, though it will still operate if that's impossible. The numbers are low because \
a lot of Nutch users use just a few machines, where higher replication rates are \
impossible.  
- However, NDFS has been designed with large installations in mind. I'd strongly \
recommend using the system with a replication rate of 3 copies, 2 minimum. (Until we \
get a proper config interface in place, adjust the DESIRED_REPLICATION and \
MIN_REPLICATION constants in fs/FSNamesystem.java) + However, NDFS has been designed \
with large installations in mind. I'd strongly recommend using the system with a \
replication rate of 3 copies, 2 minimum. Desired replication can be set in nutch \
config file using "ndfs.replication" property, and  MIN_REPLICATION constant is \
located in ndfs/FSNamesystem.java (and set to 1 by default).  
  = System Details =
  
- More details on NDFS operation are coming soon. For now, take a look at the \
following files, all in src/net/nutch/fs/*.java: + More details on NDFS operation are \
coming soon. For now, take a look at the following files, all in \
src/org/apache/nutch/ndfs/*.java:  
  NDFS.java has two inner classes, each with a main(). One is for the NameNode?, and \
one is for the DataNode?. This file has all the network-handling code. Much of the \
work is handed to other classes.  
@@ -100, +100 @@

  
  = System Integration =
  
- We are working to integrate the NDFS with Nutch filesystem stubs that we placed in \
the WebDB earlier (to support the distributed WebDB). It should be easy to add a \
switch to all Nutch tools to change quickly between the local filesystem and an NDFS \
installation. + Majority of nutch tools use Nutch``File``System abstraction to access \
files. There are currently two implementations available Local``File``System and \
NDFSFile``System. If not specified in command line arguments for tools using \
Nutch``File``System abstraction - filesystem implementation to be used is taken from \
config file property named "fs.default.name". Possible values of this property are \
"local" - for Local``File``System and "host:port" for NDFSFile``System. In the second \
case host and port values describe Name``Node location. + 
+ = Configuration properties =
+ 
+ NDFS related properties - description taken from config file:
+ 
+ "fs.default.name" - The name of the default file system.  Either the literal string \
"local" or a host:port for NDFS. + 
+ "ndfs.name.dir" - Determines where on the local filesystem the NDFS name node \
should store the name table. + 
+ "ndfs.data.dir" - Determines where on the local filesystem the NDFS name node \
should store the name table. + 
+ "ndfs.replication" - how many copies we try to have at all times (not present in \
config file)  
  = Quick Demo =
  
- On machine A, run: $ java net.nutch.fs.NDFS$NameNode 9000 namedir
+ On machines A,B,C in nutch config file set:
  
- On machine B, run: $ java net.nutch.fs.NDFS$DataNode datadir1 machineB 8000 \
machineA:9000 + fs.default.name = A:9000
  
- On machine C, run: $ java net.nutch.fs.NDFS$DataNode datadir2 machineC 8000 \
machineA:9000 + ndfs.name.dir=/tmp/nutch/ndfs/name
  
- You now have an NDFS installation with one NameNode? and two DataNodes?. (Note, of \
course, you don't have to run these on different machines. It's enough to use \
different directories and avoid port conflicts.) + ndfs.data.dir=/tmp/nutch/ndfs/data
  
- Anywhere, run the client: $ java net.nutch.fs.TestClient machineA:9000 CREATE \
foo.txt $ java net.nutch.fs.TestClient machineA:9000 GET foo.txt $ java \
net.nutch.fs.TestClient machineA:9000 RENAME foo.txt bar.txt $ java \
net.nutch.fs.TestClient machineA:9000 GET bar.txt $ java net.nutch.fs.TestClient \
machineA:9000 DELETE bar.txt  
- You have just created a large file, retrieved it, renamed it, retrieved it again, \
and deleted it. + 
+ On machine A, run: $ nutch namenode 
+ 
+ On machine B, run: $ nutch datanode
+ 
+ On machine C, run: $ nutch datanode 
+ 
+ You now have an NDFS installation with one NameNode? and two DataNodes?. (Note, of \
course, you don't have to run these on different machines. It's enough to use \
different directories and avoid port conflicts.) DataNodes use port 7000 or greater \
(they probe to find free port to listen on starting from 7000). + 
+ Anywhere, run the client (having fs.default.name = A:9000 in nutch config file): 
+ 
+ $ nutch org.apache.nutch.fs.Test``Client 
+ 
+ It will display possible NDFS operations to be performed using this test tool.
+ 
+ So to test basic NDFS operation we can execute:
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -mkdir /test
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -ls /
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -put local_file /test/testfile
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -ls /test
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -cp /test/testfile /test/backup
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -rm /test/testfile
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -mv /test/backup /test/testfile
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -get /test/testfile local_copy
+ 
+ 
+ You have just created a directory, listed its contents, copied a file from local \
filesystem into it, listed it again, copied it in NDFS, removed original, renamed \
backup to original name and retrieved a copy from NDFS to local file system. + 
+ There are also additional commands that allow you to inspect the state of NDFS:
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -report
+ 
+ $ nutch org.apache.nutch.fs.Test``Client -du /
  
  You might try interesting things like the following:
   1. Start a NameNode? and one DataNode?
@@ -126, +177 @@

  
  The system should have replicated the relevant blocks, making the data still \
available in step 6.  
- If you want to read/write directly, use the API exposed in net.nutch.fs.NDFSClient
+ If you want to read/write directly, use the API exposed in \
org.apache.nutch.ndfs.NDFSClient  
  = Conclusion =
  


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic