'[Gluster-users] (Fixed) Re: Can Hadoop run on gluster in 1 JT, N TT setup or only works for 1 JT+TT'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gluster-users
Subject:    [Gluster-users] (Fixed) Re:  Can Hadoop run on gluster in 1 JT, N TT setup or only works for 1 JT+TT
From:       fermin () tid ! es (=?ISO-8859-1?Q?Ferm=EDn_Gal=E1n_M=E1rquez?=)
Date:       2012-01-31 18:23:00
Message-ID: 4F283184.4050708 () tid ! es
[Download RAW message or body]

Dear Venky,

El 31/01/2012 6:45, Venky Shankar escribi?:
[snip]

 *   Are you collocating gluster cluster peers with TT nodes (I mean, each one of the \
8 TT nodes is also a gluster peer in the cluster) or are the gluster cluster running \
in separate nodes?

Yes, you are right. Each TaskTracker node is a gluster peer in the cluster.


 *   In the case the answer to the question above is that they are collocated, which \
fs.glusterfs.server are you using in each TT?

For the TaskTracker, fs.glusterfs.server would be _any_ one of the gluster peers \
(i.e. any one of the 8 machines considering you have a 1JT + 8TT setup). For \
simplicity, stick to one hostname/ip for this, since that would make deployment \
easier (no need to edit core-site.xml on every machine)

I'm asking so because in my mind I'm thinking in a configuration like that:

TT1-> fs.glusterfs.server @ core-site.xml in TT1= IP_TT1
TT2-> fs.glusterfs.server @ core-site.xml in TT2= IP_TT2
...
TTn-> fs.glusterfs.server @ core-site.xml in TTn= IP_TTn

    This will definitely work for you, but as i said stick to one hostname/ip. So for \
each (TT1, TT2 .. TTn) use IP_TT1.


so, each TT mounts "itself" which I suppose achieves a data locality similar to the \
one achieved with HDFS (considering the gluster driver is clever enough to use the \
local disk when the data is located in the same node). Does it make sense this \
configuration?

Exactly ! Each TT node (and the JT too) does a GlusterFS FUSE mount to get a _view_ \
of the entire namespace of the FS. JobTracker schedules jobs to TaskTracker nodes. \
When a job runs on the TT node, all I/O is done through the GlusterFS mount. Data \
locality is a bit of a catch here. Since all I/O calls go through the mount, each \
call has to take the route of client translator(s) -> server translator(s) before it \
hits the posix layer (even if the client and the server are on the same node, the TT \
in this case).

To optimize this we introduced a configurable option "quick.slave.io". This is \
essentially a "short circuit" for the case i just mentioned above. When the job wants \
to read from a particular offset in the file, the GlusterFS Hadoop plugin checks \
whether the (offset, length) in question is present in the backend file system. If \
yes, then it satisfies the read directly from the backed FS instead of going through \
the FUSE mount, thereby saving context switches, translator overhead etc..

A bit more info, this option is not tested well, so we default to "Off" in \
core-site.xml. If you do try it out please let us know if you hit any bugs (and \
please file them too !).

Thank you very much for your clarifications! I will follow your recommendation of \
sticking to just one ip/hostname in all the core-site.xml files along the cluster and \
use the quick.slave.io option (I will report in the case of any bug).

However, after reading your mail, I wonder if Hadoop plugin for gluster implements \
some location-based job scheduling similar to the one in Hadoop on HDFS. I mean, in \
Hadoop on HDFS the JT coordinates with the NN (which knows where every file block is \
located withing the cluster), so each map task is scheduled to the TT closest to the \
input they have to process (ideally, collocated). In Hadoop on gluster I understand \
that there is no NN equivalente, but is there any mean so JT can know which nodes in \
the cluster have the actual data in their respective backend filesystem so JT tries \
to schedule each map task to a TT in one of these nodes? In negative case, how JT \
select the TT to schedule each map task (round-robin, randomly, etc.)?

Probably my question is very basic, but I haven't find a clear and direct answer in \
the documentation, sorry...

Thanks!

Best regards,

------
Ferm?n

________________________________
Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra \
pol?tica de env?o y recepci?n de correo electr?nico en el enlace situado m?s abajo. \
This message is intended exclusively for its addressee. We only send and receive \
email on the basis of the terms set out at \
                http://www.tid.es/ES/PAGINAS/disclaimer.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120131/0d7a3cf1/attachment.html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic