'RE: Hadoop - is it good for me and performance question'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    RE: Hadoop - is it good for me and performance question
From:       "Haijun Cao" <haijun () kindsight ! net>
Date:       2008-06-30 19:42:02
Message-ID: C001E847C1FD4248A7D6537643690E21019F9003 () mse16be2 ! mse16 ! exchange ! ms
[Download RAW message or body]

http://www.mail-archive.com/core-user@hadoop.apache.org/msg02906.html


-----Original Message-----
From: yair gotdanker [mailto:yairgot@gmail.com] 
Sent: Sunday, June 29, 2008 4:46 AM
To: core-user@hadoop.apache.org
Subject: Hadoop - is it good for me and performance question

Hello all,



I am newbie to hadoop, The technology seems very interesting but I am
not
sure it suit my needs.  I really appreciate your feedbacks.



The problem:

I have multiple logservers each receiving 10-100 mg/minute. The received
data is processed to produce aggregated data.
The data process time should take few minutes at top (10 min).

In addtion, I did some performance benchmark on the workcount example
provided by quickstart tutorial on my pc (pseudo-distributed, using
quickstart configurations file) and it took about 40 seconds!
I must be missing something here, I must be doing something wrong here
since
40 seconds is way too long!
Map/reduce function should be very fast since there is almost no
processing
done. So I guess most of the time spend on the hadoop framework.

I will appreciate any help  for understanding this and how can I
increase
the performance.
btw:
Does anyone know good behind the schene tutorial, that explains more on
how
the jobtracker/tasktracker communicate and so.

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic