[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-user
Subject: RE: Hadoop - is it good for me and performance question
From: "Haijun Cao" <haijun () kindsight ! net>
Date: 2008-06-30 19:42:02
Message-ID: C001E847C1FD4248A7D6537643690E21019F9003 () mse16be2 ! mse16 ! exchange ! ms
[Download RAW message or body]
http://www.mail-archive.com/core-user@hadoop.apache.org/msg02906.html
-----Original Message-----
From: yair gotdanker [mailto:yairgot@gmail.com]
Sent: Sunday, June 29, 2008 4:46 AM
To: core-user@hadoop.apache.org
Subject: Hadoop - is it good for me and performance question
Hello all,
I am newbie to hadoop, The technology seems very interesting but I am
not
sure it suit my needs. I really appreciate your feedbacks.
The problem:
I have multiple logservers each receiving 10-100 mg/minute. The received
data is processed to produce aggregated data.
The data process time should take few minutes at top (10 min).
In addtion, I did some performance benchmark on the workcount example
provided by quickstart tutorial on my pc (pseudo-distributed, using
quickstart configurations file) and it took about 40 seconds!
I must be missing something here, I must be doing something wrong here
since
40 seconds is way too long!
Map/reduce function should be very fast since there is almost no
processing
done. So I guess most of the time spend on the hadoop framework.
I will appreciate any help for understanding this and how can I
increase
the performance.
btw:
Does anyone know good behind the schene tutorial, that explains more on
how
the jobtracker/tasktracker communicate and so.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic