[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Architecture question.
From:       Bill de hOra <bill () dehora ! net>
Date:       2008-12-31 17:27:32
Message-ID: 495BAB84.3060507 () dehora ! net
[Download RAW message or body]

Steve Loughran wrote:
> aakash shah wrote:
>> We can assume that this record has only one key->value mapping. Value 
>> will be updated every minute. Currently we have 1 Million these ( 
>> key->value ) pairs but I have to make sure that we can scale it upto 
>> 10 million of these ( key-> value ) pairs.
>>
>>   Every 10 minute I will be updating all of these value using their 
>> keys. This is the reason I cannot go for database as a solution. 
> 
> I wouldn't be so quick to dismiss a database.  All your big telcos run 
> their mobile phone systems on databases, where the big issue is having 
> enough memory for the DB to stay in memory; some dedicated databases 
> (e.g. TimesTen) are designed to have bounded latency on lookup so you 
> can predict how long operations will take.
> 
> That said, if you are only doing atomic updates of a single record, 
> there's less need for the advanced features. Assuming >1 machine, some 
> kind of distributed hash table may work
> 
> 
>>    I was thinking about going with memcache pool. In the mean-time I 
>> heard about hadoop and wanted to get advice from this mailing list 
>> regarding memcache pool vs hadoop for this specific problem.
> 
> It's not an area Hadoop deals with at all.

The record size sounds too small for HDFS, unless the records are in 
turn grouped to something optimal for the block size. For records that 
size, I would also consider a) writing them out again instead of doing 
updates, b) testing for physical (disk) bottlenecks.

Also, there's memcacheddb as an alternative for persistent hashing:

http://memcachedb.org/benchmark.html

Bill



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic