[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Queue support from HDFS
From:       Bharath Mundlapudi <bharathwork () yahoo ! com>
Date:       2011-06-26 22:02:10
Message-ID: 1309125730.99704.YahooMailNeo () web110703 ! mail ! gq1 ! yahoo ! com
[Download RAW message or body]


One solution i am thinking is lets say you have Kafka or some JMS implementation \
where your job client is subscribed to and submits job dynamically based on queue \
input size. You may need to run final job to combine them all.

-Bharath


________________________________
From: Saumitra <saumitra.official@gmail.com>
To: common-user@hadoop.apache.org
Sent: Saturday, June 25, 2011 1:05 PM
Subject: Re: Queue support from HDFS

Thanks for reply Jakob,

As far as I understand, Kafka's hadoop consumers is MR job where mappers 
read from shared queue from Kafka and dump data to HDFS, but they are 
not dynamically created as queue elements start bursting up.

Is there way so that new mappers are created when input queue of job 
grows or when input HDFS source get updated?


On Saturday 25 June 2011 01:01 AM, Jakob Homan wrote:
> Not directly, but you may wish to take a look at the Kafka project
> (http://sna-projects.com/kafka/), which we use as a queue and then
> bring the data periodically into HDFS via an MR job.  See this
> presentation: http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation
> -Jakob
> 
> 
> 
> On Fri, Jun 24, 2011 at 10:12 AM, Saumitra Shahapure
> <saumitra.official@gmail.com>  wrote:
> > Hi,
> > 
> > Is queue-like structure supported from HDFS where stream of data is
> > processed when it's generated?
> > Specifically, I will have stream of data coming; and data independent
> > operation needs to be applied to it (so only Map function, reducer is
> > identity).
> > I wish to distribute data among nodes using HDFS and start processing it as
> > it arrives, preferably in single MR job.
> > 
> > I agree that it can be done by starting new MR job for each batch of data,
> > but is starting many MR jobs frequently for small data chunks a good idea?
> > (Consider new batch arrives after every few sec and processing of one batch
> > takes few mins)
> > 
> > Thanks,
> > --
> > Saumitra S. Shahapure
> > 


-- 
Saumitra Shahapure



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic