[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: is that possible to make MapFile "mutable" ?
From:       "Open Study" <open.study () gmail ! com>
Date:       2007-06-27 3:33:43
Message-ID: fb1e04d60706262033y312041e1o61a35d939459a16b () mail ! gmail ! com
[Download RAW message or body]


Hi Devaraj, thanks for the reply and suggestion. I had a similar but less
sophisticated checkpoint mechanism.

Another question is, why the MapFile(in fact the SequenceFile) is made
immutable in the first place? I believe the motivation must make sense but
so far I don't know it.


On 6/27/07, Devaraj Das <ddas@yahoo-inc.com> wrote:
>
> No, you cannot append to a file on the dfs and your app should be able to
> treat multiple files as one single logical file (as you point out). But in
> your case, it seems like you could design your app to have some buffering,
> for example, you could have a buffer for the n different files, and could
> flush the buffer to different files on the dfs only when you have reached
> a
> certain limit on the amount of data in the buffer.
> I am not sure whether fault handling is of concern to you but there is the
> danger of losing the buffered messages if your app goes down. One way to
> handle this - assuming you have the ability to reprocess messages, you
> could
> checkpoint the state of the message processor in the dfs - the state could
> include the last message ID you flushed, and the next time your app starts
> up, it reads the checkpoint file from the dfs, gets the ID, and process
> messages starting from (ID + 1).
>
> -----Original Message-----
> From: Open Study [mailto:open.study@gmail.com]
> Sent: Tuesday, June 26, 2007 8:42 PM
> To: hadoop-user@lucene.apache.org
> Subject: is that possible to make MapFile "mutable" ?
>
> Hi all,
>
> MapFile doesn't support append mode of creation, so every time the
> existing
> mapfile would be overwritten if a new one with same name is created.
>
> Is there anyway I can append to an MapFile or alike without erasing the
> old
> content? or it doesn't makes sense at all?
>
> In my scenario I need to split mass (count by tens of millions) messages
> according to certain rules and put them into different mapfiles, which are
> supposed to get updated when new messages come in. Since I didn't find a
> way
> to make mapfile appendable, I have to create new mapfiles, so one mapfile
> can contain as little as one message in worst case and I will have to
> later
> merge them with their proper siblings.
>
> Regards
>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic