[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: How to use MapFile?
From:       Doug Cutting <cutting () apache ! org>
Date:       2006-11-13 18:59:26
Message-ID: 4558C08E.2090308 () apache ! org
[Download RAW message or body]

A good way to update a very large MapFile-based dataset is to:

1. Add new entries to SequenceFile's in a dataset.add directory.
2. Run a MapReduce job specifying input directories of both dataset and 
dataset.new.  If you need to update existing entries, specify a reduce 
function that merges existing entries with new entries.  Specify 
MapFileOutputFormat.  Specify dataset.new as the output directory.
3. Rename dataset.new to dataset.
4. Use MapFileOutputFormat.getReaders() and 
MapFileOutputFormat.getEntry() to randomly access entries in the dataset 
with a single read (the indexes are read into memory).  Or, for batch 
operations, use MapReduce directly on the dataset (as an input 
directory) to generate derivative datasets.

This is the way that, e.g., Nutch updates it's crawl DB.

Doug

å¼ èŒ‚æ£® wrote:
>  Hi all: 
> 
> Now I want to do some operations like ‘update' or ‘insert', which can
> describe like this:
> 
> 1. I have a base dataset
> 
> 2. Everyday I will get more data from other places, and then I want to
> update or insert these new data into my base dataset. 
> 
> 3. After I've read API Doc, I think MapFile is a good way to solve this
> problem. As far as I know, I only need to append my new data at the end of
> base dataset, and update the index file of MapFile. I understand right?
> 
> 4.  If I am right, I want to know how to do these operations using MapFile. 
> 
> Firstly, I could only find MapFileOutputFormat and couldn't find
> MapFileInputFormat, so how to read the MapFile?
> 
> Secondly, how to update the index and append the data? Do you have some
> experience or samples?
> 
> Any suggestion would be appreciated.
> 
> Thank you!
> 
> 

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic