[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    Re: JobConf.setOutputKeyComparatorClass
From:       "Owen O'Malley" <owen () yahoo-inc ! com>
Date:       2006-06-29 5:31:28
Message-ID: d67097aa82f9c7814513557df7a2c83a () yahoo-inc ! com
[Download RAW message or body]


On Jun 28, 2006, at 9:06 PM, Arun C Murthy wrote:

> All,
>
>   <background>
>       I have a *map* which does some processing and then a *reduce* 
> which sorts the results.
>       TextInputFormat & TextOutputFormat are the input/output formats 
> respectively.
>
>       However the *sort* I want to perform is as follows:
>       I want to sort output by 'comparing' 'columns' of 'key's in the 
> Comparator and not the entire 'key'.
>
>       E.g. spec: column1, column0 is the sort-spec.
>       aaa ccc ggg
>       bbb aaa hhh
>
>       should result in:
>       bbb aaa hhh
>       aaa ccc ggg
>   </background>
>
>   I can't seem to find an 'elegant' way to do this via the MR 
> framework i.e. I can't seem to be able to set a *policy* (i.e. set the 
> sort-spec) for the WritableComparable via the framework. Is there 
> something I'm missing? In essence I probably need a *configure* 
> callback for the WritableComparable interface too? Is there a better 
> way? Or is this outside the scope of the framework.

There is a way to do it, but it isn't surprising that you missed it. 
When JobConf creates a new instance of objects, if they are 
Configurable, they get sent the Configuration. So, if you make a 
ConfigurableComparator that extends WritableComparator and implements 
Configurable, it will get its setConf method called with the job's 
JobConf. Now do something like:

JobConf conf = new JobConf();
conf.set("my.sort.order", "1,0,2");
conf.setOutputKeyComparatorClass(ConfigurableComparator.class);

you should get the information where it needs to go.

-- Owen


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic