[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    Re: where distributed cache start working
From:       Hemanth Yamijala <yhemanth () gmail ! com>
Date:       2010-08-27 14:16:46
Message-ID: AANLkTim=30tomG7JKi078CDMfVSavV-hFeHFtOqxS2Dh () mail ! gmail ! com
[Download RAW message or body]

Hi,
> Thanks Arun. Change the mTime is a good idea. However, given a file (the path is
>
> A/B/C/D/file) distributed to all the nodes, if I just change the mTime of file
> to a earlier time stamp, it will not be replaced next time. Should I also change
> the mTime for all the directories along the path (A, B, C and D). Whose
> timestamp is used by DistributedCache?

It is the timestamp of the file on DFS. So, you modify the file's
timestamp on DFS, it should be re-distributed to all the nodes.

Thanks
Hemanth
>
> Thanks.
> -Gang
>
>
>
>
> ----- ԭʼÓʼþ ----
> ·¢¼þÈË£º Arun C Murthy <acm@yahoo-inc.com>
> ÊÕ¼þÈË£º mapreduce-user@hadoop.apache.org
> ·¢ËÍÈÕÆÚ£º 2010/8/22 (ÖÜÈÕ) 9:38:02 Ï Îç
> Ö÷   Ì⣺ Re: where distributed cache start working
>
> Moving to mapreduce-user@, bcc common-dev@. Please use the project specific
> lists.
>
> DistributedCache.purgeCache isn't a public api. You shouldn't be calling it from
>
> the task.
>
> A simple way of doing what you want is to change the mtime of the cache files on
>
> HDFS.
>
> Arun
>
> On Aug 22, 2010, at 9:48 AM, Gang Luo wrote:
>
>> Thanks Jeff.
>>
>> However, are you sure TaskRunner.run() is also used in the new API? I use
>>btrace
>> to trace the function call but didn't find this function had been called
>> anywhere.
>>
>>
>> One more question about distributed cache. After I call
>> DistributedCache.purgeCache, I think the local cached files should be deleted
>>or
>> invalidated. However ,When I run the same job with the purge operation at the
>> end multiple times, I find the local files have never been deleted and the
>> modification time is when the first job run. How can I ask my job to
>> re-distributed the cache again anyway?
>>
>> Thanks,
>> -Gang
>>
>>
>>
>>
>> ----- ԭʼÓʼþ ----
>> ·¢¼þÈË£º Jeff Zhang <zjffdu@gmail.com>
>> ÊÕ¼þÈË£º common-dev@hadoop.apache.org
>> ·¢ËÍÈÕÆÚ£º 2010/8/20 (ÖÜÎå) 11:22:49 ÉÏÎç
>> Ö÷   Ì⣺ Re: where distributed cache start working
>>
>> Hi Gang,
>>
>> In the TaskRunner's run() method, hadoop will download the cache files
>> which you set on the client side to local, then the forked child jvm
>> can use these cache files locally.
>>
>>
>>
>> On Fri, Aug 20, 2010 at 8:08 AM, Gang Luo <lgpublic@yahoo.com.cn> wrote:
>>> Hi all,
>>> I go through the code, but couldn't find the place where distributed cache
>>> start
>>> working. I want to know between DistriubtedCache.addCacheFile at the master
>>> node
>>> and DistributedCache.getLocalCacheFiles at the client side, when and where
> are
>>> the files get distributed.
>>>
>>>
>>> Thanks,
>>> -Gang
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>>
>
>
>
>
>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic