[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Create a block - file map
From:       Amith sha <amithsha92 () gmail ! com>
Date:       2020-01-01 15:56:19
Message-ID: CAKkPGvDxedxGfPc1TY8oONSu64v2QViPPiO5M+ZhDyF2eu3vzA () mail ! gmail ! com
[Download RAW message or body]

enable DEBUG mode on org.apache.hadoop.hdfs.server.blockmanagement on
namenode.

Thanks & Regards
Amithsha


On Wed, Jan 1, 2020 at 4:55 AM Arpit Agarwal <aagarwal@cloudera.com.invalid>
wrote:

> That is the only way to do it using the client API.
>
> Just curious why you need the mapping.
>
>
> On Tue, Dec 31, 2019, 00:41 Davide Vergari <vergari.davide@gmail.com>
> wrote:
>
>> Hi all,
>> I need to create a block map for all files in a specific directory (and
>> subdir) in HDFS.
>>
>> I'm using fs.listFiles API then I loop in the
>> RemoteIterator[LocatedFileStatus] returned by listFiles and for each
>> LocatedFileStatus I use the getFileBlockLocations api to get all the block
>> ids of that file, but it takes long time because I have millions of file in
>> the HDFS directory.
>> I also tried to use Spark to parallelize the execution, but HDFS' API are
>> not serializable.
>>
>> Is there a better way? I know there is the "hdfs oiv" command but I can't
>> access directly the Namenode directory, also the ImageFS file could be
>> outdated and I can't force the safemode to execute the saveNamespace
>> command.
>>
>> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)
>>
>> Thank you
>>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div><div dir="ltr" class="gmail_signature" \
data-smartmail="gmail_signature"><span \
style="color:rgb(51,51,51);font-size:14px;background-color:rgb(245,245,245)"><font \
face="arial, sans-serif">enable DEBUG mode on  \
org.apache.hadoop.hdfs.server.blockmanagement on namenode.</font></span></div><div \
dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><br></div><div \
dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Thanks &amp; \
Regards<br></div><div dir="ltr" class="gmail_signature" \
data-smartmail="gmail_signature">Amithsha</div></div><br></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 1, 2020 at 4:55 AM \
Arpit Agarwal &lt;aagarwal@cloudera.com.invalid&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>That is the only way to do it \
using the client API.</div><div dir="auto"><br></div><div dir="auto">Just curious why \
you need the mapping.</div><div dir="auto"><br></div><div dir="auto"><br><div \
class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Tue, Dec 31, \
2019, 00:41 Davide Vergari &lt;<a href="mailto:vergari.davide@gmail.com" \
target="_blank">vergari.davide@gmail.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi all,</div><div>I need to \
create a block map for all files in a specific directory (and subdir) in HDFS. \
<br></div><div><br></div><div>I&#39;m using fs.listFiles API then I loop in the \
RemoteIterator[LocatedFileStatus] returned by listFiles and for each \
LocatedFileStatus I use the getFileBlockLocations api to get all the block ids of \
that file, but it takes long time because I have millions of file in the HDFS \
directory.</div><div>I also tried to use Spark to parallelize the execution, but \
HDFS&#39; API are not serializable.<br></div><div><br></div><div>Is there a better \
way? I know there is the &quot;hdfs oiv&quot; command but I can&#39;t access directly \
the Namenode directory, also the ImageFS file could be outdated and I can&#39;t force \
the safemode to execute the saveNamespace command.</div><div><br></div><div>I&#39;m \
using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)</div><div><br></div><div>Thank \
you<br></div></div> </blockquote></div></div></div>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic