[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-user
Subject: Re: Create a block - file map
From: Amith sha <amithsha92 () gmail ! com>
Date: 2020-01-01 15:56:19
Message-ID: CAKkPGvDxedxGfPc1TY8oONSu64v2QViPPiO5M+ZhDyF2eu3vzA () mail ! gmail ! com
[Download RAW message or body]
enable DEBUG mode on org.apache.hadoop.hdfs.server.blockmanagement on
namenode.
Thanks & Regards
Amithsha
On Wed, Jan 1, 2020 at 4:55 AM Arpit Agarwal <aagarwal@cloudera.com.invalid>
wrote:
> That is the only way to do it using the client API.
>
> Just curious why you need the mapping.
>
>
> On Tue, Dec 31, 2019, 00:41 Davide Vergari <vergari.davide@gmail.com>
> wrote:
>
>> Hi all,
>> I need to create a block map for all files in a specific directory (and
>> subdir) in HDFS.
>>
>> I'm using fs.listFiles API then I loop in the
>> RemoteIterator[LocatedFileStatus] returned by listFiles and for each
>> LocatedFileStatus I use the getFileBlockLocations api to get all the block
>> ids of that file, but it takes long time because I have millions of file in
>> the HDFS directory.
>> I also tried to use Spark to parallelize the execution, but HDFS' API are
>> not serializable.
>>
>> Is there a better way? I know there is the "hdfs oiv" command but I can't
>> access directly the Namenode directory, also the ImageFS file could be
>> outdated and I can't force the safemode to execute the saveNamespace
>> command.
>>
>> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)
>>
>> Thank you
>>
>
[Attachment #3 (text/html)]
<div dir="ltr"><div><div dir="ltr" class="gmail_signature" \
data-smartmail="gmail_signature"><span \
style="color:rgb(51,51,51);font-size:14px;background-color:rgb(245,245,245)"><font \
face="arial, sans-serif">enable DEBUG mode on \
org.apache.hadoop.hdfs.server.blockmanagement on namenode.</font></span></div><div \
dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><br></div><div \
dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Thanks & \
Regards<br></div><div dir="ltr" class="gmail_signature" \
data-smartmail="gmail_signature">Amithsha</div></div><br></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 1, 2020 at 4:55 AM \
Arpit Agarwal <aagarwal@cloudera.com.invalid> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>That is the only way to do it \
using the client API.</div><div dir="auto"><br></div><div dir="auto">Just curious why \
you need the mapping.</div><div dir="auto"><br></div><div dir="auto"><br><div \
class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Tue, Dec 31, \
2019, 00:41 Davide Vergari <<a href="mailto:vergari.davide@gmail.com" \
target="_blank">vergari.davide@gmail.com</a>> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi all,</div><div>I need to \
create a block map for all files in a specific directory (and subdir) in HDFS. \
<br></div><div><br></div><div>I'm using fs.listFiles API then I loop in the \
RemoteIterator[LocatedFileStatus] returned by listFiles and for each \
LocatedFileStatus I use the getFileBlockLocations api to get all the block ids of \
that file, but it takes long time because I have millions of file in the HDFS \
directory.</div><div>I also tried to use Spark to parallelize the execution, but \
HDFS' API are not serializable.<br></div><div><br></div><div>Is there a better \
way? I know there is the "hdfs oiv" command but I can't access directly \
the Namenode directory, also the ImageFS file could be outdated and I can't force \
the safemode to execute the saveNamespace command.</div><div><br></div><div>I'm \
using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)</div><div><br></div><div>Thank \
you<br></div></div> </blockquote></div></div></div>
</blockquote></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic