[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: HDFS du Utility Inconsistencies?
From:       David M <mcginnisda () outlook ! com>
Date:       2019-11-08 18:03:56
Message-ID: CY4PR1201MB01185B5B546BF0AAFA776696C37B0 () CY4PR1201MB0118 ! namprd12 ! prod ! outlook ! com
[Download RAW message or body]

We use snapshots in the cluster, but I've not seen any snapshot folders und=
erneath the folder in question. I'd need to verify with the application tea=
m if snapshots for this folder are available anywhere.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Arpit Agarwal <aagarwal@cloudera.com>
Sent: Friday, November 8, 2019 11:41:31 AM
To: David M <mcginnisda@outlook.com>
Cc: user@hadoop.apache.org <user@hadoop.apache.org>
Subject: Re: HDFS du Utility Inconsistencies?

Got any snapshots?

On Fri, Nov 8, 2019, 09:38 David M <mcginnisda@outlook.com<mailto:mcginnisd=
a@outlook.com>> wrote:

All,



I=92m working on a cluster that is running Hadoop 2.7.3. I have one folder =
in particular where the command hdfs dfs -du is giving me strange results. =
If I query the folder and ask for a summary, it tells me 10 GB. If I don=92=
t ask for a summary, all of the folders underneath don=92t even add up to 1=
 GB, much less 10 GB.



I=92ve verified this is true over time and is true using the hdfs user or a=
ny other user. We are on an HDP cluster, so we are using Ranger for HDFS se=
curity, and Kerberos for authentication. We see similar results in -count, =
where the size and counts are both different. We have not seen this behavio=
r in any other folders.



See below for a sample output we are seeing. I=92ve replaced the full path =
with a fake path to protect the data we have on the cluster. Does anyone kn=
ow anything that would cause this behavior? Thanks!



$ hdfs dfs -du -h /randomFolder

119.9 M  /randomFolder/bug

1.0 M    /randomFolder/commitment

86.8 K   /randomFolder/customfield

31.3 M   /randomFolder/epic

10.3 M   /randomFolder/feature

4.0 M    /randomFolder/insprintbug

372.9 K  /randomFolder/project

15.1 K   /randomFolder/projectstatus

330.9 M  /randomFolder/story

256.3 M  /randomFolder/subtask

74.7 K   /randomFolder/subtemplate

89.6 M   /randomFolder/task

7.4 M    /randomFolder/techdebt

117.7 K  /randomFolder/template

617.9 K  /randomFolder/tempomember

8.2 K    /randomFolder/tempoteam

1.4 M    /randomFolder/tempoworklog



$ hdfs dfs -du -h -s /randomFolder

10.6 G  /randomFolder



David McGinnis



[Attachment #3 (text/html)]

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: \
sans-serif; font-size: 11pt; color: black; "> We use snapshots in the cluster, but \
I've not seen any snapshot folders underneath the folder in question. I'd need to \
verify with the application team if snapshots for this folder are available \
anywhere.<br> <br>
</div>
<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: \
sans-serif; font-size: 11pt; color: black; "> <span id="OutlookSignature">
<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: \
sans-serif; font-size: 11pt; color: black; "> Get <a \
href="https://aka.ms/ghei36">Outlook for Android</a></div> </span><br>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" \
style="font-size:11pt" color="#000000"><b>From:</b> Arpit Agarwal \
&lt;aagarwal@cloudera.com&gt;<br> <b>Sent:</b> Friday, November 8, 2019 11:41:31 \
AM<br> <b>To:</b> David M &lt;mcginnisda@outlook.com&gt;<br>
<b>Cc:</b> user@hadoop.apache.org &lt;user@hadoop.apache.org&gt;<br>
<b>Subject:</b> Re: HDFS du Utility Inconsistencies?</font>
<div>&nbsp;</div>
</div>
<div>
<div dir="auto">Got any snapshots?</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Fri, Nov 8, 2019, 09:38 David M &lt;<a \
href="mailto:mcginnisda@outlook.com" target="_blank" \
rel="noreferrer">mcginnisda@outlook.com</a>&gt; wrote:<br> </div>
<blockquote class="x_gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc \
solid; padding-left:1ex"> <div lang="EN-US">
<div>
<p class="x_MsoNormal">All,<u></u><u></u></p>
<p class="x_MsoNormal"><u></u>&nbsp;<u></u></p>
<p class="x_MsoNormal">I’m working on a cluster that is running Hadoop 2.7.3. I have \
one folder in particular where the command hdfs dfs -du is giving me strange results. \
If I query the folder and ask for a summary, it tells me 10 GB. If I don’t ask for a \
summary,  all of the folders underneath don’t even add up to 1 GB, much less 10 GB. \
<u></u><u></u></p> <p class="x_MsoNormal"><u></u>&nbsp;<u></u></p>
<p class="x_MsoNormal">I’ve verified this is true over time and is true using the \
hdfs user or any other user. We are on an HDP cluster, so we are using Ranger for \
HDFS security, and Kerberos for authentication. We see similar results in -count, \
where the size  and counts are both different. We have not seen this behavior in any \
other folders. <u></u><u></u></p>
<p class="x_MsoNormal"><u></u>&nbsp;<u></u></p>
<p class="x_MsoNormal">See below for a sample output we are seeing. I’ve replaced the \
full path with a fake path to protect the data we have on the cluster. Does anyone \
know anything that would cause this behavior? Thanks!<u></u><u></u></p> <p \
class="x_MsoNormal" align="right" style="text-align:right"><u></u>&nbsp;<u></u></p> \
<p class="x_MsoNormal">$ hdfs dfs -du -h /randomFolder<u></u><u></u></p> <p \
class="x_MsoNormal">119.9 M&nbsp; /randomFolder/bug<u></u><u></u></p> <p \
class="x_MsoNormal">1.0 M&nbsp;&nbsp;&nbsp; \
/randomFolder/commitment<u></u><u></u></p> <p class="x_MsoNormal">86.8 K&nbsp;&nbsp; \
/randomFolder/customfield<u></u><u></u></p> <p class="x_MsoNormal">31.3 M&nbsp;&nbsp; \
/randomFolder/epic<u></u><u></u></p> <p class="x_MsoNormal">10.3 M&nbsp;&nbsp; \
/randomFolder/feature<u></u><u></u></p> <p class="x_MsoNormal">4.0 \
M&nbsp;&nbsp;&nbsp; /randomFolder/insprintbug<u></u><u></u></p> <p \
class="x_MsoNormal">372.9 K&nbsp; /randomFolder/project<u></u><u></u></p> <p \
class="x_MsoNormal">15.1 K&nbsp;&nbsp; /randomFolder/projectstatus<u></u><u></u></p> \
<p class="x_MsoNormal">330.9 M&nbsp; /randomFolder/story<u></u><u></u></p> <p \
class="x_MsoNormal">256.3 M&nbsp; /randomFolder/subtask<u></u><u></u></p> <p \
class="x_MsoNormal">74.7 K&nbsp;&nbsp; /randomFolder/subtemplate<u></u><u></u></p> <p \
class="x_MsoNormal">89.6 M&nbsp;&nbsp; /randomFolder/task<u></u><u></u></p> <p \
class="x_MsoNormal">7.4 M&nbsp;&nbsp;&nbsp; /randomFolder/techdebt<u></u><u></u></p> \
<p class="x_MsoNormal">117.7 K&nbsp; /randomFolder/template<u></u><u></u></p> <p \
class="x_MsoNormal">617.9 K&nbsp; /randomFolder/tempomember<u></u><u></u></p> <p \
class="x_MsoNormal">8.2 K&nbsp;&nbsp;&nbsp; /randomFolder/tempoteam<u></u><u></u></p> \
<p class="x_MsoNormal">1.4 M&nbsp;&nbsp;&nbsp; \
/randomFolder/tempoworklog<u></u><u></u></p> <p \
class="x_MsoNormal"><u></u>&nbsp;<u></u></p> <p class="x_MsoNormal">$ hdfs dfs -du -h \
-s /randomFolder<u></u><u></u></p> <p class="x_MsoNormal">10.6 G&nbsp; \
/randomFolder<u></u><u></u></p> <p class="x_MsoNormal"><u></u>&nbsp;<u></u></p>
<p class="x_MsoNormal">David McGinnis<u></u><u></u></p>
<p class="x_MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic