'Re: PendingDeletionBlocks immediately after Namenode failover'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: PendingDeletionBlocks immediately after Namenode failover
From:       Ravi Prakash <raviprak () apache ! org>
Date:       2017-11-13 17:26:30
Message-ID: CAMs9kVj+VSX6znXqnA0dAAp+cK0Y5BJRuFUX2DZWoF_b1GjqMQ () mail ! gmail ! com
[Download RAW message or body]

Hi Michael!

Thank you for the report. I'm sorry I don't have advice other than the
generic advice, like please try a newer version of Hadoop (say
Hadoop-2.8.2) . You seem to already know that the BlockManager is the place
to look.

If you found it to be a legitimate issue which could affect Apache Hadoop
and still hasn't been fixed in trunk ( https://github.com/apache/hadoop ),
could you please create a new JIRA for it here
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&projectKey=HDFS
?

Thanks
Ravi

On Wed, Nov 8, 2017 at 7:50 PM, Michael Parkin <mparkin@siftscience.com>
wrote:

> Hello,
>
> We're seeing some unusual behavior in our two HDFS 2.6.0
> (CDH5.11.1) clusters and was wondering if you could help. When we failover
> our Namenodes we observe a large number of PendingDeletionBlocks blocks -
> i.e., the metric is zero before failover and several thousand after.
>
> This seems different to the PostponedMisreplicatedBlocks [1] (expected
> before all the datanodes have sent their block reports to the new active
> namenode and the number of NumStaleStorages is zero) - we see that metric
> become zero once all the block reports have been received. What we're
> seeing is that PendingDeletionBlocks increases immediately after
> failover, when NumStaleStorages is ~equal to the number of datanodes in
> the cluster.
>
> The amount of extra space used is a problem as we have to increase our
> cluster size to accommodate these blocks until the Namenodes are
> failed-over.  We've checked the debug logs, metasave report, and other jmx
> metrics and everything appears fine before we fail-over - apart from the
> amount of dfs used growing then decreasing.
>
> We can't find anything obviously wrong with the HDFS configuration, HA
> setup, etc. Any help on where to look/debug next would be appreciated.
>
> Thanks,
>
> Michael.
>
> [1] https://github.com/cloudera/hadoop-common/blob/cdh5-2.6.0_5.
> 11.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apach
> e/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3047
>
> --
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div><div><div>Hi Michael!<br><br></div>Thank you for the report. \
I&#39;m sorry I don&#39;t have advice other than the generic advice, like please try \
a newer version of Hadoop (say Hadoop-2.8.2) . You seem to already know that the \
BlockManager is the place to look.<br></div><div></div><div><br></div><div>If you \
found it to be a legitimate issue which could affect Apache Hadoop and still \
hasn&#39;t been fixed in trunk ( <a \
href="https://github.com/apache/hadoop">https://github.com/apache/hadoop</a> ), could \
you please create a new JIRA for it here <a \
href="https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&amp;projectK \
ey=HDFS">https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&amp;projectKey=HDFS</a> \
?<br><br></div>Thanks<br></div>Ravi<br></div><div class="gmail_extra"><br><div \
class="gmail_quote">On Wed, Nov 8, 2017 at 7:50 PM, Michael Parkin <span \
dir="ltr">&lt;<a href="mailto:mparkin@siftscience.com" \
target="_blank">mparkin@siftscience.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div dir="auto"><div \
dir="ltr">Hello,<div><br></div><div>We&#39;re seeing some unusual behavior in our two \
HDFS 2.6.0 (CDH5.11.1)  clusters and was wondering if you could help. When we \
failover our Namenodes we observe a large number of PendingDeletionBlocks blocks - \
i.e., the metric is zero before failover and several thousand after.<br><br>This \
seems different to the PostponedMisreplicatedBlocks [1] (expected before a<span \
style="font-family:sans-serif;white-space:pre-wrap">ll the datanodes have sent their \
block reports to the new active namenode and the number of \
NumStaleStorages</span><span style="font-family:sans-serif;white-space:pre-wrap"> is \
zero) - we see that metric become zero once all the block reports have been received. \
What we&#39;re seeing is that </span><span \
style="color:rgb(0,0,0);white-space:pre-wrap">PendingDeletionBlocks \
increases</span><span style="font-family:sans-serif;white-space:pre-wrap"> \
immediately after failover, when </span><span \
style="font-family:sans-serif;white-space:pre-wrap">NumStaleStorages is ~equal to the \
number of datanodes in the cluster</span><span \
style="font-family:sans-serif;white-space:pre-wrap">.</span></div><div \
dir="auto"><br></div><div dir="auto">The amount of extra space used is a problem as \
we have to increase our cluster size to accommodate these blocks until the Namenodes \
are failed-over.   We&#39;ve checked the debug logs, metasave report, and other jmx \
metrics<span style="font-family:sans-serif;white-space:pre-wrap"> </span><span \
style="white-space:pre-wrap">and everything appears fine before we fail-over - apart \
from the amount of dfs used growing then \
decreasing.</span></div><div><br></div><div>We can&#39;t find anything obviously \
wrong with the HDFS configuration, HA setup, etc. Any help on where to look/debug \
next would be appreciated.</div><div><br></div><div>Thanks,</div><div><br></div><div>Michael.</div><div><br></div><div><font \
color="#000000"><span style="white-space:pre-wrap">[1] <a \
href="https://github.com/cloudera/hadoop-common/blob/cdh5-2.6.0_5.11.0/hadoop-hdfs-pro \
ject/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3047" \
target="_blank">https://github.com/cloudera/ha<wbr>doop-common/blob/cdh5-2.6.0_5.<wbr> \
11.0/hadoop-hdfs-project/hadoo<wbr>p-hdfs/src/main/java/org/apach<wbr>e/hadoop/hdfs/server/blockmana<wbr>gement/BlockManager.java#L3047</a></span></font></div><span \
class="HOEnZb"><font color="#888888"><div><br></div><div>-- <br><div \
class="m_1916582009502071623gmail-m_-1408713368662575913m_-9197125393919898415gmail-m_ \
6732537564935189975m_5284445656322928343m_8677756370385053776gmail_signature"><div \
dir="ltr"><img src="https://siftscience.com/image/resources/mrc-emailsig.gif" \
width="200" height="41"><br></div></div> </div></font></span></div></div>
</div>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic