[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy
From:       Jonathan Bender <jonbender () stripe ! com ! INVALID>
Date:       2018-09-18 1:18:41
Message-ID: CA+WSQsSHLVVK9tR5chTiyLu3MMPhQGz97=xN+HZQGx5OfmVcmQ () mail ! gmail ! com
[Download RAW message or body]

Thanks for the responses all!

@Shane - that's great, we planned to move to 3.1.x soon anyway, all the
more reason to do that.

@Eric - I opened a JIRA here with my findings:
https://issues.apache.org/jira/browse/YARN-8786

On Mon, Sep 17, 2018 at 12:23 PM, Shane Kumpf <shane.kumpf.apache@gmail.com>
wrote:

> Hey Jon,
>
> YARN-8751 takes care of the issue that marks the NM unhealthy under these
> conditions. If you can open a JIRA with details on the swallowed error,
> that would be appreciated. As noted, 3.1.1 has a number of fixes to the
> YARN containerization features, so it would be great if you can see if the
> issue still occurs with that release.
>
> Thanks,
> -Shane
>
> On Mon, Sep 17, 2018 at 1:05 PM Jeff Hubbs <jhubbslist@att.net> wrote:
>
>> I would also just suggest moving up to 3.1.1 and trying again. Barring
>> that, maybe you can take the error message at its word. My experience with
>> running Hadoop 3.x jobs is a little limited, but I know that jobs can paint
>> a lot of data into /tmp/hadoop-yarn and if your nodes can't absorb a lot of
>> expansion in that directory, things will error out albeit softly. Noting
>> the way the terasort example behaves in that regard, I set up my worker
>> nodes to make /tmp/hadoop-yarn a mount point for its own disk volume whose
>> size I can preset and I can also optionally enable transparent compression
>> via btrfs. A lot of times, I would expect I could give that volume some
>> token small size but in trying to make a 1/5-scale (i.e., 200GB) terasort
>> run, 128GiB with compression enabled across five workers wasn't enough.
>> 1/10th-scale I could manage but at 1/5, it would fill up one node's
>> /tmp/hadoop-yarn, then the next, then the next, etc. Makes me think that
>> terasort tries to write the whole dang thing out to extra-HDFS file system
>> before making an output file in HDFS.
>>
>> On 9/17/18 1:55 PM, Eric Badger wrote:
>>
>> Hi Jonathan,
>>
>> Have you opened up a YARN JIRA with your findings? If not, that would be
>> the next step in debugging the issue and coding up a fix. This certainly
>> sounds like a bug and something that we should get to the bottom of.
>>
>> As far as Nodemanagers becoming unhealthy, a config could be added to
>> prevent this. But, if you're only seeing 1 failure out of millions of
>> tasks, this seems like it would unmask more problems than it fixes. 1
>> container failing is bad, but a node going bad and failing every container
>> that runs on it forever until it is shutdown is much, much worse. However,
>> if you think that you have a use case that could benefit from the config
>> being optional, that is something we could also look into. That would be a
>> separate YARN JIRA as well.
>>
>> Thanks,
>>
>> Eric
>>
>> On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender <
>> jonbender@stripe.com.invalid> wrote:
>>
>>> Hello,
>>>
>>> We started are using CGroups with LinuxContainerExecutor recently,
>>> running Apache Hadoop 3.0.0. Occasionally (once out of many millions of
>>> tasks) a yarn container will fail with a message like the following:
>>> WARN privileged.PrivilegedOperationExecutor: Shell execution returned
>>> exit code: 35. Privileged Execution Operation Stderr:
>>> Could not create container dirsCould not create local files and
>>> directories
>>>
>>> Looking at the container executor source it's traceable to errors here:
>>> https://github.com/apache/hadoop/blob/release-3.0.0-RC1/
>>> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-
>>> nodemanager/src/main/native/container-executor/impl/
>>> container-executor.c#L1604
>>>
>>> And ultimately to https://github.com/apache/
>>> hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-
>>> yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/
>>> container-executor/impl/container-executor.c#L672
>>>
>>> The root failure seems to be in the underlying mkdir call, but that exit
>>> code / errno is swallowed so we don't have more details. We tend to see
>>> this when many containers start at the same time for the same application
>>> on a host, and suspect it may be related to some race conditions around
>>> those shared directories between containers for the same application.
>>>
>>> Has anyone seen similar failures in using the LinuxContainerExecutor?
>>>
>>> This issue compounded because LinuxContainerExecutor renders the node
>>> unhealthy in these scenarios: https://github.com/apache/
>>> hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-
>>> yarn/hadoop-yarn-server/hadoop-yarn-server-
>>> nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/
>>> LinuxContainerExecutor.java#L566
>>>
>>> Under some circumstances this seems appropriate, but since this is a
>>> transient failure (none of these machines were at capacity for disks,
>>> inodes, etc) we shouldn't down the NodeManager. The behavior to add this
>>> blacklisting came as part of https://issues.apache.org/
>>> jira/browse/YARN-6302 which seems perfectly valid, but perhaps we
>>> should make this configurable so certain users can opt out?
>>>
>>> Cheers,
>>> Jon
>>>
>>
>>
>>

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="ltr"><div>Thanks for the responses \
all!</div><div><br></div>@Shane - that&#39;s great, we planned to move to 3.1.x soon \
anyway, all the more reason to do that.<div><br></div><div>@Eric - I opened a JIRA \
here with my findings:  <a \
href="https://issues.apache.org/jira/browse/YARN-8786">https://issues.apache.org/jira/browse/YARN-8786</a></div></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 17, 2018 at 12:23 PM, \
Shane Kumpf <span dir="ltr">&lt;<a href="mailto:shane.kumpf.apache@gmail.com" \
target="_blank">shane.kumpf.apache@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div dir="ltr">Hey \
Jon,<div><br></div><div>YARN-8751 takes care of the issue that marks the NM unhealthy \
under these conditions. If you can open a JIRA with details on the swallowed error, \
that would be appreciated. As noted, 3.1.1 has a number of fixes to the YARN \
containerization features, so it would be great if you can see if the issue still \
occurs with that release.</div><div><br></div><div>Thanks,</div><div>-Shane</div><div><div \
class="h5"><div><br><div class="gmail_quote"><div dir="ltr">On Mon, Sep 17, 2018 at \
1:05 PM Jeff Hubbs &lt;<a href="mailto:jhubbslist@att.net" \
target="_blank">jhubbslist@att.net</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex">  
    
  
  <div bgcolor="#FFFFFF">
    <div class="m_8853329516144813557gmail-m_7913483890569972161moz-cite-prefix">I \
would also just suggest moving up to  3.1.1 and trying again. Barring that, maybe you \
can take the error  message at its word. My experience with running Hadoop 3.x jobs \
is  a little limited, but I know that jobs can paint a lot of data
      into /tmp/hadoop-yarn and if your nodes can&#39;t absorb a lot of
      expansion in that directory, things will error out albeit softly.
      Noting the way the terasort example behaves in that regard, I set
      up my worker nodes to make /tmp/hadoop-yarn a mount point for its
      own disk volume whose size I can preset and I can also optionally
      enable transparent compression via btrfs. A lot of times, I would
      expect I could give that volume some token small size but in
      trying to make a 1/5-scale (i.e., 200GB) terasort run, 128GiB with
      compression enabled across five workers wasn&#39;t enough.
      1/10th-scale I could manage but at 1/5, it would fill up one
      node&#39;s /tmp/hadoop-yarn, then the next, then the next, etc. Makes
      me think that terasort tries to write the whole dang thing out to
      extra-HDFS file system before making an output file in HDFS.<br>
      <br>
      On 9/17/18 1:55 PM, Eric Badger wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">Hi Jonathan,
        <div><br>
        </div>
        <div>Have you opened up a YARN JIRA with your findings? If not,
          that would be the next step in debugging the issue and coding
          up a fix. This certainly sounds like a bug and something that
          we should get to the bottom of.</div>
        <div><br>
        </div>
        <div>As far as Nodemanagers becoming unhealthy, a config could
          be added to prevent this. But, if you&#39;re only seeing 1 failure
          out of millions of tasks, this seems like it would unmask more
          problems than it fixes. 1 container failing is bad, but a node
          going bad and failing every container that runs on it forever
          until it is shutdown is much, much worse. However, if you
          think that you have a use case that could benefit from the
          config being optional, that is something we could also look
          into. That would be a separate YARN JIRA as well.</div>
        <div><br>
        </div>
        <div>Thanks,</div>
        <div><br>
        </div>
        <div>Eric</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Mon, Sep 17, 2018 at 12:37 PM,
          Jonathan Bender <span dir="ltr">&lt;<a \
href="mailto:jonbender@stripe.com.invalid" \
target="_blank">jonbender@stripe.com.invalid</a>&gt;</span>  wrote:<br>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">  <div dir="ltr">
              <div dir="ltr">
                <div>Hello,</div>
                <div><br>
                </div>
                <div>We started are using CGroups with
                  LinuxContainerExecutor recently, running Apache Hadoop
                  3.0.0. Occasionally (once out of many millions of
                  tasks) a yarn container will fail with a message like
                  the following:</div>
                <div>WARN privileged.<wbr>PrivilegedOperationExecutor:
                  Shell execution returned exit code: 35. Privileged
                  Execution Operation Stderr:</div>
                <div>Could not create container dirsCould not create
                  local files and directories</div>
                <div><br>
                </div>
                <div>Looking at the container executor source it&#39;s
                  traceable to errors here: <a \
href="https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hado \
op-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604" \
target="_blank">https://github.com/apache/<wbr>hadoop/blob/release-3.0.0-RC1/<wbr>hado \
op-yarn-project/hadoop-<wbr>yarn/hadoop-yarn-server/<wbr>hadoop-yarn-server-<wbr>nodem \
anager/src/main/native/<wbr>container-executor/impl/<wbr>container-executor.c#L1604</a></div>
  <div><br>
                </div>
                <div>And ultimately to <a \
href="https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hado \
op-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672" \
target="_blank">https://github.com/apache/<wbr>hadoop/blob/release-3.0.0-RC1/<wbr>hado \
op-yarn-project/hadoop-<wbr>yarn/hadoop-yarn-server/<wbr>hadoop-yarn-server-<wbr>nodem \
anager/src/main/native/<wbr>container-executor/impl/<wbr>container-executor.c#L672</a></div>
  <div><br>
                </div>
                <div>The root failure seems to be in the underlying
                  mkdir call, but that exit code / errno is swallowed so
                  we don&#39;t have more details. We tend to see this when
                  many containers start at the same time for the same
                  application on a host, and suspect it may be related
                  to some race conditions around those shared
                  directories between containers for the same
                  application.  </div>
                <div><br>
                </div>
                <div>Has anyone seen similar failures in using the
                  LinuxContainerExecutor?</div>
                <div><br>
                </div>
                <div>This issue compounded because
                  LinuxContainerExecutor renders the node unhealthy in
                  these scenarios: <a \
href="https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hado \
op-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566" \
target="_blank">https://github.com/apache/<wbr>hadoop/blob/release-3.0.0-RC1/<wbr>hado \
op-yarn-project/hadoop-<wbr>yarn/hadoop-yarn-server/<wbr>hadoop-yarn-server-<wbr>nodem \
anager/src/main/java/org/<wbr>apache/hadoop/yarn/server/<wbr>nodemanager/<wbr>LinuxContainerExecutor.java#<wbr>L566</a></div>
  <div><br>
                </div>
                <div>Under some circumstances this seems appropriate,
                  but since this is a transient failure (none of these
                  machines were at capacity for disks, inodes, etc) we
                  shouldn&#39;t down the NodeManager. The behavior to add
                  this blacklisting came as part of <a \
href="https://issues.apache.org/jira/browse/YARN-6302" \
target="_blank">https://issues.apache.org/<wbr>jira/browse/YARN-6302</a>  which seems \
                perfectly valid, but perhaps we should
                  make this configurable so certain users can opt out?</div>
                <div><br>
                </div>
                <div>Cheers,</div>
                <div>Jon</div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <p><br>
    </p>
  </div>

</blockquote></div></div></div></div></div></div>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic