'YARN and memory management with CapacityScheduler and with cgroups'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    YARN and memory management with CapacityScheduler and with cgroups
From:       "Marco B." <marcusbu () gmail ! com>
Date:       2017-02-10 12:45:47
Message-ID: CAMQCvxpgAfYumdrBuhc7L8yPzFh9vrRW9r9FyvFzTXRLSuhXow () mail ! gmail ! com
[Download RAW message or body]

Hello everyone,

I am trying to figure out how to share among users a Hadoop cluster which
is primarily going to be used with Spark and YARN. I have seen that YARN
2.7 provides a scheduler called CapacityScheduler which would help with
multi-tenancy. What is not fully clear to me is how resource management is
handled by the NodeManager. I have read a lot of documents, and also the
book Hadoop: The Definitive Guide (4th), and still what is not clear to me
is how I can achieve a sort of "soft" (or even hard, whenever possible)
isolation between containers.

To quote page the book (p. 106):

> In normal operation, the Capacity Scheduler does not preempt containers
by forcibly killing them, so if a queue is under capacity due to lack of
demand, and then demand increases, the queue will only return to capacity
as resources are released from other queues as containers complete. It is
possible to mitigate this by configuring queues with a maximum capacity so
that they don't eat into other queues' capacities too much. This is at the
cost of queue elasticity, of course, so a reasonable trade-off should be
found by trial and error.

If I have an arbitrary number of queues, what happens if I set the only
following values (and not the maximum-capacity property to set elasticity)?
yarn.scheduler.capacity.abc.maximum-allocation-mb = 2048
yarn.nodemanager.resource.memory-mb = 1048

Considering that Spark may ask for an arbitrary amount of RAM per executor
(e.g., 768mb), and that each task may take additional memory at runtime
(besides overhead, maybe memory spikes?), can it happen that one container
takes much more memory than specified in the settings above? Is this
container going to prevent resources from being allocated to other
containers in other queues (or same queue as well)? As far as I know, YARN
will eventually kill the container if it's using more RAM for too long -
may I set a timeout? Like "after 15 seconds of over-use, kill it". And
finally, if I set a hard limit like the one above, is YARN still going to
provide elasticity?

I was even considering using cgroups to enforce such hard limits, and I
have found out that they won't be included until version 2.9 (
https://issues.apache.org/jira/browse/YARN-1856), although from my
understanding that jira issue is primarily focused on cgroups monitoring
(and not really enforcing), but I may be wrong about this. (As far as I
know, cgroups are only enforcing vcores limits in v. 2.7.1, which is
something we would like to have for memory as well, so that users don't use
more than allowed)

Could you please help me understand how it works?

Thanks in advance.

Kind regards,
Marco

[Attachment #3 (text/html)]

<div dir="ltr"><div><div><div><div><div><div><div><div><div>Hello \
everyone,<br><br></div>I am trying to figure out how to share among users a Hadoop \
cluster which is primarily going to be used with Spark and YARN. I have seen that \
YARN 2.7 provides a scheduler called CapacityScheduler which would help with \
multi-tenancy. What is not fully clear to me is how resource management is handled by \
the NodeManager. I have read a lot of documents, and also the book Hadoop: The \
Definitive Guide (4th), and still what is not clear to me is how I can achieve a sort \
of &quot;soft&quot; (or even hard, whenever possible) isolation between \
containers.<br><br></div>To quote page the book (p. 106): <br><br>&gt; In normal \
operation, the Capacity Scheduler does not preempt containers by forcibly killing \
them, so if a queue is under capacity due to lack of demand, and then demand \
increases, the queue will only return to capacity as resources are released from \
other queues as containers complete. It is possible to mitigate this by configuring \
queues with a maximum capacity so that they don&#39;t eat into other queues&#39; \
capacities too much. This is at the cost of queue elasticity, of course, so a \
reasonable trade-off should be found by trial and error.<br><br></div>If I have an \
arbitrary number of queues, what happens if I set the only following values (and not \
the maximum-capacity property to set \
elasticity)?<br><tt>yarn.scheduler.capacity.abc.maximum-allocation-mb = \
2048</tt><tt><br></tt></div><span \
style="font-family:monospace,monospace">yarn.nodemanager.resource.memory-mb = \
1048</span><br><br></div>Considering that Spark may ask for an arbitrary amount of \
RAM per executor (e.g., 768mb), and that each task may take additional memory at \
runtime (besides overhead, maybe memory spikes?), can it happen that one container \
takes much more memory than specified in the settings above? Is this container going \
to prevent resources from being allocated to other containers in other queues (or \
same queue as well)? As far as I know, YARN will eventually kill the container if \
it&#39;s using more RAM for too long - may I set a timeout? Like &quot;after 15 \
seconds of over-use, kill it&quot;. And finally, if I set a hard limit like the one \
above, is YARN still going to provide elasticity?<br><br>I was even considering using \
cgroups to enforce such hard limits, and I have found out that they won&#39;t be \
included until version 2.9 (<a \
href="https://issues.apache.org/jira/browse/YARN-1856">https://issues.apache.org/jira/browse/YARN-1856</a>), \
although from my understanding that jira issue is primarily focused on cgroups \
monitoring (and not really enforcing), but I may be wrong about this. (As far as I \
know, cgroups are only enforcing vcores limits in v. 2.7.1, which is something we \
would like to have for memory as well, so that users don&#39;t use more than \
allowed)<br><br></div>Could you please help me understand how it \
works?<br><br></div>Thanks in advance.<br><br></div>Kind \
regards,<br></div>Marco<br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic