[prev in list] [next in list] [prev in thread] [next in thread]
List: mesos-user
Subject: Re: Mesos slave GC clarification
From: Vinod Kone <vinodkone () gmail ! com>
Date: 2013-12-27 19:01:49
Message-ID: CAAkWvAyZ7NfLKhCy+L16RwcShgqNmk6T2Ddgfm=ycFvhm0RaHQ () mail ! gmail ! com
[Download RAW message or body]
I'm still not sure what exactly is the issue here but we have had couple of
gc related fixes included in 0.15.0-rc5. Are you willing to try that out?
On Thu, Dec 26, 2013 at 10:56 AM, Thomas Petr <tpetr@hubspot.com> wrote:
> Hi,
>
> We're running Mesos 0.14.0-rc4 on CentOS from the mesosphere repository.
> Last week we had an issue where the mesos-slave process died due running
> out of disk space. [1]
>
> The mesos-slave usage docs mention the "[GC] delay may be shorter
> depending on the available disk usage." Does anyone have any insight into
> how the GC logic works? Is there a configurable threshold percentage or
> amount that will force it to clean up more often?
>
> If the mesos-slave process is going to die due to lack of disk space,
> would it make sense for it to attempt one last GC run before giving up?
>
> Thanks,
> Tom
>
>
> [1]
> Could not create logging file: No space left on device
> COULD NOT CREATE A LOGGINGFILE 20131221-120618.20562!F1221 12:06:18.97881=
3
> 20567 paths.hpp:333] CHECK_SOME(mkdir): Failed to create executor directo=
ry
> '/usr/share/hubspot/mesos/slaves/201311111611-3792629514-5050-11268-18/fr=
ameworks/Singularity11/executors/singularity-ContactsHadoopDynamicListSegJo=
bs-contacts-wal-dynamic-list-seg-refresher-1387627577839-1-littleslash-us_e=
ast_1e/runs/457a8df0-baa7-4d22-a5ac-ba5935ea6032'No
> space left on device
> *** Check failure stack trace: ***
> I1221 12:06:19.008946 20564 cgroups_isolator.cpp:1275] Successfully
> destroyed cgroup
> mesos/framework_Singularity11_executor_singularity-ContactsTasks-parallel=
-machines:6988:list-intersection-count:1387565552709-1387627447707-1-little=
slash-us_east_1e_tag_fc028903-d303-468d-902a-dade8c22e206
> @ 0x7f2c806bcb5d google::LogMessage::Fail()
> @ 0x7f2c806c0b77 google::LogMessage::SendToLog()
> @ 0x7f2c806be9f9 google::LogMessage::Flush()
> @ 0x7f2c806becfd google::LogMessageFatal::~LogMessageFatal()
> @ 0x40f6cf _CheckSome::~_CheckSome()
> @ 0x7f2c804492e3
> mesos::internal::slave::paths::createExecutorDirectory()
> @ 0x7f2c80418a6d
> mesos::internal::slave::Framework::launchExecutor()
> @ 0x7f2c80419dd3 mesos::internal::slave::Slave::_runTask()
> @ 0x7f2c8042d5d1 std::tr1::_Function_handler<>::_M_invoke()
> @ 0x7f2c805d3ae8 process::ProcessManager::resume()
> @ 0x7f2c805d3e8c process::schedule()
> @ 0x7f2c7fe41851 start_thread
> @ 0x7f2c7e78794d clone
>
[Attachment #3 (text/html)]
<div dir="ltr">I'm still not sure what exactly is the issue here but we have had \
couple of gc related fixes included in 0.15.0-rc5. Are you willing to try that out? \
</div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Thu, Dec 26, 2013 at 10:56 AM, Thomas Petr <span dir="ltr"><<a \
href="mailto:tpetr@hubspot.com" target="_blank">tpetr@hubspot.com</a>></span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex">
<div dir="ltr"><span \
style="font-family:arial,sans-serif;font-size:13px">Hi,</span><div \
style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">We're running Mesos \
0.14.0-rc4 on CentOS from the mesosphere repository. Last week we had an issue where \
the mesos-slave process died due running out of disk space. [1]</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">The mesos-slave usage docs \
mention the "[GC] delay may be shorter depending on the available disk \
usage." Does anyone have any insight into how the GC logic works? Is there a \
configurable threshold percentage or amount that will force it to clean up more \
often?</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">If the mesos-slave process is \
going to die due to lack of disk space, would it make sense for it to attempt one \
last GC run before giving up?</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">Thanks,</div><div \
style="font-family:arial,sans-serif;font-size:13px">Tom</div><div \
style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">[1]</div><div \
style="font-family:arial,sans-serif;font-size:13px"><div><font face="courier new, \
monospace">Could not create logging file: No space left on device</font></div>
<div><font face="courier new, monospace">COULD NOT CREATE A LOGGINGFILE \
20131221-120618.20562!F1221 12:06:18.978813 20567 paths.hpp:333] CHECK_SOME(mkdir): \
Failed to create executor directory \
'/usr/share/hubspot/mesos/slaves/201311111611-3792629514-5050-11268-18/frameworks/ \
Singularity11/executors/singularity-ContactsHadoopDynamicListSegJobs-contacts-wal-dyna \
mic-list-seg-refresher-1387627577839-1-littleslash-us_east_1e/runs/457a8df0-baa7-4d22-a5ac-ba5935ea6032'No \
space left on device</font></div>
<div><font face="courier new, monospace">*** Check failure stack trace: \
***</font></div><div><font face="courier new, monospace">I1221 12:06:19.008946 20564 \
cgroups_isolator.cpp:1275] Successfully destroyed cgroup \
mesos/framework_Singularity11_executor_singularity-ContactsTasks-parallel-machines:698 \
8:list-intersection-count:1387565552709-1387627447707-1-littleslash-us_east_1e_tag_fc028903-d303-468d-902a-dade8c22e206</font></div>
<div><font face="courier new, monospace"> @ 0x7f2c806bcb5d \
google::LogMessage::Fail()</font></div><div><font face="courier new, monospace"> @ \
0x7f2c806c0b77 google::LogMessage::SendToLog()</font></div><div>
<font face="courier new, monospace"> @ 0x7f2c806be9f9 \
google::LogMessage::Flush()</font></div><div><font face="courier new, monospace"> \
@ 0x7f2c806becfd google::LogMessageFatal::~LogMessageFatal()</font></div>
<div><font face="courier new, monospace"> @ 0x40f6cf \
_CheckSome::~_CheckSome()</font></div><div><font face="courier new, monospace"> @ \
0x7f2c804492e3 mesos::internal::slave::paths::createExecutorDirectory()</font></div>
<div><font face="courier new, monospace"> @ 0x7f2c80418a6d \
mesos::internal::slave::Framework::launchExecutor()</font></div><div><font \
face="courier new, monospace"> @ 0x7f2c80419dd3 \
mesos::internal::slave::Slave::_runTask()</font></div>
<div><font face="courier new, monospace"> @ 0x7f2c8042d5d1 \
std::tr1::_Function_handler<>::_M_invoke()</font></div><div><font face="courier \
new, monospace"> @ 0x7f2c805d3ae8 \
process::ProcessManager::resume()</font></div>
<div><font face="courier new, monospace"> @ 0x7f2c805d3e8c \
process::schedule()</font></div><div><font face="courier new, monospace"> @ \
0x7f2c7fe41851 start_thread</font></div><div><font face="courier new, monospace"> \
@ 0x7f2c7e78794d clone</font></div>
</div></div>
</blockquote></div><br></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic