[prev in list] [next in list] [prev in thread] [next in thread] 

List:       fedora-devel-list
Subject:    Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM
From:       Chris Murphy <lists () colorremedies ! com>
Date:       2020-01-07 21:10:38
Message-ID: CAJCQCtTYc-1XqXqSqRmQOmAnaPi--GbReuuC8EJn16R0tQwKLA () mail ! gmail ! com
[Download RAW message or body]

On Tue, Jan 7, 2020 at 1:48 PM Mark Otaris <mark@net-c.com> wrote:
> 
> I intended to demonstrate that cgroups can be used to cause the kernel OOM
> killer to react appropriately and fast enough, implying that replacing the
> OOM killer is not necessary and that replacing it by a userspace OOM killer
> that does not account for cgroups can be undesirable. The exact same controls
> set with my example commands, and others, can be set with scopes as well,
> so this should be applicable.
> 
> > https://lore.kernel.org/linux-fsdevel/20200104090955.GF23195@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f
> > 
> 
> Okay, interesting. But that's a statement from just one person, and it has to
> be interpreted in the context of what it is confirming; that is, that the OOM
> killer is "mainly concerned about kernel survival in low memory situations",
> which is weaker than your claim that "their concern with kernel oom-killer is
> strictly with keeping the kernel functioning". I don't know if the OOM killer's
> main purpose is to keep the kernel alive (Michal Hocko appears to think so,
> maybe others disagree), but it is in any case not an abuse of the OOM killer to
> also use it to keep userspace responsive,

The oom killer doesn't keep user space responsive per se, in your
example that's done by cgroups restricting resources. And that's neat,
and necessary to keep making forward progress on. But we don't have
that for unprivileged process right now, unless the user knows the
secret decoder ring command to use to do this every time they run
something in Terminal; and then have some idea to hint at what
resources are needed for the task to succeed rather than just get
clobbered anyway.

That's maybe the elephant in the room with earlyoom (or one of them),
yes we've recovered sooner, the user can hopefully save their data and
reboot. But did their task succeed? No. It got clobbered.


> and there is no reason to think that
> kernel folks are not interested in helping achieve this goal.

I did mean with a kernel only solution. I've been tracking this issue
for 6-7 months including the congestion and kswapd discussions
on-going, so I know they do care broadly about providing some
mechanisms by which user space can better behave. But all of that
requires varying degrees of opt-in, and quite a lot of it involves
considerable work to even understand it, let alone implement it.

> The only
> advantage I see to earlyoom so far is that it sends SIGTERM before taking
> further steps that will kill processes.

Yes and it happens sooner. Probably not soon enough for many users.
There may be some risk by overpromising and under delivering: by
making it the default and then for the vast majority of cases it
doesn't matter, because users are long since conditioned to just force
power off within a minute or less of the GUI stuttering or freezing up
on them. It is very workload and system specific.

-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic