[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
From:       David Rientjes <rientjes () google ! com>
Date:       2013-09-30 22:08:25
Message-ID: alpine.DEB.2.02.1309301457590.28109 () chino ! kir ! corp ! google ! com
[Download RAW message or body]

On Fri, 27 Sep 2013, Sergey Dyasly wrote:

> What you are saying contradicts current OOMk code the way I read it. Comment in
> oom_kill_process() says:
> 
> "If the task is already exiting ... set TIF_MEMDIE so it can die quickly"
> 
> I just want to know the right solution.
> 

That's a comment, not code.  The point of the PF_EXITING special handling 
in oom_kill_process() is to avoid telling sysadmins that a process has 
been killed to free memory when it has already called exit() and to avoid 
sacrificing one of its children for the exiting process.

It may or may not need access to memory reserves to actually exit after 
PF_EXITING depending on whether it needs to allocate memory for 
coredumping or anything else.  So instead of waiting for it to recall the 
oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
processes can already get TIF_MEMDIE immediately when their memory 
allocation fails so there's no reason not to set it now as an 
optimization.

But we definitely want to avoid printing anything to the kernel log when 
the process has already called exit() and issuing the SIGKILL at that 
point would be pointless.

> You are mistaken, oom_kill_process() is only called from out_of_memory()
> and mem_cgroup_out_of_memory().
> 

out_of_memory() calls oom_kill_process() in two places, plus the call from 
mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
matters in the slightest, though.

> > Read the comment about why we don't emit anything to the kernel log in 
> > this case; the process is already exiting, there's no need to kill it or 
> > make anyone believe that it was killed.
> 
> Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
> and in this case oom_kill_process() won't be even called. That's why it's
> redundant.
> 

You apparently have no idea how long select_bad_process() runs on a large 
system with thousands of processes.  Keep in mind that SGI requested the 
addition of the oom_kill_allocating_task sysctl specifically because of 
how long select_bad_process() runs.  The PF_EXITING check in 
oom_kill_process() is simply an optimization to return early and with 
access to memory reserves so it can exit as quickly as possible and 
without the kernel stating it's killing something that has already called 
exit().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic