[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]
From:       Severin Gehwolf <sgehwolf () openjdk ! java ! net>
Date:       2022-05-23 19:13:58
Message-ID: 3kqALUtVpso2x-3Pv41cTPRvjpXI7_88yFPfj-xmRNg=.143ff56d-ad5a-4ffb-8649-03f9447a9248 () github ! com
[Download RAW message or body]

On Mon, 23 May 2022 09:24:19 GMT, Severin Gehwolf <sgehwolf@openjdk.org> wrote:

> > Also, I think the current PR could produce the wrong answer, if systemd is indeed \
> > running inside the container, and we have: 
> > 
> > "/user.slice/user-1000.slice/session-50.scope",    // root_path
> > "/user.slice/user-1000.slice/session-3.scope",     // cgroup_path
> > 
> > 
> > The PR gives /sys/fs/cgroup/memory/user.slice/user-1000.slice/, which specifies \
> > the overall memory limit for user-1000. However, the correct answer may be \
> > /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope, which may have \
> > a smaller memory limit, and the JVM may end up allocating a larger heap than \
> > allowed.
> 
> Yes, if we can decide which one the right file is. This is largely undocumented \
> territory. The correct fix is a) find the correct path to the namespace hierarchy \
> the process is a part of. b) starting at the leaf node, walk up the hierarchy and \
> find the **lowest** limits. Doing this would be very expensive! 
> Aside: Current container detection in the JVM/JDK is notoriously imprecise. It's \
> largely based on common setups (containers like docker). The heuristics assume that \
> memory limits are reported inside the container at the leaf node. If, however, \
> that's not the case, the detected limits will be wrong (it will detect it as \
> unlimited, even though it's - for example - memory constrained at the parent). This \
> can for example be reproduced on a cgroups v2 system with a systemd slice using \
> memory limits. We've worked-around this in OpenJDK for cgroups v1 by \
> https://bugs.openjdk.java.net/browse/JDK-8217338

> Maybe we should do this instead?
> 
> * Read /proc/self/cgroup
> 
> * Find the `10:memory:<path>` line
> 
> * If `/sys/fs/cgroup/memory/<path>/tasks` contains my PID, this is the path
> 
> * Otherwise, scan all `tasks` files under  `/sys/fs/cgroup/memory/`. Exactly one of \
> them contains my PID.

Something like that seems most promising, but it would have to be `cgroup.procs` not \
`tasks` as `tasks` is the task id (i.e. Linux's thread), not the process. We could \
keep the two common cases as short circuiting. I.e. host and docker cases in the \
test.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8629


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic