[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: [External] : Re: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native 
From:       Stefan Johansson <stefan.johansson () oracle ! com>
Date:       2022-12-01 10:52:50
Message-ID: 4a2dbc13-dc65-e8ba-a40c-ff6b1cef931d () oracle ! com
[Download RAW message or body]

Hi Thomas and Carter,

I opened up a PR for this to allow more specific comments on the 
implementation:
https://github.com/openjdk/jdk/pull/11449

If this discussion leads to us not wanting to proceed with the change I 
will withdraw the PR.

Some more comments below.

On 2022-12-01 08:26, Thomas Stüfe wrote:
> Hi Carter, Stefan,
> 
> thank you, I think it is good to have this discussion, it is important.
> 
> Side note, the discussion steered away from my original question - 
> whether to instrument the JDK with NMT. I still would love to discuss 
> that, too.
> 

Sorry for that :)

> About opening NMT up for user consumption, that is of course possible. 
> But I think the bigger question is which data we want to open for user 
> consumption, and at what granularity. And what contracts do we enter 
> when we do this.
> 

To me this is not so much opening it up, but just making it much simpler 
to get the already available data (JFR instead of jcmd). I get your 
point that when we make it easier it will likely get more visibility and 
that could generate expectations. To me the contract on these events 
should not be much harder than, for example, the contract we have on the 
format of GC logs. So we should not be locked down by this.

> NMT was originally a hotspot-dev-centric tool. It has a lot of 
> idiosyncrasies. Interpreting the results needs detailed knowledge about 
> hotspot memory management. Some examples:
> 
> - its reports are not consistent across JDK versions, not even across 
> different patch levels of the same JDK. So you cannot compare results, 
> say, between JDK11 and 17.
> - before a certain version X (I believe JDK 11), the full thread stacks 
> were accounted for instead of just the in-use portion of the thread 
> stacks. I remember reading blogs about how thread stack consumption went 
> down when all that changed was NMT reporting.
> - The memory sizes it shows may not have much to do with real RSS. It 
> systematically underreports some things, since it omits libc overhead 
> and retention, usage by system- and JNI libraries. But it also 
> overreports things since it mostly (not always) accounts in terms of 
> "committed" memory, which usually means mmap()ed or malloc()ed memory. 
> But that is just committed, not physical memory, it does not translate 
> to RSS usage directly. That memory may never be touched. OTOH NMT probes 
> thread stacks with mincore(), so for that section, "committed" really 
> means "physical".
> 

I agree that NMT is a low-level tool and that it's not perfect. But in 
some cases I think it's the best way to see the memory consumption of 
the JVM. Especially since you can zoom in on certain areas.

> I am fine with opening up NMT via JFR. But does this mean we have to be 
> more consistent? Do we have to care about downward compatibility of NMT 
> reports? Are we then still free to redesign the tag system (see my 
> original mail) or will this tie us down with the current NMT tag system 
> forever? As a negative example, JFR exposes metaspace allocator details 
> (chunk statistics) which have been broken ever since JDK 16 when the 
> underlying implementation changed.
> 

I think a tag based system for NMT would be awesome and it would be 
really sad if exposing the NMT information through JFR would stop us 
from doing this. Hopefully the only thing we need to do when improving 
NMT is to do CSRs. One possible way to avoid constraints even more would 
be to tag those events as "experimental" at first. This would signal 
that user should not rely on them.

> Therefore I am curious about what end users use NMT really for.
> 
> @Carter: can you give us examples of which NMT sections had been 
> particularly useful to you? Maybe we can define a subset to expose 
> instead of exposing all tags. E.g. I can see thread stack usage being 
> very useful, but things like ObjectMonitor footprint not so much.
> 

I agree that not to many users would care about the ObjectMonitor 
footprint, but unless we get constrained by what we report I would like 
to report all. If there are constraints, this might be a good middle road.

Thanks,
Stefan

> Cheers, Thomas
> 
> 
> 
> 
> On Wed, Nov 30, 2022 at 9:45 PM Carter Kozak <ckozak@ckozak.net 
> <mailto:ckozak@ckozak.net>> wrote:
> 
> __
> This looks fantastic, thank you so much! I can confirm that the
> proposed
> design would solve my use-case.
> 
> I'd enjoy discussing the NMT event   contract somewhere more specific
> to the implementation, but I don't want to muddle this thread with
> implementation details.
> 
> Carter Kozak
> 
> On Wed, Nov 30, 2022, at 03:37, Stefan Johansson wrote:
> > Hi Carter,
> > 
> > Your mail made me pick up an old item from my wishlist: to have
> > native
> > memory tracking information available in JFR recordings. When we,
> > in GC,
> > do improvements to decrease the native memory overhead of our
> > algorithms, NMT is a very good tool to track the progress. We have
> > scripts that sound very similar to what you describe and more than
> > once
> > I've been thinking about adding this information into JFR. But it has
> > not been a priority and the greater value has been unclear.
> > 
> > Hearing that others might also benefit from such a change I took a
> > discussion with the JFR team on how to best proceed with this. I have
> > created a branch for this and will probably create a PR for it
> > shortly,
> > but I thought I would drop it here first:
> > https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt
> > <https://urldefense.com/v3/__https://github.com/kstefanj/jdk/tree/8157023-jfr-even \
> > ts-for-nmt__;!!ACWV5N9M2RV99hQ!IpI1Gbn4N8zH6ZeK20WzMC2bG8XfncJ3sH15GZk2mG3AozRbI4h6b1ZtAhWMNr4qsHE1_dLeDFZWtzF6LpA4XQ4zFFGN$>
> >  
> > The change adds two new JFR events: one for the total usage and
> > one for
> > the usage of each memory type. These are sent only if Native Memory
> > Tracking is turned on, and they are enabled in the default JFR
> > profile
> > with an interval of 1s. This might change during reviewing but it
> > was a
> > good starting point.
> > 
> > With this you will be able to use JFR streaming to access the events
> > from within your running process. I hope this will help your use
> > cases
> > and please let us know if you have any comments or suggestions.
> > 
> > Thanks,
> > Stefan
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic