[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: 8226575: OperatingSystemMXBean should be made container aware
From:       Bob Vandette <bob.vandette () oracle ! com>
Date:       2019-12-11 17:21:11
Message-ID: EB00EA71-C340-418F-9D9C-9A3EA77EEDBA () oracle ! com
[Download RAW message or body]

Yes, I defer to Mandy on the best way to express the various Java exceptions.
I'm ok with the changes.

Thanks for getting this done for JDK14!

Bob.

> On Dec 11, 2019, at 12:12 PM, Daniil Titov <daniil.x.titov@oracle.com> wrote:
> 
> Hi Serguei,
> 
> Thank you for your comments. I will correct this nits before pushing the changes.
> 
> Hi Bob and David,
> 
> > [Mandy Chung]
> > > I reviewed Metrics and Subsystem in this version.
> > > I don't need to see a new webrev.
> 
> As I understood Mandy finished reviewing this fix. Just wanted to confirm with you \
> if you are okey with that version of the fix (webrev.06) ? 
> Mach5 testing: tier1-tier6 and open/test/hotspot/jtreg/containers/docker tests \
> passed.  
> Thank you,
> Daniil
> 
> 
> 
> On 12/9/19, 6:02 PM, "serguei.spitsyn@oracle.com" <serguei.spitsyn@oracle.com> \
> wrote: 
> Hi Daniil,
> 
> It is not a full review, just some minor comments.
> In fact, I do not see real problems yet.
> 
> http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html
>  
> 55     public long getTotalSwapSpaceSize() {
> 56         if (containerMetrics != null) {
> 57             long limit = containerMetrics.getMemoryAndSwapLimit();
> 58             // The memory limit metrics is not available if JVM 
> runs on Linux host ( not in a docker container)
> 59             // or if a docker container was started without 
> specifying a memory limit ( without '--memory='
> 60             // Docker option). In latter case there is no limit on 
> how much memory the container can use and
> 61             // it can use as much memory as the host's OS allows.
> 62             long memLimit = containerMetrics.getMemoryLimit();
> 63             if (limit >= 0 && memLimit >= 0) {
> 64                 return limit - memLimit;
> 65             }
> 66         }
> 67         return getTotalSwapSpaceSize0();
> 68     }
> 
> Unneeded space after brackets '('.
> Do we need to check if the (limit - memLimit) value is negative?
> The same question is for getFreeSwapSpaceSize():
> memSwapLimit - memLimit - (memSwapUsage - memUsage)
> 
> and getFreeMemorySize():
> 101 return limit - usage;
> 
> 81                         // If this happens just retry the loop for 
> a few iterations
> 
> Dot is missed at the end of comment.
> 
> 
> http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html
>  
> 34 System.out.println(String.format("Runtime.availableProcessors: 
> %d", Runtime.getRuntime().availableProcessors()));
> 35 
> System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: 
> %d", osBean.getAvailableProcessors()));
> 36 
> System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: 
> %d", osBean.getTotalMemorySize()));
> 37 
> System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: \
>  %d", osBean.getTotalPhysicalMemorySize()));
> 38 
> System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: 
> %d", osBean.getFreeMemorySize()));
> 39 
> System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: 
> %d", osBean.getFreePhysicalMemorySize()));
> 40 
> System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: 
> %d", osBean.getTotalSwapSpaceSize()));
> 41 
> System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: 
> %d", osBean.getFreeSwapSpaceSize()));
> 42 
> System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", 
> osBean.getCpuLoad()));
> 43 
> System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: 
> %f", osBean.getSystemCpuLoad()));
> 
> 
> To make the above lines a little bit shorter I'd suggest to define a 
> log() method like this:
> private static void log(String msg) ( System.out.println(msg(; }
> 
> 34         log(String.format("Runtime.availableProcessors: %d", 
> Runtime.getRuntime().availableProcessors()));
> 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: 
> %d", osBean.getAvailableProcessors()));
> 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", 
> osBean.getTotalMemorySize()));
> 37 
> log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: 
> %d", osBean.getTotalPhysicalMemorySize()));
> 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", 
> osBean.getFreeMemorySize()));
> 39 
> log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", 
> osBean.getFreePhysicalMemorySize()));
> 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: 
> %d", osBean.getTotalSwapSpaceSize()));
> 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: 
> %d", osBean.getFreeSwapSpaceSize()));
> 42         log(String.format("OperatingSystemMXBean.getCpuLoad: %f", 
> osBean.getCpuLoad()));
> 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", 
> osBean.getSystemCpuLoad()));
> 
> 
> Thanks,
> Serguei
> 
> 
> 
> On 12/6/19 17:41, Daniil Titov wrote:
> > Hi David, Mandy, and Bob,
> > 
> > Thank you for reviewing this fix.
> > 
> > Please review a new version of the fix [1] that includes the following changes \
> > comparing to the previous version of the webrev ( webrev.04) 1. The changes in \
> > Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were \
> > discarded. 2.  The implementation of methods getFreeMemorySize, \
> > getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize was also \
> > reverted to webrev.03 version that return host's values if the metrics are \
> > unavailable or cannot be properly read. I would like to mention that  currently \
> > the native implementation of these methods de-facto may return -1 at some \
> > circumstances, but I agree that the changes proposed in the previous version of \
> > the webrev increase such probability. I filed the follow-up issue [4] as Mandy \
> > suggested. 3.  The legacy methods were renamed as David suggested.
> > 
> > 
> > > src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c
> > > !     static int initialized=1;
> > > 
> > > Am I reading this right that the code currently fails to actually do the
> > > initialization because of this ???
> > Yes, currently the code fails to do the initialization but it was unnoticed since \
> > method get_cpuload_internal(...) was never called for a specific CPU, the first \
> > parameter "which" was always -1.
> > 
> > > test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
> > > 
> > > System.out.println(String.format(...)
> > > 
> > > Why not simply
> > > 
> > > System.out.printf(..)
> > As I tried explain it earlier it would make the tests unstable.
> > System.out.printf(...) just delegates the call to System.out.format(...) that \
> > doesn't emit the string atomically. Instead it parses the format string into a \
> > list of FormatString objects and then iterates over the list. As a result, the \
> > other traces occasionally got printed between these iterations  and break the \
> > pattern the test is expected to find in the output.
> > 
> > For example, here is the sample of a such output that has the trace message \
> > printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" and \
> > "1030762496". 
> > <skipped>
> > [0.304s][trace][os,container] Memory Usage is: 42983424
> > OperatingSystemMXBean.getFreeMemorySize: 1030758400
> > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is \
> > /sys/fs/cgroup/memory/memory.usage_in_bytes [0.305s][trace][os,container] Memory \
> > Usage is: 42979328 [0.306s][trace][os,container] Path to /memory.usage_in_bytes \
> >                 is /sys/fs/cgroup/memory/memory.usage_in_bytes
> > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] \
> > Memory Usage is: 42975232 1030762496
> > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176
> > 
> > <skipped>
> > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: \
> > [1-9][0-9]+' missing from stdout/stderr 
> > 	at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306)
> > 	at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151)
> >   at TestMemoryAwareness.main(TestMemoryAwareness.java:73)
> > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native \
> > Method)  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> > 	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
> >   at java.base/java.lang.Thread.run(Thread.java:832)
> > 
> > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests \
> > passed. Tier4-tier6 tests are still running. 
> > [1] Webrev:  http://cr.openjdk.java.net/~dtitov/8226575/webrev.05
> > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575
> > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428
> > [4] https://bugs.openjdk.java.net/browse/JDK-8235522
> > 
> > Thank you,
> > Daniil
> > 
> > On 12/6/19, 1:38 PM, "Mandy Chung" <mandy.chung@oracle.com> wrote:
> > 
> > 
> > 
> > On 12/6/19 5:59 AM, Bob Vandette wrote:
> > > > On Dec 6, 2019, at 2:49 AM, David Holmes<David.Holmes@oracle.com>  wrote:
> > > > 
> > > > 
> > > > src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java
> > > >  
> > > > The changes to allow for a return of -1 are somewhat more extensive than we \
> > > > have previously discussed. These methods previously were (per the spec) \
> > > > guaranteed to return some (assumably) meaningful value but now they are \
> > > > effectively allowed to fail by returning -1. No existing code is expecting to \
> > > > have to handle a return of -1 so I see this as a significant compatibility \
> > > > issue.
> > 
> > I thought that the error case we are referring to is limit == 0 which
> > indicates something unexpected goes wrong.  So the compatibility concern
> > should be low.  This is very specific to Metrics implementation for
> > cgroup v1 and let me know if I'm wrong.
> > 
> > > > Surely there must always be some information available from the operating \
> > > > environment? I see from the impl file: 
> > > > // the host data, value 0 indicates that something went wrong while the \
> > > > metric was read and // in this case we return "information unavailable" code \
> > > > -1. 
> > > > I don't agree with this. If the container metrics are messed up somehow we \
> > > > should either fallback to the host value or else abort with some kind of \
> > > > exception. Returning -1 is not an option here IMO.
> > > I agree with David on the compatibility concern.  I originally thought that -1 \
> > > was already a specified return for all of these methods. Since the 0 return \
> > > failure from the Metrics API should only occur if one of the cgroup subsystems \
> > > is not enabled while others are, I'd suggest we keep Daniil's original logic to \
> > > fall back to the host value since a disabled subsystem is equivalent to no \
> > > limits. 
> > 
> > It's important to consider carefully if the monitoring API indicates an
> > error vs unavailable and an application should continue to run when the
> > monitoring system fails to get the metrics.
> > 
> > There are several choices to report "something goes wrong" scenarios
> > (should unlikely happen???):
> > 1. fall back to a random positive value  (e.g. host value)
> > 2. return a negative value
> > 3. throw an exception
> > 
> > #3 is not an option as the application is not expecting this.  For #2,
> > the application can filter bad values if desirable.
> > 
> > I'm okay if you want to file a JBS issue to follow up and thoroughly
> > look at the cases that the metrics are unavailable and the cases when
> > fails to obtain.
> > 
> > > > ---
> > > > 
> > > > test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
> > > > 
> > > > System.out.println(String.format(...)
> > > > 
> > > > Why not simply
> > > > 
> > > > System.out.printf(..)
> > > > 
> > > > ?
> > 
> > or simply (as I commented [1])
> > System.out.format
> > 
> > Mandy
> > [1]
> > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html
> >  
> > 
> > 
> > 
> 
> 
> 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic