[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-serviceability-dev
Subject: Re: [9] RFR(S): 8080999: MemoryPoolMXBean.getUsageThresholdCount() returns incorrect value
From: Tobias Hartmann <tobias.hartmann () oracle ! com>
Date: 2015-09-10 4:51:56
Message-ID: 55F10C6C.1010709 () oracle ! com
[Download RAW message or body]
[CC'ing runtime for wider audience]
On 03.09.2015 11:33, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8080999
> http://cr.openjdk.java.net/~thartmann/8080999/webrev.00/
>
> Problem:
> The compiler test [1] sets a maximum usage threshold for a code heap memory pool, \
> allocates enough memory to hit the threshold and checks if the threshold counter \
> was incremented (see 'CodeCacheUtils.hitUsageThreshold' [2]). On success, the \
> allocated memory is freed and we start over again. The counter should only be \
> incremented the first time the threshold is crossed. Subsequent threshold hits are \
> only counted if the usage fell below the threshold value. The test times out \
> waiting for the threshold counter to be incremented.
> The problem is that there is a race between SensorInfo::set_gauge_sensor_level() \
> which checks if the sensor should be triggered and SensorInfo::clear() which resets \
> the sensor once the usage falls below the threshold value. The race occurs if \
> SensorInfo::process_pending_requests() invokes SensorInfo::clear() because there is \
> a pending request to clear the sensor. In the meantime \
> SensorInfo::set_gauge_sensor_level() may be called because the sensor is triggered \
> again. We now have to remove the pending clear request because the sensor should be \
> activated (see [3]). However, since SensorInfo::clear() does not check the value of \
> '_pending_clear_count' again, it still clears the sensor. The sensor may then be \
> triggered again although the threshold value was only crossed once. As a result, \
> the output of getUsageThresholdCount() is higher than expected.
> Here is the detailed trace of events (the info in brackets is the output of \
> SensorInfo::print):
> ...
>
> [on count = 1 pending_triggers = 0 pending_clears = 0]
>
> WB.freeCodeBlob()
> LowMemoryDetector::detect_low_memory()
> SensorInfo::set_gauge_sensor_level()
> is_below_low ->
> _pending_clear_count++;
>
> [on count = 1 pending_triggers = 0 pending_clears = 1]
>
> WB.allocateCodeBlob()
> LowMemoryDetector::detect_low_memory()
> SensorInfo::set_gauge_sensor_level()
> is_over_high ->
> _pending_trigger_count++;
> _pending_clear_count = 0; <-- remove pending clear request because the sensor is \
> triggered again
> [on count = 1 pending_triggers = 1 pending_clears = 0]
>
> LowMemoryDetector::process_sensor_changes()
> SensorInfo::process_pending_requests() <-- still sees the old _pending_clear_count \
> == 1 because there is a race SensorInfo::clear()
>
> [off count = 2 pending_triggers = 0 pending_clears = 0]
>
> WB.fullGC()
> LowMemoryDetector::detect_low_memory()
> SensorInfo::set_gauge_sensor_level()
> is_over_high -> _pending_trigger_count++;
>
> [off count = 2 pending_triggers = 1 pending_clears = 0]
>
> At this point the threshold count should be 2 because the threshold was only hit \
> once. However, we have a pending trigger request. Now there are tow possibilities: \
> 1) We check the counter before the pending trigger is processed: We continue \
> because as expected the count is 2 but we fail later because the overall count does \
> not match the expected value (RuntimeException: Unexpected threshold usage count \
> (assert failed: 11 == 10)). 2) We check after the pending trigger is processed: The \
> counter is 3 and we time out waiting for it to be 2.
> Solution:
> I added asserts to the code to make sure that such race conditions are detected. I \
> changed the implementation of SensorInfo::clear() to acquire the Service_lock and \
> bail out if _pending_clear_count was reset in the meantime (i.e., if we lost a race \
> to SensorInfo::set_gauge_sensor_level()). I also added a missing
> 335 _sensor_count += count;
>
> to SensorInfo::clear() because it may trigger counter increments as well.
>
> Testing:
> - 15k runs of failing testcase
> - JPRT
>
> Thanks,
> Tobias
>
>
> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/UsageThresholdExceededTest.java
> [2] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/CodeCacheUtils.java#l50
> [3] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/src/share/vm/services/lowMemoryDetector.cpp#l222
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic