[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: [9] RFR(S): 8080999: MemoryPoolMXBean.getUsageThresholdCount() returns incorrect value
From:       Tobias Hartmann <tobias.hartmann () oracle ! com>
Date:       2015-09-10 4:51:56
Message-ID: 55F10C6C.1010709 () oracle ! com
[Download RAW message or body]


[CC'ing runtime for wider audience]


On 03.09.2015 11:33, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch.
> 
> https://bugs.openjdk.java.net/browse/JDK-8080999
> http://cr.openjdk.java.net/~thartmann/8080999/webrev.00/
> 
> Problem:
> The compiler test [1] sets a maximum usage threshold for a code heap memory pool, \
> allocates enough memory to hit the threshold and checks if the threshold counter \
> was incremented (see 'CodeCacheUtils.hitUsageThreshold' [2]). On success, the \
> allocated memory is freed and we start over again. The counter should only be \
> incremented the first time the threshold is crossed. Subsequent threshold hits are \
> only counted if the usage fell below the threshold value. The test times out \
> waiting for the threshold counter to be incremented. 
> The problem is that there is a race between SensorInfo::set_gauge_sensor_level() \
> which checks if the sensor should be triggered and SensorInfo::clear() which resets \
> the sensor once the usage falls below the threshold value. The race occurs if \
> SensorInfo::process_pending_requests() invokes SensorInfo::clear() because there is \
> a pending request to clear the sensor. In the meantime \
> SensorInfo::set_gauge_sensor_level() may be called because the sensor is triggered \
> again. We now have to remove the pending clear request because the sensor should be \
> activated (see [3]). However, since SensorInfo::clear() does not check the value of \
> '_pending_clear_count' again, it still clears the sensor. The sensor may then be \
> triggered again although the threshold value was only crossed once. As a result, \
> the output of getUsageThresholdCount() is higher than expected. 
> Here is the detailed trace of events (the info in brackets is the output of \
> SensorInfo::print): 
> ...
> 
> [on count = 1 pending_triggers = 0 pending_clears = 0]
> 
> WB.freeCodeBlob()
> LowMemoryDetector::detect_low_memory()
> SensorInfo::set_gauge_sensor_level()
> is_below_low ->
> _pending_clear_count++;
> 
> [on count = 1 pending_triggers = 0 pending_clears = 1]
> 
> WB.allocateCodeBlob()
> LowMemoryDetector::detect_low_memory()
> SensorInfo::set_gauge_sensor_level()
> is_over_high -> 
> _pending_trigger_count++;
> _pending_clear_count = 0;  <-- remove pending clear request because the sensor is \
> triggered again 
> [on count = 1 pending_triggers = 1 pending_clears = 0]
> 
> LowMemoryDetector::process_sensor_changes()
> SensorInfo::process_pending_requests()  <-- still sees the old _pending_clear_count \
> == 1 because there is a race SensorInfo::clear()
> 
> [off count = 2 pending_triggers = 0 pending_clears = 0]
> 
> WB.fullGC()
> LowMemoryDetector::detect_low_memory()
> SensorInfo::set_gauge_sensor_level()
> is_over_high -> _pending_trigger_count++;
> 
> [off count = 2 pending_triggers = 1 pending_clears = 0]
> 
> At this point the threshold count should be 2 because the threshold was only hit \
> once. However, we have a pending trigger request. Now there are tow possibilities: \
> 1) We check the counter before the pending trigger is processed: We continue \
> because as expected the count is 2 but we fail later because the overall count does \
> not match the expected value (RuntimeException: Unexpected threshold usage count \
> (assert failed: 11 == 10)). 2) We check after the pending trigger is processed: The \
> counter is 3 and we time out waiting for it to be 2. 
> Solution:
> I added asserts to the code to make sure that such race conditions are detected. I \
> changed the implementation of SensorInfo::clear() to acquire the Service_lock and \
> bail out if _pending_clear_count was reset in the meantime (i.e., if we lost a race \
> to SensorInfo::set_gauge_sensor_level()). I also added a missing  
> 335     _sensor_count += count;
> 
> to SensorInfo::clear() because it may trigger counter increments as well.
> 
> Testing:
> - 15k runs of failing testcase
> - JPRT
> 
> Thanks,
> Tobias
> 
> 
> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/UsageThresholdExceededTest.java
>  [2] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/CodeCacheUtils.java#l50
>  [3] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/src/share/vm/services/lowMemoryDetector.cpp#l222
>  


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic