'Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-gc-dev
Subject:    Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
From:       Jon Masamitsu <jon.masamitsu () oracle ! com>
Date:       2013-02-26 18:28:19
Message-ID: 512CFEC3.9040203 () oracle ! com
[Download RAW message or body]

The variations in the GC times I was seeing were due to
differing numbers of concurrent mode failures.  I reran
the tests and didn't see the same variations.

I've run the gc_test_suite with

default CMSWaitDuration
CMSWaitDuration=0
CMSWaitDuration=-1

and saw no problems. If there are no
suggestions for other testing, I think this
is ready to push.

Jon

On 02/12/13 09:15, Jon Masamitsu wrote:
> Michal,
>
> I'm running some benchmarks (specjbb2000 and specjbb2005) and
> I'm seeing some differences in the number of concurrent cms
> cycles.  I'm comparing
>
> baseline vs -XX:CMSWaitDuration=-1
>
> and
>
> default value for CMSWaitDuration vs  -XX:CMSWaitDuration=-1
>
> I suspect it's just variations in GC but will need to dig into
> the GC logs more to know.
>
> In you testing did you notice any statistically significant differences
> in the number of CMS concurrent cycles that you were not expecting?
>
> Jon
>
>
> On 01/30/13 03:35, Michal Frajt wrote:
>> Hi John,
>>
>> We have verified the new webrev. The changes have been applied 
>> correctly.
>>
>> Regards,
>> Michal
>>
>>
>> Od: "John Cuthbertson" john.cuthbertson@oracle.com
>> Komu: "Michal Frajt" michal@frajt.eu
>> Kopie: hotspot-gc-dev@openjdk.java.net
>> Datum: Fri, 25 Jan 2013 13:13:27 -0800
>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>> non-incremental mode of CMS
>>
>>> Hi Michal,
>>>
>>> The patch is applied. The new webrev can be found at
>>> http://cr.openjdk.java.net/~johnc/7189971/webrev.2/
>>>
>>> Thanks,
>>>
>>> JohnC
>>>
>>> On 1/17/2013 2:58 AM, Michal Frajt wrote:
>>>> Hi John,
>>>>
>>>> Please apply the attached patch to the webrev. You are right, the 
>>>> setting of the CMS token has been somehow moved back above the 
>>>> method return. Additionally I have fixed the printf of the unsigned 
>>>> loop counter (correct is %u).
>>>>
>>>> Regards,
>>>> Michal
>>>>
>>>> Od: hotspot-gc-dev-bounces@openjdk.java.net
>>>> Komu: hotspot-gc-dev@openjdk.java.net
>>>> Kopie:
>>>> Datum: Thu, 10 Jan 2013 12:20:28 -0800
>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>>>> non-incremental mode of CMS
>>>>
>>>>
>>>>> Hi Michal,
>>>>>
>>>>> On 1/10/2013 11:46 AM, John Cuthbertson wrote:
>>>>>> Hi Michal,
>>>>>>
>>>>>> Many apologies for the delay in generating a new webrev for this
>>>>>> change but here is the new one:
>>>>>> http://cr.openjdk.java.net/~johnc/7189971/webrev.1/
>>>>>>
>>>>>> Can you verify the webrev to make sure that changes have been 
>>>>>> applied
>>>>>> correctly? Looking at the new webrev it seems that the setting of 
>>>>>> the
>>>>>> CMS has been moved back above the return out of the loop. Was this
>>>>>> intentional?
>>>>> The above should be "... setting of the CMS token has been ...".
>>>>>
>>>>> JohnC
>>>>>
>>>>>> I've done a couple of sanity tests with GCOld with CMSWaitDuration=0
>>>>>> and CMSWaitDuration=1500 with CMS.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> JohnC
>>>>>>
>>>>>> On 12/12/2012 4:35 AM, Michal Frajt wrote:
>>>>>>> All,
>>>>>>> Find the attached patch. It implements proposed recommendations and
>>>>>>> requested changes. Please mind that the CMSWaitDuration set to -1
>>>>>>> (never wait) requires new parameter CMSCheckInterval (develop only,
>>>>>>> 1000 milliseconds default - constant). The parameter defines the
>>>>>>> next CMS cycle start check interval in the case there are no
>>>>>>> desynchronization (notifications) events on the CGC_lock.
>>>>>>>
>>>>>>> Tested with the Solaris/amd64 build
>>>>>>> CMS
>>>>>>> + CMSWaitDuration>0 OK
>>>>>>> + CMSWaitDuration=0 OK
>>>>>>> + CMSWaitDuration<0 OK
>>>>>>> iCMS
>>>>>>> + CMSWaitDuration>0 OK
>>>>>>> + CMSWaitDuration=0 OK
>>>>>>> + CMSWaitDuration<0 OK
>>>>>>> Regards,
>>>>>>> Michal
>>>>>>> Od: hotspot-gc-dev-bounces@openjdk.java.net
>>>>>>> Komu: hotspot-gc-dev@openjdk.java.net
>>>>>>> Kopie:
>>>>>>> Datum: Fri, 7 Dec 2012 18:48:48 +0100
>>>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
>>>>>>> non-incremental mode of CMS
>>>>>>>
>>>>>>>> Hi John/Jon/Ramki,
>>>>>>>>
>>>>>>>> All proposed recommendations and requested changes have been
>>>>>>>> implemented. We are going to test it on Monday. You will get 
>>>>>>>> the new
>>>>>>>> tested patch soon.
>>>>>>>>
>>>>>>>> The attached code here just got compiled, no test executed yet, it
>>>>>>>> might contain a bug, but you can quickly review it and send your
>>>>>>>> comments.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>> Michal
>>>>>>>>
>>>>>>>>
>>>>>>>> // Wait until the next synchronous GC, a concurrent full gc 
>>>>>>>> request,
>>>>>>>> // or a timeout, whichever is earlier.
>>>>>>>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>>>> t_millis) {
>>>>>>>> // Wait time in millis or 0 value representing infinite wait for
>>>>>>>> a scavenge
>>>>>>>> assert(t_millis>= 0, "Wait time for scavenge should be 0 or
>>>>>>>> positive");
>>>>>>>>
>>>>>>>> GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>>>>> double start_time_secs = os::elapsedTime();
>>>>>>>> double end_time_secs = start_time_secs + (t_millis / ((double)
>>>>>>>> MILLIUNITS));
>>>>>>>>
>>>>>>>> // Total collections count before waiting loop
>>>>>>>> unsigned int before_count;
>>>>>>>> {
>>>>>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>>>>>> before_count = gch->total_collections();
>>>>>>>> }
>>>>>>>>
>>>>>>>> unsigned int loop_count = 0;
>>>>>>>>
>>>>>>>> while(!_should_terminate) {
>>>>>>>> double now_time = os::elapsedTime();
>>>>>>>> long wait_time_millis;
>>>>>>>>
>>>>>>>> if(t_millis != 0) {
>>>>>>>> // New wait limit
>>>>>>>> wait_time_millis = (long) ((end_time_secs - now_time) *
>>>>>>>> MILLIUNITS);
>>>>>>>> if(wait_time_millis<= 0) {
>>>>>>>> // Wait time is over
>>>>>>>> break;
>>>>>>>> }
>>>>>>>> } else {
>>>>>>>> // No wait limit, wait if necessary forever
>>>>>>>> wait_time_millis = 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> // Wait until the next event or the remaining timeout
>>>>>>>> {
>>>>>>>> MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>>>>>>>
>>>>>>>> set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
>>>>>>>> if (_should_terminate || _collector->_full_gc_requested) {
>>>>>>>> return;
>>>>>>>> }
>>>>>>>> assert(t_millis == 0 || wait_time_millis>  0, "Sanity");
>>>>>>>> CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>>>> wait_time_millis);
>>>>>>>> clear_CMS_flag(CMS_cms_wants_token);
>>>>>>>> assert(!CMS_flag_is_set(CMS_cms_has_token |
>>>>>>>> CMS_cms_wants_token),
>>>>>>>> "Should not be set");
>>>>>>>> }
>>>>>>>>
>>>>>>>> // Extra wait time check before entering the heap lock to get
>>>>>>>> the collection count
>>>>>>>> if(t_millis != 0&&  os::elapsedTime()>= end_time_secs) {
>>>>>>>> // Wait time is over
>>>>>>>> break;
>>>>>>>> }
>>>>>>>>
>>>>>>>> // Total collections count after the event
>>>>>>>> unsigned int after_count;
>>>>>>>> {
>>>>>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>>>>>> after_count = gch->total_collections();
>>>>>>>> }
>>>>>>>>
>>>>>>>> if(before_count != after_count) {
>>>>>>>> // There was a collection - success
>>>>>>>> break;
>>>>>>>> }
>>>>>>>>
>>>>>>>> // Too many loops warning
>>>>>>>> if(++loop_count == 0) {
>>>>>>>> warning("wait_on_cms_lock_for_scavenge() has looped %d
>>>>>>>> times", loop_count - 1);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>>>>>>>> while (!_should_terminate) {
>>>>>>>> if (CMSIncrementalMode) {
>>>>>>>> icms_wait();
>>>>>>>> if(CMSWaitDuration>= 0) {
>>>>>>>> // Wait until the next synchronous GC, a concurrent full gc
>>>>>>>> // request or a timeout, whichever is earlier.
>>>>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>>>>> }
>>>>>>>> return;
>>>>>>>> } else {
>>>>>>>> if(CMSWaitDuration>= 0) {
>>>>>>>> // Wait until the next synchronous GC, a concurrent full gc
>>>>>>>> // request or a timeout, whichever is earlier.
>>>>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>>>>> } else {
>>>>>>>> // Wait until any cms_lock event not to call
>>>>>>>> shouldConcurrentCollect permanently
>>>>>>>> wait_on_cms_lock(0);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> // Check if we should start a CMS collection cycle
>>>>>>>> if (_collector->shouldConcurrentCollect()) {
>>>>>>>> return;
>>>>>>>> }
>>>>>>>> // .. collection criterion not yet met, let's go back
>>>>>>>> // and wait some more
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Od: hotspot-gc-dev-bounces@openjdk.java.net
>>>>>>>> Komu: "Jon Masamitsu" jon.masamitsu@oracle.com,"John Cuthbertson"
>>>>>>>> john.cuthbertson@oracle.com
>>>>>>>> Kopie: hotspot-gc-dev@openjdk.java.net
>>>>>>>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>>>>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
>>>>>>>> non-incremental mode of CMS
>>>>>>>>
>>>>>>>>> Hi John --
>>>>>>>>>
>>>>>>>>> wrt the changes posted, i see the intent of the code and agree 
>>>>>>>>> with
>>>>>>>>> it. I have a few minor suggestions on the
>>>>>>>>> details of how it's implemented. My comments are inline below,
>>>>>>>>> interleaved with the code:
>>>>>>>>>
>>>>>>>>> 317 // Wait until the next synchronous GC, a concurrent full gc
>>>>>>>>> request,
>>>>>>>>> 318 // or a timeout, whichever is earlier.
>>>>>>>>> 319 void
>>>>>>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>>>>> t_millis) {
>>>>>>>>> 320 // Wait for any cms_lock event when timeout not specified
>>>>>>>>> (0 millis)
>>>>>>>>> 321 if (t_millis == 0) {
>>>>>>>>> 322 wait_on_cms_lock(t_millis);
>>>>>>>>> 323 return;
>>>>>>>>> 324 }
>>>>>>>>>
>>>>>>>>> I'd completely avoid the special case above because it would 
>>>>>>>>> miss the
>>>>>>>>> part about waiting for a
>>>>>>>>> scavenge, instead dealing with that case in the code in the 
>>>>>>>>> loop below
>>>>>>>>> directly. The idea
>>>>>>>>> of the "0" value is not to ask that we return immediately, but 
>>>>>>>>> that we
>>>>>>>>> wait, if necessary
>>>>>>>>> forever, for a scavenge. The "0" really represents the value 
>>>>>>>>> infinity
>>>>>>>>> in that sense. This would
>>>>>>>>> be in keeping with our use of wait() with a "0" value for 
>>>>>>>>> timeout at
>>>>>>>>> other places in the JVM as
>>>>>>>>> well, so it's consistent.
>>>>>>>>>
>>>>>>>>> 325
>>>>>>>>> 326 GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>>>>>> 327 double start_time = os::elapsedTime();
>>>>>>>>> 328 double end_time = start_time + (t_millis / 1000.0);
>>>>>>>>>
>>>>>>>>> Note how, the end_time == start_time for the special case of 
>>>>>>>>> t_millis
>>>>>>>>> == 0, so we need to treat that
>>>>>>>>> case specially below.
>>>>>>>>>
>>>>>>>>> 329
>>>>>>>>> 330 // Total collections count before waiting loop
>>>>>>>>> 331 unsigned int before_count;
>>>>>>>>> 332 {
>>>>>>>>> 333 MutexLockerEx hl(Heap_lock,
>>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>>> 334 before_count = gch->total_collections();
>>>>>>>>> 335 }
>>>>>>>>>
>>>>>>>>> Good.
>>>>>>>>>
>>>>>>>>> 336
>>>>>>>>> 337 while (true) {
>>>>>>>>> 338 double now_time = os::elapsedTime();
>>>>>>>>> 339 long wait_time_millis = (long)((end_time - now_time) *
>>>>>>>>> 1000.0);
>>>>>>>>> 340
>>>>>>>>> 341 if (wait_time_millis<= 0) {
>>>>>>>>> 342 // Wait time is over
>>>>>>>>> 343 break;
>>>>>>>>> 344 }
>>>>>>>>>
>>>>>>>>> Modify to:
>>>>>>>>> if (t_millis != 0) {
>>>>>>>>> if (wait_time_millis<= 0) {
>>>>>>>>> // Wait time is over
>>>>>>>>> break;
>>>>>>>>> }
>>>>>>>>> } else {
>>>>>>>>> wait_time_millis = 0; // for use in wait() below
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 345
>>>>>>>>> 346 // Wait until the next event or the remaining timeout
>>>>>>>>> 347 {
>>>>>>>>> 348 MutexLockerEx x(CGC_lock,
>>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>>> 349 if (_should_terminate || _collector->_full_gc_requested) {
>>>>>>>>> 350 return;
>>>>>>>>> 351 }
>>>>>>>>> 352 set_CMS_flag(CMS_cms_wants_token); // to provoke
>>>>>>>>> notifies
>>>>>>>>>
>>>>>>>>> insert: assert(t_millis == 0 || wait_time_millis>  0, "Sanity");
>>>>>>>>>
>>>>>>>>> 353 CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>>>>> wait_time_millis);
>>>>>>>>> 354 clear_CMS_flag(CMS_cms_wants_token);
>>>>>>>>> 355 assert(!CMS_flag_is_set(CMS_cms_has_token |
>>>>>>>>> CMS_cms_wants_token),
>>>>>>>>> 356 "Should not be set");
>>>>>>>>> 357 }
>>>>>>>>> 358
>>>>>>>>> 359 // Extra wait time check before entering the heap lock to
>>>>>>>>> get
>>>>>>>>> the collection count
>>>>>>>>> 360 if (os::elapsedTime()>= end_time) {
>>>>>>>>> 361 // Wait time is over
>>>>>>>>> 362 break;
>>>>>>>>> 363 }
>>>>>>>>>
>>>>>>>>> Modify above wait time check to make an exception for t_miliis 
>>>>>>>>> == 0:
>>>>>>>>> // Extra wait time check before checking collection count
>>>>>>>>> if (t_millis != 0&&  os::elapsedTime()>= end_time) {
>>>>>>>>> // wait time exceeded
>>>>>>>>> break;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 364
>>>>>>>>> 365 // Total collections count after the event
>>>>>>>>> 366 unsigned int after_count;
>>>>>>>>> 367 {
>>>>>>>>> 368 MutexLockerEx hl(Heap_lock,
>>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>>> 369 after_count = gch->total_collections();
>>>>>>>>> 370 }
>>>>>>>>> 371
>>>>>>>>> 372 if (before_count != after_count) {
>>>>>>>>> 373 // There was a collection - success
>>>>>>>>> 374 break;
>>>>>>>>> 375 }
>>>>>>>>> 376 }
>>>>>>>>> 377 }
>>>>>>>>>
>>>>>>>>> While it is true that we do not have a case where the method 
>>>>>>>>> is called
>>>>>>>>> with a time of "0", I think we
>>>>>>>>> want that value to be treated correctly as "infinity". For the 
>>>>>>>>> case
>>>>>>>>> where we do not want a wait at all,
>>>>>>>>> we should use a small positive value, like "1 ms" to signal that
>>>>>>>>> intent, i.e. -XX:CMSWaitDuration=1,
>>>>>>>>> reserving CMSWaitDuration=0 to signal infinity. (We could also 
>>>>>>>>> do that
>>>>>>>>> by reserving negative values to
>>>>>>>>> signal infinity, but that would make the code in the loop a bit
>>>>>>>>> more fiddly.)
>>>>>>>>>
>>>>>>>>> As mentioned in my previous email, I'd like to see this tested 
>>>>>>>>> with
>>>>>>>>> CMSWaitDuration set to 0, positive and
>>>>>>>>> negative values (if necessary, we can reject negative value 
>>>>>>>>> settings),
>>>>>>>>> and with ExplicitGCInvokesConcurrent.
>>>>>>>>>
>>>>>>>>> Rest looks OK to me, although I am not sure how this behaves with
>>>>>>>>> iCMS, as I have forgotten that part of the
>>>>>>>>> code.
>>>>>>>>>
>>>>>>>>> Finally, in current code (before these changes) there are two 
>>>>>>>>> callers
>>>>>>>>> of the former wait_for_cms_lock() method,
>>>>>>>>> one here in sleepBeforeNextCycle() and one from the 
>>>>>>>>> precleaning loop.
>>>>>>>>> I think the right thing has been done
>>>>>>>>> in terms of leaving the latter alone.
>>>>>>>>>
>>>>>>>>> It would be good if this were checked with 
>>>>>>>>> CMSInitiatingOccupancy set
>>>>>>>>> to 0 (or a small value), CMSWaitDuration set to 0,
>>>>>>>>> -+PromotionFailureALot and checking that (1) it does not 
>>>>>>>>> deadlock (2)
>>>>>>>>> CMS cycles start very soon after the end of
>>>>>>>>> a scavenge (and not at random times as Michal has observed 
>>>>>>>>> earlier,
>>>>>>>>> although i am guessing that is difficult to test).
>>>>>>>>> It would be good to repeat the above test with iCMS as well.
>>>>>>>>>
>>>>>>>>> thanks!
>>>>>>>>> -- ramki
>>>>>>>>>
>>>>>>>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna wrote:
>>>>>>>>>> Thanks Jon for the pointer:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu wrote:
>>>>>>>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>>>>>>>> The high level idea looks correct. I'll look at the details 
>>>>>>>>>>>> in a
>>>>>>>>>>>> bit (seriously this time; sorry it dropped off my plate last
>>>>>>>>>>>> time I promised).
>>>>>>>>>>>> Does anyone have a pointer to the related discussion thread on
>>>>>>>>>>>> this aias from earlier in the year, by chance, so one could
>>>>>>>>>>>> refresh one's
>>>>>>>>>>>> memory of that discussion?
>>>>>>>>>>> subj: CMSWaitDuration unstable behavior
>>>>>>>>>>>
>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> also:
>>>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On to it later this afternoon, and TTYL w/review.
>>>>>>>>>> - ramki
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic