[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-dev
Subject:    Re: RFR: JDK-8058209: Race in G1 card scanning could allow scanning of memory covered by PLABs
From:       Mikael Gerdin <mikael.gerdin () oracle ! com>
Date:       2014-11-14 15:32:47
Message-ID: 5466209F.1050109 () oracle ! com
[Download RAW message or body]

Hi Bill,

On 2014-11-14 15:48, bill pittore wrote:
> I don't want to hold up your current push but have a question inline.

Thanks, I'm in a bit of a hurry to get this in time to be able to 
backport it to 8u40.

>
> On 11/14/2014 8:20 AM, Mikael Gerdin wrote:
>> All,
>>
>> I've just realized (with the help of Stefan K.) that my webrev.1 is in
>> fact incorrect. It actually re-introduces the race by using a local
>> variable for the time stamp assert.
>>
>> With that in mind I want to push the contents of webrev.0 (with the
>> indentation fix) instead since that has gone through extensive testing.
>>
>> I plan to push this today and I'm already working on a bunch of
>> cleanups to this code.
>>
>> /Mikael
>>
>> On 2014-11-11 16:18, Mikael Gerdin wrote:
>>> Hi all,
>>>
>>> I've sent this to hotspot-dev instead of just hotspot-gc-dev in the hope
>>> of getting some extra feedback from our resident concurrency experts.
>>>
>>> Please review this subtle change to the order in which we read fields in
>>> G1OffsetTableContigSpace::saved_mark_word, original included here for
>>> reference:
>>> 1003
>>> 1004 HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
>>> 1005   G1CollectedHeap* g1h = G1CollectedHeap::heap();
>>> 1006   assert( _gc_time_stamp <= g1h->get_gc_time_stamp(),
>>> "invariant" );
>>> 1007   if (_gc_time_stamp < g1h->get_gc_time_stamp())
>>> 1008     return top();
>>> 1009   else
>>> 1010     return Space::saved_mark_word();
>>> 1011 }
>>> 1012
>>>
>>> When getting a new gc alloc region several stores are performed where
>>> store ordering needs to be enforced and several synchronization points
>>> occur.
>>> [write path]
>>> ST(_saved_mark_word)
>>> #StoreStore
>>> ST(_gc_time_stamp)
>>> ST(_top) // satisfying alloc request
> If at this point in time the read thread running on some other core (on
> a system with weak memory ordering) reads _top and _gc_time_stamp I'm
> not convinced that the values will always be what you think.  The store
> of _top could float above the store of _gc_time_stamp.  I don't know the
> gc code in question so I don't know if that's good or bad but I do think
> it's possible.  If you need strict ordering of the two writes then a
> StoreStore between them will guarantee that. Bertrand or David H. please
> correct me if I'm wrong.

I think you are correct about this. I chose to only fix this for the 
case I can verify on x86 to make a minimal fix for the crash in 8u40.

As I mentioned earlier in the thread I plan to rework this code a bit to 
reduce the number of races and introduce proper usage of barriers 
(and/or use store_release/load_acquire as I think they can reduce the 
overhead slightly)

/Mikael

>
> thanks,
> bill
>>> #StoreStore
>>> ST(_alloc_region) // publishing to other gc workers
>>> #MonitorEnter
>>> ST(_top) // potential further allocations
>>> #MonitorExit
>>> #MonitorEnter
>>> ST(_top) // potential further allocations
>>> #MonitorExit
>>>
>>> When we inspect a region during remembered set scanning we need to
>>> ensure that we never read memory which have been allocated by a GC
>>> worker thread for the purpose of copying objects into.
>>> The way this works is that a time stamp field is supposed to signal to a
>>> scanning thread that it should look at addresses below _top if the time
>>> stamp is old or addresses below _saved_mark_word if the time stamp is
>>> current.
>>>
>>> The current code does (as seen above)
>>> [read path]
>>> LD(_gc_time_stamp)
>>> LD(_top)
>>> or (depending on time stamp)
>>> LD(_saved_mark_word)
>>>
>>> Because these values are written to without full mutual exclusion we
>>> need to be very careful about the order in which we read these values,
>>> and this is where I argue that the current code is incorrect.
>>> In order to observe a consistent view of the ordered stores in the
>>> [write path] above we need to load the values in the reverse order they
>>> were written, with proper #LoadLoad ordering enforced.
>>>
>>> The problem which we've observed here is that after we've read the time
>>> stamp as below the heap time stamp the top pointer can be updated by a
>>> GC worker allocating objects into this region. To make sure that the top
>>> value we see is in fact valid we must read it before we read the time
>>> stamp which determines which value we should return from the
>>> saved_mark_word function.
>>>
>>> My suggested fix is to load _top first and enforce #LoadLoad ordering
>>> enforced:
>>> HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
>>>    G1CollectedHeap* g1h = G1CollectedHeap::heap();
>>>    assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
>>>    HeapWord* local_top = top();
>>>    OrderAccess::loadload();
>>>    if (_gc_time_stamp < g1h->get_gc_time_stamp()) {
>>>      return local_top;
>>>    } else {
>>>       return Space::saved_mark_word();
>>>    }
>>> }
>>>
>>> I've successfully reproduced the crash with the original code by adding
>>> some random sleep calls between the load of the time stamp and the load
>>> of top so I'm fairly certain that this resolves the issue. I've also
>>> verified that the fix I'm proposing does resolve the bug for the team
>>> which encountered the issue, even if I can't reproduce that crash
>>> locally.
>>>
>>> I also plan to attempt design around some of the races in this code to
>>> reduce its complexity, but for the sake of backporting the fix to 8u40
>>> I'd like to start with just adding the minimal fix.
>>>
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8058209
>>> Webrev: http://cr.openjdk.java.net/~mgerdin/8058209/webrev.0/
>>> Testing: JPRT, local kitchensink (4 hours), gc test suite
>>>
>>> Thanks
>>> /Mikael
>
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic