'RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-runtime-dev
Subject:    RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
From:       "Doerr, Martin" <martin.doerr () sap ! com>
Date:       2016-04-27 14:46:18
Message-ID: 03ca7f623bee439885e9bdeaca5ab80b () DEWDFE13DE14 ! global ! corp ! sap
[Download RAW message or body]

Hi Hiroshi,

> I think, to use the enum and semantics of C++11, callers of cmpxchg need to call
> memory-barrier after cmpxchg when all of updates in the other processes must be
> available for the following instructions of the cmpxchg. Correct?

I think the problem is that our current C++ code doesn't use seq_cst semantics for \
volatile loads. The sync at the end of the cmpxchg is only needed if a volatile load \
is following which is not preceded by a sync instruction. As long as this is the \
case, I think we should keep the sync at the end of cmpxchg.

> Do you mean that a new cmpxchg with relaxed semantics will be added and used in
> the GC change? Or, after the discussion of the new cmpxchg interface, will be
> the discussion of the GC change started?

The second.

Best regards,
Martin

From: Hiroshi H Horii [mailto:HORII@jp.ibm.com]
Sent: Mittwoch, 27. April 2016 05:34
To: Doerr, Martin <martin.doerr@sap.com>
Cc: David Holmes <david.holmes@oracle.com>; Lindenmaier, Goetz \
<goetz.lindenmaier@sap.com>; hotspot-gc-dev@openjdk.java.net; \
hotspot-runtime-dev@openjdk.java.net; ppc-aix-port-dev@openjdk.java.net; Tim Ellison \
                <Tim_Ellison@uk.ibm.com>; Volker Simonis <volker.simonis@gmail.com>
Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Hi Martin,

> I think we shouldn't better use an own enum (e.g. like AccessKind in
> library_call.cpp).
> Otherwise we'll get trouble when we switch to C++11.  Would you agree?

I agree.

I think, to use the enum and semantics of C++11, callers of cmpxchg need to call
memory-barrier after cmpxchg when all of updates in the other processes must be
available for the following instructions of the cmpxchg. Correct?

> Would it be better to split this bug into 2 and discuss the cmpxchg
> interface change on the runtime list and the GC change on the gc list?

Do you mean that a new cmpxchg with relaxed semantics will be added and used in
the GC change? Or, after the discussion of the new cmpxchg interface, will be
the discussion of the GC change started?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


"Doerr, Martin" <martin.doerr@sap.com<mailto:martin.doerr@sap.com>> wrote on \
04/25/2016 19:25:15:

> From: "Doerr, Martin" <martin.doerr@sap.com<mailto:martin.doerr@sap.com>>
> To: Hiroshi H Horii/Japan/IBM@IBMJP, David Holmes \
>                 <david.holmes@oracle.com<mailto:david.holmes@oracle.com>>
> Cc: "hotspot-gc-dev@openjdk.java.net<mailto:hotspot-gc-dev@openjdk.java.net>" \
> <hotspot-gc- dev@openjdk.java.net<mailto:dev@openjdk.java.net>>, \
> "hotspot-runtime-dev@openjdk.java.net<mailto:hotspot-runtime-dev@openjdk.java.net>" \
> <hotspot-runtime-dev@openjdk.java.net<mailto:hotspot-runtime-dev@openjdk.java.net>>, \
> "ppc-aix-port- dev@openjdk.java.net<mailto:dev@openjdk.java.net>" \
> <ppc-aix-port-dev@openjdk.java.net<mailto:ppc-aix-port-dev@openjdk.java.net>>, Tim \
> Ellison <Tim_Ellison@uk.ibm.com<mailto:Tim_Ellison@uk.ibm.com>>, Volker Simonis \
> <volker.simonis@gmail.com<mailto:volker.simonis@gmail.com>>, "Lindenmaier, Goetz" \
>                 <goetz.lindenmaier@sap.com<mailto:goetz.lindenmaier@sap.com>>
> Date: 04/25/2016 19:26
> Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
> 
> Hi David and Hiroshi,
> 
> thank you very much for this interesting question and analysis.
> 
> I think we shouldn't better use an own enum (e.g. like AccessKind in
> library_call.cpp).
> Otherwise we'll get trouble when we switch to C++11.  Would you agree?
> 
> Would it be better to split this bug into 2 and discuss the cmpxchg
> interface change on the runtime list and the GC change on the gc list?
> 
> Best regards,
> Martin
> 
> From: Hiroshi H Horii [mailto:HORII@jp.ibm.com]
> Sent: Montag, 25. April 2016 09:10
> To: David Holmes <david.holmes@oracle.com<mailto:david.holmes@oracle.com>>
> Cc: hotspot-gc-dev@openjdk.java.net<mailto:hotspot-gc-dev@openjdk.java.net>; \
> hotspot-runtime- dev@openjdk.java.net<mailto:dev@openjdk.java.net>; \
> ppc-aix-port-dev@openjdk.java.net<mailto:ppc-aix-port-dev@openjdk.java.net>; Tim \
> Ellison <Tim_Ellison@uk.ibm.com<mailto:Tim_Ellison@uk.ibm.com>>; Volker Simonis \
> <volker.simonis@gmail.com<mailto:volker.simonis@gmail.com>>; Doerr, Martin \
> <martin.doerr@sap.com<mailto:martin.doerr@sap.com>>; Lindenmaier, Goetz \
>                 <goetz.lindenmaier@sap.com<mailto:goetz.lindenmaier@sap.com>>
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
> 
> Hi David,
> 
> Thank you for your comments and questions.
> 
> > 1. Are the current cmpxchg semantics exactly the same as
> > memory_order_seq_cst?
> 
> This is very good question..
> 
> I guess, cmpxchg needs a more conservative constraint for memory ordering
> than C++11, to add sync after a compare-and-exchange operation.
> 
> Could someone give comments or thoughts?
> 
> memory_order_seq_cst is defined as
> "Any operation with this memory order is both an acquire operation and
> a release operation, plus a single total order exists in which
> all threads
> observe all modifications (see below) in the same order."
> (http://en.cppreference.com/w/cpp/atomic/memory_order)
> 
> In my environment, g++ and xlc generate following assemblies on ppc64le.
> (interestingly, they generates the same assemblies for any memory_order)
> 
> g++ (4.9.2)
> 100008a4:   ac 04 00 7c     sync
> 100008a8:   28 50 20 7d     lwarx   r9,0,r10
> 100008ac:   00 18 09 7c     cmpw    r9,r3
> 100008b0:   0c 00 c2 40     bne-    100008bc
> 100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
> 100008b8:   f0 ff c2 40     bne-    100008a8
> 100008bc:   2c 01 00 4c     isync
> 
> xlc (13.1.3)
> 10000888:   ac 04 00 7c     sync
> 1000088c:   28 28 c0 7c     lwarx   r6,0,r5
> 10000890:   40 00 26 7c     cmpld   r6,r0
> 10000894:   0c 00 82 40     bne     100008a0
> 10000898:   2d 29 80 7c     stwcx.  r4,0,r5
> 1000089c:   f0 ff e2 40     bne+    1000088c
> 100008a0:   2c 01 00 4c     isync
> 
> On the other hand, the current OpenJDK generates following assemblies.
> 
> 508:   ac 04 00 7c     sync
> 50c:   00 00 5c e9     ld      r10,0(r28)
> 510:   00 50 3b 7c     cmpd    r27,r10
> 514:   1c 00 c2 40     bne-    530
> 518:   a8 40 5c 7d     ldarx   r10,r28,r8
> 51c:   00 50 3b 7c     cmpd    r27,r10
> 520:   10 00 c2 40     bne-    530
> 524:   ad 41 3c 7d     stdcx.  r9,r28,r8
> 528:   f0 ff c2 40     bne-    518
> 52c:   ac 04 00 7c     sync
> 530:   00 50 bb 7f     ...
> 
> Though we can ignore 50c-514 (because they are a duplicated guard condition),
> the last sync instruction (52c) makes cmpxchg more strict than
> memory_order_seq_cst.
> 
> In some cases, the last sync is necessary when this thread must be
> able to read
> all of the changes in the other threads while executing from 508 to 530
> (that processes compare-and-exchange).
> 
> > 2. Has there been a discussion already, establishing that the modified
> > GC code can indeed use memory_order_relaxed? Otherwise who is
> > postulating that and based on what evidence?
> 
> Volker and his colleagues have investigated the current GC codes
> according to this.
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019079.html
> However, I believe, we need comments of other GC experts to change
> the shared codes.
> 
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
> 
> 
> David Holmes <david.holmes@oracle.com<mailto:david.holmes@oracle.com>> wrote on \
> 04/22/2016 21:57:07: 
> > From: David Holmes <david.holmes@oracle.com<mailto:david.holmes@oracle.com>>
> > To: Hiroshi H Horii/Japan/IBM@IBMJP, hotspot-runtime-
> > dev@openjdk.java.net<mailto:dev@openjdk.java.net>, \
> >                 hotspot-gc-dev@openjdk.java.net<mailto:hotspot-gc-dev@openjdk.java.net>
> >                 
> > Cc: Tim Ellison <Tim_Ellison@uk.ibm.com<mailto:Tim_Ellison@uk.ibm.com>>, \
> >                 ppc-aix-port-dev@openjdk.java.net<mailto:ppc-aix-port-dev@openjdk.java.net>
> >                 
> > Date: 04/22/2016 21:58
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> > copy_to_survivor for ppc64
> > 
> > Hi Hiroshi,
> > 
> > Two initial questions:
> > 
> > 1. Are the current cmpxchg semantics exactly the same as
> > memory_order_seq_cst?
> > 
> > 2. Has there been a discussion already, establishing that the modified
> > GC code can indeed use memory_order_relaxed? Otherwise who is
> > postulating that and based on what evidence?
> > 
> > Missing memory barriers have caused very difficult to track down bugs in
> > the past - very rare race conditions. So any relaxation here has to be
> > done with extreme confidence.
> > 
> > Thanks,
> > David
> > 
> > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> > > Dear all:
> > > 
> > > Can I please request reviews for the following change?
> > > 
> > > Code change:
> > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> > > (I initially created and Martin enhanced so much)
> > > 
> > > This change follows the discussion started from this mail.
> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> > April/018960.html
> > > 
> > > Description:
> > > This change provides relaxed compare-and-exchange by introducing
> > > similar semantics of C++ atomic memory operators, enum memory_order.
> > > As described in atomic_linux_ppc.inline.hpp, the current implementation of
> > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
> > > general purposes because twice calls of sync before and after cmpxchg will
> > > provide strict consistency. However, they sometimes cause overheads
> > > because
> > > sync instructions are very expensive in the current POWER chip design.
> > > In addition, for the other platforms, such as aarch64, this strict
> > > semantics
> > > may cause some overheads (according to the Andrew's mail).
> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> > April/019073.html
> > > 
> > > With this change, callers can explicitly specify constraints of memory
> > > ordering
> > > for cmpxchg with an additional parameter, memory_order order.
> > > 
> > > typedef enum memory_order {
> > > memory_order_relaxed,
> > > memory_order_consume,
> > > memory_order_acquire,
> > > memory_order_release,
> > > memory_order_acq_rel,
> > > memory_order_seq_cst
> > > } memory_order;
> > > 
> > > Because the default value of the parameter is memory_order_seq_cst,
> > > existing codes can use the same semantics of cmpxchg without any
> > > modification. The relaxed cmpxchg is implemented only on ppc
> > > in this changeset. Therefore, the behavior on the other platforms will
> > > not be changed with this changeset.
> > > 
> > > In addition, with the new parameter of cmpxchg, this change improves
> > > performance of copy_to_survivor in the parallel GC.
> > > copy_to_survivor changes forward pointers by using cmpxchg. This
> > > operation doesn't require any sync instructions.  A pointer is changed
> > > at most once in a GC and when cmpxchg fails, the latest pointer is
> > > available for the caller. cas_set_mark and cas_forward_to are extended
> > > with an additional memory_order parameter as cmpxchg and copy_to_survivor
> > > uses memory_order_relaxed to modify the forward pointers.
> > > 
> > > Summary of source code changes:
> > > 
> > > * src/share/vm/runtime/atomic.hpp
> > > - Defines enum memory_order and adds a parameter to cmpxchg.
> > > 
> > > * src/share/vm/runtime/atomic.cpp
> > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> > > - Added a parameter for each cmpxchg function to follow
> > > the change of atomic.hpp. Their implementations are not changed.
> > > 
> > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> > > - Added a parameter for each cmpxchg function to follow
> > > the change of atomic.hpp. In addition, implementations
> > > are changed corresponding to the specified memory_order.
> > > 
> > > * src/share/vm/oops/oop.hpp
> > > * src/share/vm/oops/oop.inline.hpp
> > > - Add a memory_order parameter to use relaxed cmpxchg in
> > > cas_set_mark and cas_forward_to.
> > > 
> > > * src/share/vm/gc/parallel/psPromotionManager.cpp
> > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> > > 
> > > Martin tested this changeset  on linuxx86_64, linuxppc64le and
> > > darwinintel64.
> > > Though more time is needed to test on the other platform, we would like to
> > > ask
> > > reviews and start discussion on this changeset.
> > > I also tested this changeset with SPECjbb2013 and confirmed that gc pause
> > > time
> > > is reduced.
> > > 
> > > Regards,
> > > Hiroshi
> > > -----------------------
> > > Hiroshi Horii, Ph.D.
> > > IBM Research - Tokyo
> > > 
> > > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic