--===============3611885960121178894== Content-Type: multipart/alternative; boundary=bcaec548a05fb771b1051b57a4a9 --bcaec548a05fb771b1051b57a4a9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Jul 20, 2015 at 3:50 PM, John McCall wrote: > On Jul 17, 2015, at 4:56 PM, Richard Smith wrote: > On Fri, Jul 17, 2015 at 3:23 PM, John McCall wrote: > >> On Jul 17, 2015, at 2:49 PM, Richard Smith wrote= : >> On Fri, Jul 17, 2015 at 2:05 PM, Philip Reames > > wrote: >> >>> >>> >>> On 07/16/2015 02:38 PM, Richard Smith wrote: >>> >>> On Thu, Jul 16, 2015 at 2:03 PM, John McCall >>> wrote: >>> >>>> On Jul 16, 2015, at 11:46 AM, Richard Smith >>>> wrote: >>>> On Thu, Jul 16, 2015 at 11:29 AM, John McCall >>>> wrote: >>>> >>>>> > On Jul 15, 2015, at 10:11 PM, Hal Finkel wrote: >>>>> > >>>>> > Hi everyone, >>>>> > >>>>> > C++11 added features that allow for certain parts of the class >>>>> hierarchy to be closed, specifically the 'final' keyword and the sema= ntics >>>>> of anonymous namespaces, and I think we take advantage of these to en= hance >>>>> our ability to perform devirtualization. For example, given this situ= ation: >>>>> > >>>>> > struct Base { >>>>> > virtual void foo() =3D 0; >>>>> > }; >>>>> > >>>>> > void external(); >>>>> > struct Final final : Base { >>>>> > void foo() { >>>>> > external(); >>>>> > } >>>>> > }; >>>>> > >>>>> > void dispatch(Base *B) { >>>>> > B->foo(); >>>>> > } >>>>> > >>>>> > void opportunity(Final *F) { >>>>> > dispatch(F); >>>>> > } >>>>> > >>>>> > When we optimize this code, we do the expected thing and inline >>>>> 'dispatch' into 'opportunity' but we don't devirtualize the call to f= oo(). >>>>> The fact that we know what the vtable of F is at that callsite is not >>>>> exploited. To a lesser extent, we can do similar things for final vir= tual >>>>> methods, and derived classes in anonymous namespaces (because Clang c= ould >>>>> determine whether or not a class (or method) there is effectively fin= al). >>>>> > >>>>> > One possibility might be to @llvm.assume to say something about wha= t >>>>> the vtable ptr of F might be/contain should it be needed later when w= e emit >>>>> the initial IR for 'opportunity' (and then teach the optimizer to use= that >>>>> information), but I'm not at all sure that's the best solution. Thoug= hts? >>>>> >>>>> The problem with any sort of @llvm.assume-encoded information about >>>>> memory contents is that C++ does actually allow you to replace object= s in >>>>> memory, up to and including stuff like: >>>>> >>>>> { >>>>> MyClass c; >>>>> >>>>> // Reuse the storage temporarily. UB to access the object through >>>>> =E2=80=98c=E2=80=99 now. >>>>> c.~MyClass(); >>>>> auto c2 =3D new (&c) MyOtherClass(); >>>>> >>>>> // The storage has to contain a =E2=80=98MyClass=E2=80=99 when it g= oes out of scope. >>>>> c2->~MyOtherClass(); >>>>> new (&c) MyClass(); >>>>> } >>>>> >>>>> The standard frontend devirtualization optimizations are permitted >>>>> under a couple of different language rules, specifically that: >>>>> 1. If you access an object through an l-value of a type, it has to >>>>> dynamically be an object of that type (potentially a subobject). >>>>> 2. Object replacement as above only =E2=80=9Cforwards=E2=80=9D existi= ng formal >>>>> references under specific conditions, e.g. the dynamic type has to be= the >>>>> same, =E2=80=98const=E2=80=99 members have to have the same value, et= c. Using an >>>>> unforwarded reference (like the name of the local variable =E2=80=98c= =E2=80=99 above) >>>>> doesn=E2=80=99t formally refer to a valid object and thus has undefin= ed behavior. >>>>> >>>>> You can apply those rules much more broadly than the frontend does, o= f >>>>> course; but those are the language tools you get. >>>> >>>> >>>> Right. Our current plan for modelling this is: >>>> >>>> 1) Change the meaning of the existing !invariant.load metadata (or >>>> add another parallel metadata kind) so that it allows load-load forwar= ding >>>> (even if the memory is not known to be unmodified between the loads) i= f: >>>> >>>> >>>> invariant.load currently allows the load to be reordered pretty >>>> aggressively, so I think you need a new metadata. >>>> >>> >>> Our thoughts were: >>> 1) The existing !invariant.load is redundant because it's exactly >>> equivalent to a call to @llvm.invariant.start and a load. >>> 2) The new semantics are a more strict form of the old semantics, so no >>> special action is required to upgrade old IR. >>> ... so changing the meaning of the existing metadata seemed preferable >>> to adding a new, similar-but-not-quite-identical, form of the metadata.= But >>> either way seems fine. >>> >>> I'm going to argue pretty strongly in favour of the new form of >>> metadata. We've spent a lot of time getting !invariant.load working we= ll >>> for use cases like the "length" field in a Java array and I'd really ha= te >>> to give that up. >>> >>> (One way of framing this is that the current !invariant.load gives a >>> guarantee that there can't be a @llvm.invariant.end call anywhere in th= e >>> program and that any @llvm.invariant.start occurs outside the visible s= cope >>> of the compilation unit (Module, LTO, what have you) and must have exec= uted >>> before any code contained in said module which can describe the memory >>> location can execute. FYI, that last bit of strange wording is to allo= w >>> initialization inside a malloc like function which returns a noalias >>> pointer.) >>> >> >> I had overlooked that !invariant.load also applies for loads /before/ th= e >> invariant load. I agree that this is different both from what we're >> proposing and from what you can achieve with @llvm.invariant.start. I wo= uld >> expect that you can use our metadata for the length in a Java array -- i= t >> seems like it'd be straightforward for you to arrange that all loads of = the >> array field have the metadata (and that you use the same operand on all = of >> them) -- but there's no real motivation behind reusing the existing >> metadata besides simplicity and cleanliness. >> >> I'm definitely open to working together on a revised version of a more >>> general invariant mechanism. In particular, we don't have a good way o= f >>> modelling Java's "final" fields* in the IR today since the initializati= on >>> logic may be visible to the compiler. Coming up with something which >>> supports both use cases would be really useful. >>> >> >> This seems like something that our proposed mechanism may be able to >> support; we intend to use it for const and reference data members in C++= , >> though the semantics of those are not quite the same. >> >> >> ObjC (and Swift, and probably a number of other languages) has a >> optimization opportunity where there=E2=80=99s a global variable that=E2= =80=99s known to be >> constant after its initialization. (For the initiated, I=E2=80=99m talk= ing here >> primarily about ivar offset variables.) However, that initialization is >> run lazily, and it=E2=80=99s only at specific points within the program = that we can >> guarantee that it=E2=80=99s already been performed. (Namely, before iva= r accesses >> or after message sends to the class (but not to instances, because of >> nil).) Those points usually guarantee the initialization of more than o= ne >> variable, and contrariwise, there are often several such points that wou= ld >> each individually suffice to establish the guarantee for a particular lo= ad, >> allowing it to be hoisted/reordered/combined at will. >> >> So e.g. >> >> if (cond) { >> // Here there=E2=80=99s an operation that proves to us that A, B, an= d C are >> initialized. >> } else { >> // Here there=E2=80=99s an operation that proves it for just A and B= . >> } >> >> for (;;) { >> // Here we load A. This should be hoist able out of this loop, >> independently of whatever else happens in this loop. >> } >> >> This is actually the situation where ObjC currently uses !invariant.load= , >> except that we can only safely use it in specific functions (ObjC method >> implementations) that guarantee initialization before entry and which ca= n >> never be inlined. >> >> Now, I think something like invariant.start would help with this, except >> that I=E2=80=99m concerned that we=E2=80=99d have to eagerly emit what m= ight be dozens of >> invariant.starts at every point that established the guarantee, which wo= uld >> be pretty wasteful even for optimized builds. If we=E2=80=99re designin= g new >> metadata anyway, or generalizing existing metadata, can we try to make t= his >> more scalable, so that e.g. I can use a single intrinsic with a list of = the >> invariants it establishes, ideally in a way that=E2=80=99s sharable betw= een calls? >> > > It seems we have three different use cases: > > 1) This invariant applies to this load and all future loads of this > pointer (ObjC / Swift constants, Java final members) > 2) This invariant applies to this load and all past and future loads of > this pointer (Java array length) > > > Hmm. I=E2=80=99m not really seeing what you=E2=80=99re saying about past= and future. The > difference is that the invariant holds, but only after a certain point; > reordering the load earlier across side-effects etc. is fine (great, even= ), > it just can=E2=80=99t cross a particular line. > > I assume the only reason that bounding the invariant isn=E2=80=99t import= ant for > Philip's array-length is that the initialization is done opaquely, so tha= t > the optimizer can=E2=80=99t see the pointer until it=E2=80=99s been prope= rly initialized. > That is, the bound is enforced by SSA. > > I think the real difference here is whether SSA value identity can give u= s > sufficient information or not. > > For C++ vtables and const/reference members, it does, because the > semantics are dependent on =E2=80=9Cblessed=E2=80=9D object references (t= he result of the > new-expression, the name of the local variable, etc.) which mostly have > undefined behavior if re-written. Maybe you need to make some effort to > not have GEPs down to base classes break the optimization, but that=E2=80= =99s > probably easy. > > For the Java cases, it still does, because the object becomes immutable > past a certain point (the return from the new operation), which also > narrows most/all of the subsequent accesses to a single value. So you ca= n > bless the object at that point; sure, you theoretically give up some > optimization opportunities if =E2=80=98this=E2=80=99 was stored aside dur= ing construction, > but it=E2=80=99s not a big deal. > > For my ObjC/Swift cases, it=E2=80=99s not good enough, because the addres= ses that > become immutable are global. Each load is going to re-derive the address > from the global, so no amount of SSA value propagation is going to help; > thus the barrier has to be more like a memory barrier than just an SSA > barrier. (The biggest distinguishing feature from actual atomic memory > barriers is that dominating barriers trivialize later barriers, regardles= s > of what happens in between.) > > (Of course, Swift has situations more like the C++ and Java opportunities= , > too.) > > So maybe these are really different problems and there=E2=80=99s no usefu= l shared > generalization. > I'm increasingly thinking that's the case; we now intend to propose adding new metadata rather than extending / generalizing !invariant.load. --bcaec548a05fb771b1051b57a4a9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On M= on, Jul 20, 2015 at 3:50 PM, John McCall <rjmccall@apple.com> wrote:
On Jul 17, = 2015, at 4:56 PM, Richard Smith <richard@metafoo.co.uk> wrote:
On Fri, Jul = 17, 2015 at 3:23 PM, John McCall <rjmccall@apple.com> wrote= :
On Jul 17, 2015, at 2:49 PM, Ric= hard Smith <r= ichard@metafoo.co.uk> wrote:
On Fri, Jul 17, 2015 at 2:05 PM= , Philip Reames <listmail@philipreames.com> wrote:
=20 =20 =20


On 07/16/2015 02:38 PM, Richard Smith wrote:
On Thu, Jul 16, 2015 at 2:03 PM, John McCall <rjmccall@apple.com> wrote:
On Jul 16, 2015, at 11:46 AM, Richard Smith <richard@metafoo.co.uk> wrote:
On Thu, Jul 16, 2015 at 11:29 AM, John McCall <rjmccall@a= pple.com> wrote:
> O= n Jul 15, 2015, at 10:11 PM, Hal Finkel <hfinkel@anl.gov> wrote:
>
> Hi everyone,
>
> C++11 added features that allow for certain parts of the class hierarchy to be closed, specifically the 'final' keyword and the semantics of anonymous namespaces, and I think we take advantage of these to enhance our ability to perform devirtualization. For example, given this situation:
>
> struct Base {
>=C2=A0 virtual void foo() =3D 0; > };
>
> void external();
> struct Final final : Base {
>=C2=A0 void foo() {
>=C2=A0 =C2=A0 external();
>=C2=A0 }
> };
>
> void dispatch(Base *B) {
>=C2=A0 B->foo();
> }
>
> void opportunity(Final *F) {
>=C2=A0 dispatch(F);
> }
>
> When we optimize this code, we do the expected thing and inline 'dispatch' into 'opportunit= y' but we don't devirtualize the call to foo(). The fact that we know what the vtable of F is at that callsite is not exploited. To a lesser extent, we can do similar things for final virtual methods, and derived classes in anonymous namespaces (because Clang could determine whether or not a class (or method) there is effectively final).
>
> One possibility might be to @llvm.assume to say something about what the vtable ptr of F might be/contain should it be needed later when we emit the initial IR for 'opportunity' (and then teach t= he optimizer to use that information), but I'm not at all sure that's = the best solution. Thoughts?

The problem with any sort of @llvm.assume-encoded information about memory contents is that C++ does actually allow you to replace objects in memory, up to and including stuff like:

{
=C2=A0 MyClass c;

=C2=A0 // Reuse the storage temporarily.= =C2=A0 UB to access the object through =E2=80=98= c=E2=80=99 now.
=C2=A0 c.~MyClass();
=C2=A0 auto c2 =3D new (&c) MyOtherClass();

=C2=A0 // The storage has to contain a =E2=80=98MyClass=E2=80=99 when it goes ou= t of scope.
=C2=A0 c2->~MyOtherClass();
=C2=A0 new (&c) MyClass();
}

The standard frontend devirtualization optimizations are permitted under a couple of different language rules, specifically that:
1. If you access an object through an l-value of a type, it has to dynamically be an object of that type (potentially a subobject).
2. Object replacement as above only =E2=80=9Cforwards=E2=80=9D existing forma= l references under specific conditions, e.g. the dynamic type has to be the same, =E2=80=98const=E2=80=99 members have to h= ave the same value, etc.=C2=A0 Using an unforwarded reference (like the name of the local variable =E2=80=98c=E2=80=99 above) doesn= =E2=80=99t formally refer to a valid object and thus has undefined behavior.

You can apply those rules much more broadly than the frontend does, of course; but those are the language tools you get.

Right. Our current plan for modelling this is:

1) Change the meaning of the existing !invariant.load metadata (or add another parallel metadata kind) so that it allows load-load forwarding (even if the memory is not known to be unmodified between the loads) if:

invariant.load currently allows the load to be reordered pretty aggressively, so I think you need a new metadata.

Our thoughts were:
1) The existing !invariant.load is redundant because it's exactly equivalent to a call to @llvm.invariant.star= t and a load.
2) The new semantics are a more strict form of the old semantics, so no special action is required to upgrade old IR.
... so changing the meaning of the existing metadata seemed preferable to adding a new, similar-but-not-quite-identical, form of the metadata. But either way seems fine.
I'm going to argue pretty strongly in favour of the new form of metadata.=C2=A0 We've spent a lot of time getting !invariant.load w= orking well for use cases like the "length" field in a Java array an= d I'd really hate to give that up.

(One way of framing this is that the current !invariant.load gives a guarantee that there can't be a @llvm.invariant.end call anywhere i= n the program and that any @llvm.invariant.start occurs outside the visible scope of the compilation unit (Module, LTO, what have you) and must have executed before any code contained in said module which can describe the memory location can execute.=C2=A0 FYI, that las= t bit of strange wording is to allow initialization inside a malloc like function which returns a noalias pointer.)
<= div>
I had overlooked that !invariant.load also applies for l= oads /before/ the invariant load. I agree that this is different both from = what we're proposing and from what you can achieve with @llvm.invariant= .start. I would expect that you can use our metadata for the length in a Ja= va array -- it seems like it'd be straightforward for you to arrange th= at all loads of the array field have the metadata (and that you use the sam= e operand on all of them) -- but there's no real motivation behind reus= ing the existing metadata besides simplicity and cleanliness.
I'm definitely open to working together on a revised version of a more general invariant mechanism.=C2=A0 In particular, we don't hav= e a good way of modelling Java's "final" fields* in the IR to= day since the initialization logic may be visible to the compiler.=C2=A0 Coming u= p with something which supports both use cases would be really useful.

This seems like somet= hing that our proposed mechanism may be able to support; we intend to use i= t for const and reference data members in C++, though the semantics of thos= e are not quite the same.
ObjC (and Swift, and probably a number of other languag= es) has a optimization opportunity where there=E2=80=99s a global variable = that=E2=80=99s known to be constant after its initialization. =C2=A0(For th= e initiated, I=E2=80=99m talking here primarily about ivar offset variables= .) =C2=A0However, that initialization is run lazily, and it=E2=80=99s only = at specific points within the program that we can guarantee that it=E2=80= =99s already been performed. =C2=A0(Namely, before ivar accesses or after m= essage sends to the class (but not to instances, because of nil).) =C2=A0Th= ose points usually guarantee the initialization of more than one variable, = and contrariwise, there are often several such points that would each indiv= idually suffice to establish the guarantee for a particular load, allowing = it to be hoisted/reordered/combined at will.

So e.= g.

=C2=A0 if (cond) {
=C2=A0 =C2=A0 // H= ere there=E2=80=99s an operation that proves to us that A, B, and C are ini= tialized.
=C2=A0 } else {
=C2=A0 =C2=A0 // Here there= =E2=80=99s an operation that proves it for just A and B.
=C2=A0 }=

=C2=A0 for (;;) {
=C2=A0 =C2=A0 // Here= we load A.=C2=A0 This should be hoist able out of this loop, independently= of whatever else happens in this loop.
=C2=A0 }

This is actually the situation where ObjC currently uses !invarian= t.load, except that we can only safely use it in specific functions (ObjC m= ethod implementations) that guarantee initialization before entry and which= can never be inlined.

Now, I think something like= invariant.start would help with this, except that I=E2=80=99m concerned th= at we=E2=80=99d have to eagerly emit what might be dozens of invariant.star= ts at every point that established the guarantee, which would be pretty was= teful even for optimized builds.=C2=A0 If we=E2=80=99re designing new metad= ata anyway, or generalizing existing metadata, can we try to make this more= scalable, so that e.g. I can use a single intrinsic with a list of the inv= ariants it establishes, ideally in a way that=E2=80=99s sharable between ca= lls?

It seems we have three dif= ferent use cases:

1) This invariant applies to thi= s load and all future loads of this pointer (ObjC / Swift constants, Java f= inal members)
2) This invariant applies to this load and all past= and future loads of this pointer (Java array length)

Hmm.=C2=A0 I=E2=80=99m not = really seeing what you=E2=80=99re saying about past and future.=C2=A0 The d= ifference is that the invariant holds, but only after a certain point; reor= dering the load earlier across side-effects etc. is fine (great, even), it = just can=E2=80=99t cross a particular line.

I assu= me the only reason that bounding the invariant isn=E2=80=99t important for = Philip's array-length is that the initialization is done opaquely, so t= hat the optimizer can=E2=80=99t see the pointer until it=E2=80=99s been pro= perly initialized.=C2=A0 That is, the bound is enforced by SSA.
<= br>
I think the real difference here is whether SSA value identit= y can give us sufficient information or not.

For C= ++ vtables and const/reference members, it does, because the semantics are = dependent on =E2=80=9Cblessed=E2=80=9D object references (the result of the= new-expression, the name of the local variable, etc.) which mostly have un= defined behavior if re-written.=C2=A0 Maybe you need to make some effort to= not have GEPs down to base classes break the optimization, but that=E2=80= =99s probably easy.

For the Java cases, it still d= oes, because the object becomes immutable past a certain point (the return = from the new operation), which also narrows most/all of the subsequent acce= sses to a single value.=C2=A0 So you can bless the object at that point; su= re, you theoretically give up some optimization opportunities if =E2=80=98t= his=E2=80=99 was stored aside during construction, but it=E2=80=99s not a b= ig deal.

For my ObjC/Swift cases, it=E2=80=99s not= good enough, because the addresses that become immutable are global.=C2=A0= Each load is going to re-derive the address from the global, so no amount = of SSA value propagation is going to help; thus the barrier has to be more = like a memory barrier than just an SSA barrier. =C2=A0(The biggest distingu= ishing feature from actual atomic memory barriers is that dominating barrie= rs trivialize later barriers, regardless of what happens in between.)
=

(Of course, Swift has situations more like the C++ and = Java opportunities, too.)

So maybe these are reall= y different problems and there=E2=80=99s no useful shared generalization.

I'm increasingly thinking th= at's the case; we now intend to propose adding new metadata rather than= extending / generalizing !invariant.load.
--bcaec548a05fb771b1051b57a4a9-- --===============3611885960121178894== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ cfe-dev mailing list cfe-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev --===============3611885960121178894==--