From cfe-dev Mon Aug 03 22:30:40 2015 From: Richard Smith Date: Mon, 03 Aug 2015 22:30:40 +0000 To: cfe-dev Subject: Re: [cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers Message-Id: X-MARC-Message: https://marc.info/?l=cfe-dev&m=143864126915966 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============3998022752456802752==" --===============3998022752456802752== Content-Type: multipart/alternative; boundary=20cf30334c17307a5b051c6fb8ce --20cf30334c17307a5b051c6fb8ce Content-Type: text/plain; charset=UTF-8 On Mon, Aug 3, 2015 at 1:37 PM, Hans Wennborg wrote: > On Sun, Aug 2, 2015 at 5:09 PM, Chandler Carruth > wrote: > > Would cherrypicking the diagnostics to the 3.7 branch be better or worse? > > (I'm of two minds, curious what others think...) > > The alternative of reverting would have the downside of missing out on > some of the target attribute functionality in 3.7. I haven't been > following closely enough to determine how great a loss that would be, > but as far as I understand, this is still a work in progress, right? > > The alternative of cherry-picking diagnostics has the problem that I > don't think any diagnostics have landed yet :-) > > My inclination is to revert back to safety here. The revert would be > pretty hairy though, as there have been a number of changes to these > files since r239883 landed. I'm still trying to figure this out. Here's my view: sometimes features don't make the completeness bar in time for a release. That's normal and not something to be apologetic / troubled about; we'll get the feature in the next release, and we shouldn't try to rush it in. This feature is unusual in that the transitional period has temporarily left us with (arguably) a regression, and that happened to be the state when the 3.7 branch was cut, but our normal philosophies should apply: we have time-based releases, not feature-based ones, and our first response to regressions is to revert to green. So I think reverting r239883 is the way to go. If the diagnostic work is done in time, we can consider unreverting and patching it across. > On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner > wrote: > >> > >> Eric Christopher writes: > >> > On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner > wrote: > >> > > >> > I'm opposed to this. Going forward, I would really like target > >> > intrinsics > >> > to be available regardless of the current feature set, so users > >> > don't need > >> > hacks like these. > >> > > >> > Agreed. > >> > > >> > > >> > I see two ways to do this with different tradeoffs: > >> > 1. Diagnose missing target attributes when calling the intel > >> > intrinsics. I > >> > was surprised to find that we don't already do this. > >> > > >> > Sorry. This is on my list of things to do. > >> > >> +hans > >> > >> I agree with the direction of moving to use target attributes instead of > >> relying on flaky ifdefs, but without any errors or warnings here this is > >> a pretty serious diagnostic regression. > >> > >> I think we should revert this on the 3.7 branch. It can stay as is on > >> trunk assuming the diagnostics are coming soon. > >> > >> Right now we end up in spaces where we get crashes in the backend > >> instead of a sensible error in far too many situations. Notably: > >> > >> https://llvm.org/bugs/show_bug.cgi?id=24125 > >> https://llvm.org/bugs/show_bug.cgi?id=24087 > >> https://llvm.org/bugs/show_bug.cgi?id=24335 > >> > >> Additionally, I'm told this causes issues with configure scripts > >> misdetecting available features, as well as strange compatibility issues > >> like the one that led to this thread. > >> > >> This feature is woefully incomplete. We need the warnings/errors for it > >> to be acceptable quality. > >> > >> > > >> > 2. We could support some automatic transfer of the target > attribute > >> > to the > >> > caller when calling these intrinsics, but I worry that this is too > >> > confusing. > >> > > >> > We could, but it's probably better to leave it as is. > >> > > >> > -eric > >> > > >> > > >> > Implicitly setting a target attribute may block inlining that the > >> > user > >> > expected to happen, for example. Alternatively, there may be a > >> > dynamic > >> > cpuid check in the same function between SSE2 and AVX variants of > >> > the same > >> > algorithm, and now the SSE2 loop will unexpectedly use AVX > >> > instructions. > >> > > >> > So we should probably settle with telling the user to add -msseNN > or > >> > __atribute__((target(("sseNN")))). > >> > > >> > On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar > wrote: > >> > > >> > I've run into some code which no longer compiles because of > two > >> > recent > >> > changes: > >> > > >> > 41885d3 Update the intel intrinsic headers to use the target > >> > attribute support. > >> > 695aff1 Use a define for per-file function attributes for > the > >> > Intel > >> > intrinsic headers. > >> > > >> > Specifically, one project defines its own SSE4.1 emulation > >> > routines > >> > when the real intrinsics aren't available. This is a problem > >> > because > >> > they've reused the names of the intrinsics. E.g; > >> > > >> > > #ifndef __SSE4_1__ > >> > > #define _mm_extract_epi8(a_, ndx) ({ ... }) > >> > > static inline __m128i _mm_blendv_epi8(__m128i a, __m128i b, > >> > __m128i > >> > mask) { ... } > >> > > ... > >> > > #endif > >> > > >> > SSE4.1 intrinsics now leak into the project when it's being > >> > compiled > >> > for targets without SSE4.1 support. Compilation fails with > >> > "error: > >> > redefinition ...". > >> > > >> > When these changes were initially being discussed, I think our > >> > stance > >> > was that we shouldn't support code like this [1]. However, we > >> > should > >> > reconsider for the sake of avoiding breakage. AFAICT, we would > >> > need to > >> > revert just two types of changes: > >> > > >> > In lib/Headers/__wmmintrin_aes.h: > >> > > >> > > -#if defined (__SSE4_2__) || defined (__SSE4_1__) > >> > > #include > >> > > -#endif > >> > > >> > In lib/Headers/smmintrin.h: > >> > > >> > > -#ifndef __SSE4_1__ > >> > > -#error "SSE4.1 instruction set not enabled" > >> > > -#else > >> > > >> > I don't see any downsides to reintroducing these guards. If > >> > everyone's > >> > OK with this, I can mail a patch in. The alternative is to > have > >> > clients rewrite their emulation layers like this: > >> > > >> > > #ifdef __SSE4_1__ > >> > > #define compat_mm_extract_epi8 _mm_extract_epi8 > >> > > static inline __m128i combat_mm_blendv_epi8(__m128i a, > __m128i > >> > b, > >> > __m128i mask) __attribute__((__target__(("sse4.1")))) { > >> > > return _mm_blendv_epi8(a, b, mask); > >> > > } > >> > > ... > >> > > #else /* OK, no native SSE 4.1. Define our own. */ > >> > > #define compat_mm_extract_epi8(a_, ndx) ({ ... }) > >> > > static inline __m128i compat_mm_blendv_epi8(__m128i a, > __m128i > >> > b, > >> > __m128i mask) { ... } > >> > > ... > >> > > #endif > >> > > >> > ... and then replace all calls to intrinsics with calls to the > >> > new > >> > compatibility routines. This seems like a lot of tedious work, > >> > and I'd > >> > love to help people avoid it :). > >> > > >> > Let me know what you think! > >> > > >> > vedant > >> > > >> > [1] http://lists.cs.uiuc.edu/pipermail/cfe-commits/ > >> > Week-of-Mon-20150615/131192.html > _______________________________________________ > cfe-dev mailing list > cfe-dev@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > --20cf30334c17307a5b051c6fb8ce Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: Quoted-printable
On M= on, Aug 3, 2015 at 1:37 PM, Hans Wennborg <hans@chromium.org> wrote:
On Sun, Aug 2, = 2015 at 5:09 PM, Chandler Carruth <chandlerc@google.com> wrote:
> Would cherrypicking the diagnostics to the 3.7 branch be better or wor= se?
> (I'm of two minds, curious what others think...)

The alternative of reverting would have the downside of missing out = on
some of the target attribute functionality in 3.7. I haven't been
following closely enough to determine how great a loss that would be,
but as far as I understand, this is still a work in progress, right?

The alternative of cherry-picking diagnostics has the problem that I
don't think any diagnostics have landed yet :-)

My inclination is to revert back to safety here. The revert would be
pretty hairy though, as there have been a number of changes to these
files since r239883 landed. I'm still trying to figure this out.

Here's my view: sometimes features don't = make the completeness bar in time for a release. That's normal and not = something to be apologetic / troubled about; we'll get the feature in t= he next release, and we shouldn't try to rush it in. This feature is un= usual in that the transitional period has temporarily left us with (arguabl= y) a regression, and that happened to be the state when the 3.7 branch was = cut, but our normal philosophies should apply: we have time-based releases,= not feature-based ones, and our first response to regressions is to revert= to green. So I think reverting r239883 is the way to go. If the diagnostic= work is done in time, we can consider unreverting and patching it across.<= /div>

> On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner <mail@justinbogner.com> wrote:
>>
>> Eric Christopher <echrist= o@gmail.com> writes:
>> > On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <rnk@google.com> wrote:
>> >
>> >=C2=A0 =C2=A0 =C2=A0I'm opposed to this. Going forward, I = would really like target
>> > intrinsics
>> >=C2=A0 =C2=A0 =C2=A0to be available regardless of the current = feature set, so users
>> > don't need
>> >=C2=A0 =C2=A0 =C2=A0hacks like these.
>> >
>> > Agreed.
>> >
>> >
>> >=C2=A0 =C2=A0 =C2=A0I see two ways to do this with different t= radeoffs:
>> >=C2=A0 =C2=A0 =C2=A01. Diagnose missing target attributes when= calling the intel
>> > intrinsics. I
>> >=C2=A0 =C2=A0 =C2=A0was surprised to find that we don't al= ready do this.
>> >
>> > Sorry. This is on my list of things to do.
>>
>> +hans
>>
>> I agree with the direction of moving to use target attributes inst= ead of
>> relying on flaky ifdefs, but without any errors or warnings here t= his is
>> a pretty serious diagnostic regression.
>>
>> I think we should revert this on the 3.7 branch. It can stay as is= on
>> trunk assuming the diagnostics are coming soon.
>>
>> Right now we end up in spaces where we get crashes in the backend<= br> >> instead of a sensible error in far too many situations. Notably: >>
>>=C2=A0 =C2=A0https://llvm.org/bug= s/show_bug.cgi?id=3D24125
>>=C2=A0 =C2=A0https://llvm.org/bug= s/show_bug.cgi?id=3D24087
>>=C2=A0 =C2=A0https://llvm.org/bug= s/show_bug.cgi?id=3D24335
>>
>> Additionally, I'm told this causes issues with configure scrip= ts
>> misdetecting available features, as well as strange compatibility = issues
>> like the one that led to this thread.
>>
>> This feature is woefully incomplete. We need the warnings/errors f= or it
>> to be acceptable quality.
>>
>> >
>> >=C2=A0 =C2=A0 =C2=A02. We could support some automatic transfe= r of the target attribute
>> > to the
>> >=C2=A0 =C2=A0 =C2=A0caller when calling these intrinsics, but = I worry that this is too
>> >=C2=A0 =C2=A0 =C2=A0confusing.
>> >
>> > We could, but it's probably better to leave it as is.
>> >
>> > -eric
>> >
>> >
>> >=C2=A0 =C2=A0 =C2=A0Implicitly setting a target attribute may = block inlining that the
>> > user
>> >=C2=A0 =C2=A0 =C2=A0expected to happen, for example. Alternati= vely, there may be a
>> > dynamic
>> >=C2=A0 =C2=A0 =C2=A0cpuid check in the same function between S= SE2 and AVX variants of
>> > the same
>> >=C2=A0 =C2=A0 =C2=A0algorithm, and now the SSE2 loop will unex= pectedly use AVX
>> > instructions.
>> >
>> >=C2=A0 =C2=A0 =C2=A0So we should probably settle with telling = the user to add -msseNN or
>> >=C2=A0 =C2=A0 =C2=A0__atribute__((target(("sseNN")))= ).
>> >
>> >=C2=A0 =C2=A0 =C2=A0On Thu, Jul 30, 2015 at 8:53 AM, Vedant Ku= mar <vsk@apple.com> wrote:
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I've run into some code = which no longer compiles because of two
>> > recent
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0changes:
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A041885d3 Update the in= tel intrinsic headers to use the target
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0attribute support.
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0695aff1 Use a define = for per-file function attributes for the
>> > Intel
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0intrinsic headers.
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Specifically, one project de= fines its own SSE4.1 emulation
>> > routines
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0when the real intrinsics are= n't available. This is a problem
>> > because
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0they've reused the names= of the intrinsics. E.g;
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #ifndef __SSE4_1__
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define _mm_extract_epi= 8(a_, ndx) ({ ... })
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i _= mm_blendv_epi8(__m128i a, __m128i b,
>> > __m128i
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mask) { ... }
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #endif
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SSE4.1 intrinsics now leak i= nto the project when it's being
>> > compiled
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0for targets without SSE4.1 s= upport. Compilation fails with
>> > "error:
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0redefinition ...".
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0When these changes were init= ially being discussed, I think our
>> > stance
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was that we shouldn't su= pport code like this [1]. However, we
>> > should
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reconsider for the sake of a= voiding breakage. AFAICT, we would
>> > need to
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0revert just two types of cha= nges:
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In lib/Headers/__wmmintrin_a= es.h:
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#if defined (__SSE4_2_= _) || defined (__SSE4_1__)
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>=C2=A0 #include <smmi= ntrin.h>
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#endif
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In lib/Headers/smmintrin.h:<= br> >> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#ifndef __SSE4_1__
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#error "SSE4.1 in= struction set not enabled"
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#else
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I don't see any downside= s to reintroducing these guards. If
>> > everyone's
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0OK with this, I can mail a p= atch in. The alternative is to have
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0clients rewrite their emulat= ion layers like this:
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #ifdef __SSE4_1__
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define compat_mm_extra= ct_epi8 _mm_extract_epi8
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i c= ombat_mm_blendv_epi8(__m128i a, __m128i
>> > b,
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i mask) __attribute__(= (__target__(("sse4.1")))) {
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0return _mm_= blendv_epi8(a, b, mask);
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> }
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #else /* OK, no native = SSE 4.1. Define our own. */
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define compat_mm_extra= ct_epi8(a_, ndx) ({ ... })
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i c= ompat_mm_blendv_epi8(__m128i a, __m128i
>> > b,
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i mask) { ... }
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #endif
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0... and then replace all cal= ls to intrinsics with calls to the
>> > new
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0compatibility routines. This= seems like a lot of tedious work,
>> > and I'd
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0love to help people avoid it= :).
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Let me know what you think!<= br> >> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vedant
>> >
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[1] ht= tp://lists.cs.uiuc.edu/pipermail/cfe-commits/
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Week-of-Mon-20150615/131192.= html
_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

--20cf30334c17307a5b051c6fb8ce-- --===============3998022752456802752== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ cfe-dev mailing list cfe-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev --===============3998022752456802752==--