From cfe-dev Mon Aug 03 00:09:37 2015 From: Chandler Carruth Date: Mon, 03 Aug 2015 00:09:37 +0000 To: cfe-dev Subject: Re: [cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers Message-Id: X-MARC-Message: https://marc.info/?l=cfe-dev&m=143856080528513 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============8542080555258006993==" --===============8542080555258006993== Content-Type: multipart/alternative; boundary=047d7b5d5d32db002e051c5cfc70 --047d7b5d5d32db002e051c5cfc70 Content-Type: text/plain; charset=UTF-8 Would cherrypicking the diagnostics to the 3.7 branch be better or worse? (I'm of two minds, curious what others think...) On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner wrote: > Eric Christopher writes: > > On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner wrote: > > > > I'm opposed to this. Going forward, I would really like target > intrinsics > > to be available regardless of the current feature set, so users > don't need > > hacks like these. > > > > Agreed. > > > > > > I see two ways to do this with different tradeoffs: > > 1. Diagnose missing target attributes when calling the intel > intrinsics. I > > was surprised to find that we don't already do this. > > > > Sorry. This is on my list of things to do. > > +hans > > I agree with the direction of moving to use target attributes instead of > relying on flaky ifdefs, but without any errors or warnings here this is > a pretty serious diagnostic regression. > > I think we should revert this on the 3.7 branch. It can stay as is on > trunk assuming the diagnostics are coming soon. > > Right now we end up in spaces where we get crashes in the backend > instead of a sensible error in far too many situations. Notably: > > https://llvm.org/bugs/show_bug.cgi?id=24125 > https://llvm.org/bugs/show_bug.cgi?id=24087 > https://llvm.org/bugs/show_bug.cgi?id=24335 > > Additionally, I'm told this causes issues with configure scripts > misdetecting available features, as well as strange compatibility issues > like the one that led to this thread. > > This feature is woefully incomplete. We need the warnings/errors for it > to be acceptable quality. > > > > > 2. We could support some automatic transfer of the target attribute > to the > > caller when calling these intrinsics, but I worry that this is too > > confusing. > > > > We could, but it's probably better to leave it as is. > > > > -eric > > > > > > Implicitly setting a target attribute may block inlining that the > user > > expected to happen, for example. Alternatively, there may be a > dynamic > > cpuid check in the same function between SSE2 and AVX variants of > the same > > algorithm, and now the SSE2 loop will unexpectedly use AVX > instructions. > > > > So we should probably settle with telling the user to add -msseNN or > > __atribute__((target(("sseNN")))). > > > > On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar wrote: > > > > I've run into some code which no longer compiles because of two > recent > > changes: > > > > 41885d3 Update the intel intrinsic headers to use the target > > attribute support. > > 695aff1 Use a define for per-file function attributes for the > Intel > > intrinsic headers. > > > > Specifically, one project defines its own SSE4.1 emulation > routines > > when the real intrinsics aren't available. This is a problem > because > > they've reused the names of the intrinsics. E.g; > > > > > #ifndef __SSE4_1__ > > > #define _mm_extract_epi8(a_, ndx) ({ ... }) > > > static inline __m128i _mm_blendv_epi8(__m128i a, __m128i b, > __m128i > > mask) { ... } > > > ... > > > #endif > > > > SSE4.1 intrinsics now leak into the project when it's being > compiled > > for targets without SSE4.1 support. Compilation fails with > "error: > > redefinition ...". > > > > When these changes were initially being discussed, I think our > stance > > was that we shouldn't support code like this [1]. However, we > should > > reconsider for the sake of avoiding breakage. AFAICT, we would > need to > > revert just two types of changes: > > > > In lib/Headers/__wmmintrin_aes.h: > > > > > -#if defined (__SSE4_2__) || defined (__SSE4_1__) > > > #include > > > -#endif > > > > In lib/Headers/smmintrin.h: > > > > > -#ifndef __SSE4_1__ > > > -#error "SSE4.1 instruction set not enabled" > > > -#else > > > > I don't see any downsides to reintroducing these guards. If > everyone's > > OK with this, I can mail a patch in. The alternative is to have > > clients rewrite their emulation layers like this: > > > > > #ifdef __SSE4_1__ > > > #define compat_mm_extract_epi8 _mm_extract_epi8 > > > static inline __m128i combat_mm_blendv_epi8(__m128i a, __m128i > b, > > __m128i mask) __attribute__((__target__(("sse4.1")))) { > > > return _mm_blendv_epi8(a, b, mask); > > > } > > > ... > > > #else /* OK, no native SSE 4.1. Define our own. */ > > > #define compat_mm_extract_epi8(a_, ndx) ({ ... }) > > > static inline __m128i compat_mm_blendv_epi8(__m128i a, __m128i > b, > > __m128i mask) { ... } > > > ... > > > #endif > > > > ... and then replace all calls to intrinsics with calls to the > new > > compatibility routines. This seems like a lot of tedious work, > and I'd > > love to help people avoid it :). > > > > Let me know what you think! > > > > vedant > > > > [1] http://lists.cs.uiuc.edu/pipermail/cfe-commits/ > > Week-of-Mon-20150615/131192.html > > _______________________________________________ > > cfe-dev mailing list > > cfe-dev@cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > > > _______________________________________________ > > cfe-dev mailing list > > cfe-dev@cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > _______________________________________________ > cfe-dev mailing list > cfe-dev@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > --047d7b5d5d32db002e051c5cfc70 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: Quoted-printable
Would cherrypicking the diagnostics to the 3.7 branch be b= etter or worse? (I'm of two minds, curious what others think...)
<= br>
On Sun, Aug 2, 2015 at 12:00= PM Justin Bogner <mail@justinb= ogner.com> wrote:
Eric Chr= istopher <echris= to@gmail.com> writes:
> On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <rnk@google.com> wrote:
>
>=C2=A0 =C2=A0 =C2=A0I'm opposed to this. Going forward, I would rea= lly like target intrinsics
>=C2=A0 =C2=A0 =C2=A0to be available regardless of the current feature s= et, so users don't need
>=C2=A0 =C2=A0 =C2=A0hacks like these.
>
> Agreed.
> =C2=A0
>
>=C2=A0 =C2=A0 =C2=A0I see two ways to do this with different tradeoffs:=
>=C2=A0 =C2=A0 =C2=A01. Diagnose missing target attributes when calling = the intel intrinsics. I
>=C2=A0 =C2=A0 =C2=A0was surprised to find that we don't already do = this.
>
> Sorry. This is on my list of things to do.

+hans

I agree with the direction of moving to use target attributes instead of
relying on flaky ifdefs, but without any errors or warnings here this is
a pretty serious diagnostic regression.

I think we should revert this on the 3.7 branch. It can stay as is on
trunk assuming the diagnostics are coming soon.

Right now we end up in spaces where we get crashes in the backend
instead of a sensible error in far too many situations. Notably:

=C2=A0 https://llvm.org/bugs/show_bug.cg= i?id=3D24125
=C2=A0 https://llvm.org/bugs/show_bug.cg= i?id=3D24087
=C2=A0 https://llvm.org/bugs/show_bug.cg= i?id=3D24335

Additionally, I'm told this causes issues with configure scripts
misdetecting available features, as well as strange compatibility issues
like the one that led to this thread.

This feature is woefully incomplete. We need the warnings/errors for it
to be acceptable quality.

>
>=C2=A0 =C2=A0 =C2=A02. We could support some automatic transfer of the = target attribute to the
>=C2=A0 =C2=A0 =C2=A0caller when calling these intrinsics, but I worry t= hat this is too
>=C2=A0 =C2=A0 =C2=A0confusing.
>
> We could, but it's probably better to leave it as is.
>
> -eric
> =C2=A0
>
>=C2=A0 =C2=A0 =C2=A0Implicitly setting a target attribute may block inl= ining that the user
>=C2=A0 =C2=A0 =C2=A0expected to happen, for example. Alternatively, the= re may be a dynamic
>=C2=A0 =C2=A0 =C2=A0cpuid check in the same function between SSE2 and A= VX variants of the same
>=C2=A0 =C2=A0 =C2=A0algorithm, and now the SSE2 loop will unexpectedly = use AVX instructions.
>
>=C2=A0 =C2=A0 =C2=A0So we should probably settle with telling the user = to add -msseNN or
>=C2=A0 =C2=A0 =C2=A0__atribute__((target(("sseNN")))).
>
>=C2=A0 =C2=A0 =C2=A0On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <<= a href=3D"mailto:vsk@apple.com" target=3D"_blank">vsk@apple.com> wro= te:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I've run into some code which no = longer compiles because of two recent
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0changes:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 41885d3 Update the intel intri= nsic headers to use the target
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0attribute support.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 695aff1 Use a define for per-f= ile function attributes for the Intel
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0intrinsic headers.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Specifically, one project defines its= own SSE4.1 emulation routines
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0when the real intrinsics aren't a= vailable. This is a problem because
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0they've reused the names of the i= ntrinsics. E.g;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #ifndef __SSE4_1__
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define _mm_extract_epi8(a_, ndx= ) ({ ... })
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i _mm_blendv= _epi8(__m128i a, __m128i b, __m128i
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mask) { ... }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #endif
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SSE4.1 intrinsics now leak into the p= roject when it's being compiled
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0for targets without SSE4.1 support. C= ompilation fails with "error:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0redefinition ...".
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0When these changes were initially bei= ng discussed, I think our stance
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was that we shouldn't support cod= e like this [1]. However, we should
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reconsider for the sake of avoiding b= reakage. AFAICT, we would need to
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0revert just two types of changes:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In lib/Headers/__wmmintrin_aes.h:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#if defined (__SSE4_2__) || def= ined (__SSE4_1__)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>=C2=A0 #include <smmintrin.h&g= t;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#endif
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In lib/Headers/smmintrin.h:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#ifndef __SSE4_1__
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#error "SSE4.1 instruction= set not enabled"
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#else
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I don't see any downsides to rein= troducing these guards. If everyone's
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0OK with this, I can mail a patch in. = The alternative is to have
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0clients rewrite their emulation layer= s like this:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #ifdef __SSE4_1__
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define compat_mm_extract_epi8 _= mm_extract_epi8
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i combat_mm_= blendv_epi8(__m128i a, __m128i b,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i mask) __attribute__((__target= __(("sse4.1")))) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0return _mm_blendv_ep= i8(a, b, mask);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #else /* OK, no native SSE 4.1. = Define our own. */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define compat_mm_extract_epi8(a= _, ndx) ({ ... })
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i compat_mm_= blendv_epi8(__m128i a, __m128i b,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i mask) { ... }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #endif
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0... and then replace all calls to int= rinsics with calls to the new
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0compatibility routines. This seems li= ke a lot of tedious work, and I'd
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0love to help people avoid it :).
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Let me know what you think!
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vedant
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[1] http://list= s.cs.uiuc.edu/pipermail/cfe-commits/
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Week-of-Mon-20150615/131192.html
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_____________________________________= __________
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cfe-dev mailing list
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cfe-dev@cs.uiuc.edu
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0http://lists.= cs.uiuc.edu/mailman/listinfo/cfe-dev
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@cs.ui= uc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-= dev

_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.ed= u
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
--047d7b5d5d32db002e051c5cfc70-- --===============8542080555258006993== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ cfe-dev mailing list cfe-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev --===============8542080555258006993==--