On Sun, Aug 2, 2015 at 12:00= PM Justin Bogner <mail@justinb= ogner.com> wrote:

Eric Chr= istopher <echris= to@gmail.com> writes:
> On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <rnk@google.com> wrote:
>
>=C2=A0 =C2=A0 =C2=A0I'm opposed to this. Going forward, I would rea= lly like target intrinsics
>=C2=A0 =C2=A0 =C2=A0to be available regardless of the current feature s= et, so users don't need
>=C2=A0 =C2=A0 =C2=A0hacks like these.
>
> Agreed.
> =C2=A0
>
>=C2=A0 =C2=A0 =C2=A0I see two ways to do this with different tradeoffs:=
>=C2=A0 =C2=A0 =C2=A01. Diagnose missing target attributes when calling = the intel intrinsics. I
>=C2=A0 =C2=A0 =C2=A0was surprised to find that we don't already do = this.
>
> Sorry. This is on my list of things to do.

+hans

I agree with the direction of moving to use target attributes instead of
relying on flaky ifdefs, but without any errors or warnings here this is
a pretty serious diagnostic regression.

I think we should revert this on the 3.7 branch. It can stay as is on
trunk assuming the diagnostics are coming soon.

Right now we end up in spaces where we get crashes in the backend
instead of a sensible error in far too many situations. Notably:

=C2=A0 https://llvm.org/bugs/show_bug.cg= i?id=3D24125
=C2=A0 https://llvm.org/bugs/show_bug.cg= i?id=3D24087
=C2=A0 https://llvm.org/bugs/show_bug.cg= i?id=3D24335

Additionally, I'm told this causes issues with configure scripts
misdetecting available features, as well as strange compatibility issues
like the one that led to this thread.

This feature is woefully incomplete. We need the warnings/errors for it
to be acceptable quality.

>
>=C2=A0 =C2=A0 =C2=A02. We could support some automatic transfer of the = target attribute to the
>=C2=A0 =C2=A0 =C2=A0caller when calling these intrinsics, but I worry t= hat this is too
>=C2=A0 =C2=A0 =C2=A0confusing.
>
> We could, but it's probably better to leave it as is.
>
> -eric
> =C2=A0
>
>=C2=A0 =C2=A0 =C2=A0Implicitly setting a target attribute may block inl= ining that the user
>=C2=A0 =C2=A0 =C2=A0expected to happen, for example. Alternatively, the= re may be a dynamic
>=C2=A0 =C2=A0 =C2=A0cpuid check in the same function between SSE2 and A= VX variants of the same
>=C2=A0 =C2=A0 =C2=A0algorithm, and now the SSE2 loop will unexpectedly = use AVX instructions.
>
>=C2=A0 =C2=A0 =C2=A0So we should probably settle with telling the user = to add -msseNN or
>=C2=A0 =C2=A0 =C2=A0__atribute__((target(("sseNN")))).
>
>=C2=A0 =C2=A0 =C2=A0On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <<= a href=3D"mailto:vsk@apple.com" target=3D"_blank">vsk@apple.com> wro= te:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I've run into some code which no = longer compiles because of two recent
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0changes:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 41885d3 Update the intel intri= nsic headers to use the target
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0attribute support.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 695aff1 Use a define for per-f= ile function attributes for the Intel
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0intrinsic headers.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Specifically, one project defines its= own SSE4.1 emulation routines
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0when the real intrinsics aren't a= vailable. This is a problem because
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0they've reused the names of the i= ntrinsics. E.g;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #ifndef __SSE4_1__
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define _mm_extract_epi8(a_, ndx= ) ({ ... })
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i _mm_blendv= _epi8(__m128i a, __m128i b, __m128i
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mask) { ... }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #endif
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SSE4.1 intrinsics now leak into the p= roject when it's being compiled
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0for targets without SSE4.1 support. C= ompilation fails with "error:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0redefinition ...".
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0When these changes were initially bei= ng discussed, I think our stance
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was that we shouldn't support cod= e like this [1]. However, we should
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reconsider for the sake of avoiding b= reakage. AFAICT, we would need to
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0revert just two types of changes:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In lib/Headers/__wmmintrin_aes.h:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#if defined (__SSE4_2__) || def= ined (__SSE4_1__)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>=C2=A0 #include <smmintrin.h&g= t;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#endif
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In lib/Headers/smmintrin.h:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#ifndef __SSE4_1__
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#error "SSE4.1 instruction= set not enabled"
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> -#else
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I don't see any downsides to rein= troducing these guards. If everyone's
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0OK with this, I can mail a patch in. = The alternative is to have
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0clients rewrite their emulation layer= s like this:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #ifdef __SSE4_1__
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define compat_mm_extract_epi8 _= mm_extract_epi8
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i combat_mm_= blendv_epi8(__m128i a, __m128i b,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i mask) __attribute__((__target= __(("sse4.1")))) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0return _mm_blendv_ep= i8(a, b, mask);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #else /* OK, no native SSE 4.1. = Define our own. */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #define compat_mm_extract_epi8(a= _, ndx) ({ ... })
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> static inline __m128i compat_mm_= blendv_epi8(__m128i a, __m128i b,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i mask) { ... }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> ...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> #endif
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0... and then replace all calls to int= rinsics with calls to the new
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0compatibility routines. This seems li= ke a lot of tedious work, and I'd
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0love to help people avoid it :).
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Let me know what you think!
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vedant
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[1] http://list= s.cs.uiuc.edu/pipermail/cfe-commits/
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Week-of-Mon-20150615/131192.html
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_____________________________________= __________
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cfe-dev mailing list
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cfe-dev@cs.uiuc.edu
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0http://lists.= cs.uiuc.edu/mailman/listinfo/cfe-dev
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@cs.ui= uc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-= dev

_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.ed= u
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev