'Re: [cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cfe-dev
Subject:    Re: [cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers
From:       Richard Smith <richard () metafoo ! co ! uk>
Date:       2015-08-03 22:30:40
Message-ID: CAOfiQqnhR2LxGKqkcKYiJ7407u-vLJ-nM859_hmKhrkEj7gJUg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

On Mon, Aug 3, 2015 at 1:37 PM, Hans Wennborg <hans@chromium.org> wrote:

> On Sun, Aug 2, 2015 at 5:09 PM, Chandler Carruth <chandlerc@google.com>
> wrote:
> > Would cherrypicking the diagnostics to the 3.7 branch be better or worse?
> > (I'm of two minds, curious what others think...)
>
> The alternative of reverting would have the downside of missing out on
> some of the target attribute functionality in 3.7. I haven't been
> following closely enough to determine how great a loss that would be,
> but as far as I understand, this is still a work in progress, right?
>
> The alternative of cherry-picking diagnostics has the problem that I
> don't think any diagnostics have landed yet :-)
>
> My inclination is to revert back to safety here. The revert would be
> pretty hairy though, as there have been a number of changes to these
> files since r239883 landed. I'm still trying to figure this out.

Here's my view: sometimes features don't make the completeness bar in time
for a release. That's normal and not something to be apologetic / troubled
about; we'll get the feature in the next release, and we shouldn't try to
rush it in. This feature is unusual in that the transitional period has
temporarily left us with (arguably) a regression, and that happened to be
the state when the 3.7 branch was cut, but our normal philosophies should
apply: we have time-based releases, not feature-based ones, and our first
response to regressions is to revert to green. So I think reverting r239883
is the way to go. If the diagnostic work is done in time, we can consider
unreverting and patching it across.

> On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner <mail@justinbogner.com>
> wrote:
> >>
> >> Eric Christopher <echristo@gmail.com> writes:
> >> > On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <rnk@google.com>
> wrote:
> >> >
> >> >     I'm opposed to this. Going forward, I would really like target
> >> > intrinsics
> >> >     to be available regardless of the current feature set, so users
> >> > don't need
> >> >     hacks like these.
> >> >
> >> > Agreed.
> >> >
> >> >
> >> >     I see two ways to do this with different tradeoffs:
> >> >     1. Diagnose missing target attributes when calling the intel
> >> > intrinsics. I
> >> >     was surprised to find that we don't already do this.
> >> >
> >> > Sorry. This is on my list of things to do.
> >>
> >> +hans
> >>
> >> I agree with the direction of moving to use target attributes instead of
> >> relying on flaky ifdefs, but without any errors or warnings here this is
> >> a pretty serious diagnostic regression.
> >>
> >> I think we should revert this on the 3.7 branch. It can stay as is on
> >> trunk assuming the diagnostics are coming soon.
> >>
> >> Right now we end up in spaces where we get crashes in the backend
> >> instead of a sensible error in far too many situations. Notably:
> >>
> >>   https://llvm.org/bugs/show_bug.cgi?id=24125
> >>   https://llvm.org/bugs/show_bug.cgi?id=24087
> >>   https://llvm.org/bugs/show_bug.cgi?id=24335
> >>
> >> Additionally, I'm told this causes issues with configure scripts
> >> misdetecting available features, as well as strange compatibility issues
> >> like the one that led to this thread.
> >>
> >> This feature is woefully incomplete. We need the warnings/errors for it
> >> to be acceptable quality.
> >>
> >> >
> >> >     2. We could support some automatic transfer of the target
> attribute
> >> > to the
> >> >     caller when calling these intrinsics, but I worry that this is too
> >> >     confusing.
> >> >
> >> > We could, but it's probably better to leave it as is.
> >> >
> >> > -eric
> >> >
> >> >
> >> >     Implicitly setting a target attribute may block inlining that the
> >> > user
> >> >     expected to happen, for example. Alternatively, there may be a
> >> > dynamic
> >> >     cpuid check in the same function between SSE2 and AVX variants of
> >> > the same
> >> >     algorithm, and now the SSE2 loop will unexpectedly use AVX
> >> > instructions.
> >> >
> >> >     So we should probably settle with telling the user to add -msseNN
> or
> >> >     __atribute__((target(("sseNN")))).
> >> >
> >> >     On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <vsk@apple.com>
> wrote:
> >> >
> >> >         I've run into some code which no longer compiles because of
> two
> >> > recent
> >> >         changes:
> >> >
> >> >           41885d3 Update the intel intrinsic headers to use the target
> >> >         attribute support.
> >> >           695aff1 Use a define for per-file function attributes for
> the
> >> > Intel
> >> >         intrinsic headers.
> >> >
> >> >         Specifically, one project defines its own SSE4.1 emulation
> >> > routines
> >> >         when the real intrinsics aren't available. This is a problem
> >> > because
> >> >         they've reused the names of the intrinsics. E.g;
> >> >
> >> >         > #ifndef __SSE4_1__
> >> >         > #define _mm_extract_epi8(a_, ndx) ({ ... })
> >> >         > static inline __m128i _mm_blendv_epi8(__m128i a, __m128i b,
> >> > __m128i
> >> >         mask) { ... }
> >> >         > ...
> >> >         > #endif
> >> >
> >> >         SSE4.1 intrinsics now leak into the project when it's being
> >> > compiled
> >> >         for targets without SSE4.1 support. Compilation fails with
> >> > "error:
> >> >         redefinition ...".
> >> >
> >> >         When these changes were initially being discussed, I think our
> >> > stance
> >> >         was that we shouldn't support code like this [1]. However, we
> >> > should
> >> >         reconsider for the sake of avoiding breakage. AFAICT, we would
> >> > need to
> >> >         revert just two types of changes:
> >> >
> >> >         In lib/Headers/__wmmintrin_aes.h:
> >> >
> >> >         > -#if defined (__SSE4_2__) || defined (__SSE4_1__)
> >> >         >  #include <smmintrin.h>
> >> >         > -#endif
> >> >
> >> >         In lib/Headers/smmintrin.h:
> >> >
> >> >         > -#ifndef __SSE4_1__
> >> >         > -#error "SSE4.1 instruction set not enabled"
> >> >         > -#else
> >> >
> >> >         I don't see any downsides to reintroducing these guards. If
> >> > everyone's
> >> >         OK with this, I can mail a patch in. The alternative is to
> have
> >> >         clients rewrite their emulation layers like this:
> >> >
> >> >         > #ifdef __SSE4_1__
> >> >         > #define compat_mm_extract_epi8 _mm_extract_epi8
> >> >         > static inline __m128i combat_mm_blendv_epi8(__m128i a,
> __m128i
> >> > b,
> >> >         __m128i mask) __attribute__((__target__(("sse4.1")))) {
> >> >         >   return _mm_blendv_epi8(a, b, mask);
> >> >         > }
> >> >         > ...
> >> >         > #else /* OK, no native SSE 4.1. Define our own. */
> >> >         > #define compat_mm_extract_epi8(a_, ndx) ({ ... })
> >> >         > static inline __m128i compat_mm_blendv_epi8(__m128i a,
> __m128i
> >> > b,
> >> >         __m128i mask) { ... }
> >> >         > ...
> >> >         > #endif
> >> >
> >> >         ... and then replace all calls to intrinsics with calls to the
> >> > new
> >> >         compatibility routines. This seems like a lot of tedious work,
> >> > and I'd
> >> >         love to help people avoid it :).
> >> >
> >> >         Let me know what you think!
> >> >
> >> >         vedant
> >> >
> >> >         [1] http://lists.cs.uiuc.edu/pipermail/cfe-commits/
> >> >         Week-of-Mon-20150615/131192.html
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

[Attachment #5 (text/html)]

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Aug 3, 2015 \
at 1:37 PM, Hans Wennborg <span dir="ltr">&lt;<a href="mailto:hans@chromium.org" \
target="_blank">hans@chromium.org</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><span class="">On Sun, Aug 2, 2015 at 5:09 PM, Chandler \
Carruth &lt;<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>&gt; \
wrote:<br> &gt; Would cherrypicking the diagnostics to the 3.7 branch be better or \
worse?<br> &gt; (I&#39;m of two minds, curious what others think...)<br>
<br>
</span>The alternative of reverting would have the downside of missing out on<br>
some of the target attribute functionality in 3.7. I haven&#39;t been<br>
following closely enough to determine how great a loss that would be,<br>
but as far as I understand, this is still a work in progress, right?<br>
<br>
The alternative of cherry-picking diagnostics has the problem that I<br>
don&#39;t think any diagnostics have landed yet :-)<br>
<br>
My inclination is to revert back to safety here. The revert would be<br>
pretty hairy though, as there have been a number of changes to these<br>
files since r239883 landed. I&#39;m still trying to figure this \
out.</blockquote><div><br></div><div>Here&#39;s my view: sometimes features don&#39;t \
make the completeness bar in time for a release. That&#39;s normal and not something \
to be apologetic / troubled about; we&#39;ll get the feature in the next release, and \
we shouldn&#39;t try to rush it in. This feature is unusual in that the transitional \
period has temporarily left us with (arguably) a regression, and that happened to be \
the state when the 3.7 branch was cut, but our normal philosophies should apply: we \
have time-based releases, not feature-based ones, and our first response to \
regressions is to revert to green. So I think reverting r239883 is the way to go. If \
the diagnostic work is done in time, we can consider unreverting and patching it \
across.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div \
class="h5"> &gt; On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner &lt;<a \
href="mailto:mail@justinbogner.com">mail@justinbogner.com</a>&gt; wrote:<br> \
&gt;&gt;<br> &gt;&gt; Eric Christopher &lt;<a \
href="mailto:echristo@gmail.com">echristo@gmail.com</a>&gt; writes:<br> &gt;&gt; &gt; \
On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner &lt;<a \
href="mailto:rnk@google.com">rnk@google.com</a>&gt; wrote:<br> &gt;&gt; &gt;<br>
&gt;&gt; &gt;        I&#39;m opposed to this. Going forward, I would really like \
target<br> &gt;&gt; &gt; intrinsics<br>
&gt;&gt; &gt;        to be available regardless of the current feature set, so \
users<br> &gt;&gt; &gt; don&#39;t need<br>
&gt;&gt; &gt;        hacks like these.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Agreed.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;        I see two ways to do this with different tradeoffs:<br>
&gt;&gt; &gt;        1. Diagnose missing target attributes when calling the intel<br>
&gt;&gt; &gt; intrinsics. I<br>
&gt;&gt; &gt;        was surprised to find that we don&#39;t already do this.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Sorry. This is on my list of things to do.<br>
&gt;&gt;<br>
&gt;&gt; +hans<br>
&gt;&gt;<br>
&gt;&gt; I agree with the direction of moving to use target attributes instead of<br>
&gt;&gt; relying on flaky ifdefs, but without any errors or warnings here this is<br>
&gt;&gt; a pretty serious diagnostic regression.<br>
&gt;&gt;<br>
&gt;&gt; I think we should revert this on the 3.7 branch. It can stay as is on<br>
&gt;&gt; trunk assuming the diagnostics are coming soon.<br>
&gt;&gt;<br>
&gt;&gt; Right now we end up in spaces where we get crashes in the backend<br>
&gt;&gt; instead of a sensible error in far too many situations. Notably:<br>
&gt;&gt;<br>
&gt;&gt;     <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bu \
gs_show-5Fbug.cgi-3Fid-3D24125&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLi \
RvC_UQz6u3oG6GUNn7_wosSM&m=RdspM8_Pm6OLUKwmzkMy0csEp2E-tr3j9snM7PfaBB4&s=A_CJhKtp6KaEjhT-H5ZQ2qKLU3I9_uMr9WoK9RmLO3E&e=" \
rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24125</a><br> \
&gt;&gt;     <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bu \
gs_show-5Fbug.cgi-3Fid-3D24087&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLi \
RvC_UQz6u3oG6GUNn7_wosSM&m=RdspM8_Pm6OLUKwmzkMy0csEp2E-tr3j9snM7PfaBB4&s=9wzhomdN0ZqgXEtMzJg7TDE-uExIMYx96MlJjjn73Zw&e=" \
rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24087</a><br> \
&gt;&gt;     <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bu \
gs_show-5Fbug.cgi-3Fid-3D24335&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLi \
RvC_UQz6u3oG6GUNn7_wosSM&m=RdspM8_Pm6OLUKwmzkMy0csEp2E-tr3j9snM7PfaBB4&s=HeT2HWnCGK6E93QKy0rpnG_nXBH0pSZp7bbuUy1Q9yI&e=" \
rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24335</a><br> \
&gt;&gt;<br> &gt;&gt; Additionally, I&#39;m told this causes issues with configure \
scripts<br> &gt;&gt; misdetecting available features, as well as strange \
compatibility issues<br> &gt;&gt; like the one that led to this thread.<br>
&gt;&gt;<br>
&gt;&gt; This feature is woefully incomplete. We need the warnings/errors for it<br>
&gt;&gt; to be acceptable quality.<br>
&gt;&gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;        2. We could support some automatic transfer of the target \
attribute<br> &gt;&gt; &gt; to the<br>
&gt;&gt; &gt;        caller when calling these intrinsics, but I worry that this is \
too<br> &gt;&gt; &gt;        confusing.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; We could, but it&#39;s probably better to leave it as is.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; -eric<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;        Implicitly setting a target attribute may block inlining that \
the<br> &gt;&gt; &gt; user<br>
&gt;&gt; &gt;        expected to happen, for example. Alternatively, there may be \
a<br> &gt;&gt; &gt; dynamic<br>
&gt;&gt; &gt;        cpuid check in the same function between SSE2 and AVX variants \
of<br> &gt;&gt; &gt; the same<br>
&gt;&gt; &gt;        algorithm, and now the SSE2 loop will unexpectedly use AVX<br>
&gt;&gt; &gt; instructions.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;        So we should probably settle with telling the user to add \
-msseNN or<br> &gt;&gt; &gt;        __atribute__((target((&quot;sseNN&quot;)))).<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;        On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar &lt;<a \
href="mailto:vsk@apple.com">vsk@apple.com</a>&gt; wrote:<br> &gt;&gt; &gt;<br>
&gt;&gt; &gt;              I&#39;ve run into some code which no longer compiles \
because of two<br> &gt;&gt; &gt; recent<br>
&gt;&gt; &gt;              changes:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;                 41885d3 Update the intel intrinsic headers to use the \
target<br> &gt;&gt; &gt;              attribute support.<br>
&gt;&gt; &gt;                 695aff1 Use a define for per-file function attributes \
for the<br> &gt;&gt; &gt; Intel<br>
&gt;&gt; &gt;              intrinsic headers.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              Specifically, one project defines its own SSE4.1 \
emulation<br> &gt;&gt; &gt; routines<br>
&gt;&gt; &gt;              when the real intrinsics aren&#39;t available. This is a \
problem<br> &gt;&gt; &gt; because<br>
&gt;&gt; &gt;              they&#39;ve reused the names of the intrinsics. E.g;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              &gt; #ifndef __SSE4_1__<br>
&gt;&gt; &gt;              &gt; #define _mm_extract_epi8(a_, ndx) ({ ... })<br>
&gt;&gt; &gt;              &gt; static inline __m128i _mm_blendv_epi8(__m128i a, \
__m128i b,<br> &gt;&gt; &gt; __m128i<br>
&gt;&gt; &gt;              mask) { ... }<br>
&gt;&gt; &gt;              &gt; ...<br>
&gt;&gt; &gt;              &gt; #endif<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              SSE4.1 intrinsics now leak into the project when it&#39;s \
being<br> &gt;&gt; &gt; compiled<br>
&gt;&gt; &gt;              for targets without SSE4.1 support. Compilation fails \
with<br> &gt;&gt; &gt; &quot;error:<br>
&gt;&gt; &gt;              redefinition ...&quot;.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              When these changes were initially being discussed, I think \
our<br> &gt;&gt; &gt; stance<br>
&gt;&gt; &gt;              was that we shouldn&#39;t support code like this [1]. \
However, we<br> &gt;&gt; &gt; should<br>
&gt;&gt; &gt;              reconsider for the sake of avoiding breakage. AFAICT, we \
would<br> &gt;&gt; &gt; need to<br>
&gt;&gt; &gt;              revert just two types of changes:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              In lib/Headers/__wmmintrin_aes.h:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              &gt; -#if defined (__SSE4_2__) || defined (__SSE4_1__)<br>
&gt;&gt; &gt;              &gt;   #include &lt;smmintrin.h&gt;<br>
&gt;&gt; &gt;              &gt; -#endif<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              In lib/Headers/smmintrin.h:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              &gt; -#ifndef __SSE4_1__<br>
&gt;&gt; &gt;              &gt; -#error &quot;SSE4.1 instruction set not \
enabled&quot;<br> &gt;&gt; &gt;              &gt; -#else<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              I don&#39;t see any downsides to reintroducing these \
guards. If<br> &gt;&gt; &gt; everyone&#39;s<br>
&gt;&gt; &gt;              OK with this, I can mail a patch in. The alternative is to \
have<br> &gt;&gt; &gt;              clients rewrite their emulation layers like \
this:<br> &gt;&gt; &gt;<br>
&gt;&gt; &gt;              &gt; #ifdef __SSE4_1__<br>
&gt;&gt; &gt;              &gt; #define compat_mm_extract_epi8 _mm_extract_epi8<br>
&gt;&gt; &gt;              &gt; static inline __m128i combat_mm_blendv_epi8(__m128i \
a, __m128i<br> &gt;&gt; &gt; b,<br>
&gt;&gt; &gt;              __m128i mask) \
__attribute__((__target__((&quot;sse4.1&quot;)))) {<br> &gt;&gt; &gt;              \
&gt;     return _mm_blendv_epi8(a, b, mask);<br> &gt;&gt; &gt;              &gt; \
}<br> &gt;&gt; &gt;              &gt; ...<br>
&gt;&gt; &gt;              &gt; #else /* OK, no native SSE 4.1. Define our own. \
*/<br> &gt;&gt; &gt;              &gt; #define compat_mm_extract_epi8(a_, ndx) ({ ... \
})<br> &gt;&gt; &gt;              &gt; static inline __m128i \
compat_mm_blendv_epi8(__m128i a, __m128i<br> &gt;&gt; &gt; b,<br>
&gt;&gt; &gt;              __m128i mask) { ... }<br>
&gt;&gt; &gt;              &gt; ...<br>
&gt;&gt; &gt;              &gt; #endif<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              ... and then replace all calls to intrinsics with calls to \
the<br> &gt;&gt; &gt; new<br>
&gt;&gt; &gt;              compatibility routines. This seems like a lot of tedious \
work,<br> &gt;&gt; &gt; and I&#39;d<br>
&gt;&gt; &gt;              love to help people avoid it :).<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              Let me know what you think!<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              vedant<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;              [1] <a \
href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/" rel="noreferrer" \
target="_blank">http://lists.cs.uiuc.edu/pipermail/cfe-commits/</a><br> &gt;&gt; &gt; \
Week-of-Mon-20150615/131192.html<br> \
_______________________________________________<br> cfe-dev mailing list<br>
<a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" rel="noreferrer" \
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br> \
</div></div></blockquote></div><br></div></div>

_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

[prev in list] [next in list] [prev in thread] [next in thread]