'Re: Validity of SUBREG+AND-imm transformations'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    Re: Validity of SUBREG+AND-imm transformations
From:       Kyrill Tkachov <kyrylo.tkachov () foss ! arm ! com>
Date:       2016-02-29 10:51:24
Message-ID: 56D422AC.2000703 () foss ! arm ! com
[Download RAW message or body]

Hi Jeff,

On 26/02/16 21:24, Jeff Law wrote:
> On 02/26/2016 06:40 AM, Kyrill Tkachov wrote:
> > Hi all,
> > 
> > I'm looking at a case where some RTL passes create an RTL expression of
> > the form:
> > (subreg:QI (and:SI (reg:SI x1)
> > (const_int 31)) 0)
> > 
> > which I'd like to simplify to:
> > (and:QI (subreg:QI (reg:SI x1) 0)
> > (const_int 31))
> I can think of cases where the first is better and other cases where the second is \
> better -- a lot depends on context.  I don't have a good sense for which is better \
> in general. 
> Note that as-written these don't trigger the subtle issues in what happens with \
> upper bits.  That's more for extensions. 
> (subreg:SI (whatever:QI))
> 
> vs
> 
> {zero,sign}_extend:SI (whatever:QI))
> 
> vs
> 
> (and:SI (subreg:SI (whatever:QI) (const_int 0x255)))
> 
> 
> The first leave the bits beyond QI as "undefined" and sometimes (but I doubt all \
> that often in practice) the compiler will use the undefined nature of those bits to \
> enable optimizations. 
> 
> The second & 3rd variants crisply define the upper bits.
> 

Thanks for the explanation.

> 
> > 
> > It's easy enough to express in RTL but I'm trying to convince myself on
> > its validity.
> > I know there are some subtle points in this area. combine_simplify_rtx
> > in combine.c
> > has a comment:
> > /* Note that we cannot do any narrowing for non-constants since
> > we might have been counting on using the fact that some bits were
> > zero.  We now do this in the SET.  */
> That comment makes no sense.  Unfortunately it goes back to a change from Kenner in \
> 1994 -- which predates having patch discussions here and consistently adding tests \
> to the testsuite. 
> The code used to do this:
> 
> 
> -      if (GET_MODE_CLASS (mode) == MODE_INT
> -         && GET_MODE_CLASS (GET_MODE (SUBREG_REG (x))) == MODE_INT
> -         && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)))
> -         && subreg_lowpart_p (x))
> -       return force_to_mode (SUBREG_REG (x), mode, GET_MODE_MASK (mode),
> -                             NULL_RTX, 0);
> 
> Which appears to check that we've got a narrowing subreg expression, and if we do \
> try to force the SUBREG_REG into the right mode using force_to_mode. 
> But if we had a narrowing SUBREG_REG, then I can't see how anything would have been \
> dependign on the upper bits being zero. 
> 
> > 
> > and if I try to implement this transformation in simplify_subreg from
> > simplify-rtx.c
> > I get some cases where combine goes into an infinite recursion in
> > simplify_comparison
> > because it tries to do:
> > 
> > /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
> > fits in both M1 and M2 and the SUBREG is either paradoxical
> > or represents the low part, permute the SUBREG and the AND
> > and try again.  */
> Right.  I think you just end up ping-ponging between the two equivalent \
> representations.  Which may indeed argue that the existing representation is \
> preferred and we should look deeper into why the existing representation isn't \
> being  handled as well as it should be.
> 

After a bit more experimentation I found it's easy enough to avoid looping on this \
spot. Just check if the gen_lowpart part of that transformation returned something \
different from the original expression...

> > 
> > Performing this transformation would help a lot with recognition of some
> > patterns that
> > I'm working on, so would it be acceptable to teach combine or
> > simplify-rtx to do this?
> How does it help recognition?   What kinds of patterns are you trying to recognize?
> 

On aarch64 the normal integer-side variable shift/rotate instructions truncate their \
shift amount. However, we can't enable SHIFT_COUNT_TRUNCATED by default on aarch64 \
because some of the alternatives in our shift patterns use the vector shift \
instructions, which don't truncate the shift amount. So I'm trying to add a combine \
pattern to eliminate redundant truncations where possible when using the normal shift \
instructions.

So I'm trying to create a define_insn to match something like:
   [(set (match_operand:SI 0 "register_operand" "=r")
     (ashift:SI
       (match_operand:SI 1 "register_operand" "r")
       (and:QI
         (match_operand:QI 2 "register_operand" "r")
         (match_operand:QI 3 "const_int_operand" "n"))))]


where operands[3] is 31 for SImode. The 'and' expression has to be in QImode because \
our shift expanders expand the shift amount to QImode.
However, when combine tries to combine the and-mask operation with the shift, for \
example from the code: unsigned f1(unsigned x, int y) { return x << (y & 31); }

the expression it tries to match is:
(set (reg/i:SI 0 x0)
     (ashift:SI (reg:SI 0 x0 [ x ])
         (subreg:QI (and:SI (reg:SI 1 x1 [ y ])
                 (const_int 31 [0x1f])) 0)))

That is, the subreg is not propagated inside. I think the define_insn pattern for \
that would look somewhat unnatural, so I'm trying to get the subreg to be propagated \
to the reg inside.

Thanks,
Kyrill


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic