[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    Re: __attribute__((early_branch))
From:       Jeff Law <law () redhat ! com>
Date:       2019-04-30 19:53:05
Message-ID: 9c490839-079f-d87b-ace6-85f75a0a3258 () redhat ! com
[Download RAW message or body]

On 4/30/19 12:34 PM, cmdLP #CODE wrote:
> Hello GCC-team,
> 
> I use GCC for all my C and C++ programs. I know how to use GCC, but I am
> not a contributor to GCC (yet). I often discover some problems C and C++
> code have in general. There is often the choice between fast or readable
> code. Some code I have seen performs well but looks ugly (goto, etc.);
> other code is simple, but has some performance problems. What if we could
> make the simple code perform well?
> 
> There is a common problem with conditional branches inside loops. This can
> decrease the performance of a program. To fix this issue, the conditional
> branch should be moved outside of the loop. Sometimes this optimization is
> done by the compiler, but guessing on optimizations done by the compiler is
> really bad. Often it is not easy to transform the source code to have the
> conditional branching outside the loop. Instead I propose a new attribute,
> which forces the compiler to do a conditional branch (based on the
> annotated parameter) at the beginning of a function. It branches to the
> corresponding code of the function compiled with the value being constant.
> 
> Here is a code example, which contains this issue.
> 
>     enum reduce_op
>     {
>         REDUCE_ADD,
>         REDUCE_MULT,
>         REDUCE_MIN,
>         REDUCE_MAX
>     };
> 
>     /* performance critical function */
>     void reduce_data(enum reduce_op reduce,
>                      unsigned const* data,
>                      unsigned data_size)
>     {
>         unsigned i, result, item;
> 
>         result = reduce == REDUCE_MULT ?  1u
>                : reduce == REDUCE_MIN  ? ~0u // ~0u is UINT_MAX
>                :                          0u;
> 
>         for(i = 0; i < data_size; ++i)
>         {
>             item = data[i];
> 
>             switch(reduce)
>             {
>                 case REDUCE_ADD:
>                     result += item;
>                 break;
> 
>                 case REDUCE_MULT:
>                     result *= item;
>                 break;
> 
>                 case REDUCE_MIN:
>                     if(item < result) result = item;
>                     // RIP: result <?= item;
>                 break;
> 
>                 case REDUCE_MAX:
>                     if(item > result) result = item;
>                     // RIP: result >?= item;
>                 break;
>             }
>         }
> 
>         return result;
>     }
> 
> The value of  reduce  does not change inside the function. For this
> example, the optimization is trivial. But consider more complex examples.
> The function should be optimized to:
[ .... ]
This is loop unswitching.  It's a standard GCC optimization.  If it's
not working as well as it should, we're far better off determining why
and fixing the automatic transformation rather than relying on
attributes to drive the transformation.


> What if the variable changes?
> 
> When the variable changes there should be a jump to the corresponding
> parallel code of the compiled code with new value.
> 
> Unoptimized
> 
>     /* removes comments starting with # and ending in a newline */
>     void remove_comment(char* dst,
>                         char const* src)
>     {
>         // initialization nessecary
>         int state __attribute__((early_branch(0, 1))) = 0;
> 
>         char c;
> 
>         while(*src)
>         {
>             c = *src++;
> 
>             switch(state)
>             {
>                 case 0:
>                     if(c == '#')
>                         state = 1;
>                     else
>                         *dst++ = c;
> 
>                     break;
> 
>                 case 1:
>                     if(c == '\n')
>                     {
>                         *dst++ = '\n';
>                         state = 0;
>                     }
> 
>                     break;
>             }
>         }
>         *dst = '\0';
>     }
> 
> changed to
> 
>     void remove_comment(char* dst,
>                         char const* src)
>     {
>         char c;
> 
>         switch(0)
>         {
>             case 0:
>                 while(*src)
>                 {
>                     c = *src++;
>                     if(c == '#')
>                         goto branch1;
>                     else
>                         *dst++ = c;
>                     branch0:
>                 }
>                 break;
> 
>             case 1:
>                 while(*src)
>                 {
>                     if(*src++ == '\n')
>                     {
>                         *dst++ = '\n';
>                         goto branch0;
>                     }
>                     branch1:
>                 }
>                 break;
>         }
>         *dst = '\0';
>     }
> 
> You see that the state is not tested in each iteration of the loop(s). For
> complex cases (like parsers), this can improve performance. This attribute
> is needed to avoid such a goto usage. I have seen such code, which is
> unmaintainable but performs better than the first solution.
This is backwards jump threading which was primarily designed to help
state machines by avoiding evaluation of the switch at each iteration of
the loop and instead jumping directly to the desired target.

Again, if it's not working for a particular case we're far better off
figuring out why than relying on attributes to drive the transformation.

jeff

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic