[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-kernel
Subject: Re: [patch V4] lib: GCD: Use binary GCD algorithm instead of Euclidean
From: "George Spelvin" <linux () horizon ! com>
Date: 2016-05-07 22:30:18
Message-ID: 20160507223018.17111.qmail () ns ! horizon ! com
[Download RAW message or body]
Sam Ravnborg wrote:
> sparc64 have an efficient ffs implementation.
> We use run-time patching to use the proper version
> depending on the actual sparc cpu.
>
> As this is determinded at config time, then let the
> sparc cpu that has the efficient ffs benefit from this.
>
> In other words - select CPU_NO_EFFICIENT_FFS only for SPARC32.
I'm not sure this is the right thing.
It's always a function call, and there's boot-time code patching to use
either an unrolled binary search or a POPC instructon on processors that
have that instruction.
The NO_EFFICIENT_FFS isn't much slower than the __ffs version, so the
call/return alone might eat the difference, and if the CPU doesn't have
POPC support it's definitely a lose.
Quite simply, gcd isn't important enough to be worth the same boot-time
code patching, and if we have to use one on both types of CPU the the
NO_EFFICIENT_FFS path is the safer alternative in case of uncertainty.
Would you be willing to try benchmarking it? The baseline code, plus two
versions of the __ffs code using the two different __ffs implementations
(forced out of line by compiling from assembler source or using
inine asm and __attribute((noinline))).
By the way, the SPARC64 implementation could be improved.
It's currently 5 instructions:
__ffs:
neg %o0, %g1
xnor %o0, %g1, %o1
popc %o1, %o0
retl
sub %o0, 1, %o0
That could be improved to 4:
sub %o0, 1, %g1
andn %g1, %o0, %g1
retl
popc %g1, %o0
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic