'Hashing via multiply on 68000, H8 and Microblaze'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-m68k
Subject:    Hashing via multiply on 68000, H8 and Microblaze
From:       "George Spelvin" <linux () horizon ! com>
Date:       2016-05-14 4:14:29
Message-ID: 20160514041429.6631.qmail () ns ! horizon ! com
[Download RAW message or body]

My current project is reworking <linux/hash.h>.  The most urgent fix
went in to 4.6-rc7 as commit 689de1d6ca ("Minimal fix-up of bad hashing
behavior of hash_64()"), but there's more.

This is done by multiplication on most platforms, but there are some
where that's very inefficient.  This message is:

1. A brain dump on the subject,
2. Establishing contact with the relevant architecture maintainers
   in preparation for future patches, and
3. A solicitation for ideas.


The Linux kernel makes heavy use of the inline functions hash_32(x, bits)
and hash_64(x, bits) to compute hash table indexes from an integer, pointer,
or word-sized string hash.

hash_32() is just { return x * CONSTANT >> (32 - bits); } for some
suitably chosen 32-bit CONSTANT, and equivalently (with a larger constant)
for hash_64.  The multiply basically causes each bit of the input to
affect all more-significant bits of the output, and then those bits
are extracted for the final index.

But the current constant was chosen to be easy to compute with a
shift-and-add sequence on platforms without a good hardware multiplier,
which causes bad bit mixing, and I've been looking into the implications
of changing that.

On Sun, 1 May 2016 at 09:5, Linus Torvalds wrote:
> On Sun, May 1, 2016 at 2:43 AM, George Spelvin <linux@horizon.com> wrote:
>> * If you feel ambitious, add a 32-bit CONFIG_ARCH_HAS_SLOW_MULTIPLIER
>>   exception path.
> 
> Let's make that a separate worry, and just fix hash_64() first.
> 
> In particular, that means "let's not touch COLDEN_RATIO_32 yet". I
> suspect that when we *do* change that value, we do want the
> non-multiplying version you had.

Upon research, I've found that all 64-bit processors have decent
64x64-bit multiplier, so there's no benefit to choosing a "simpler"
constant.

As do most 32-bit processors that Linux supports.  But not all.

The current exceptions are:

* MC68000/010/328 (arch/m68k, no-mmu)
* H8 (arch/h8300)
* Microblaze (Xilinx soft-core) may be configured without multiplier

The thing about these is, they *also* don't have barrel shifters.
I have an efficient shift-and-add pattern for computing x * 0x61C88647:
	a = x + (x << 19);
	b = a + (x << 9);
	c = b + (x << 23);
	reutrn (b << 11) + (c << 6) + (a << 3) - c;
... but it involves large shifts, and without efficient support for them,
this isn't a great alternative!

The 68000 has multi-bit shifts, but they're also multi-cycle.

The H8/300 has only 1-bit shift instructions, and more recent models have
been extended with 2-bit shifts, but either way, large shifts are a PITA.

Both of these have moderately efficient 16x16->32-bit multiplies.  (On some
H8 models, the multiply is actually a practical way to do left shifts!)

Both also provide efficient access to the 16-bit halves of 32-bit values,
which can be used by large shits and rotates.

Microblaze is complicated.  Both barrel shifter and multiplier are
configurable options.  If missing, so are the corresponding instructions.
(And arch/microblaze/Kconfig.platform supports omitting both.)


Given that it's not necessary to compute the same hash function on all
platforms (as long as it's still a good hash), my current thoughts are
that "generic" fallback code is impractical and I should instead:
- Add a CONFIG_ARCH_HASH option, which works like CONFIG_ARCH_RANDOM,
  indicating that the gneric code should #include <asm/archhash.h>
- Within that, platforms may define replacement functions and #define
  HAVE_ARCH_FOO variable to suppress the standard definitions.
  (This is all conditional, as other m68k platforms with MULU.L can
  use the generic functions.)
- On 68000 and H8, I'll define
  hash32(x) = swap((x & 0xffff) * CONST1 + (x >> 16) * CONST2)
  This isn't quite as good mixing as a 32-bit multiply, but only needs
  2 (not 3) 16x16->32-bit multiplies and the swap put the most-mixed bits
  in the msbits.
- If a Microblaze has a barrel shifter (but no multiplier), I'll use the
  shift-and-add code.
- If a Microblaze does not have a barrel shifter, I don't know what
  the heck I'll do.  Any ideas?
- There are some other hash operations that depend on large shifts that
  will also take some thinking.
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic