[prev in list] [next in list] [prev in thread] [next in thread] 

List:       botan-devel
Subject:    [Botan-devel] Botan 1.5.4
From:       lloyd () randombit ! net (Jack Lloyd)
Date:       2006-01-30 4:10:57
Message-ID: 20060130041057.GD18609 () randombit ! net
[Download RAW message or body]


An interesting release:

- Luca Piccarreta sent me a fair amount of x86 and x86-64 asm code, which seems
  to improve RSA performance by a very noticable amount. He also suggested a
  nice optimization for the Montgomery reduction code (which I think was
  actually in 1.5.3, but I forgot to mention it).

- Matt Johnston made some suggestions that led to me rewriting the memory
  allocator in a much better/faster way. I'm sure the effect will not be nearly
  as noticable on typical applications, but the test suite runs in about half
  the time on my machine as compared to the suite using the old allocator, so
  it seems to be a win.

- I found some nasty bugs. In particular sign-handling bugs all through the
  modulo and division code, an off-by-one in the new Karatsuba code,
  BigInt::sub was very broken, same_mem wouldn't compile (but it was never used
  so nobody noticed), and bigint_modop has had some dead code for a long ways
  back that wasn't doing anyone any good.

This release is the fastest Botan yet. On Opteron it seems to be beating
OpenSSL on RSA keys > 2048 bits (admittedly, this is OpenSSL 0.9.7something,
0.9.8 would likely crush these times, but it is a start).
  [http://botan.randombit.net/bmarks.html]

Known bug: `./check --bench-algo RW` either runs very slowly or hits an infinte
loop, not sure which. I discovered this right after release. It's been broken
for a while, it looks like (OK in 1.4.11, broken in 1.5.1, haven't traced it
further yet). I suspect it's a bug in the division code that causes jacobi() to
screw up. The test suite portion for RW passes fine, so this is probably
something obscure. Repeatable on x86 and x86-64, haven't tried elsewhere. May
take me a couple of days to get around to. Don't even have a simple testcase
yet.

Likely bug(s): The memory allocator is, once again, largely new
code. Thankfully, it's now fast for both allocation and deallocation (<2% in
most profile runs, as compared to 20% for deallocation in some benchmarks on
older releases), and space efficient, so it shouldnt need any further
redesigns, but it will no doubt have some interesting surprises for a while.

Project idea: The amd64 code is largely a simple translation of the x86 code. I
suspect by taking advantage of the additional registers as well as knowledge of
the amd64 ABI, one can schedule register allocations noticably better than is
currently being done. (Perhaps the x86 code can be scheduled better to take
into account the long multiply latencies of modern x86 machines as well, I
haven't checked this out carefully yet).

Todo: Luca also sent me Visual C++ asm for x86, but I haven't integrated it
yet. Mail me if you want to try it out, or wait until I get the time to do the
merge.

As always, compile, test, report bugs or problems. Tested on Linux/[x86, amd64,
ppc64]/gcc and Solaris/SPARC/Forte. I have heard reports of Windows breakage
WRT the last couple of releases, which I think should be at least mostly fixed
now (but, still not tested, let me know how it goes).

-Jack


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic