[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nettle-bugs
Subject:    ECC status update for March
From:       nisse () lysator ! liu ! se (Niels =?iso-8859-1?Q?M=F6ller?=)
Date:       2013-04-04 8:33:59
Message-ID: nn8v4yelu0.fsf () stalhein ! lysator ! liu ! se
[Download RAW message or body]

Nettle project funded by Internetfonden

Status update for March 2013

* Summary

ARM optimizations of cryptographic primitives.

* Activities

The time in March have been spent mainly on ARM assembly
implementation of various cryptographic primitives. The portability of
the new ECC code has been improved, with contributions from Martin
Storsjö. In the process, also the non-ARM code and the testsuite has
seen some improvements.

The GNU GMP library is used by Nettle for general arithmetic on large
integers. For example, each multiplication of two ECC coordinates is
done as a call to a general multiplication routine in GMP, followed by
a call to an optimized curve-specific function for reducing the
product modulo a fixed prime p. During the month, some work has been
done to make more low-level operations in GMP available for users of
the library. The functions mpn_cnd_add_n and mpn_cnd_sub_n, for
side-channel silent conditional addition and subtraction, are now
available and documented, as well as some functions for easy
conversion betwen GMP's higher-level "mpz" interface and the
lower-level "mpn" interface.

On the GMP side, Torbjörn Granlund have been improving the ARM code.
Relinking Nettle code with the development version of GMP also give
nice performance improvements.

During March, 103 working hours have been spent on the project.

* Results

The "normal" ARM instruction set offers 16 general purpose registers,
of 32 bits each. Some ARM processors, including the Cortex-A9, also
offer an extension for single-instruction-multiple-data (SIMD). The
ARM "Neon" instruction set is in some ways similar to the SSE2
instructions available on current x86_64 processors, but well designed
and much easier to work with. It offers 16 additional registers of 128
bits each, with both integer operations on up to 64-bit quantities,
and floating point operations (which are not used in Nettle).

Using Neon instructions gives a dramatic speedup for the SHA512 and
SHA3 hash functions, which make heavy use of 64-bit operations. The
ARM assembly code for the other algorithms don't make any use of Neon,
but they have nevertheless beeen sped up compared to the C
implementation, due to better register allocation, and tricks to use
aligned reads for possibly unaligned input data.

There is certainly room for additional optimizations to the assembly
code, in particular improving instruction scheduling.

* Benchmarks

For these cryptographic primitives, the numbers are in units of
MByte/s, benchmarked on a 1GHz ARM Cortex-A9.

Algorithm		Before	After	Speedup

  memxor (aligned)	988	1906	93%
  memxor (unaligned)	511	638 	25%
			            
  aes-128		17.3	21.9	27%
  aes-192		14.5	16.7	15%
  aes-256		12.5	16.1	29%
  			            
  salsa20		39.9	58.1	46%
  			            
  sha1			55.7	60.7	9%
  sha256		24.6	31.7	29%
  sha512		7.8	30.4	290%
  			            
  sha3-224		5.77	27.5	377%
  sha3-256		5.45	26.0	377%
  sha3-384		4.18	20.0	378%
  sha3-512		2.90	13.9	379%

Public key operations:

           name size   sign/ms         verify/ms
            rsa 1024    0.5014 ( +90%)    9.1323 (+101%)
            rsa 2048    0.0835 (+113%)    2.6942 (+116%)
            dsa 1024    0.9857 (+110%)    0.5026 (+111%)
          ecdsa  192    1.4312 ( +16%)    0.5530 ( +28%)
          ecdsa  224    1.0072 ( +18%)    0.4037 ( +31%)
          ecdsa  256    0.7846 ( +25%)    0.3094 ( +38%)
          ecdsa  384    0.3308 ( +31%)    0.1307 ( +49%)
          ecdsa  521    0.1823 ( +38%)    0.0719 ( +60%)
ecdsa (openssl)  224    0.1842            0.1545
ecdsa (openssl)  384    0.0695            0.0590
ecdsa (openssl)  521    0.0261            0.0216

The improvements here are due to GMP work.


* Remaining tasks

The most important remaining task is documentation and release work.
Plan and progress for the release is maintained at
http://www.lysator.liu.se/~nisse/nettle/plan.html.

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic