[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gmp-devel
Subject:    Improved mpn code for Core 2
From:       jason.worth.martin () gmail ! com (Jason Martin)
Date:       2006-12-02 1:51:28
Message-ID: 98170fa20612011751o633f0c35l6e01c7d421dd43ff () mail ! gmail ! com
[Download RAW message or body]

Hi All,

I've managed to improve the addmul_1 (and friends) mpn routines for
Core 2 processors.

My addmul_1 executes at 4.6 cycles/limb with a 4-way unroll of the
main loop and 4.3 cycles/limb with a 16-way unroll.  I believe that
this is close to optimal for the Core 2 architecture.  submul_1
behaves identically to addmul_1, and mul_1 executes at 4 cycles/limb.

This, together with some earlier changes to add_n and sub_n provide
for a GMPbench score of 8260 on my 2.66GHz Mac Pro, so it appears to
make quite a difference for the Core 2 architecture.

If you're interested, the code is available on my homepage:

    http://www.math.jmu.edu/~martin

For those who asked:  I've included an install routine that detects
the CPU and will only install the patches if a Core 2 CPU is found.
Hopefully this will allow you to add the patches into whatever
automatic build scripts you are using.

--jason

-----------------------------------------------------------
Jason Worth Martin
Asst. Prof. of Mathematics
James Madison University
http://www.math.jmu.edu/~martin

"Ever my heart rises as we draw near the mountains.
There is good rock here." -- Gimli, son of Gloin

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic