[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zlib-devel
Subject:    [Zlib-devel] To asm or not to asm, that is the question
From:       nijtmans () users ! sourceforge ! net (Jan Nijtmans)
Date:       2017-01-18 15:25:58
Message-ID: CAO1jNwsTP9fJuyz0zfi8Tv6+rGDDbzxx4BVMmZmHzi1edyE19A () mail ! gmail ! com
[Download RAW message or body]

2017-01-16 19:41 GMT+01:00 Mark Adler:
> There have been many reports of bugs in both the deflate and inffast assembler code \
> in contrib. As a result I have, for now, added warnings to deflate.c and inffast.c \
> (using #pragma message) for when it is compiled with assembler code, saying you are \
> using the assembler code at your own risk.

Which bugs? Those ???
1)
    https://github.com/madler/zlib/issues/41

    It's not clear from this bug-report, but reading the comments it
seems to be a Visual Studio build.

2)
    https://github.com/madler/zlib/issues/200

    This one claims the presence of a bug in masmx86/inffas32.asm,
done with Visual Studio as well


I built my dll with inflate86/inffas86.c and contrib/asm686/match.S
(just as the official zlib 1.2.7 and 1.2.8 were built with), not with
Visual Studio but with gcc (mingw-w64). Are there
any bug-reports referring to those???

The README accompanying match.S claims ~8% speedup
on Pentium 4 and AMD64 chips (32-bit). That sound valuable
to me, especially for legacy environments.

What benchmark can I use to verify this?

Thanks!
       Jan Nijtmans


====================================================
This is a patched version of zlib, modified to use
Pentium-Pro-optimized assembly code in the deflation algorithm. The
files changed/added by this patch are:

README.686
match.S

The speedup that this patch provides varies, depending on whether the
compiler used to build the original version of zlib falls afoul of the
PPro's speed traps. My own tests show a speedup of around 10-20% at
the default compression level, and 20-30% using -9, against a version
compiled using gcc 2.7.2.3. Your mileage may vary.

Note that this code has been tailored for the PPro/PII in particular,
and will not perform particuarly well on a Pentium.

If you are using an assembler other than GNU as, you will have to
translate match.S to use your assembler's syntax. (Have fun.)

Brian Raiter
breadbox at muppetlabs.com
April, 1998


Added for zlib 1.1.3:

The patches come from
http://www.muppetlabs.com/~breadbox/software/assembly.html

To compile zlib with this asm file, copy match.S to the zlib directory
then do:

CFLAGS="-O3 -DASMV" ./configure
make OBJA=match.o


Update:

I've been ignoring these assembly routines for years, believing that
gcc's generated code had caught up with it sometime around gcc 2.95
and the major rearchitecting of the Pentium 4. However, I recently
learned that, despite what I believed, this code still has some life
in it. On the Pentium 4 and AMD64 chips, it continues to run about 8%
faster than the code produced by gcc 4.1.

In acknowledgement of its continuing usefulness, I've altered the
license to match that of the rest of zlib. Share and Enjoy!

Brian Raiter
breadbox at muppetlabs.com
April, 2007


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic