[prev in list] [next in list] [prev in thread] [next in thread] 

List:       musl
Subject:    Re: [musl] Optimized C memset [v2]
From:       Rich Felker <dalias () aerifal ! cx>
Date:       2013-08-28 1:24:33
Message-ID: 20130828012433.GZ20515 () brightrain ! aerifal ! cx
[Download RAW message or body]

On Wed, Aug 28, 2013 at 12:05:43PM +1200, Andre Renaud wrote:
> Hi Rich,
> 
> On 28 August 2013 04:22, Rich Felker <dalias@aerifal.cx> wrote:
> > Here's version 2 (filename version 6, in honor of glibc ;) of the
> > memset code. I fixed a bug in the logic for coverage of the tail (the
> > part past what's covered by the loop) for some values of n and
> > alignments, and cleaned up the __GNUC__ usage a bit to use less
> > #ifdeffery. The remaining test at the top for the __GNUC__ version is
> > ugly, I admit, and should possibly just be removed and replaced by a
> > configure check to add -D__may_alias__= to the CFLAGS if the compiler
> > defines __GNUC__ but does not recognize __attribute__((__may_alias__))
> > -- opinions on this?
> 
> Can you explain the algorithm a bit - I can't entirely follow the us
> of negation/masking, but it looks like at the end you're doing a loop
> of 64-bit aligned writes, but I don't see how it can work if the tail
> end ends in something that isn't 64-bit aligned? Is this assuming that
> unaligned writes will work ok?

See the version I committed a couple hours ago. It has comments added.
The basic thing you're missing is that the code before the loop fills
from both the beginning and the end, not just the beginning. This
allows for a really effective O(log n) branch strategy to fill n
bytes: essentially, knowing n>=k allows you to fill up to 2*k bytes:
0,1,...,k-1 and n-1,n-2,n-3,...,n-k. If n<2*k, some of these will
overlap, but it doesn't matter.

Rich

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic