[prev in list] [next in list] [prev in thread] [next in thread]
List: coreutils
Subject: Re: [coreutils] added ability in sort to skip n number of lines for each file
From: Pádraig Brady <P () draigBrady ! com>
Date: 2010-11-23 16:21:07
Message-ID: 4CEBE9F3.9060306 () draigBrady ! com
[Download RAW message or body]
On 23/11/10 15:57, Jim Hester wrote:
> Below I have an updated proper patch, it is quite a bit larger than my
> first, but should address all of the concerns from Assaf and Pádraig.
>
> My main motivation here is not just to make this common operation less
> annoying, it was mostly for increased performance. I made a test
> dataset of 10 files with 3 header lines each and 500,000 lines to sort,
> then ran sort by using head and tail as Pádraig suggests, and then again
> using my implemented header skip on an 8 core machine. Larger files
> seem to show similar speed up as well. I believe this speedup comes
> from the fact that the multithreaded sort is trying to read from the
> buffer faster than tail can write to the buffer.
>
>>time { (head -q -n 3 test[0-9] | head -n 3; tail -q -n+4 test[0-9] |
> ./sort -n ) > out2; }
>
> real 0m51.660s
> user 2m0.324s
> sys 0m4.115s
>
>>time ./sort -n -l 3 test[0-9] > out
>
> real 0m31.834s
> user 2m17.775s
> sys 0m3.981s
>>diff out out2
The user time from the head;tail|sort
is lower than sort -l which suggests that
the first invocation was just waiting on disk?
Could you please repeat the test using precached data?
Currently the threads in `sort` are passed data that is read
sequentially from input files (as otherwise `sort`
would have to start worrying about device ids,
and /sys/block/<blockdev>/queue/rotational etc.
so as to not thrash disk heads). That kind of
logic is probably always best outside of `sort`.
cheers,
Pádraig.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic