'Re: [privoxy-devel] speeding up privoxy'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       privoxy-developers
Subject:    Re: [privoxy-devel] speeding up privoxy
From:       Fabian Keil <fk () fabiankeil ! de>
Date:       2011-01-03 18:50:25
Message-ID: 20110103195025.26698219 () r500 ! local
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


Lee <ler762@gmail.com> wrote:

> On 1/2/11, Fabian Keil <fk@fabiankeil.de> wrote:
> > Lee <ler762@gmail.com> wrote:
> >
> >> On 1/1/11, Fabian Keil <fk@fabiankeil.de> wrote:
> >> > Lee <ler762@gmail.com> wrote:
> >> >
> >> >> On 12/31/10, Fabian Keil <fk@fabiankeil.de> wrote:
> >> >> > Lee <ler762@gmail.com> wrote:
> >
> >> > It would probably be worth investigating why using curl without Privoxy
> >> > can
> >> > be about twice as fast. I don't think the difference has to be that big.
> >>
> >> I'd guess about a 2X difference if you're doing everything on the same
> >> machine:
> >>   Without privoxy, curl gets the data & sends the data.
> >>   With privoxy, curl gets the data & sends it to privoxy which gets
> >> the data & sends the data.
> >
> > You're right. I suspected the test was still somewhat network-bound,
> > when apparently it's not. However we still may be able to reduce the
> > difference by reducing Privoxy's CPU use (when only passing the data
> > through), by using multiple buffers in chat() to make it more zero-copy
> > friendly.
> 
> How do you do multiple buffers?
> I added a
> #define TCP_BUFFER_SIZE 23361
> in project.h and changed chat in jcc.c to
>    char buf[TCP_BUFFER_SIZE];
> since it seemed like chat was the only place that would really benefit
> from a larger buffer size.  But if it's possible to use multiple
> buffers in chat, maybe bigger isn't better?

Quoting FreeBSD's zero_copy(9):

| For sending data, there are no special requirements or capabilities that
| the sending NIC must have.  The data written to the socket, though, must
| be at least a page in size and page aligned in order to be mapped into
| the kernel.  If it does not meet the page size and alignment constraints,
| it will be copied into the kernel, as is normally the case with socket
| I/O.
|
| The user should be careful not to overwrite buffers that have been writ-
| ten to the socket before the data has been freed by the kernel, and the
| copy-on-write mapping cleared.  If a buffer is overwritten before it has
| been given up by the kernel, the data will be copied, and no savings in
| CPU utilization and memory bandwidth utilization will be realized.

Currently we use a single buffer in chat(), so its very likely
that it gets overwritten before the kernel has given it up.
Given that the buffer is on the stack, the alignment may be
another issue.

The solution is probably to simply allocate multiple buffers
in chat() with the proper alignment and not to reuse them
until we assume (or know that) the kernel is done with them.

Quoting FreeBSD's zero_copy(9) again:

| The socket(2) API does not really give the user any indication of when
| his data has actually been sent over the wire, or when the data has been
| freed from kernel buffers.  For protocols like TCP, the data will be kept
| around in the kernel until it has been acknowledged by the other side; it
| must be kept until the acknowledgement is received in case retransmission
| is required.
|
| From an application standpoint, the best way to guarantee that the data
| has been sent out over the wire and freed by the kernel (for TCP-based
| sockets) is to set a socket buffer size (see the SO_SNDBUF socket option
| in the setsockopt(2) manual page) appropriate for the application and
| network environment and then make sure you have sent out twice as much
| data as the socket buffer size before reusing a buffer.  For TCP, the
| send and receive socket buffer sizes generally directly correspond to the
| TCP window size.

Of course it's not clear that it's worth it anyway, but implementing
and testing it shouldn't be too much work either.

Currently, enabling kern.ipc.zero_copy.send slows the transfer through
Privoxy down by about 25%, while curl without Privoxy seems to perform
about the same. Again this wasn't a proper benchmark and I only tested
on localhost, the results could be completely different with a real
network card involved.

Anyway, a reasonable first goal would probably be not to perform
worse with kern.ipc.zero_copy.send enabled.

> >> Another one of my suspicions is that using a buffer size that's a
> >> multiple of 1460 bytes in chat would be a little bit faster.  For me,
> >> at home, TCP/IP has a max packet size of 1500 bytes; subtract 40 bytes
> >> for IP and TCP headers and you've got 1460 bytes.  I added a log msg
> >> in read & write_socket and almost all the network reads were multiples
> >> of 1460 bytes, so having a buffer that's a multiple of 1460 seems like
> >> the Thing To Do.
> >
> > I'm not sure why it should matter as long as the buffer
> > is large enough not to cause fragmentation. While unused
> > buffer space could be considered a waste, it's probably
> > not significant enough to affect performance.
> 
> I was trying to reduce the number of calls to read_socket.
> If the server's sending data in chunks of 1460 bytes & the buffer size
> is a multiple of 1460 bytes there's no left-over bit to read at the
> end.  ^shrug^ dunno how much an 'extra' call to read_socket costs, but
> it was easy enough to change the chat buffer size to be a multiple of
> 1460..

It's still not obvious to me why a buffer size of for example
3*1460 bytes would cause less read_socket() calls than a buffer
size of 3*1460 bytes + X bytes.

> > Of course we'll never now until somebody benchmarks it.
> 
> I'm more interested in how much
>   buffer_and_filter_content && !socket_is_still_alive(csp->cfd)
> impacts downloading html.  I'd set buffer-limit to 8192 in config.txt
> a while back so I'm kind of curious how much buffer_and_filter_content
> && !socket_is_still_alive costs me vs. 0 && !socket_is_still_alive

For the average website the buffer-limit change to 8192 KB
probably doesn't matter. I agree that benchmarking with buffered
content would be useful, too, though.

Maybe it makes sense to only check csp->cfd every X bytes.
We could additionally skip the check unless there are Y bytes
left to receive from the server, as we don't gain anything
by the check if we detect the dead client socket after the
whole content is already on the wire.

Depending on what values for X and Y (or ways to calculate them)
we come up with, we may end up skipping the check most of the time.

Fabian

["signature.asc" (application/pgp-signature)]

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl

_______________________________________________
Ijbswa-developers mailing list
Ijbswa-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ijbswa-developers


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic