[prev in list] [next in list] [prev in thread] [next in thread] 

List:       microblaze-uclinux
Subject:    Re: [microblaze-uclinux] TCP transmit performance
From:       "Falk Brettschneider" <falk.brettschneider () gmx ! de>
Date:       2008-05-13 21:18:58
Message-ID: 20080513211858.135320 () gmx ! net
[Download RAW message or body]

Hi,
hmm...I also optimised my user application and put some more kernel stuff into BRAM. \
So I cannot separate well which part brings how much gain.

I just can directly compare the spent time in xenet_FifoSend() which reduced from \
600us to about 250us because of no skb_alloc, no memcpy, new assembler checksum \
calculation and just one call of skb_free() instead of two. 

Memcpy (done by the latest assembler version) of one packet (with 1460 bytes, \
src=32bit-aligned, dst=16bit-aligned) takes 150us here now. Memmove() should \
approximately that long also, and it's cut down on FifoRecvHandler().

My user app uses the Nagle algorithm to reduce the ACK response count and calls \
socket function send 3 times, with 220KB, 80KB and 10KB one after another. That takes \
320ms to send here. Before I had 670ms with the same KB size, but also because of \
much more calls of send() with much smaller portions; so I had additional \
thread-switching time.

Each call of send() is internally processed in an on-the-fly mechanism in \
tcp_sendmsg() which works in a loop. Each loop cycle allocs a packet management \
struct (skb) and a 1460bytes-userdata-plus-header portion, copies the data to it \
(150us), queues such packet and tries to send it. About every second trial leads to \
real sending. The average time of such loop cycle is 1100us where xenet_FifoSend() is \
250us of it. I've seen my PC answers every second packet with an ACK packet which is \
additionally processed in the RX soft-IRQ handler. This handler also cleans up sent \
packets and intermittently skb_free's about 30 of them, which then always interrupts \
sending for about 5ms because freeing takes about 150us for each packet. Such freeing \
seems to clean from the packets which were cloned because of possible \
retransmissions.

You can see most time the EMAC is idle and Linux is doing packet management and \
dynamic memory alloc/free. I know my EMAC FPGA module has a fifo of 8KB and I wish I \
know how I could set a larger packet size (likely MTU size). This way the number of \
packets would be smaller and the management overhead per user data portion would be \
percentally smaller. So looking at the EMAC workload I could still imagine doubled \
TCP/IP send throughput.

CU, F@lk

-- 
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/?mc=sv_ext_mf@gmx
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@itee.uq.edu.au
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic