[prev in list] [next in list] [prev in thread] [next in thread] 

List:       beowulf
Subject:    TCP 2.2.12 patch improves short message performance
From:       Josip Loncaric josip () icase ! edu
Date:       1999-10-28 15:49:52
[Download RAW message or body]

Hello,

Clusters using MPI require high TCP performance.  I get significant
improvements in TCP performance when streaming short messages by using a
mixed acknowledgment strategy which does immediate ACKs with probability
1/8 and delayed ACKs with probability 7/8.  Within our cluster ONLY, we
also use faster timeouts than standard TCP.  Both modifications apply
only to sockets with TCP_NODELAY option set, such as MPI traffic within
the cluster.

My patch is available at:

 http://www.icase.edu/~josip/tcp-patch-for-2.2.12

with more detail about the behavior of this kernel modification provided
at

 http://www.icase.edu/coral/LinuxTCP2.html

Before you install this patch, I highly recommend obtaining the latest
network card driver. I use a slightly modified version of Donald
Becker's tulip.c:v0.91m driver.  I also find that high network loads
present a problem for SMP machines.  Our uniprocessor boxes run more
reliably and also have lower TCP latency than our dual CPU machines (63
microseconds vs. 100 microseconds).  This appears to be an SMP kernel
issue.  On our dual CPU nodes we found it necessary to use network cards
capable of interrupt mitigation.  Tulip.c:v0.91m does this for
21143-based cards by setting register CSR11=0x45240000, but we get more
responsive network performance by using CSR11=0x8b240000.

To use my TCP fix, you do the following:

(1) patch your copy of Linux kernel 2.2.12
(2) build and install patched kernel, its version will be 2.2.12-tcpfix
(3) add the following at the end of your /etc/rc.d/rc.local:

    if [ -f /proc/sys/net/ipv4/tcp_delack_strategy ]; then
       echo 3 >/proc/sys/net/ipv4/tcp_delack_strategy
    fi
    if [ -f /proc/sys/net/ipv4/tcp_faster_timeouts ]; then
       echo 1 >/proc/sys/net/ipv4/tcp_faster_timeouts
    fi

(4) reboot with the patched kernel
(5) run applications which use TCP_NODELAY socket option

Brief summary of what this patch does:

For cluster applications and MPI use, rapid exchange of short messages
is a priority.  Unfortunately, the TCP in the plain Linux kernel 2.2.12
still exhibits repeated deadlocks when streaming short messages.  These
deadlocks are resolved by a timer at 10 millisecond intervals, which is
about 1000 times slower than it should be.  In this regime, network
performance collapses by several orders of magnitude.  Mixed ACK
strategy speeds up recovery from repeated deadlock regime without the
computational cost of immediate ACKs.

Also, TCP was developed for regular ethernet and Internet use.  Some of
the timeout values in standard TCP are way too long for clusters using
switched fast ethernet.  Within such clusters ONLY, it makes sense to
allow the TCP to use faster timeouts.

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip@icase.edu
ICASE, Mail Stop 132C                       http://www.icase.edu/~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to beowulf-request@beowulf.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic