[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-smp
Subject:    IRQ affinity for network IRQs on x86-64 (and IA64) SMP platforms
From:       John Lumby <johnlumby () hotmail ! com>
Date:       2008-09-18 2:53:14
Message-ID: BAY137-W42C60EAE538F55F45CD5D4A34F0 () phx ! gbl
[Download RAW message or body]


I am interested in investigating being able to distribute softirq work from a network \
NIC across multiple processor cores on an x86-64 machine  - this particular one has \
two Dual Core AMD Opteron Processor 275 and two broadcom gigabit NICs .   But in \
general, where the number of cores is a multiple of the number of NICs, I'd like to \
be able to distribute the IRQs of each NIC over that multiple of cores.

The background is that I am running a network-intensive bidirectional workload on two \
of these machines, using a single bonded IP interface on each machine interconnected \
by a switch,  each bond consisting of the two gigabit interfaces running full-duplex, \
with multiple sessions each establishing connections between these two IP endpoints; \
                and I am seeing that :
    .   total network throughput of around 2660 Megabits/sec through each bond
        (aggregated over send and receive)
        is rather less than the network is capable of  (CPU power permitting, the \
                network is capable of somewhere nearer 3950 Megabits/sec) 
    .  overall CPU uitilization is only around 85%, so some to spare ...
    .  ... but /proc/stat shows that the CPU utilization is very uneven over the 4 \
cores,  with all the softirq processing confined to two cores.

I believe that for this workload, the network throughput would increase to around \
3000 Megabits/sec if the softirq load could be spread evenly over all 4 cores.

I switched off the irqbalance daemon and then tried altering the \
/proc/irq//smp_affinity proc files myself manually for the two IRQs (one for each \
NIC)   to specify, for each one, two cores e.g. 05 for irq 225 and 0a for irq 201,    \
At the time, the machine was running a 2.6.16 kernel.   The result was - no \
dsitribution at alll.   That is, for each NIC, as reported in /proc/interrupts,  all \
interrupts were being directed to a single core  - which was the "first" (in \
little-endian sense) of the bits in my smp_affinity mask.  It ignored the second.


I then came across the paper in /Documentation/ia64/IRQ-redir.txt that documents this \
behaviour for ia64  (but I don't see anything saying this is also the case on \
x86-64).    The paper says

   "Because of the usage of SAPIC mode and physical destination mode the IRQ target \
is one particular CPU and cannot be a mask of several CPUs. Only the first non-zero \
bit is taken into account."

Ok  -  so that is exactly what saw (on 2.6.16)

Here is a clip of /proc/interrupts showing my two NICs after a run on 2.6.16
            CPU0         CPU1         CPU2         CPU3
217:    2828591     551570   14406281    2734679   IO-APIC-level  eth5
225:   18986626          0    2643626         14   IO-APIC-level  eth3

(Note -   I know the ratios are not all:0  -  I had been experimenting with different \
masks  -  and don't see any way of resetting counters)


I then upgraded the kernel to 2.6.26.5 and tried again, and now I see something \
different.   with the same masks (05, 0a)  I see that, for each NIC,  IRQs are now \
distributed over the two cores I specify in the mask  -  but not evenly.   The ratio \
is around 7:1.       This is better than all:0 and raises the throughput from 2660 \
Mbits/sec  to  over 2810 Mbits/sec with no other changes.  

Here is a clip of /proc/interrupts showing my two NICs after a run on 2.6.26
            CPU0         CPU1         CPU2         CPU3
24:   144       1145810          0     612858   IO-APIC-fasteoi   eth5
25: 83517             7     575415     849336   IO-APIC-fasteoi   eth3

again, the ratios are from several runs with different masks, but the counts for CPUs \
0 and 2 for IRQ 25 are representative.

A couple of obvious changes from 2.6.16  -
IRQ numbers are smaller
IRQ method has changed from  IO-APIC-level  to   IO-APIC-fasteoi 

I see better CPU utilization over the 4 cores from /proc/stat, in particular, softirq \
work spread in that 7:1 ratio.      So it seems that smp_affinity does partially work \
for a network device and several cores.      So I am happier but left with a number \
of questions and hoping someone can answer  :

1)     As far as I can tell, SAPIC, aka IOSAPIC, is specific to Itanium, but in the \
literature I see something which appears to be similar called X2APIC on other Intel \
64-bit architectures.     Does X2APIC have the same behaviour as regards IRQ \
balancing and smp affinity?     And does the AMD Opteron(tm) Processor 275 also use \
X2APIC or AMD equivalent?

2)     Is it expected that something changed in this area between 2.6.16 and 2.6.26 \
and if so what?   ((maybe related to the external changes in output of \
/proc/interrupts I noted?)

3)     Is it now possible, on this current kernel and with my hardware (or any \
gigabit NIC) to distribute softirq work approx 50:50 over two cores?    If so, how?

I can supply more information about the runs and config etc if needed

John
_________________________________________________________________

--
To unsubscribe from this list: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic