[prev in list] [next in list] [prev in thread] [next in thread] 

List:       dpdk-users
Subject:    Mellanox Connectx-6 Dx dual port performance
From:       Дмитрий Степанов <stepanov.dmit () gmail ! co
Date:       2022-03-22 9:03:56
Message-ID: CA+-SuJ01pcMz_Y6H1=Z-Q9PGM6i8fpkfbqYT2JeB8PEHWoktBQ () mail ! gmail ! com
[Download RAW message or body]

Hi!

I'm testing overall dual port performance on ConnectX-6 Dx EN adapter card
(100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on Ubuntu
20.04.
I have 2 dual port NICs installed on the same server (but on different NUMA
nodes) which I use as a generator and a reciever respectively.
First, I started custom packet generator on port 0 and got 148 Mpps TX (64
bytes TCP packets with zero payload lentgh) which equals the maximum of 100
Gbps line rate. Then I launched the same generator with the same parameters
simultaneously on port 1.
Performance on both ports decreased to 105-106 Mpss per port (210-212 Mpps
in sum). If I use 512 bytes TCP packets - then running generators on both
ports gives me 23 Mpps for each port (46 Mpps in sum, which for given TCP
packet size equals the maximum line rate).

Mellanox performance report
http://fast.dpdk.org/doc/perf/DPDK_21_08_Mellanox_NIC_performance_report.pdf
doesn't contain measurements for TX path, only for RX.
Provided Test#11 Mellanox ConnectX-6 Dx 100GbE PCIe Gen4 Throughput at Zero
Packet Loss (2x 100GbE) for RX path contains near the same results that I
got for TX path (214 Mpps for 64 bytes packets, 47 Mpps for 512 bytes
packets). The question is - do my results for TX path should coincide with
provided results for RX path? Why I can't get 148 x 2 Mpps for small
packets when using both ports? What is a bottleneck here - PCIe, RAM or NIC
itself?

To test RX path I used testpmd and l3fwd (slightly midified to print RX
stats) utilities.

./dpdk-testpmd -l 64-127 -n 4 -a
0000:c1:00.0,mprq_en=1,mprq_log_stride_num=9 -a
0000:c1:00.1,mprq_en=1,mprq_log_stride_num=9 -- --stats-period 1
--nb-cores=16 --rxq=16 --txq=16 --rxd=4096 --txd=4096 --burst=64
--mbcache=512

./build/examples/dpdk-l3fwd -l 96-111 -n 4 --socketmem=0,4096 -a
0000:c1:00.0,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt_pad_en=1
                
-a
0000:c1:00.1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt_pad_en=1
                
-- -p 0x3 -P
--config='(0,0,111),(0,1,110),(0,2,109),(0,3,108),(0,4,107),(0,5,106),(0,6,105),(0,7,104),(1,0,103),(1,1,102),(1,2,101),(1,3,100),(1,4,99),(1,5,98),(1,6,97),(1,7,96)'
                
--eth-dest=0,00:15:77:1f:eb:fb --eth-dest=1,00:15:77:1f:eb:fb

Then I provided 105 Mpps of 64 bytes TCP packets from another dual port NIC
to each port (210 Mpps in sum). As I described above I can't get more than
210 Mpps in sum from generator. In both cases I was not able to get more
than 75-85 Mpps for each port (150-170 Mpps in sum) on RX path. This
contradicts with results provided in Mellanox performance report (214 Mpps
for both ports, 112 Mpps per port on RX path). Running only single
generator gives me 148 Mpps on both TX and RX sides. But after starting
generator on the second port - the TX performance decreased to 105 Mpps per
port (210 Mpps in sum), RX performance descreased to 75-85 Mpps per port
(150-170 Mpps in sum for both ports). Could these poor RX results be due
not fully utilized generator or I should get 210 Mpps provided by generator
on both ports in sum? I used all suggestions for system tuning described in
Mellanox performance report document.
I would be grateful for any advice.

Thanks in advance!


[Attachment #3 (text/html)]

<div dir="ltr">Hi!<br><br>I&#39;m testing overall dual port performance on ConnectX-6 \
Dx EN adapter card (100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on \
Ubuntu 20.04.<br>I have 2 dual port NICs installed on the same server (but on \
different NUMA nodes) which I use as a generator and a reciever respectively. \
<br>First, I started custom packet generator on port 0 and got 148 Mpps TX (64 bytes \
TCP packets with zero payload lentgh) which equals the maximum of 100 Gbps line rate. \
Then I launched the same generator with the same parameters simultaneously on port \
1.<br>Performance on both ports decreased to 105-106 Mpss per port (210-212 Mpps in \
sum). If I use 512 bytes TCP packets - then running generators on both ports gives me \
23 Mpps for each port (46 Mpps in sum, which for given TCP packet size equals the \
maximum line rate).<br><br>Mellanox performance report <a \
href="http://fast.dpdk.org/doc/perf/DPDK_21_08_Mellanox_NIC_performance_report.pdf">http://fast.dpdk.org/doc/perf/DPDK_21_08_Mellanox_NIC_performance_report.pdf</a> \
doesn&#39;t contain measurements for TX path, only for RX. <br>Provided Test#11 \
Mellanox ConnectX-6 Dx 100GbE PCIe Gen4 Throughput at Zero Packet Loss (2x 100GbE) \
for RX path contains near the same results that I got for TX path (214 Mpps for 64 \
bytes packets, 47 Mpps for 512 bytes packets). The question is - do my results for TX \
path should coincide with provided results for RX path? Why I can&#39;t get 148 x 2 \
Mpps for small packets when using both ports? What is a bottleneck here - PCIe, RAM \
or NIC itself?<br><br>To test RX path I used testpmd and l3fwd (slightly midified to \
print RX stats) utilities. <br><br>./dpdk-testpmd -l 64-127 -n 4 -a \
0000:c1:00.0,mprq_en=1,mprq_log_stride_num=9 -a \
0000:c1:00.1,mprq_en=1,mprq_log_stride_num=9 -- --stats-period 1 --nb-cores=16 \
--rxq=16 --txq=16 --rxd=4096 --txd=4096 --burst=64 \
--mbcache=512<br><br>./build/examples/dpdk-l3fwd -l 96-111 -n 4 --socketmem=0,4096 -a \
0000:c1:00.0,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt_pad_en=1 \
-a 0000:c1:00.1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt_pad_en=1 \
-- -p 0x3 -P --config=&#39;(0,0,111),(0,1,110),(0,2,109),(0,3,108),(0,4,107),(0,5,106) \
,(0,6,105),(0,7,104),(1,0,103),(1,1,102),(1,2,101),(1,3,100),(1,4,99),(1,5,98),(1,6,97),(1,7,96)&#39; \
--eth-dest=0,00:15:77:1f:eb:fb --eth-dest=1,00:15:77:1f:eb:fb<br><br>Then I provided \
105 Mpps of 64 bytes TCP packets from another dual port NIC to each port (210 Mpps in \
sum). As I described above I can&#39;t get more than 210 Mpps in sum from generator. \
In both cases I was not able to get more than 75-85 Mpps for each port (150-170 Mpps \
in sum) on RX path. This contradicts with results provided in Mellanox performance \
report (214 Mpps for both ports, 112 Mpps per port on RX path). Running only single \
generator gives me 148 Mpps on both TX and RX sides. But after starting generator on \
the second port - the TX performance decreased to 105 Mpps per port (210 Mpps in \
sum), RX performance descreased to 75-85 Mpps per port (150-170 Mpps in sum for both \
ports). Could these poor RX results be due not fully utilized generator or I should \
get 210 Mpps provided by generator on both ports in sum? I used all suggestions for \
system tuning described in Mellanox performance report document.  <div>I would be \
grateful for any advice.<br><br>Thanks in advance!<br></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic