[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xdp-newbies
Subject:    Re: ICE driver bug using XDP_TX with multi FCQs
From:       Robin Cowley <Robin.Cowley () thehutgroup ! com>
Date:       2022-11-29 14:10:48
Message-ID: LO4P265MB37589CCE8C7014D275E2E21F87129 () LO4P265MB3758 ! GBRP265 ! PROD ! OUTLOOK ! COM
[Download RAW message or body]

Hi Maciej,

Happy to test the fix supplied by you and to report the findings back.

Thanks,
Robin


From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Sent: 29 November 2022 13:56
To: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Robin Cowley <Robin.Cowley@thehutgroup.com>; xdp-newbies@vger.kernel.org \
                <xdp-newbies@vger.kernel.org>
Subject: Re: ICE driver bug using XDP_TX with multi FCQs 
  
CAUTION: This email originates from outside THG

On Fri, Nov 25, 2022 at 09:16:17AM +0100, Magnus Karlsson wrote:
> On Thu, Nov 24, 2022 at 4:40 PM Robin Cowley
> <Robin.Cowley@thehutgroup.com> wrote:
> > 
> > Hi there,
> > 
> > We're looking at developing some software that uses XSK in zero copy mode, where \
> > we either redirect packets to userspace using AF_XDP, or transmit packets \
> > straight from the XDP kernel program using XDP_TX. 
> > Our program is the same one as described here:
> > https://lore.kernel.org/xdp-newbies/6205E10C-292E-4995-9D10-409649354226@outlook.com/
> >  
> > Recently we've been testing some functionality that transmits packets directly \
> > from the data plane / XDP code using XDP_TX. This functionality works on a \
> > mellanox MT27710 ConnectX-4 Lx NIC using mlx5_core driver. However, using an \
> > Intel NIC with the ice driver, we have some problems. This was tested on the 5.15 \
> > kernel and on the newer 6.1 kernel and they both result in the same behaviour. 
> > Everything below was seen using the intel NIC with these configs:
> > 
> > # ethtool -i ice0
> > driver: ice
> > version: 6.1.0-0.rc5.el8.elrepo.x86_64
> > firmware-version: 2.50 0x800077a8 1.2960.0
> > expansion-rom-version:
> > bus-info: 0000:03:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > # lspci -s 03:00.0
> > 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for \
> > SFP (rev 02) 
> > # ethtool -g ice0
> > Ring parameters for ice0:
> > Pre-set maximums:
> > RX:         8160
> > RX Mini:    n/a
> > RX Jumbo:   n/a
> > TX:         8160
> > Current hardware settings:
> > RX:         4096
> > RX Mini:    n/a
> > RX Jumbo:   n/a
> > TX:         4096
> > 
> > # ethtool -l ice0
> > Channel parameters for ice0:
> > Pre-set maximums:
> > RX:         16
> > TX:         16
> > Other:       1
> > Combined:   16
> > Current hardware settings:
> > RX: 0
> > TX:         0
> > Other: 1
> > Combined:   4
> > 
> > 
> > When redirecting traffic from the data plane into user-space via XSK, everything \
> > works as expected. 
> > When transmitting packets from the data plane directly out the NIC via XDP_TX, we \
> > can see our kernel logs getting hit through the systemd-journal process. It seems \
> > to be for every packet sent through XDP_TX, it's generating a kernel warning. 
> > 
> > An example warning and call trace is:
> > 
> > Incorrect XDP memory type (1785255936) usage
> > WARNING: CPU: 7 PID: 0 at net/core/xdp.c:403 __xdp_return+0x33/0x1f0
> > 
> > ...
> > 
> > Call Trace:
> > <IRQ>
> > ice_xmit_zc+0x251/0x310 [ice]
> > ice_napi_poll+0x54/0x640 [ice]
> > __napi_poll+0x2b/0x190
> > net_rx_action+0x2b2/0x310
> > __do_softirq+0xbe/0x2b6
> > irq_exit_rcu+0xad/0xd0
> > common_interrupt+0x82/0xa0
> > </IRQ>
> > 
> > 
> > The memory type value seen above changes each error, suggesting that the value is \
> > uninitialized or the pointer is corrupted.

Fix is a oneliner I believe, let's do it this way - I will send the patch
to this mailing list with a request to you to test it. If it fixes things
on your side, let us send it upstream with your reported by tag.

Sounds good?

> > 
> > We have been able to recreate the issue using a program based on the xdpsock \
> > sample programs from the kernel tree to validate it's not specific to our \
> > software. 
> > 
> > We have been testing a simple BPF program that swaps the MAC addresses around and \
> > transmits the packet back out of the same NIC. This can be seen here: \
> > https://github.com/OpenSource-THG/xdpsock-sample/tree/test_zero_copy_tx on the \
> > test_zero_copy_tx branch, which has the very basic BPF program. The issue only \
> > occurs when testing the multi FCQ, it seems to work fine on a single FCQ. The \
> > issue also happens in copy mode and zero copy mode. 
> > The command used was:
> > 
> > ./xdpsock_multi --extra-stats --l2fwd --zero-copy --interface ice0 --channels=2 \
> > --busy-poll 
> > 
> > It is my belief that this is a supported scenario, but I'm seeking some guidance \
> > to validate my thoughts, and ultimately whether this is a legitimate bug.
> 
> Thank you so much for the detailed bug report Robin. We will try to
> reproduce it on our end, root cause it and get back to you.
> 
> > I hope this gives enough background and information for a reproducible issue. Any \
> > feedback is welcome and we look forward to hearing a response. :) Robin Cowley
> > Software Engineer
> > The Hut Group<http://www.thehutgroup.com/>
> > 
> > Tel:
> > Email: Robin.Cowley@thehutgroup.com<mailto:Robin.Cowley@thehutgroup.com>
> > 
> > For the purposes of this email, the "company" means The Hut Group Limited, a \
> > company registered in England and Wales (company number 6539496) whose registered \
> > office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 \
> > 3DQ and/or any of its respective subsidiaries. 
> > Confidentiality Notice
> > This e-mail is confidential and intended for the use of the named recipient only. \
> > If you are not the intended recipient please notify us by telephone immediately \
> > on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from \
> > your system and note that any use, dissemination, forwarding, printing or copying \
> > is strictly prohibited. Any views or opinions are solely those of the author and \
> > do not necessarily represent those of the company. 
> > Encryptions and Viruses
> > Please note that this e-mail and any attachments have not been encrypted. They \
> > may therefore be liable to be compromised. Please also note that it is your \
> > responsibility to scan this e-mail and any attachments for viruses. We do not, to \
> > the extent permitted by law, accept any liability (whether in contract, \
> > negligence or otherwise) for any virus infection and/or external compromise of \
> > security and/or confidentiality in relation to transmissions sent by e-mail. 
> > Monitoring
> > Activity and use of the company's systems is monitored to secure its effective \
> > use and operation and for other lawful business purposes. Communications using \
> > these systems will also be monitored and may be recorded to secure effective use \
> > and operation and for other lawful business purposes. 
> > hgvyjuv


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic