[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-pci
Subject:    Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
From:       Bjorn Helgaas <bhelgaas () google ! com>
Date:       2013-08-27 23:01:30
Message-ID: CAErSpo6pLsDTOPRgV0Qb3W9OSGZXkNOeBk+i47RyoFjK-xPEJw () mail ! gmail ! com
[Download RAW message or body]

On Fri, Aug 23, 2013 at 3:41 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > Sent: Friday, August 23, 2013 1:43 PM
> > To: Skidmore, Donald C
> > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Don Dutile
> > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> > to PF Nacked" messages
> > 
> > On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
> > <donald.c.skidmore@intel.com> wrote:
> > > > -----Original Message-----
> > > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > > > Sent: Friday, August 23, 2013 11:53 AM
> > > > To: Skidmore, Donald C
> > > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> > > > linux- kernel@vger.kernel.org; Don Dutile
> > > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of
> > > > type 00 to PF Nacked" messages
> > > > 
> > > > On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> > > > > > -----Original Message-----
> > > > > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > > > > > Sent: Friday, August 23, 2013 9:53 AM
> > > > > > To: Skidmore, Donald C
> > > > > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> > > > > > linux- kernel@vger.kernel.org; Don Dutile
> > > > > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last
> > > > > > Request of type 00 to PF Nacked" messages
> > > > > > 
> > > > > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas
> > > > > > <bhelgaas@google.com>
> > > > > > wrote:
> > > > > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
> > > > > > > <bhelgaas@google.com>
> > > > > > wrote:
> > > > > > > > On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
> > > > > > > > <bhelgaas@google.com>
> > > > > > wrote:
> > > > > > > 
> > > > > > > > > I played with this a little more and found this:
> > > > > > > > > 
> > > > > > > > > 1) Magma card in z420, connected to chassis containing X540:
> > > > > > > > > fails (original report)
> > > > > > > > > 2) X540 in z420, Magma card in z420, connected to empty chassis:
> > > > > > > > > fails
> > > > > > > > > 3) X540 in z420, Magma card in z420 but no cable to chassis:
> > > > > > > > > works
> > > > > > > 
> > > > > > > For what it's worth, I tried config 3 again with v3.11-rc6, and
> > > > > > > it failed the same way.  I haven't bothered with config 2.
> > > > > > > It's not 100% reproducible, but at least it doesn't seem
> > > > > > > related to the expansion chassis.
> > > > > > > 
> > > > > > > I attached the logs from config 3 to
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> > > > > > 
> > > > > > Is there anything I can do to help debug this?  Add
> > > > > > instrumentation, etc.?  It seems like I'm doing the simplest
> > > > > > possible thing -- just writing to the sysfs sriov_num_vfs file to enable
> > VFs.
> > > > > > 
> > > > > > I almost think it must be related to my config somehow if nobody
> > > > > > else is seeing this, but at the same time, my config also seems
> > > > > > the simplest possible, so I don't know what I could be doing that's
> > unusual.
> > > > > > 
> > > > > > Bjorn
> > > > > 
> > > > > Hey Bjorn,
> > > > > 
> > > > > I'm may be little confused so bear with me.
> > > > > 
> > > > > Option 1 = (your normal set up), Magma card plugged to chasis, X540
> > > > > in
> > > > chasis.
> > > > > Option 2 = Magma card plugged to chasis, X540 in z420 system.
> > > > > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> > > > > 
> > > > > Options 1 & 2 - always fail
> > > > > Option 3 - sometimes fails (unsure at what rate failure occurs)
> > > > > 
> > > > > Please correct me if I messed any of that up. :)
> > > > 
> > > > Generally correct.  I've seen failures in all three configs, so I'm
> > > > only concerned with the simplest for now (config 3, no expansion chassis).
> > > > 
> > > > > Another question I have relates to the lspci output you supplied in
> > > > > the
> > > > bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run
> > > > lspci before you created the VF's?  If so could we see one while the failure
> > was occurring?
> > > > 
> > > > That's correct, I collected the lspci output before reproducing the
> > > > problem.  I can't easily collect lspci afterwards because the machine
> > > > isn't responsive after the problem starts.
> > > > 
> > > > > Also could you download the latest ixgbevf from source forge?
> > > > > 
> > > > > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> > > > > 
> > > > > If we add debugging messages it will be easier to patch this driver
> > > > > and it
> > > > contains our latest validated code base.
> > > > 
> > > > I can do that if it turns out to be necessary.  But John Haller gave
> > > > me a good clue off-list:
> > > > 
> > > > John wrote:
> > > > > I assume you want the VFs to be instantiated in a VM.  To do this,
> > > > > you need to blacklist the ixgbevf driver in the host (or not
> > > > > compile it into the host), or it will try to associate the driver
> > > > > in the host, rather than in the VM where you want it.  Then, the VM
> > > > > needs the ixgbevf driver, which will hopefully do a better job of
> > > > > talking to the mailbox in the host.  There is some work to assign
> > > > > the VF(s) to the VM, but I don't remember that offhand.
> > > > 
> > > > I don't have any VMs (I started this whole thing because I was
> > > > looking at a PCI hotplug issue related to SR-IOV, so I don't really care
> > about VMs).
> > > > 
> > > > So the ixgbevf driver on the *host* is claiming the new VFs, and it
> > > > sounds like maybe it can't handle that?
> > > > 
> > > > Bjorn
> > > 
> > > Not to speak for John, but I believe he was saying if you want to use your
> > VF's in a VM you need to make sure you don't run the ixgbevf driver on the
> > host as it will "claim" the VF's.  If you are NOT running any VM's then it is
> > perfectly fine to have both ixgbe and ixgbevf loaded.
> > 
> > OK.  It certainly *seemed* surprising to have the ixgbevf driver blow up,
> > even if it was an error on my part to load it in the host.  Just let me know if
> > there's any more testing I can do.
> > 
> > Bjorn
> 
> Something is leading to the mbx messages being messed up as event by the " Last \
> Request of type 03 to PF Nacked" messages.   Have you tried reseting the ixgbevf \
> port (ethtool -r <your port>)?  Is it even possible to do this as you mentioned \
> that in the failure state the machine isn't very responsive? 
> If it might be worthwhile to add logging into the ixgbevf and ixgbe drivers around \
> the mbx messages, with the hope being that it would help show what is going between \
> the two.  There have been some changes in that area of the ixgbevf code as of late, \
> so working off the latest source forge driver would the easiest for me to send you \
> patch on.  Sadly we haven't been able to recreate the failure here so it makes it \
> rather hard to debug.

I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
driver from http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/

I did notice what looks like a printk format problem and what appears
to be a bare MAC address with no label:

[  316.699504] ixgbevf: eth%d: ixgbevf_init_interrupt_scheme:
Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1
[  316.710897] ixgbevf: eth3: ixgbevf_probe: Intel(R) X540 Virtual Function
[  316.717608] 08:88:ff:ff:0d:ec

Sorry for wasting so much time on something that appears to be already fixed.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic