[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-pci
Subject:    Re: PCIe endpoint crosstalk
From:       Bjorn Helgaas <bhelgaas () google ! com>
Date:       2013-08-28 12:43:56
Message-ID: CAErSpo4HQpi8HfWP=w=QYZTyKdDU7qxBxBhhGvEsvY9qLJs=xg () mail ! gmail ! com
[Download RAW message or body]

On Wed, Aug 28, 2013 at 2:09 AM, Ludwig Petrosyan
<ludwig.petrosyan@desy.de> wrote:
> On 08/27/2013 06:27 PM, Bjorn Helgaas wrote:
>> On Tue, Aug 27, 2013 at 09:53:35AM +0200, Ludwig Petrosyan wrote:
>>> So: I use microTCA system with PCIe bus, there are two AMC cards (PCIe
>>> endpoints), lets call card A and card B.
>>> as well there are two device drivers for A and B. Card B has bug, after
>>> PCIe memory write  operation (MWr) the card sends back Completion
>>> packet without data (Cpl) (I now it is wrong, but card designed in this
>>> way and has to be changed).
>>> User process Ua reads data from Card A in loop, everything is OK , but
>>> then I start second user process Ub which writes in loop data to card B
>>> (bugged card) the Ua gets wrong data. After improving card B the problem
>>> was solved, but could be it has to be checked on the PCIe driver level
>>> as well.
>> PCIe transactions (MWr, MRd, Cpl, etc.) are not directly visible
>> to the OS or the driver.
>>
>> The only thing I can think of that we could do is add a quirk to
>> blacklist the broken version of card B.  You can look at existing
>> quirks in drivers/pci/quirks.c.  Most of them workaround issues
>> that aren't quite as severe as this one, but we could probably
>> figure out a way to make the device completely unusable.
>>
>> Or do you have something else in mind?
>>
>> Bjorn
> We have fixed the bug in card B and now it is OK, but question is open,
> what will happen if we got some PCIe endpoint card with the same bug:
> read operations from other PCIe devices could be broken. Just I think
> this problem should be solved on the OS level (I am not sure)
>
> I will try to explain how things are going on how I think:
>
> User process Ub sends Memory-Write request to card B, this is Posted
> request, so just  after sending the request Ub forgets about it,
> TLP of this packet contain Requester ID for RootComplex, at the same
> time user process Ua (the RootComplex is free now) sends non-Posted
> memory read request to card A and waits for Completion packet, but at
> the same time the card B (bugged card, it should not send Completion to
> Posted memory write request) send to RootComplex Completion Packet
> without data and some how Ua get this data as result of his Memory Read
> request. Seems the Completer ID (or Tag field) in Completion packet not
> checked and completion from one PCIe endpoint returned as completion of
> read request from other PCIe endpoint.
>
> I want to say this is only an assumption, just I wont to be sure the
> bugged PCIe device won't influence operation of other devices
> But could be this problem has to be solved on PCIe Switch or RootComplex
> side not in OS side...

Yes.  I can't conceive of a way for the OS to deal with this problem.
The only thing I can think of is to disable card B altogether.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic