[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-rdma
Subject:    Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
From:       Doug Ledford <dledford () redhat ! com>
Date:       2015-02-25 17:13:58
Message-ID: 1424884438.4847.91.camel () redhat ! com
[Download RAW message or body]

On Wed, 2015-02-25 at 00:29 +0000, Weiny, Ira wrote:
> > > > > 
> > > > > Do you have a suggestion for alternatives?
> > > > 
> > > > The desire to leverage the IB MAD infrastructure for OPA is
> > > > understood but the current approach represents OPA as a device
> > > > capability which does not seem appropriate because OPA is clearly a
> > > > different type of RDMA technology than IB.
> > > > 
> > > 
> > > While it is a different type of technology, standard verbs[*] remains 100%
> > compatible.  Unlike other verbs technologies user space software does not need
> > any knowledge that the underlying device is not IB.  For example, PR (and SA)
> > queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible.
> > 
> > Even if OPA is 100% standard verbs compatible which it does not appear to be,
> > that does not make OPA an extra capability of an IBA device.
> 
> I don't want to make it an extra capability of an IBA device.  I want to make it an \
> extra capability of a "verbs" device.

And this, friends, is why it's bad to make both a link layer and an user
space API with the exact same name ;-).  Anyway, I get your point Ira
and it makes sense to me.  However, I also get Hal's point.  Our track
record on this particular issue is a bit wonky though.

First we had InfiniBand.

Then came iWARP, and we used the transport type to differentiate it from
an actual InfiniBand device, but left the underlying link layer listed
as InfiniBand.  Then came RoCE, and we listed its transport type as
InfiniBand, but changed the link layer to Ethernet.  Which left us in
the oxymoronic position that even though iWARP was over Ethernet, the
tools said it was over InfiniBand, while RoCE was the only thing that
listed Ethernet as the link layer.  We later fixed that up with some
hacks in tools to keep users from being confused and filing bugs.

Maybe this represents an opportunity to straighten some of this mess
out.  If I remember correctly, this is the matrix of technologies today:

Technology	LinkLayer	Transport

InfiniBand	InfiniBand	InfiniBand Verbs
iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
				specific connection establishment
				requirements that don't exist with IBV)
RoCE		Ethernet	InfiniBand Verbs (but with different
				addressing because of the different
				link layer)
OPA		?		InfiniBand Verbs

It makes me wonder if we shouldn't make this matrix more accurate:

Technology	LinkLayer	Transport

InfiniBand	InfiniBand	InfiniBand Verbs
iWARP		Ethernet	iWARP Verbs
RoCE		Ethernet	RoCE-v1 or RoCE-v2
OPA		?		OPA Verbs

With this sort of setup, the core ib_mad/ib_umad code would simply check
the verbs type to see what support it can enable.  For IBV it would be
the existing support, for OPAV it would be the additional jumbo support.

I'm not sure how much we might expect a change like this to break
existing software though, so maybe staightening this mess out is a
non-starter.

> > While it is a primary goal of the RDMA stack to have a common verbs API for
> > various RDMA interconnects, each one is properly represented to allow it's
> > unique characteristics to be exposed.
> 
> The difference here is that we have maintained IB Verbs compatibility where other \
> RDMA technologies did not.  We have tested many Verbs applications (both kernel and \
> user space) and they function _without_ _modification_. 
> Despite this compatibility we are still having this discussion.
> 
> I can think of no other way to signal the MAD capability to the MAD stack which \
> will preserve the verbs compatibility in the same way.

See above.  Define a new transport type, OPAVerbs, that is a superset of
IBV and enable jumbo support when OPAV is the transport on the link.

> > 
> > > Therefore, to address your initial question regarding tradeoffs I believe this
> > method is the least invasive to the code as well as removing any potential
> > performance penalties to core verbs.
> > > 
> > > Ira
> > > 
> > > [*] We don't support some of the extensions particularly those which have
> > been most recently introduced.  And we would like to make our own extensions
> > in the form of higher MTU availability, but the patch is not yet ready to be
> > submitted upstream.
> > 
> > There appear to be a number of things that are not exposed by the current
> > patch set which will be needed in subsequent patches. It would be better to see
> > the complete picture so it can be reviewed as a whole.
> 
> Is there something in particular you would like to see?  There are no other patches \
> required in the core modules for verbs applications to function.  The MTU patch \
> only improves verbs performance. 
> Ira 
> 
> > 
> > -- Hal
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
> > of a message to majordomo@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD


["signature.asc" (application/pgp-signature)]
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic