[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-api
Subject:    Re: [PATCH v2 net-next 0/3] ipv4: Hash-based multipath routing
From:       Thomas Graf <tgraf () suug ! ch>
Date:       2015-08-31 9:02:11
Message-ID: 20150831090211.GA12707 () pox ! localdomain
[Download RAW message or body]

On 08/30/15 at 03:29pm, Tom Herbert wrote:
> On Sun, Aug 30, 2015 at 2:28 PM, Peter N??rlund <pch@ordbogen.com> wrote:
> > It would definitely be simpler, and it would be nice to just fetch the
> > hash directly from the NIC - and for link aggregation it would probably
> > be fine. But with L4, we always need to consider fragmented packets,
> > which might cause some packets of a flow to be routed differently - and
> > with ECMP, the ramifications of suddenly choosing another path for a
> > flow are worse than for link aggregation. The latency through the
> > different paths may differ enough to cause out-or-order packets and bad
> > TCP performance as a consequence. Both Cisco and Juniper routers
> > defaults to L3 for ECMP - exactly for that reason, I believe. RFC 2991
> > also points out that ports probably shouldn't be used as part of the
> > flow key with ECMP.
> >
> That's more reason why we need vendors to use IPv6 flow label instead
> of ports to do ECMP :-). In any case, if we're fragmenting TCP packets
> then we're already in a bad place performance-wise-- we really don't
> need to optimize for that case. Albeit, it would be nice if fragments
> of packet  followed same path, but the would require devices to not do
> L4 hash over ports when MF is set-- I don't know if anyone does that
> (I have been meaning to add that to stack).

+1 for solving this at hash level. Being able to rely on the L4 HW
hash for multipath routing is very desirable. A simple MF bit ||
FO > 0 check with fall back to flow dissector to generate an L3 hash
in case the HW provided an L4 hash should be sufficient to address the
fragmentation concern.

Since performance is gone anyway, I'm not sure it's worth offloading
this behaviour to the HW.
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic