[prev in list] [next in list] [prev in thread] [next in thread] 

List:       quagga-dev
Subject:    [quagga-dev 8201]  Re: request for testing
From:       "Chris Hall" <chris.hall.list () highwayman ! com>
Date:       2010-08-28 15:57:14
Message-ID: 017401cb46c9$af1faa90$0d5effb0$ () hall ! list () highwayman ! com
[Download RAW message or body]

Mike Tancsa wrote (on 27-Aug-2010 at 20:47):
> At 05:37 AM 8/27/2010, Denis Ovsienko wrote:
> > Chris Hall wrote: 
> > > What troubles me is that I am unsure how to distinguish the case of a
> > > dangerous, broken peer from the case of a naive (or stupid) one.

> >This task is best solved not by software, but by a live person, the
> >network administrator, which can contact remote side to find out, if
> >they execute their side of the peering contract properly or not. Let
> >computers do bit-crunching as defined by the technical spec.

>          With respect to this particular bug/security issue, is it
> something that is transitive across networks,

 The problem in Quagga was triggered by an invalid AS4 Path, which was not
properly handled.  An AS4 Path is, of course, transitive, and should arrive
only from an AS2 speaking peer -- and may have traversed more than one AS2
speaker since leaving the most recent AS4 speaker.
   
>                                               or just an issue with
> an immediate peer. ie. the bug is triggered by something local such
> as today's test
> 
> http://mailman.nanog.org/pipermail/nanog/2010-August/024837.html

Yesterday's particular stupidity is to do with a perfectly legal, but
unknown, transitive attribute, which many routers correctly passed on.  Some
routers mangled the attribute which caused further routers to drop the
session [1].

> I agree, calling the peer and dealing with it that way is the best bet.

Yes, the root cause has to be dealt with.  But the question remains what
bgpd should do in the meantime:

  a. assume the peer is essentially sound, so discard the invalid
     routes, but pass on anything not noticeably invalid.

     This is the naive or stupid peer case.

     This is assuming that bogus routes are rare, and do not
     indicate that all routes from the peer are suspect.

     This may require bgpd to assume that some level of message
     framing is valid, even if the contents are invalid, or
     even if some lower level framing is invalid.

     To some extent this masks the problem, since some routes
     may get through.

  b. drop the session and shut down the peer until the problem
     is fixed.

     This is the dangerous, broken peer case.

     This is playing safe, and taking the view that one invalid
     route may be symptomatic of something nasty -- so the best
     thing is to let the humans sort it out.

     This also avoids having to assume that any level of framing
     is valid.

  c. drop the session and restart it, but after the 3rd drop,
     shut down the peer until the problem is fixed.

     This is similar to (b), but allows for transient problems.

Any of these is better than the current response to invalid stuff, which is
to bounce the session up and down indefinitely !

Chris

[1] http://www.cisco.com/warp/public/707/cisco-sa-20100827-bgp.pdf

_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic