[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bird-users
Subject:    Re: LSA ID collision issue is seen with OSPF stub-networks configured and BIRD is acting as ABR
From:       naveen chowdary Yerramneni <naveenchowdary.581 () gmail ! com>
Date:       2016-11-30 15:07:32
Message-ID: CAMujGJ6CCB3eABOhbZ3nktXayOt_0Whsb8ym3_Az3U+WiT8eTg () mail ! gmail ! com
[Download RAW message or body]

Regd #6, all stubnetworks are explicitly removed from FIB in rt_sync(), not
due to any external event. Pasted the code sniplet below for reference.

I configured stub-networks in the OSPF config using "stubnet" option.



static void
rt_sync(struct ospf_proto *p)
{
  struct top_hash_entry *en;
  struct fib_iterator fit;
  struct fib *fib = &p->rtf;
  ort *nf;
  struct ospf_area *oa;

  /* This is used for forced reload of routes */
  int reload = (p->calcrt == 2);

  OSPF_TRACE(D_EVENTS, "Starting routing table synchronisation");

  DBG("Now syncing my rt table with nest's\n");
  FIB_ITERATE_INIT(&fit, fib);
again1:
  FIB_ITERATE_START(fib, &fit, nftmp)
  {
    nf = (ort *) nftmp;

    /* Sanity check of next-hop addresses, failure should not happen */
    if (nf->n.type)
    {
      struct mpnh *nh;
      for (nh = nf->n.nhs; nh; nh = nh->next)
        if (ipa_nonzero(nh->gw))
        {
          neighbor *ng = neigh_find2(&p->p, &nh->gw, nh->iface, 0);
          if (!ng || (ng->scope == SCOPE_HOST))
            { reset_ri(nf); break; }
        }
    }

    /* Remove configured stubnets */
    if (!nf->n.nhs)
      reset_ri(nf);

    .
.
.
.
.

    /* Remove unused rt entry, some special entries are persistent */
    if (!nf->n.type && !nf->external_rte && !nf->area_net)
    {
      FIB_ITERATE_PUT(&fit, nftmp);
      fib_delete(fib, nftmp);
      goto again1;
    }
  }
  FIB_ITERATE_END(nftmp);
  .
  .
  .
}


On Wed, Nov 30, 2016 at 6:01 PM, Ondrej Zajicek <santiago@crfreenet.org>
wrote:

> On Wed, Nov 30, 2016 at 10:35:45AM +0530, naveen chowdary Yerramneni wrote:
> > Hi,
> >
> > *Issue Description*: LSA ID collision issue is seen with OSPF
> > *stub-networks* configured and BIRD is acting as ABR
>
> Hi, thanks for the bugreport and analysis. My comments and questions are
> below.
>
> > *Code Flow*:
> > 1.       Stubnets are advertised to area-0 by generating router LSA (
> > LSA_T_RT).
> > o   ospf_disp() -> ospf_update_topology() ->
> ospf_originate_rt_lsa()->ospf_
> > originate_lsa()
> > 2.       Now, stubnets are added to top graph table (p->gr) with LSA type
> > LSA_T_RT
> > 3.       When creating routing table, these stubnets are added to FIB (
> > p->rtf).
> > o   ospf_disp() -> ospf_rt_spf() -> ospf_rt_spfa() -> spfa_process_rt()
> ->
> > add_network() -> ri_install_net()
> > 4.       When BIRD is acting as ABR then, walk through FIB(p->rtf)  and
> > send summary LSA (LSA_T_SUM_NET) with LSA mode as LSA_M_RTCALC. Also, nf
> > pointer is set (stores fib node address) in top_hash_entry.
> > o    ospf_disp() -> ospf_rt_abr2() -> check_sum_net_lsa() ->
> > ospf_originate_sum_net_lsa() ->ospf_originate_lsa()
> > 5.       Now, stubnets are added to top graph table (p->gr) with LSA type
> > LSA_T_SUM_NET
>
> Until this it looks OK.
>
> > 6.       Stubnets are removed from FIB(p->rtf)
> > o   ospf_disp() -> rt_sync() -> fib_delete()
>
> This seems strange. If these networs are part of the topology, they
> should be in FIB(p->rtf). You mean that appropriate record in FIB is
> removed automatically as a conseqyence of this code seqence, or is
> there any external change (like stubnet removal) that causes it to
> be removed?
>
> > 7.       Now, stubnets entries are still present in top graph table
> (Note:
> > fib_delte() doesn't free the node, it just moves the fib node to free
> pool,
> > fib_node pointer is still valid)
>
> That looks like a dangerous bug. Although fib_delete just moves the
> fib node back to slab pool, the pointer is considered freed and invalid
> (the slab pool is just optimization).
>
>
> > 8.       With any change in network, ospf_rt_spf()is called. In
> > ospf_rt_reset(), LSA mode is updated fromLSA_M_RTCALC to LSA_M_STALE.
> > 9.       Again, steps 3-6 are repeated. In step-3, fib node pointer is
> > changed and in step-4, fib node pointer comparison fails in
> > ospf_originate_lsa()which is leading to LSA id collision.
> >
> > *Issue is resolved with below code change. *Please review the change and
> > provide your comments. Also, please let me know if any other information
> is
> > required.
>
>
> Well, it seems to me that the real cause of the bug is in step 6.
>
> BTW, your stubnets are explicitly configured ones or based on prefixes
> of stub interfaces?
>
> --
> Elen sila lumenn' omentielvo
>
> Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org)
> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
> "To err is human -- to blame it on a computer is even more so."
>

[Attachment #3 (text/html)]

<div dir="ltr">Regd #6, all stubnetworks are explicitly removed from FIB in \
rt_sync(), not due to any external event. Pasted the code sniplet below for \
reference.<div><br></div><div>I configured stub-networks in the OSPF config using \
&quot;stubnet&quot; option.</div><div><br></div><div><br><div><br></div><div><div>static \
void</div><div>rt_sync(struct ospf_proto *p)</div><div>{</div><div>   struct \
top_hash_entry *en;</div><div>   struct fib_iterator fit;</div><div>   struct fib \
*fib = &amp;p-&gt;rtf;</div><div>   ort *nf;</div><div>   struct ospf_area \
*oa;</div><div><br></div><div>   /* This is used for forced reload of routes \
*/</div><div>   int reload = (p-&gt;calcrt == 2);</div><div><br></div><div>   \
OSPF_TRACE(D_EVENTS, &quot;Starting routing table \
synchronisation&quot;);</div><div><br></div><div>   DBG(&quot;Now syncing my rt table \
with nest&#39;s\n&quot;);</div><div>   FIB_ITERATE_INIT(&amp;fit, \
fib);</div><div>again1:</div><div>   FIB_ITERATE_START(fib, &amp;fit, \
nftmp)</div><div>   {</div><div>      nf = (ort *) nftmp;</div><div><br></div><div>   \
/* Sanity check of next-hop addresses, failure should not happen */</div><div>      \
if (nf-&gt;n.type)</div><div>      {</div><div>         struct mpnh *nh;</div><div>   \
for (nh = nf-&gt;n.nhs; nh; nh = nh-&gt;next)</div><div>            if \
(ipa_nonzero(nh-&gt;gw))</div><div>            {</div><div>               neighbor \
*ng = neigh_find2(&amp;p-&gt;p, &amp;nh-&gt;gw, nh-&gt;iface, 0);</div><div>          \
if (!ng || (ng-&gt;scope == SCOPE_HOST))</div><div>                  { reset_ri(nf); \
break; }</div><div>            }</div><div>      }</div><div><br></div><div>      /* \
Remove configured stubnets */</div><div>      if (!nf-&gt;n.nhs)</div><div>         \
reset_ri(nf);</div><div><br></div><div>      .</div><div><span \
class="gmail-Apple-tab-span" style="white-space:pre">	</span>.</div><div><span \
class="gmail-Apple-tab-span" style="white-space:pre">	</span>.</div><div><span \
class="gmail-Apple-tab-span" style="white-space:pre">	</span>.</div><div><span \
class="gmail-Apple-tab-span" \
style="white-space:pre">	</span>.</div><div><br></div><div>      /* Remove unused rt \
entry, some special entries are persistent */</div><div>      if (!nf-&gt;n.type \
&amp;&amp; !nf-&gt;external_rte &amp;&amp; !nf-&gt;area_net)</div><div>      \
{</div><div>         FIB_ITERATE_PUT(&amp;fit, nftmp);</div><div>         \
fib_delete(fib, nftmp);</div><div>         goto again1;</div><div>      }</div><div>  \
}</div><div>   FIB_ITERATE_END(nftmp);</div><div>   .</div><div>   .</div><div>   \
.</div><div>}</div><div>                               </div></div></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 30, 2016 at 6:01 PM, \
Ondrej Zajicek <span dir="ltr">&lt;<a href="mailto:santiago@crfreenet.org" \
target="_blank">santiago@crfreenet.org</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">On Wed, Nov 30, 2016 at 10:35:45AM +0530, naveen chowdary \
Yerramneni wrote:<br> &gt; Hi,<br>
&gt;<br>
&gt; *Issue Description*: LSA ID collision issue is seen with OSPF<br>
&gt; *stub-networks* configured and BIRD is acting as ABR<br>
<br>
Hi, thanks for the bugreport and analysis. My comments and questions are<br>
below.<br>
<br>
&gt; *Code Flow*:<br>
<span class="">&gt; 1.           Stubnets are advertised to area-0 by generating \
router LSA (<br> &gt; LSA_T_RT).<br>
&gt; o     ospf_disp() -&gt; ospf_update_topology() -&gt; \
ospf_originate_rt_lsa()-&gt;ospf_<br> &gt; originate_lsa()<br>
&gt; 2.           Now, stubnets are added to top graph table (p-&gt;gr) with LSA \
type<br> &gt; LSA_T_RT<br>
&gt; 3.           When creating routing table, these stubnets are added to FIB (<br>
&gt; p-&gt;rtf).<br>
&gt; o     ospf_disp() -&gt; ospf_rt_spf() -&gt; ospf_rt_spfa() -&gt; \
spfa_process_rt() -&gt;<br> &gt; add_network() -&gt; ri_install_net()<br>
&gt; 4.           When BIRD is acting as ABR then, walk through FIB(p-&gt;rtf)   \
and<br> &gt; send summary LSA (LSA_T_SUM_NET) with LSA mode as LSA_M_RTCALC. Also, \
nf<br> &gt; pointer is set (stores fib node address) in top_hash_entry.<br>
&gt; o      ospf_disp() -&gt; ospf_rt_abr2() -&gt; check_sum_net_lsa() -&gt;<br>
&gt; ospf_originate_sum_net_lsa() -&gt;ospf_originate_lsa()<br>
&gt; 5.           Now, stubnets are added to top graph table (p-&gt;gr) with LSA \
type<br> &gt; LSA_T_SUM_NET<br>
<br>
</span>Until this it looks OK.<br>
<span class=""><br>
&gt; 6.           Stubnets are removed from FIB(p-&gt;rtf)<br>
&gt; o     ospf_disp() -&gt; rt_sync() -&gt; fib_delete()<br>
<br>
</span>This seems strange. If these networs are part of the topology, they<br>
should be in FIB(p-&gt;rtf). You mean that appropriate record in FIB is<br>
removed automatically as a conseqyence of this code seqence, or is<br>
there any external change (like stubnet removal) that causes it to<br>
be removed?<br>
<span class=""><br>
&gt; 7.           Now, stubnets entries are still present in top graph table \
(Note:<br> &gt; fib_delte() doesn&#39;t free the node, it just moves the fib node to \
free pool,<br> &gt; fib_node pointer is still valid)<br>
<br>
</span>That looks like a dangerous bug. Although fib_delete just moves the<br>
fib node back to slab pool, the pointer is considered freed and invalid<br>
(the slab pool is just optimization).<br>
<span class=""><br>
<br>
&gt; 8.           With any change in network, ospf_rt_spf()is called. In<br>
&gt; ospf_rt_reset(), LSA mode is updated fromLSA_M_RTCALC to LSA_M_STALE.<br>
&gt; 9.           Again, steps 3-6 are repeated. In step-3, fib node pointer is<br>
&gt; changed and in step-4, fib node pointer comparison fails in<br>
&gt; ospf_originate_lsa()which is leading to LSA id collision.<br>
&gt;<br>
</span>&gt; *Issue is resolved with below code change. *Please review the change \
and<br> <span class="">&gt; provide your comments. Also, please let me know if any \
other information is<br> &gt; required.<br>
<br>
<br>
</span>Well, it seems to me that the real cause of the bug is in step 6.<br>
<br>
BTW, your stubnets are explicitly configured ones or based on prefixes<br>
of stub interfaces?<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Elen sila lumenn&#39; omentielvo<br>
<br>
Ondrej &#39;Santiago&#39; Zajicek (email: <a \
href="mailto:santiago@crfreenet.org">santiago@crfreenet.org</a>)<br> OpenPGP \
encrypted e-mails preferred (KeyID 0x11DEADC3, <a href="http://wwwkeys.pgp.net" \
rel="noreferrer" target="_blank">wwwkeys.pgp.net</a>)<br> &quot;To err is human -- to \
blame it on a computer is even more so.&quot;<br> \
</font></span></blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic