[prev in list] [next in list] [prev in thread] [next in thread]
List: quagga-dev
Subject: [quagga-dev 8178] Re: [PATCH] bgpd: fix bgp_node locking issues
From: Balaji G <balajig81 () gmail ! com>
Date: 2010-08-25 14:42:45
Message-ID: AANLkTimg6gpyO5cgqZS+PYmpzsD_B2KOpKxOTik6Yf=D () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
> Where is your branch located (sorry if this is a daft quesion) but seeing
I need a release with this patch applied ....
http://github.com/balajig/quagga-next
<http://github.com/balajig/quagga-next>Some how it makes me feel whether my
efforts are going void :(
Cheers,
- Balaji
<http://github.com/balajig/quagga-next>
On Wed, Aug 25, 2010 at 7:50 PM, Richard J Palmer <richard@merula.net>wrote:
> Ah fun...
>
> Where is your branch located (sorry if this is a daft quesion) but seeing
> I need a release with this patch applied ....
>
>
> On 25/08/2010 15:20, Balaji G wrote:
>
> Hi Richard
>
> I am maintaining a separate branch in which lot of patches are getting
> merged and once Paul is back he could pull it from this location and hence i
> have applied your patch to my branch which i maintain. I am surprised that a
> new version has been released without merging these fixes so it makes me
> think whether what i maintain is useless ?
>
> Cheers,
> - Balaji
>
> On Wed, Aug 25, 2010 at 2:24 PM, Richard J Palmer <richard@merula.net>wrote:
>
>> Can you confirm if this is in 0.99.17 ?
>>
>> I don't see this patch listed on the changelog
>>
>> Thanks
>>
>> Richard
>>
>>
>> On 28/07/2010 04:45, Balaji G wrote:
>>
>> Hi Chris
>>
>> Applied. Thanks
>>
>> Thanks,
>> Cheers,
>> - Balaji
>>
>> On Tue, Jul 27, 2010 at 9:58 PM, Chris Caputo <ccaputo@alt.net> wrote:
>>
>>> Balaji, please consider this for quagga-next.
>>>
>>> Thanks,
>>> Chris
>>>
>>> ---------- Forwarded message ----------
>>> Date: Sun, 25 Apr 2010 20:58:27 +0000 (UTC)
>>> From: Chris Caputo <ccaputo@alt.net>
>>> To: quagga-dev@lists.quagga.net, Denis Ovsienko <infrastation@yandex.ru>
>>> Cc: Richard Palmer <richard@merula.net>
>>> Subject: [quagga-dev 7960] Re: [PATCH] bgpd: fix bgp_node locking issues
>>>
>>> I believe the below patch is ready for inclusion.
>>>
>>> Richard Palmer has been running it for almost 2 months now and it fixed
>>> the assert problem he was experiencing.
>>>
>>> Thanks,
>>> Chris
>>>
>>> ---------- Forwarded message ----------
>>> Date: Sat, 27 Feb 2010 04:45:29 +0000 (UTC)
>>> From: Chris Caputo <ccaputo@alt.net>
>>> To: Richard Palmer <richard@merula.net>
>>> Cc: quagga-dev@lists.quagga.net
>>> Subject: [quagga-dev 7809] [PATCH] bgpd: fix bgp_node locking issues
>>>
>>> On Wed, 16 Dec 2009, Richard Palmer wrote:
>>> > i've just had another crash in BGPd same as all previous:
>>> >
>>> > 2009/12/15 11:01:59 BGP: 195.66.224.250 [Error] bgp_read_packet error:
>>> Connection reset by peer
>>> > 2009/12/15 11:04:14 BGP: Assertion `node->lock > 0' failed in file
>>> bgp_table.c, line 252, function bgp_unlock_node
>>> > 2009/12/15 11:04:14 BGP: Backtrace for 12 stack frames:
>>> > 2009/12/15 11:04:14 BGP: [bt 0]
>>> /usr/local/quagga/lib/libzebra.so.0(zlog_backtrace+0x1f) [0x1571c5]
>>> > 2009/12/15 11:04:14 BGP: [bt 1]
>>> /usr/local/quagga/lib/libzebra.so.0(_zlog_assert_failed+0x99) [0x157322]
>>> > 2009/12/15 11:04:14 BGP: [bt 2] ./bgpd [0xc53dc7]
>>> > 2009/12/15 11:04:14 BGP: [bt 3] ./bgpd [0xc525d4]
>>> > 2009/12/15 11:04:14 BGP: [bt 4] ./bgpd [0xc41e85]
>>> > 2009/12/15 11:04:14 BGP: [bt 5] ./bgpd [0xc42937]
>>> > 2009/12/15 11:04:14 BGP: [bt 6] ./bgpd [0xc490a3]
>>> > 2009/12/15 11:04:14 BGP: [bt 7] ./bgpd(bgp_read+0xa72) [0xc4aa1d]
>>> > 2009/12/15 11:04:14 BGP: [bt 8]
>>> /usr/local/quagga/lib/libzebra.so.0(thread_call+0x67) [0x14b799]
>>> > 2009/12/15 11:04:14 BGP: [bt 9] ./bgpd(main+0x418) [0xc230e1]
>>> > 2009/12/15 11:04:14 BGP: [bt 10] /lib/libc.so.6(__libc_start_main+0xe6)
>>> [0x18b5d6]
>>> > 2009/12/15 11:04:14 BGP: [bt 11] ./bgpd [0xc22bb1]
>>> >
>>> > We have had several of these :(
>>> >
>>> > Any thoughts on how we can fix this ?
>>>
>>> Richard,
>>>
>>> I've worked up a fix the crash you have been experiencing. Please apply
>>> this patch to your 0.99.15 install and report any problems or success.
>>> Feel free to keep using the diagnostic code I sent, so you can check your
>>> BGP log for the overflow indication we were watching for previously. Or
>>> feel free to apply this to a virgin 0.99.15 codebase.
>>>
>>> All,
>>>
>>> I've been working with Richard on his crashing issue. The problem turned
>>> out to be connected table locks being locked but not unlocked, such that
>>> eventually a lock would exceed 2^31 and become negative, thus triggering
>>> an assert later on.
>>>
>>> Patch below...
>>>
>>> Thanks,
>>> Chris
>>>
>>> ---
>>> [PATCH] bgpd: fix bgp_node locking issues
>>>
>>> This patch fixes bugs with respect to bgp_node locking. Also included
>>> are
>>> improvements to alloc/free & cleanup code.
>>>
>>> Thanks to Richard Palmer for assistance in narrowing down the locking
>>> problem on his system, which was regularly experiencing a lock count
>>> exceeding 2^31 and becoming negative.
>>>
>>> bgpd/bgp_main.c:
>>> - Improve bgp_exit() cleanup code to enable developers to better detect
>>> leaks.
>>>
>>> bgpd/bgp_nexthop.c:
>>> - Correct use of bgp_node_match() and bgp_node_lookup() by calling
>>> bgp_unlock_node() as appropriate.
>>> - Track bgp_connected_ref allocations.
>>> - Fix cleanup code by releasing nexthop cache in bgp_scan_finish().
>>>
>>> bgpd/bgp_route.c:
>>> - Correct use of bgp_node_match() by calling bgp_unlock_node() as
>>> appropriate.
>>>
>>> lib/memtypes.c:
>>> - Add MTYPE_BGP_CONN.
>>>
>>> Signed-off-by: Chris Caputo <ccaputo@alt.net>
>>> ---
>>> bgpd/bgp_main.c | 10 ++++++-
>>> bgpd/bgp_nexthop.c | 22 ++++++++++++---
>>> bgpd/bgp_route.c | 77
>>> +++++++++++++++++++++++++++++++---------------------
>>> lib/memtypes.c | 1 +
>>> 4 files changed, 74 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/bgpd/bgp_main.c b/bgpd/bgp_main.c
>>> index 9d14683..1a460c6 100644
>>> --- a/bgpd/bgp_main.c
>>> +++ b/bgpd/bgp_main.c
>>> @@ -243,7 +243,15 @@ bgp_exit (int status)
>>> if (retain_mode)
>>> if_add_hook (IF_DELETE_HOOK, NULL);
>>> for (ALL_LIST_ELEMENTS (iflist, node, nnode, ifp))
>>> - if_delete (ifp);
>>> + {
>>> + struct listnode *c_node, *c_nnode;
>>> + struct connected *c;
>>> +
>>> + for (ALL_LIST_ELEMENTS (ifp->connected, c_node, c_nnode, c))
>>> + bgp_connected_delete (c);
>>> +
>>> + if_delete (ifp);
>>> + }
>>> list_free (iflist);
>>>
>>> /* reverse bgp_attr_init */
>>> diff --git a/bgpd/bgp_nexthop.c b/bgpd/bgp_nexthop.c
>>> index 0cde665..719cb96 100644
>>> --- a/bgpd/bgp_nexthop.c
>>> +++ b/bgpd/bgp_nexthop.c
>>> @@ -276,6 +276,8 @@ bgp_nexthop_lookup_ipv6 (struct peer *peer, struct
>>> bgp_info *ri, int *changed,
>>>
>>> if (bnc->metric != oldbnc->metric)
>>> bnc->metricchanged = 1;
>>> +
>>> + bgp_unlock_node (oldrn);
>>> }
>>> }
>>> }
>>> @@ -365,6 +367,8 @@ bgp_nexthop_lookup (afi_t afi, struct peer *peer,
>>> struct bgp_info *ri,
>>>
>>> if (bnc->metric != oldbnc->metric)
>>> bnc->metricchanged = 1;
>>> +
>>> + bgp_unlock_node (oldrn);
>>> }
>>> }
>>> }
>>> @@ -571,7 +575,7 @@ bgp_connected_add (struct connected *ifc)
>>> }
>>> else
>>> {
>>> - bc = XCALLOC (0, sizeof (struct bgp_connected_ref));
>>> + bc = XCALLOC (MTYPE_BGP_CONN, sizeof (struct
>>> bgp_connected_ref));
>>> bc->refcnt = 1;
>>> rn->info = bc;
>>> }
>>> @@ -596,7 +600,7 @@ bgp_connected_add (struct connected *ifc)
>>> }
>>> else
>>> {
>>> - bc = XCALLOC (0, sizeof (struct bgp_connected_ref));
>>> + bc = XCALLOC (MTYPE_BGP_CONN, sizeof (struct
>>> bgp_connected_ref));
>>> bc->refcnt = 1;
>>> rn->info = bc;
>>> }
>>> @@ -636,7 +640,7 @@ bgp_connected_delete (struct connected *ifc)
>>> bc->refcnt--;
>>> if (bc->refcnt == 0)
>>> {
>>> - XFREE (0, bc);
>>> + XFREE (MTYPE_BGP_CONN, bc);
>>> rn->info = NULL;
>>> }
>>> bgp_unlock_node (rn);
>>> @@ -662,7 +666,7 @@ bgp_connected_delete (struct connected *ifc)
>>> bc->refcnt--;
>>> if (bc->refcnt == 0)
>>> {
>>> - XFREE (0, bc);
>>> + XFREE (MTYPE_BGP_CONN, bc);
>>> rn->info = NULL;
>>> }
>>> bgp_unlock_node (rn);
>>> @@ -1136,11 +1140,15 @@ bgp_multiaccess_check_v4 (struct in_addr nexthop,
>>> char *peer)
>>> rn1 = bgp_node_match (bgp_connected_table[AFI_IP], &p1);
>>> if (! rn1)
>>> return 0;
>>> + bgp_unlock_node (rn1);
>>>
>>> rn2 = bgp_node_match (bgp_connected_table[AFI_IP], &p2);
>>> if (! rn2)
>>> return 0;
>>> + bgp_unlock_node (rn2);
>>>
>>> + /* This is safe, even with above unlocks, since we are just
>>> + comparing pointers to the objects, not the objects themselves. */
>>> if (rn1 == rn2)
>>> return 1;
>>>
>>> @@ -1316,6 +1324,9 @@ bgp_scan_init (void)
>>> void
>>> bgp_scan_finish (void)
>>> {
>>> + /* Only the current one needs to be reset. */
>>> + bgp_nexthop_cache_reset (bgp_nexthop_cache_table[AFI_IP]);
>>> +
>>> bgp_table_unlock (cache1_table[AFI_IP]);
>>> cache1_table[AFI_IP] = NULL;
>>>
>>> @@ -1326,6 +1337,9 @@ bgp_scan_finish (void)
>>> bgp_connected_table[AFI_IP] = NULL;
>>>
>>> #ifdef HAVE_IPV6
>>> + /* Only the current one needs to be reset. */
>>> + bgp_nexthop_cache_reset (bgp_nexthop_cache_table[AFI_IP6]);
>>> +
>>> bgp_table_unlock (cache1_table[AFI_IP6]);
>>> cache1_table[AFI_IP6] = NULL;
>>>
>>> diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
>>> index a92ca4e..617f150 100644
>>> --- a/bgpd/bgp_route.c
>>> +++ b/bgpd/bgp_route.c
>>> @@ -6554,7 +6554,10 @@ bgp_show_route_in_table (struct vty *vty, struct
>>> bgp *bgp,
>>> if ((rm = bgp_node_match (table, &match)) != NULL)
>>> {
>>> if (prefix_check && rm->p.prefixlen != match.prefixlen)
>>> - continue;
>>> + {
>>> + bgp_unlock_node (rm);
>>> + continue;
>>> + }
>>>
>>> for (ri = rm->info; ri; ri = ri->next)
>>> {
>>> @@ -6568,6 +6571,8 @@ bgp_show_route_in_table (struct vty *vty, struct
>>> bgp *bgp,
>>> display++;
>>> route_vty_out_detail (vty, bgp, &rm->p, ri, AFI_IP,
>>> SAFI_MPLS_VPN);
>>> }
>>> +
>>> + bgp_unlock_node (rm);
>>> }
>>> }
>>> }
>>> @@ -6591,6 +6596,8 @@ bgp_show_route_in_table (struct vty *vty, struct
>>> bgp *bgp,
>>> route_vty_out_detail (vty, bgp, &rn->p, ri, afi, safi);
>>> }
>>> }
>>> +
>>> + bgp_unlock_node (rn);
>>> }
>>> }
>>>
>>> @@ -11348,41 +11355,49 @@ bgp_clear_damp_route (struct vty *vty, const
>>> char *view_name,
>>>
>>> if ((table = rn->info) != NULL)
>>> if ((rm = bgp_node_match (table, &match)) != NULL)
>>> - if (! prefix_check || rm->p.prefixlen == match.prefixlen)
>>> - {
>>> - ri = rm->info;
>>> - while (ri)
>>> - {
>>> - if (ri->extra && ri->extra->damp_info)
>>> - {
>>> - ri_temp = ri->next;
>>> - bgp_damp_info_free (ri->extra->damp_info, 1);
>>> - ri = ri_temp;
>>> - }
>>> - else
>>> - ri = ri->next;
>>> - }
>>> - }
>>> + {
>>> + if (! prefix_check || rm->p.prefixlen ==
>>> match.prefixlen)
>>> + {
>>> + ri = rm->info;
>>> + while (ri)
>>> + {
>>> + if (ri->extra && ri->extra->damp_info)
>>> + {
>>> + ri_temp = ri->next;
>>> + bgp_damp_info_free (ri->extra->damp_info,
>>> 1);
>>> + ri = ri_temp;
>>> + }
>>> + else
>>> + ri = ri->next;
>>> + }
>>> + }
>>> +
>>> + bgp_unlock_node (rm);
>>> + }
>>> }
>>> }
>>> else
>>> {
>>> if ((rn = bgp_node_match (bgp->rib[afi][safi], &match)) != NULL)
>>> - if (! prefix_check || rn->p.prefixlen == match.prefixlen)
>>> - {
>>> - ri = rn->info;
>>> - while (ri)
>>> - {
>>> - if (ri->extra && ri->extra->damp_info)
>>> - {
>>> - ri_temp = ri->next;
>>> - bgp_damp_info_free (ri->extra->damp_info, 1);
>>> - ri = ri_temp;
>>> - }
>>> - else
>>> - ri = ri->next;
>>> - }
>>> - }
>>> + {
>>> + if (! prefix_check || rn->p.prefixlen == match.prefixlen)
>>> + {
>>> + ri = rn->info;
>>> + while (ri)
>>> + {
>>> + if (ri->extra && ri->extra->damp_info)
>>> + {
>>> + ri_temp = ri->next;
>>> + bgp_damp_info_free (ri->extra->damp_info, 1);
>>> + ri = ri_temp;
>>> + }
>>> + else
>>> + ri = ri->next;
>>> + }
>>> + }
>>> +
>>> + bgp_unlock_node (rn);
>>> + }
>>> }
>>>
>>> return CMD_SUCCESS;
>>> diff --git a/lib/memtypes.c b/lib/memtypes.c
>>> index 05d9322..5902067 100644
>>> --- a/lib/memtypes.c
>>> +++ b/lib/memtypes.c
>>> @@ -108,6 +108,7 @@ struct memory_list memory_list_bgp[] =
>>> { MTYPE_BGP_NODE, "BGP node" },
>>> { MTYPE_BGP_ROUTE, "BGP route" },
>>> { MTYPE_BGP_ROUTE_EXTRA, "BGP ancillary route info" },
>>> + { MTYPE_BGP_CONN, "BGP connected" },
>>> { MTYPE_BGP_STATIC, "BGP static" },
>>> { MTYPE_BGP_ADVERTISE_ATTR, "BGP adv attr" },
>>> { MTYPE_BGP_ADVERTISE, "BGP adv" },
>>>
>>
>>
>> _______________________________________________
>> Quagga-dev mailing listQuagga-dev@lists.quagga.nethttp://lists.quagga.net/mailman/listinfo/quagga-dev
>>
>>
>>
>> _______________________________________________
>> Quagga-dev mailing list
>> Quagga-dev@lists.quagga.net
>> http://lists.quagga.net/mailman/listinfo/quagga-dev
>>
>>
>
>
[Attachment #5 (text/html)]
<meta http-equiv="content-type" content="text/html; charset=utf-8"><div>> Where is \
your branch located (sorry if this is a daft quesion) but seeing I need a release \
with this patch applied ....</div><div><br></div> <a \
href="http://github.com/balajig/quagga-next">http://github.com/balajig/quagga-next</a><div>
<br></div><div><br></div><div><a \
href="http://github.com/balajig/quagga-next"></a>Some how it makes me feel whether my \
efforts are going void :(</div><div><br></div><div>Cheers,</div><div> - \
Balaji<br><div><br></div><div> <a \
href="http://github.com/balajig/quagga-next"></a><br><br><div class="gmail_quote">On \
Wed, Aug 25, 2010 at 7:50 PM, Richard J Palmer <span dir="ltr"><<a \
href="mailto:richard@merula.net">richard@merula.net</a>></span> wrote:<br> \
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex;">
<div bgcolor="#ffffff" text="#000000">
Ah fun...<br>
<br>
Where is your branch located (sorry if this is a daft quesion) but
seeing I need a release with this patch applied ....<div><div></div><div \
class="h5"><br> <br>
On 25/08/2010 15:20, Balaji G wrote:
<blockquote type="cite">Hi Richard
<div><br>
</div>
<div>I am maintaining a separate branch in which lot of patches
are getting merged and once Paul is back he could pull it from
this location and hence i have applied your patch to my branch
which i maintain. I am surprised that a new version has been
released without merging these fixes so it makes me think
whether what i maintain is useless ?</div>
<div><br>
</div>
<div>Cheers,</div>
<div> - Balaji<br>
<br>
<div class="gmail_quote">On Wed, Aug 25, 2010 at 2:24 PM,
Richard J Palmer <span dir="ltr"><<a href="mailto:richard@merula.net" \
target="_blank">richard@merula.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex"> <div \
bgcolor="#ffffff" text="#000000"> Can you confirm if this is in 0.99.17 ?<br>
<br>
I don't see this patch listed on the changelog<br>
<br>
Thanks<br>
<br>
Richard
<div>
<div><br>
<br>
On 28/07/2010 04:45, Balaji G wrote: </div>
</div>
<blockquote type="cite">
<div>
<div>Hi Chris <br>
<br>
Applied. Thanks <br>
<br>
Thanks,<br>
Cheers,<br>
- Balaji<br>
<br>
<div class="gmail_quote">On Tue, Jul 27, 2010 at
9:58 PM, Chris Caputo <span dir="ltr"><<a \
href="mailto:ccaputo@alt.net" target="_blank">ccaputo@alt.net</a>></span> \
wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">Balaji, please \
consider this for quagga-next.<br> <br>
Thanks,<br>
Chris<br>
<br>
---------- Forwarded message ----------<br>
Date: Sun, 25 Apr 2010 20:58:27 +0000 (UTC)<br>
From: Chris Caputo <<a href="mailto:ccaputo@alt.net" \
target="_blank">ccaputo@alt.net</a>><br>
To: <a href="mailto:quagga-dev@lists.quagga.net" \
target="_blank">quagga-dev@lists.quagga.net</a>,
Denis Ovsienko <<a href="mailto:infrastation@yandex.ru" \
target="_blank">infrastation@yandex.ru</a>><br>
Cc: Richard Palmer <<a href="mailto:richard@merula.net" \
target="_blank">richard@merula.net</a>><br> Subject: [quagga-dev 7960] Re: \
[PATCH] bgpd: fix bgp_node locking issues<br>
<br>
I believe the below patch is ready for
inclusion.<br>
<br>
Richard Palmer has been running it for almost 2
months now and it fixed<br>
the assert problem he was experiencing.<br>
<br>
Thanks,<br>
Chris<br>
<br>
---------- Forwarded message ----------<br>
Date: Sat, 27 Feb 2010 04:45:29 +0000 (UTC)<br>
From: Chris Caputo <<a href="mailto:ccaputo@alt.net" \
target="_blank">ccaputo@alt.net</a>><br>
To: Richard Palmer <<a href="mailto:richard@merula.net" \
target="_blank">richard@merula.net</a>><br>
Cc: <a href="mailto:quagga-dev@lists.quagga.net" \
target="_blank">quagga-dev@lists.quagga.net</a><br> Subject: [quagga-dev 7809] \
[PATCH] bgpd: fix bgp_node locking issues<br>
<br>
On Wed, 16 Dec 2009, Richard Palmer wrote:<br>
> i've just had another crash in BGPd same as
all previous:<br>
><br>
> 2009/12/15 11:01:59 BGP: 195.66.224.250
[Error] bgp_read_packet error: Connection reset
by peer<br>
> 2009/12/15 11:04:14 BGP: Assertion
`node->lock > 0' failed in file
bgp_table.c, line 252, function bgp_unlock_node<br>
> 2009/12/15 11:04:14 BGP: Backtrace for 12
stack frames:<br>
> 2009/12/15 11:04:14 BGP: [bt 0]
/usr/local/quagga/lib/libzebra.so.0(zlog_backtrace+0x1f)
[0x1571c5]<br>
> 2009/12/15 11:04:14 BGP: [bt 1]
/usr/local/quagga/lib/libzebra.so.0(_zlog_assert_failed+0x99)
[0x157322]<br>
> 2009/12/15 11:04:14 BGP: [bt 2] ./bgpd
[0xc53dc7]<br>
> 2009/12/15 11:04:14 BGP: [bt 3] ./bgpd
[0xc525d4]<br>
> 2009/12/15 11:04:14 BGP: [bt 4] ./bgpd
[0xc41e85]<br>
> 2009/12/15 11:04:14 BGP: [bt 5] ./bgpd
[0xc42937]<br>
> 2009/12/15 11:04:14 BGP: [bt 6] ./bgpd
[0xc490a3]<br>
> 2009/12/15 11:04:14 BGP: [bt 7]
./bgpd(bgp_read+0xa72) [0xc4aa1d]<br>
> 2009/12/15 11:04:14 BGP: [bt 8]
/usr/local/quagga/lib/libzebra.so.0(thread_call+0x67)
[0x14b799]<br>
> 2009/12/15 11:04:14 BGP: [bt 9]
./bgpd(main+0x418) [0xc230e1]<br>
> 2009/12/15 11:04:14 BGP: [bt 10]
/lib/libc.so.6(__libc_start_main+0xe6)
[0x18b5d6]<br>
> 2009/12/15 11:04:14 BGP: [bt 11] ./bgpd
[0xc22bb1]<br>
><br>
> We have had several of these :(<br>
><br>
> Any thoughts on how we can fix this ?<br>
<br>
Richard,<br>
<br>
I've worked up a fix the crash you have been
experiencing. Please apply<br>
this patch to your 0.99.15 install and report
any problems or success.<br>
Feel free to keep using the diagnostic code I
sent, so you can check your<br>
BGP log for the overflow indication we were
watching for previously. Or<br>
feel free to apply this to a virgin 0.99.15
codebase.<br>
<br>
All,<br>
<br>
I've been working with Richard on his crashing
issue. The problem turned<br>
out to be connected table locks being locked but
not unlocked, such that<br>
eventually a lock would exceed 2^31 and become
negative, thus triggering<br>
an assert later on.<br>
<br>
Patch below...<br>
<br>
Thanks,<br>
Chris<br>
<br>
---<br>
[PATCH] bgpd: fix bgp_node locking issues<br>
<br>
This patch fixes bugs with respect to bgp_node
locking. Also included are<br>
improvements to alloc/free & cleanup code.<br>
<br>
Thanks to Richard Palmer for assistance in
narrowing down the locking<br>
problem on his system, which was regularly
experiencing a lock count<br>
exceeding 2^31 and becoming negative.<br>
<br>
bgpd/bgp_main.c:<br>
- Improve bgp_exit() cleanup code to enable
developers to better detect<br>
leaks.<br>
<br>
bgpd/bgp_nexthop.c:<br>
- Correct use of bgp_node_match() and
bgp_node_lookup() by calling<br>
bgp_unlock_node() as appropriate.<br>
- Track bgp_connected_ref allocations.<br>
- Fix cleanup code by releasing nexthop cache
in bgp_scan_finish().<br>
<br>
bgpd/bgp_route.c:<br>
- Correct use of bgp_node_match() by calling
bgp_unlock_node() as<br>
appropriate.<br>
<br>
lib/memtypes.c:<br>
- Add MTYPE_BGP_CONN.<br>
<br>
Signed-off-by: Chris Caputo <<a \
href="mailto:ccaputo@alt.net" \
target="_blank">ccaputo@alt.net</a>><br>
---<br>
bgpd/bgp_main.c | 10 ++++++-<br>
bgpd/bgp_nexthop.c | 22 ++++++++++++---<br>
bgpd/bgp_route.c | 77
+++++++++++++++++++++++++++++++---------------------<br>
lib/memtypes.c | 1 +<br>
4 files changed, 74 insertions(+), 36
deletions(-)<br>
<br>
diff --git a/bgpd/bgp_main.c b/bgpd/bgp_main.c<br>
index 9d14683..1a460c6 100644<br>
--- a/bgpd/bgp_main.c<br>
+++ b/bgpd/bgp_main.c<br>
@@ -243,7 +243,15 @@ bgp_exit (int status)<br>
if (retain_mode)<br>
if_add_hook (IF_DELETE_HOOK, NULL);<br>
for (ALL_LIST_ELEMENTS (iflist, node, nnode,
ifp))<br>
- if_delete (ifp);<br>
+ {<br>
+ struct listnode *c_node, *c_nnode;<br>
+ struct connected *c;<br>
+<br>
+ for (ALL_LIST_ELEMENTS
(ifp->connected, c_node, c_nnode, c))<br>
+ bgp_connected_delete (c);<br>
+<br>
+ if_delete (ifp);<br>
+ }<br>
list_free (iflist);<br>
<br>
/* reverse bgp_attr_init */<br>
diff --git a/bgpd/bgp_nexthop.c
b/bgpd/bgp_nexthop.c<br>
index 0cde665..719cb96 100644<br>
--- a/bgpd/bgp_nexthop.c<br>
+++ b/bgpd/bgp_nexthop.c<br>
@@ -276,6 +276,8 @@ bgp_nexthop_lookup_ipv6
(struct peer *peer, struct bgp_info *ri, int
*changed,<br>
<br>
if (bnc->metric !=
oldbnc->metric)<br>
bnc->metricchanged = 1;<br>
+<br>
+ bgp_unlock_node (oldrn);<br>
}<br>
}<br>
}<br>
@@ -365,6 +367,8 @@ bgp_nexthop_lookup (afi_t
afi, struct peer *peer, struct bgp_info *ri,<br>
<br>
if (bnc->metric !=
oldbnc->metric)<br>
bnc->metricchanged = 1;<br>
+<br>
+ bgp_unlock_node (oldrn);<br>
}<br>
}<br>
}<br>
@@ -571,7 +575,7 @@ bgp_connected_add (struct
connected *ifc)<br>
}<br>
else<br>
{<br>
- bc = XCALLOC (0, sizeof (struct
bgp_connected_ref));<br>
+ bc = XCALLOC (MTYPE_BGP_CONN, sizeof
(struct bgp_connected_ref));<br>
bc->refcnt = 1;<br>
rn->info = bc;<br>
}<br>
@@ -596,7 +600,7 @@ bgp_connected_add (struct
connected *ifc)<br>
}<br>
else<br>
{<br>
- bc = XCALLOC (0, sizeof (struct
bgp_connected_ref));<br>
+ bc = XCALLOC (MTYPE_BGP_CONN, sizeof
(struct bgp_connected_ref));<br>
bc->refcnt = 1;<br>
rn->info = bc;<br>
}<br>
@@ -636,7 +640,7 @@ bgp_connected_delete (struct
connected *ifc)<br>
bc->refcnt--;<br>
if (bc->refcnt == 0)<br>
{<br>
- XFREE (0, bc);<br>
+ XFREE (MTYPE_BGP_CONN, bc);<br>
rn->info = NULL;<br>
}<br>
bgp_unlock_node (rn);<br>
@@ -662,7 +666,7 @@ bgp_connected_delete (struct
connected *ifc)<br>
bc->refcnt--;<br>
if (bc->refcnt == 0)<br>
{<br>
- XFREE (0, bc);<br>
+ XFREE (MTYPE_BGP_CONN, bc);<br>
rn->info = NULL;<br>
}<br>
bgp_unlock_node (rn);<br>
@@ -1136,11 +1140,15 @@ bgp_multiaccess_check_v4
(struct in_addr nexthop, char *peer)<br>
rn1 = bgp_node_match
(bgp_connected_table[AFI_IP], &p1);<br>
if (! rn1)<br>
return 0;<br>
+ bgp_unlock_node (rn1);<br>
<br>
rn2 = bgp_node_match
(bgp_connected_table[AFI_IP], &p2);<br>
if (! rn2)<br>
return 0;<br>
+ bgp_unlock_node (rn2);<br>
<br>
+ /* This is safe, even with above unlocks,
since we are just<br>
+ comparing pointers to the objects, not the
objects themselves. */<br>
if (rn1 == rn2)<br>
return 1;<br>
<br>
@@ -1316,6 +1324,9 @@ bgp_scan_init (void)<br>
void<br>
bgp_scan_finish (void)<br>
{<br>
+ /* Only the current one needs to be reset. */<br>
+ bgp_nexthop_cache_reset
(bgp_nexthop_cache_table[AFI_IP]);<br>
+<br>
bgp_table_unlock (cache1_table[AFI_IP]);<br>
cache1_table[AFI_IP] = NULL;<br>
<br>
@@ -1326,6 +1337,9 @@ bgp_scan_finish (void)<br>
bgp_connected_table[AFI_IP] = NULL;<br>
<br>
#ifdef HAVE_IPV6<br>
+ /* Only the current one needs to be reset. */<br>
+ bgp_nexthop_cache_reset
(bgp_nexthop_cache_table[AFI_IP6]);<br>
+<br>
bgp_table_unlock (cache1_table[AFI_IP6]);<br>
cache1_table[AFI_IP6] = NULL;<br>
<br>
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c<br>
index a92ca4e..617f150 100644<br>
--- a/bgpd/bgp_route.c<br>
+++ b/bgpd/bgp_route.c<br>
@@ -6554,7 +6554,10 @@ bgp_show_route_in_table
(struct vty *vty, struct bgp *bgp,<br>
if ((rm = bgp_node_match (table,
&match)) != NULL)<br>
{<br>
if (prefix_check &&
rm->p.prefixlen != match.prefixlen)<br>
- continue;<br>
+ {<br>
+ bgp_unlock_node (rm);<br>
+ continue;<br>
+ }<br>
<br>
for (ri = rm->info; ri; ri
= ri->next)<br>
{<br>
@@ -6568,6 +6571,8 @@ bgp_show_route_in_table
(struct vty *vty, struct bgp *bgp,<br>
display++;<br>
route_vty_out_detail (vty,
bgp, &rm->p, ri, AFI_IP, SAFI_MPLS_VPN);<br>
}<br>
+<br>
+ bgp_unlock_node (rm);<br>
}<br>
}<br>
}<br>
@@ -6591,6 +6596,8 @@ bgp_show_route_in_table
(struct vty *vty, struct bgp *bgp,<br>
route_vty_out_detail (vty,
bgp, &rn->p, ri, afi, safi);<br>
}<br>
}<br>
+<br>
+ bgp_unlock_node (rn);<br>
}<br>
}<br>
<br>
@@ -11348,41 +11355,49 @@ bgp_clear_damp_route
(struct vty *vty, const char *view_name,<br>
<br>
if ((table = rn->info) != NULL)<br>
if ((rm = bgp_node_match (table,
&match)) != NULL)<br>
- if (! prefix_check ||
rm->p.prefixlen == match.prefixlen)<br>
- {<br>
- ri = rm->info;<br>
- while (ri)<br>
- {<br>
- if (ri->extra
&& ri->extra->damp_info)<br>
- {<br>
- ri_temp = \
ri->next;<br>
- bgp_damp_info_free
(ri->extra->damp_info, 1);<br>
- ri = ri_temp;<br>
- }<br>
- else<br>
- ri = ri->next;<br>
- }<br>
- }<br>
+ {<br>
+ if (! prefix_check ||
rm->p.prefixlen == match.prefixlen)<br>
+ {<br>
+ ri = rm->info;<br>
+ while (ri)<br>
+ {<br>
+ if (ri->extra
&& ri->extra->damp_info)<br>
+ {<br>
+ ri_temp =
ri->next;<br>
+ bgp_damp_info_free
(ri->extra->damp_info, 1);<br>
+ ri = ri_temp;<br>
+ }<br>
+ else<br>
+ ri = ri->next;<br>
+ }<br>
+ }<br>
+<br>
+ bgp_unlock_node (rm);<br>
+ }<br>
}<br>
}<br>
else<br>
{<br>
if ((rn = bgp_node_match
(bgp->rib[afi][safi], &match)) != NULL)<br>
- if (! prefix_check || rn->p.prefixlen
== match.prefixlen)<br>
- {<br>
- ri = rn->info;<br>
- while (ri)<br>
- {<br>
- if (ri->extra &&
ri->extra->damp_info)<br>
- {<br>
- ri_temp = ri->next;<br>
- bgp_damp_info_free
(ri->extra->damp_info, 1);<br>
- ri = ri_temp;<br>
- }<br>
- else<br>
- ri = ri->next;<br>
- }<br>
- }<br>
+ {<br>
+ if (! prefix_check ||
rn->p.prefixlen == match.prefixlen)<br>
+ {<br>
+ ri = rn->info;<br>
+ while (ri)<br>
+ {<br>
+ if (ri->extra &&
ri->extra->damp_info)<br>
+ {<br>
+ ri_temp = ri->next;<br>
+ bgp_damp_info_free
(ri->extra->damp_info, 1);<br>
+ ri = ri_temp;<br>
+ }<br>
+ else<br>
+ ri = ri->next;<br>
+ }<br>
+ }<br>
+<br>
+ bgp_unlock_node (rn);<br>
+ }<br>
}<br>
<br>
return CMD_SUCCESS;<br>
diff --git a/lib/memtypes.c b/lib/memtypes.c<br>
index 05d9322..5902067 100644<br>
--- a/lib/memtypes.c<br>
+++ b/lib/memtypes.c<br>
@@ -108,6 +108,7 @@ struct memory_list
memory_list_bgp[] =<br>
{ MTYPE_BGP_NODE, "BGP node" \
},<br>
{ MTYPE_BGP_ROUTE, "BGP route" \
},<br>
{ MTYPE_BGP_ROUTE_EXTRA, "BGP ancillary
route info" },<br>
+ { MTYPE_BGP_CONN, "BGP \
connected" },<br>
{ MTYPE_BGP_STATIC, "BGP static" \
},<br>
{ MTYPE_BGP_ADVERTISE_ATTR, "BGP adv attr"
},<br>
{ MTYPE_BGP_ADVERTISE, "BGP adv" \
},<br>
</blockquote>
</div>
<br>
</div>
</div>
<pre><fieldset></fieldset>
_______________________________________________
Quagga-dev mailing list
<a href="mailto:Quagga-dev@lists.quagga.net" \
target="_blank">Quagga-dev@lists.quagga.net</a> <a \
href="http://lists.quagga.net/mailman/listinfo/quagga-dev" \
target="_blank">http://lists.quagga.net/mailman/listinfo/quagga-dev</a> </pre>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
Quagga-dev mailing list<br>
<a href="mailto:Quagga-dev@lists.quagga.net" \
target="_blank">Quagga-dev@lists.quagga.net</a><br>
<a href="http://lists.quagga.net/mailman/listinfo/quagga-dev" \
target="_blank">http://lists.quagga.net/mailman/listinfo/quagga-dev</a><br> <br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div></div>
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic