[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ms-ospf
Subject:    Re: [Lsr] Multiple failures in Dynamic Flooding
From:       tony.li () tony ! li
Date:       2019-03-11 17:41:08
Message-ID: 10A1CA48-0D09-44FF-95ED-8D52FB867B8B () tony ! li
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi Huaimo,



> In summary for multiple failures, two issues below in \
> draft-li-lsr-dynamyic-flooding are discussed: 1)      how to determine the current \
> flooding topology is split; and 2)      how to repair/connect the flooding topology \
> split. For the first issue, the discussions are still going on.
> For the second issue, repairing/connecting the flooding topology split through \
> Hello protocol extensions does not work.  When a "backup path"/connection of \
> multiple hops is needed to connect/repair the flooding topology split, Hello can \
> not go beyond one hop, thus can not repair the flooding topology split in this \
> case.


You do not try to repair things remotely, they are always repaired locally.  If there \
are multiple failures in the flooding topology and it is partitioned, then it follows \
that there are multiple remaining connected components of the flooding topology.  \
Nodes that are adjacent to the failures will update their LSPs and flood them \
throughout their connected component.  Each component will see at least two link \
failures if there is a partition of the FT and each node in the component can detect \
that the FT has partitioned.  Each node is then capable of enabling temporary \
flooding on one or more links that will traverse the partition, thereby restoring a \
functioning FT.  The Area Leader then recomputes and redistributes the revised FT.

To put it yet another way, repair is fully distributed.  You should like that.  :-)


> > We are not requiring it, but a system could also do a more extensive computation \
> > and compare the links between itself and the neighbor by tracing the path in the \
> > FT and then confirming that each link is up in the LSDB.
> 
> It normally takes a long time such as more than ten minutes to age out and remove \
> an LSP/LSA for the neighbor from the LSDB even though the neighbor is disconnected \
> physically. How can you decide quickly in tens of milliseconds that the flooding \
> topology is disconnected?


You do not wait for LSP/LSA removal.  You look for link changes in the LSPs that you \
do get, or local link changes.


> > As we have discussed, this is not a solution. In fact, this is more dangerous \
> > than anything else that has been proposed and seems highly likely to trigger a \
> > cascade failure. You are enabling full flooding for many nodes.  In dense \
> > topologies, even a radius of 3 is very high.  For example, in a LS topology, a \
> > radius of 3 is sufficient to enable full flooding throughout the entire topology. \
> > If that were stable, we would not need Dynamic Flooding at all.
> 
> This full flooding is enabled only for a very short time.


All it takes is enabling it at sufficient density to create a cascade failure.  \
Milliseconds are sufficient for a collapse.


> How do you get that this is more dangerous than anything else and seems highly \
> likely to trigger a cascade failure? Can you give some explanations in details?


Again, we do not have absolute metrics on what triggers a cascade failure today.  We \
have several data points of several different implementations at different points in \
time.  We know that in the early ‘90s, a full mesh of 20 neighbors running L1L2 was \
sufficient.  Obviously things have changed somewhat, but even more modern \
implementations have had problems.  This is why the MSDC went to BGP.

As a result, we need to be very conservative about what flooding we temporarily \
enable.  We do not want to walk anywhere near the cliff, as the cascade failure is \
fatal to the network.

Tony


[Attachment #5 (unknown)]

<html><head><meta http-equiv="Content-Type" content="text/html; \
charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; \
line-break: after-white-space;" class=""><div class=""><br class=""></div>Hi \
Huaimo,<div class=""><br class=""></div><div class=""><br class=""><div><br \
class=""><blockquote type="cite" class=""><div class=""><span style="color: rgb(0, \
112, 192); font-family: Calibri, sans-serif; font-size: 11pt;" class="">&nbsp; &nbsp; \
In summary for multiple failures, two issues below in draft-li-lsr-dynamyic-flooding \
are discussed:</span></div><div class=""><div class="WordSection1" style="page: \
WordSection1; caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 14px; \
font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: \
normal; text-align: start; text-indent: 0px; text-transform: none; white-space: \
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: \
none;"><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 12pt; font-family: \
&quot;Times New Roman&quot;, serif; text-indent: -0.25in;" class=""><span \
style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(0, 112, 192);" \
class=""><span class="">1)<span style="font-style: normal; font-variant-caps: normal; \
font-weight: normal; font-stretch: normal; font-size: 7pt; line-height: normal; \
font-family: &quot;Times New Roman&quot;;" \
class="">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span \
class="Apple-converted-space">&nbsp;</span></span></span></span><span \
style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(0, 112, 192);" \
class="">how to determine the current flooding topology is split; and<o:p \
class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: \
12pt; font-family: &quot;Times New Roman&quot;, serif; text-indent: -0.25in;" \
class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: \
rgb(0, 112, 192);" class=""><span class="">2)<span style="font-style: normal; \
font-variant-caps: normal; font-weight: normal; font-stretch: normal; font-size: 7pt; \
line-height: normal; font-family: &quot;Times New Roman&quot;;" \
class="">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span \
class="Apple-converted-space">&nbsp;</span></span></span></span><span \
style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(0, 112, 192);" \
class="">how to repair/connect the flooding topology split.<o:p \
class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span style="font-size: \
11pt; font-family: Calibri, sans-serif; color: rgb(0, 112, 192);" class="">For the \
first issue, the discussions are still going on.<o:p class=""></o:p></span></div><div \
style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: &quot;Times New \
Roman&quot;, serif;" class=""><span style="font-size: 11pt; font-family: Calibri, \
sans-serif; color: rgb(0, 112, 192);" class="">For the second issue, \
repairing/connecting the flooding topology split through Hello protocol extensions \
does not work.&nbsp; When a "backup path"/connection of multiple hops is needed to \
connect/repair the flooding topology split, Hello can not go beyond one hop, thus can \
not repair the flooding topology split in this \
case.</span></div></div></div></blockquote><div><br class=""></div><div><br \
class=""></div><div>You do not try to repair things remotely, they are always \
repaired locally. &nbsp;If there are multiple failures in the flooding topology and \
it is partitioned, then it follows that there are multiple remaining connected \
components of the flooding topology. &nbsp;Nodes that are adjacent to the failures \
will update their LSPs and flood them throughout their connected component. \
&nbsp;Each component will see at least two link failures if there is a partition of \
the FT and each node in the component can detect that the FT has partitioned. \
&nbsp;Each node is then capable of enabling temporary flooding on one or more links \
that will traverse the partition, thereby restoring a functioning FT. &nbsp;The Area \
Leader then recomputes and redistributes the revised FT.</div><div><br \
class=""></div><div>To put it yet another way, repair is fully distributed. &nbsp;You \
should like that. &nbsp;:-)</div><div><br class=""></div><div><br \
class=""></div><blockquote type="cite" class=""><div class=""><div \
class="WordSection1" style="page: WordSection1; caret-color: rgb(0, 0, 0); \
font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: \
normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: \
0px; text-transform: none; white-space: normal; word-spacing: 0px; \
-webkit-text-stroke-width: 0px; text-decoration: none;"><div style="margin: 0in 0in \
0.0001pt; font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" \
class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: \
rgb(0, 112, 192);" class=""><o:p class=""></o:p></span></div><div style="margin: 0in \
0in 0.0001pt; font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" \
class=""><span class="" style="font-size: 11pt; font-family: Calibri, \
sans-serif;">&gt;</span><span style="font-size: 12pt;" class="">We are not requiring \
it, but a system could also do a more extensive computation and compare the links \
between itself and the neighbor</span></div><div style="margin: 0in 0in 0.0001pt; \
font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" class=""><o:p \
class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span style="font-size: \
11pt; font-family: Calibri, sans-serif;" class="">&gt;</span>by tracing the path in \
the FT and then confirming that each link is up in the LSDB.<o:p \
class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><o:p \
class="">&nbsp;</o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span style="color: rgb(0, \
112, 192);" class="">It normally takes a long time such as more than ten minutes to \
age out and remove an LSP/LSA for the neighbor from the LSDB even though the neighbor \
is disconnected physically.<o:p class=""></o:p></span></div><div style="margin: 0in \
0in 0.0001pt; font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" \
class=""><span style="color: rgb(0, 112, 192);" class="">How can you decide quickly \
in tens of milliseconds that the flooding topology is \
disconnected?</span></div></div></div></blockquote><div><br class=""></div><div><br \
class=""></div><div>You do not wait for LSP/LSA removal. &nbsp;You look for link \
changes in the LSPs that you do get, or local link changes.</div><div><br \
class=""></div><br class=""><blockquote type="cite" class=""><div \
class="WordSection1" style="page: WordSection1; caret-color: rgb(0, 0, 0); \
font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: \
normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: \
0px; text-transform: none; white-space: normal; word-spacing: 0px; \
-webkit-text-stroke-width: 0px; text-decoration: none;"><div style="margin: 0in 0in \
0.0001pt; font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" \
class=""><span style="color: rgb(0, 112, 192);" class=""><o:p \
class=""></o:p></span></div><div style="margin: 0in 0.5in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span class="" \
style="font-size: 11pt; font-family: Calibri, sans-serif;">&gt;</span><span \
style="font-size: 12pt;" class="">As we have discussed, this is not a solution. In \
fact, this is more dangerous than anything else that has been proposed \
and</span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: \
&quot;Times New Roman&quot;, serif;" class=""><o:p class=""></o:p></div><div \
style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: &quot;Times New \
Roman&quot;, serif;" class=""><span style="font-size: 11pt; font-family: Calibri, \
sans-serif;" class="">&gt;</span>seems highly likely to trigger a cascade failure. \
You are enabling full flooding for many nodes. &nbsp;In dense topologies, even<o:p \
class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span style="font-size: \
11pt; font-family: Calibri, sans-serif;" class="">&gt;</span>a radius of 3 is very \
high. &nbsp;For example, in a LS topology, a radius of 3 is sufficient to enable full \
flooding throughout the<o:p class=""></o:p></div><div style="margin: 0in 0in \
0.0001pt; font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" \
class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif;" \
class="">&gt;</span>entire topology. If that were stable, we would not need Dynamic \
Flooding at all.<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; \
font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" class=""><o:p \
class="">&nbsp;</o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span style="color: rgb(0, \
112, 192);" class="">This full flooding is enabled only for a very short \
time.</span></div></div></blockquote><div><br class=""></div><div><br \
class=""></div><div>All it takes is enabling it at sufficient density to create a \
cascade failure. &nbsp;Milliseconds are sufficient for a collapse.</div><div><br \
class=""></div><br class=""><blockquote type="cite" class=""><div \
class="WordSection1" style="page: WordSection1; caret-color: rgb(0, 0, 0); \
font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: \
normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: \
0px; text-transform: none; white-space: normal; word-spacing: 0px; \
-webkit-text-stroke-width: 0px; text-decoration: none;"><div style="margin: 0in 0in \
0.0001pt; font-size: 12pt; font-family: &quot;Times New Roman&quot;, serif;" \
class=""><span style="color: rgb(0, 112, 192);" class=""><o:p \
class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: &quot;Times New Roman&quot;, serif;" class=""><span style="color: rgb(0, \
112, 192);" class="">How do you get that this is more dangerous than anything else \
and seems highly likely to trigger a cascade failure? Can you give some explanations \
in details?<br class=""></span></div></div></blockquote><div><br \
class=""></div><div><br class=""></div><div>Again, we do not have absolute metrics on \
what triggers a cascade failure today. &nbsp;We have several data points of several \
different implementations at different points in time. &nbsp;We know that in the \
early ‘90s, a full mesh of 20 neighbors running L1L2 was sufficient. \
&nbsp;Obviously things have changed somewhat, but even more modern implementations \
have had problems. &nbsp;This is why the MSDC went to BGP.</div><div><br \
class=""></div><div>As a result, we need to be very conservative about what flooding \
we temporarily enable. &nbsp;We do not want to walk anywhere near the cliff, as the \
cascade failure is fatal to the network.</div><div><br \
class=""></div></div><div>Tony</div><div><br class=""></div><br \
class=""></div></body></html>



_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic