[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drbd-dev
Subject:    Re: [Drbd-dev] [DRBD-user] A question about drbd hang in conn_disconnect
From:       David Pfarrer <dpfarrer () gnubio ! com>
Date:       2015-04-16 13:55:39
Message-ID: CAFGHznKdrwop8tywQuEKEFTR0yD2Dsw5LpSmJw3yvfinMUaCHA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


unsubscribe

-------------------------------------------------------------
David Pfarrer| Sr IT Support Tech | Bio-Rad Laboratories, Inc.
1 Kendall Square, Suite B14201
Cambridge, MA 02139
E-Mail: dpfarrer@gnubio.com
*TEL*: *617-500-1838*
-------------------------------------------------------------

On Thu, Apr 16, 2015 at 9:46 AM, Lars Ellenberg <lars.ellenberg@linbit.com>
wrote:

> On Mon, Apr 13, 2015 at 05:45:25PM -0600, Fang Sun wrote:
> > Hi drbd developers,
> >
> > After some research and tests I feel I found the reason of this problem
> and
> > a possible fix in drbd.
> > Would you please check if my theory is correct?
> >
> >
> > Let me use 8.4.6 as the code base when I explain it.
> > When conn_disconnect hang it is hanging at line drbd_receiver.c:5178
> > static int drbd_disconnected(struct drbd_peer_device *peer_device)
> > {
> > ........
> > wait_event(device->misc_wait, !test_bit(BITMAP_IO, &device->flags));
> > }
> >
> > The reason is device has flag BITMAP_IO set.
> >
> >
> > The reason why flag BITMAP_IO is set and not clear is:
> > Disk state changes when network is disconnected and after_state_ch is
> > called.
> >
> > At drbd_state.c line 1949 drbd_queue_bitmap_io is called
> inafter_state_ch()
> > .
> >
> > I think the real reason is in drbd_queue_bitmap_io. drbd_main.c line
> 3641.
> > void drbd_queue_bitmap_io(struct drbd_device *device,
> >   int (*io_fn)(struct drbd_device *),
> >   void (*done)(struct drbd_device *, int),
> >   char *why, enum bm_flag flags)
> > {
> > .........
> > set_bit(BITMAP_IO, &device->flags);
> > if (atomic_read(&device->ap_bio_cnt) == 0) {
> > if (!test_and_set_bit(BITMAP_IO_QUEUED, &device->flags))
> > drbd_queue_work(&first_peer_device(device)->connection->sender_work,
> > &device->bm_io_work.w);
> > }
> > ........
> > }
> >
> > In the code the only code to clear BITMAP_IO is in
> > device->bm_io_work.w(w_bitmap_io). But when
> > atomic_read(&device->ap_bio_cnt) != 0 the flag BITMAP_IO  is set, however
> > bm_io_work.w is not called.
> > Then drbd_disconnected() is blocked.
> >
> > Should we move set_bit(BITMAP_IO, &device->flags) to the front of
> > drbd_queue_work()?
>
> No.  That would be the wrong fix,
> and cause potential inconsistencies later.
>
> It may need to be fixed, but in a different way.
>
> Let me (reproduce locally ... and) think about that for a bit.
>
> Thanks,
>
> --
> : Lars Ellenberg
> : http://www.LINBIT.com | Your Way to High Availability
> : DRBD, Linux-HA  and  Pacemaker support and consulting
>
> DRBD ® and LINBIT ® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

[Attachment #5 (text/html)]

<div dir="ltr">unsubscribe<br></div><div class="gmail_extra"><br \
clear="all"><div><div class="gmail_signature"><div \
dir="ltr"><div><div>-------------------------------------------------------------<br></div><font \
size="1" color="#808080" face="Verdana">David Pfarrer| Sr IT Support Tech \
</font><font size="1" color="#808080" face="Verdana">| Bio-Rad Laboratories, Inc.<br> \
1 Kendall Square, Suite B14201</font><font style="font-family:arial,sans-serif" \
size="3" color="#8f8f8f">  </font><font size="1" color="#808080" \
face="Verdana"><br>Cambridge, MA 02139</font><font \
style="font-family:arial,sans-serif" size="3" color="#8f8f8f">  </font><font size="1" \
                color="#808080" face="Verdana"><br>
E-Mail: <a href="mailto:dpfarrer@gnubio.com" \
target="_blank">dpfarrer@gnubio.com</a></font><font \
style="font-family:arial,sans-serif" size="3" color="#8f8f8f">  </font><font size="1" \
color="#808080" face="Arial"><b><br>TEL</b></font><font size="1" color="#808080" \
face="Verdana">:  </font><font size="1" color="#0000ff" \
face="Verdana"><u>617-500-1838</u></font><font style="font-family:arial,sans-serif" \
size="3" color="#8f8f8f"> \
<br></font></div>-------------------------------------------------------------<br></div></div></div>
 <br><div class="gmail_quote">On Thu, Apr 16, 2015 at 9:46 AM, Lars Ellenberg <span \
dir="ltr">&lt;<a href="mailto:lars.ellenberg@linbit.com" \
target="_blank">lars.ellenberg@linbit.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">On Mon, Apr 13, 2015 at 05:45:25PM -0600, Fang Sun wrote:<br> \
&gt; Hi drbd developers,<br> &gt;<br>
&gt; After some research and tests I feel I found the reason of this problem and<br>
&gt; a possible fix in drbd.<br>
&gt; Would you please check if my theory is correct?<br>
&gt;<br>
&gt;<br>
&gt; Let me use 8.4.6 as the code base when I explain it.<br>
&gt; When conn_disconnect hang it is hanging at line drbd_receiver.c:5178<br>
&gt; static int drbd_disconnected(struct drbd_peer_device *peer_device)<br>
&gt; {<br>
&gt; ........<br>
&gt; wait_event(device-&gt;misc_wait, !test_bit(BITMAP_IO, \
&amp;device-&gt;flags));<br> &gt; }<br>
&gt;<br>
&gt; The reason is device has flag BITMAP_IO set.<br>
&gt;<br>
&gt;<br>
&gt; The reason why flag BITMAP_IO is set and not clear is:<br>
&gt; Disk state changes when network is disconnected and after_state_ch is<br>
&gt; called.<br>
&gt;<br>
&gt; At drbd_state.c line 1949 drbd_queue_bitmap_io is called inafter_state_ch()<br>
&gt; .<br>
&gt;<br>
&gt; I think the real reason is in drbd_queue_bitmap_io. drbd_main.c line 3641.<br>
&gt; void drbd_queue_bitmap_io(struct drbd_device *device,<br>
&gt;     int (*io_fn)(struct drbd_device *),<br>
&gt;     void (*done)(struct drbd_device *, int),<br>
&gt;     char *why, enum bm_flag flags)<br>
&gt; {<br>
&gt; .........<br>
&gt; set_bit(BITMAP_IO, &amp;device-&gt;flags);<br>
&gt; if (atomic_read(&amp;device-&gt;ap_bio_cnt) == 0) {<br>
&gt; if (!test_and_set_bit(BITMAP_IO_QUEUED, &amp;device-&gt;flags))<br>
&gt; drbd_queue_work(&amp;first_peer_device(device)-&gt;connection-&gt;sender_work,<br>
 &gt; &amp;device-&gt;bm_io_work.w);<br>
&gt; }<br>
&gt; ........<br>
&gt; }<br>
&gt;<br>
&gt; In the code the only code to clear BITMAP_IO is in<br>
&gt; device-&gt;bm_io_work.w(w_bitmap_io). But when<br>
&gt; atomic_read(&amp;device-&gt;ap_bio_cnt) != 0 the flag BITMAP_IO   is set, \
however<br> &gt; bm_io_work.w is not called.<br>
&gt; Then drbd_disconnected() is blocked.<br>
&gt;<br>
&gt; Should we move set_bit(BITMAP_IO, &amp;device-&gt;flags) to the front of<br>
&gt; drbd_queue_work()?<br>
<br>
No.   That would be the wrong fix,<br>
and cause potential inconsistencies later.<br>
<br>
It may need to be fixed, but in a different way.<br>
<br>
Let me (reproduce locally ... and) think about that for a bit.<br>
<br>
Thanks,<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
> Lars Ellenberg<br>
> <a href="http://www.LINBIT.com" target="_blank">http://www.LINBIT.com</a> | Your \
> Way to High Availability<br>
> DRBD, Linux-HA   and   Pacemaker support and consulting<br>
<br>
DRBD ® and LINBIT ® are registered trademarks of LINBIT, Austria.<br>
__<br>
please don&#39;t Cc me, but send to list     --     I&#39;m subscribed<br>
_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" \
target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br> \
</font></span></blockquote></div><br></div>



_______________________________________________
drbd-dev mailing list
drbd-dev@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic