'Re: XFS handling of synchronous buffers in case of EIO error'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-xfs
Subject:    Re: XFS handling of synchronous buffers in case of EIO error
From:       Ajeet Yadav <ajeet.yadav.77 () gmail ! com>
Date:       2010-12-31 6:59:12
Message-ID: AANLkTi=OK4uGx5476ro8W47icu685gvQea43rNHozPKS () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Dear Dave,

Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
But I have seen similar behaviour in another post related to process ls hang
in 2.6.35.9
*

http://oss.sgi.com/pipermail/xfs/2010-December/048691.html

*I have always seen the hang problem comes only if comes when b_relse !=
NULL, and b_hold > 2

I have made below workaround it solved the problem in our case because when
USB is removed we know we get EIO error.

But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()
considering  XBF_LOCK flow path.

@@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
                        /* We actually overwrite the existing b-relse
                           function at times, but we're gonna be shutting
down
                           anyway. */
-                       XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
-                       XFS_BUF_DONE(bp);
-                       XFS_BUF_FINISH_IOWAIT(bp);
+                       if (XFS_BUF_GETERROR(bp) == EIO){
+                               ASSERT(XFS_BUF_TARGET(bp) ==
mp->m_ddev_targp);
+                               XFS_BUF_SUPER_STALE(bp);
+                               trace_xfs_buf_item_iodone(bp, _RET_IP_);
+                               xfs_buf_do_callbacks(bp, lip);
+                               XFS_BUF_SET_FSPRIVATE(bp, NULL);
+                               XFS_BUF_CLR_IODONE_FUNC(bp);
+                               xfs_biodone(bp);
+                       } else {
+
XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
+                               XFS_BUF_DONE(bp);
+                               XFS_BUF_FINISH_IOWAIT(bp);
+                       }
                }
                return;
        }



 Dec 31, 2010 at 4:43 AM, Dave Chinner <david@fromorbit.com> wrote:

> On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote:
> > Kernel: 2.6.30.9
> >
> > I am trouble shooting a hang in XFS during umount.
> > Test scenerio: Copy large number of files files using below script, and
> > remove the USB after 3-5 second
>
> FWIW, in future can you please report what kernel you are testing on?
>
> >
> > index=0
> > while [ "$?" == 0 ]
> > do
> >         index=$((index+1))
> >         sync
> >         cp $1/1KB.txt $2/"$index".test
> > done
> >
> > In rare scenerio during USB unplug the umount process hang at
> xfs_buf_lock.
> > Below log shows the hung process
> >
> > We have put printk to buffer handling functions
> xfs_buf_iodone_callbacks(),
> > xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()
> >
> > We always observed the hang only comes when bp->b_relse =
> > xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute
> > XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > XFS_BUF_DONE(bp);
> > XFS_BUF_FINISH_IOWAIT(bp);
> >
> >  buf its never called by xfs_buf_relse() because b_hold = 3.
> >
> > Also we have seen that this problem always comes when bp->relse != NULL
> &&
> > bp->hold > 1.
>
> This appears to be the same problem as reported here:
>
> http://oss.sgi.com/archives/xfs/2010-12/msg00380.html
>
>
> > I do not know whether below prints will help you, but I have taken printk
> > for super block buffer tracing
> > S-functionname ( Start of function)
> > E-functionname (End of function)
>
> If you have a recent enough kernel, you can get all this information
> from the tracing built into XFS.
>
> As it is, the cause of the problem is that setting bp->b_relse
> changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it
> doesn't unlock the buffer. This is normally just fine, because
> xfs_buf_rele() has a special case to handle buffers with
> bp->b_relse(), which adds a hold count and call the release function
> when the hold count drops to zero. The b_relse function is supposed
> to unlock the buffer by calling xfs_buf_relse() again.
>
> Unfortunately, the superblock buffer is special - the hold count on
> it never drops to zero until very late in the unmont process because
> it is managed by the filesystem.  Hence the bp->b_relse function is
> never called, and hence the buffer is never unlocked in this case.
> Hence future attempts to access it hang.
>
> I'll need to think about this one for a bit...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[Attachment #5 (text/html)]

<div>Dear Dave,</div>
<div> </div>
<div>Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.</div>
<div>But I have seen similar behaviour in another post related to process ls hang in \
2.6.35.9</div> <div><u><font color="#0000ff" size="2"><font color="#0000ff" size="2">
<p><a href="http://oss.sgi.com/pipermail/xfs/2010-December/048691.html">http://oss.sgi.com/pipermail/xfs/2010-December/048691.html</a></p>
 <p></p></font></font></u>I have always seen the hang problem comes only if comes \
when b_relse != NULL, and b_hold &gt; 2  <p>I have made below workaround it solved \
the problem in our case because when USB is removed we know we get EIO error.</p> \
<p>But I think we need to review xfs_buf_error_relse() and xfs_buf_relse() \
considering  XBF_LOCK flow path.</p> <p>@@ -1047,9 +1047,19 @@ \
xfs_buf_iodone_callbacks(<br>                        /* We actually overwrite the \
existing b-relse<br>                           function at times, but we&#39;re gonna \
                be shutting down<br>                           anyway. */<br>
-                       XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>-         \
XFS_BUF_DONE(bp);<br>-                       XFS_BUF_FINISH_IOWAIT(bp);<br>+          \
if (XFS_BUF_GETERROR(bp) == EIO){<br> +                               \
ASSERT(XFS_BUF_TARGET(bp) == mp-&gt;m_ddev_targp);<br>+                               \
XFS_BUF_SUPER_STALE(bp);<br>+                               \
trace_xfs_buf_item_iodone(bp, _RET_IP_);<br> +                               \
xfs_buf_do_callbacks(bp, lip);<br>+                               \
XFS_BUF_SET_FSPRIVATE(bp, NULL);<br>+                               \
XFS_BUF_CLR_IODONE_FUNC(bp);<br>+                               xfs_biodone(bp);<br> \
+                       } else {<br>+                               \
XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>+                               \
XFS_BUF_DONE(bp);<br>+                               XFS_BUF_FINISH_IOWAIT(bp);<br> + \
}<br>                }<br>                return;<br>        }</p> <p> </p>
<p> Dec 31, 2010 at 4:43 AM, Dave Chinner <span dir="ltr">&lt;<a \
href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a>&gt;</span> \
wrote:<br></p></div> <div class="gmail_quote">
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; \
PADDING-LEFT: 1ex" class="gmail_quote"> <div>On Thu, Dec 30, 2010 at 05:58:36PM \
+0530, Ajeet Yadav wrote:<br>&gt; Kernel: 2.6.30.9<br>&gt;<br>&gt; I am trouble \
shooting a hang in XFS during umount.<br>&gt; Test scenerio: Copy large number of \
files files using below script, and<br> &gt; remove the USB after 3-5 \
second<br><br></div>FWIW, in future can you please report what kernel you are testing \
on?<br> <div><br>&gt;<br>&gt; index=0<br>&gt; while [ &quot;$?&quot; == 0 ]<br>&gt; \
do<br>&gt;         index=$((index+1))<br>&gt;         sync<br>&gt;         cp \
$1/1KB.txt $2/&quot;$index&quot;.test<br>&gt; done<br>&gt;<br>&gt; In rare scenerio \
during USB unplug the umount process hang at xfs_buf_lock.<br> &gt; Below log shows \
the hung process<br>&gt;<br>&gt; We have put printk to buffer handling functions \
xfs_buf_iodone_callbacks(),<br>&gt; xfs_buf_error_relse(), xfs_buf_relse() and \
xfs_buf_rele()<br>&gt;<br>&gt; We always observed the hang only comes when \
bp-&gt;b_relse =<br> &gt; xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() \
execute<br>&gt; XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>&gt; \
XFS_BUF_DONE(bp);<br>&gt; XFS_BUF_FINISH_IOWAIT(bp);<br>&gt;<br>&gt;  buf its never \
called by xfs_buf_relse() because b_hold = 3.<br> &gt;<br>&gt; Also we have seen that \
this problem always comes when bp-&gt;relse != NULL &amp;&amp;<br>&gt; bp-&gt;hold \
&gt; 1.<br><br></div>This appears to be the same problem as reported here:<br><br><a \
href="http://oss.sgi.com/archives/xfs/2010-12/msg00380.html" \
target="_blank">http://oss.sgi.com/archives/xfs/2010-12/msg00380.html</a><br>

<div><br><br>&gt; I do not know whether below prints will help you, but I have taken \
printk<br>&gt; for super block buffer tracing<br>&gt; S-functionname ( Start of \
function)<br>&gt; E-functionname (End of function)<br><br> </div>If you have a recent \
enough kernel, you can get all this information<br>from the tracing built into \
XFS.<br><br>As it is, the cause of the problem is that setting \
bp-&gt;b_relse<br>changes the behaviour of xfs_buf_relse() - if bp-&gt;b_relse is \
set, it<br> doesn&#39;t unlock the buffer. This is normally just fine, \
because<br>xfs_buf_rele() has a special case to handle buffers \
with<br>bp-&gt;b_relse(), which adds a hold count and call the release \
function<br>when the hold count drops to zero. The b_relse function is supposed<br> \
to unlock the buffer by calling xfs_buf_relse() again.<br><br>Unfortunately, the \
superblock buffer is special - the hold count on<br>it never drops to zero until very \
late in the unmont process because<br>it is managed by the filesystem.  Hence the \
bp-&gt;b_relse function is<br> never called, and hence the buffer is never unlocked \
in this case.<br>Hence future attempts to access it hang.<br><br>I&#39;ll need to \
think about this one for a bit...<br><br>Cheers,<br><br>Dave.<br><font \
color="#888888">--<br> Dave Chinner<br><a href="mailto:david@fromorbit.com" \
target="_blank">david@fromorbit.com</a><br></font></blockquote></div><br>



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs


[prev in list] [next in list] [prev in thread] [next in thread]