[prev in list] [next in list] [prev in thread] [next in thread]
List: ssic-linux-devel
Subject: [SSI-devel] Re: [SSI-users] hangs on write()
From: Roger Tsang <roger.tsang () gmail ! com>
Date: 2005-10-02 3:41:40
Message-ID: 498263350510012041o3e576f89m8a05b2b0e511503e () mail ! gmail ! com
[Download RAW message or body]
Hi,
The sync_inodes() backport is missing a CFS test. I've also added some
locking for cfs_setattr()/getattr(). You might want to wait until I post my
new patch.
Roger
On 9/27/05, Roger Tsang <roger.tsang@gmail.com> wrote:
>
> I found a bug in my patch. You'd have to add spin_unlock() in
> cfs_commit_inode() in cluster/ssi/cfs/write.c at approx. line 1325:
>
> spin_lock(&cfs_wreq_lock);
> // res = cfs_scan_commit(inode, &head, idx_start, npages);
> res = cfs_scan_commit(inode, &head, 0, 0);
> spin_unlock(&cfs_wreq_lock);
> if (res) {
>
> Roger
>
>
> On 9/22/05, Roger Tsang <roger.tsang@gmail.com> wrote:
> >
> > The patch is kinda messy as it's a straight diff of my latest work, but
> > it will do. Patch it against the kernel. I can't check-in any of these until
> > CVS is fixed...
> >
> > Roger
> >
> >
> > On 9/22/05, Roger Tsang <roger.tsang@gmail.com > wrote:
> > >
> > > I dunno about sync on same machine, I don't remember now. It's not a
> > > crash, so it's hard to tell. I'll do a few more tests later after work
> > > hours. I'll put out a CFS patch for you to try.
> > >
> > > Roger
> > >
> > >
> > > On 9/22/05, Andy Phillips <Andrew.Phillips@betfair.com > wrote:
> > > >
> > > > Hi,
> > > >
> > > > If you type "sync" on the same machine in another window, does it
> > > > recover?
> > > >
> > > > Any ideas as to the underlying cause?
> > > >
> > > > Andy
> > > >
> > > > On Sat, 2005-09-17 at 14:38 -0400, Roger Tsang wrote:
> > > > > Alright I can reproduce this by doing the large file copy. It gets
> > > >
> > > > > stuck here...
> > > > >
> > > > > Stack traceback for pid 136777
> > > > > 0xc57f3a80 136777 136733 0 0 D 0xc57f3c40 mc
> > > > > EBP EIP Function (args)
> > > > > 0xd8e01cc0 0xc03b4103 schedule+0x2b3
> > > > > 0xd8e01cc8 0xc03b462e io_schedule+0xe (0xc15001b0)
> > > > > 0xd8e01cd4 0xc0136745 sync_page+0x35 (0xc1251ea8, 0x0, 0xc0136710,
> > > >
> > > > > 0xc57f3a80, 0xd8e01d24)
> > > > > 0xd8e01cf4 0xc03b49a9 __wait_on_bit_lock+0x49 (0x2, 0xc1251ea8,
> > > > > 0xc1251ea8, 0x0, 0x0)
> > > > > 0xd8e01d50 0xc0136f5a __lock_page+0x8a (0xd666c8a0, 0x1d838,
> > > > > 0xda0e96a0, 0x1d838, 0x2)
> > > > > 0xd8e01de8 0xc013764b do_generic_mapping_read+0x3db (0xd666c8a0,
> > > > > 0xda0e96e8, 0xda0e96a0, 0xd8e01f14, 0xd8e01e1c)
> > > > > 0xd8e01e38 0xc0137b24 __generic_file_aio_read+0x194 (0xd8e01ed8,
> > > > > 0xd8e01e50, 0x1, 0xd8e01f14, 0x8135998)
> > > > > 0xd8e01e64 0xc0137be2 generic_file_aio_read+0x52 (0xd8e01ed8,
> > > > > 0x8135998, 0x2000, 0x1d838000, 0x0)
> > > > > 0xd8e01ea0 0xc0268f40 __cfs_file_read+0xc0 (0xd8e01ed8, 0x0,
> > > > > 0x8135998, 0x2000, 0xd8e01ed0)
> > > > > 0xd8e01ebc 0xc0268ffe cfs_file_aio_read+0x2e (0xd8e01ed8,
> > > > 0x8135998,
> > > > > 0x2000, 0x1d838000, 0x0)
> > > > > 0xd8e01f64 0xc015561b do_sync_read+0xab (0xda0e96a0, 0x8135998,
> > > > > 0x2000, 0xd8e01fa8, 0x0)
> > > > > 0xd8e01f90 0xc0155758 vfs_read+0xe8 (0xda0e96a0, 0x8135998,
> > > > 0x2000,
> > > > > 0xd8e01fa8, 0x1d838000)
> > > > > 0xd8e01fbc 0xc0155a1b sys_read+0x4b
> > > > > 0xc0103c55 sysenter_past_esp+0x52
> > > > >
> > > > > On 9/17/05, Roger Tsang < roger.tsang@gmail.com > wrote:
> > > > > Okay I ran into this hang just a moment ago while copying a
> > > > > very large file from node 2 to the init node. It hangs at the
> > > > > very end of the file. Then if I do "sync" as you have
> > > > > suggested, the copy completes. I guess next time I see this
> > > > > I'll do a backtrace on the copy process. My guess is it's
> > > > > probably waiting in CFS wait_for_congestion().
> > > > >
> > > > > Have you tried a different IO scheduler? Try deadline if you
> > > > > were using cfq.
> > > > >
> > > > > Roger
> > > > >
> > > > >
> > > > >
> > > > > On 8/25/05, John Byrne < john.l.byrne@hp.com> wrote:
> > > > > Andy Phillips wrote:
> > > > > > Following on;
> > > > > >
> > > > > > It appears that if I remount the file system with
> > > > > the
> > > > > > "sync" option then this problem goes away. But
> > > > > performance
> > > > > > is bad. Shutting down the other node in the cluster
> > > > > does
> > > > > > not seem to affect this at all.
> > > > > >
> > > > > > Would SSI or the CFS cause issues with async i/o?
> > > > > Would
> > > > > > that follow a different path to a normal kernel?
> > > > > >
> > > > > > Andy
> > > > > >
> > > > >
> > > > > There can certainly be bugs and I do note that your
> > > > > hanging is rather
> > > > > large. Maybe that is the cause of the problem. Maybe
> > > > > you could make a
> > > > > simple test case with 256k writes and see if that
> > > > > hangs.
> > > > >
> > > > > John
> > > > >
> > > > >
> > > > > -------------------------------------------------------
> > > > > SF.Net email is Sponsored by the Better Software
> > > > > Conference & EXPO
> > > > > September 19-22, 2005 * San Francisco, CA *
> > > > > Development Lifecycle Practices
> > > > > Agile & Plan-Driven Development * Managing Projects &
> > > > > Teams * Testing & QA
> > > > > Security * Process Improvement & Measurement *
> > > > > http://www.sqe.com/bsce5sf
> > > > > _______________________________________________
> > > > > Ssic-linux-users mailing list
> > > > > Ssic-linux-users@lists.sourceforge.net
> > > > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > ________________________________________________________________________
> > > > > In order to protect our email recipients, Betfair use SkyScan from
> > > > > MessageLabs to scan all Incoming and Outgoing mail for viruses.
> > > > >
> > > > >
> > > > ________________________________________________________________________
> > > > --
> > > > Andy Phillips, FRAS
> > > > Systems Architect, Performance and Test Manager.
> > > > Infrastructure.
> > > >
> > > > Direct Line: 0208 834 8436
> > > >
> > > > Waterfront | Hammersmith Embankment | Chancellors Road London | W6
> > > > 9HP
> > > >
> > > > The information in this e-mail and any attachment is confidential
> > > > and is
> > > > intended only for the named recipient(s). The e-mail may not be
> > > > disclosed or used by any person other than the addressee, nor may it
> > > > be
> > > > copied in any way. If you are not a named recipient please notify
> > > > the
> > > > sender immediately and delete any copies of this message. Any
> > > > unauthorized copying, disclosure or distribution of the material in
> > > > this
> > > > e-mail is strictly forbidden.Any view or opinions presented are
> > > > solely
> > > > those of the author and do not necessarily represent those of
> > > > Betfair.Betfair is the trading name of The Sporting Exchange Limited
> > > > whose registered office is: Waterfront, Hammersmith Embankment,
> > > > Chancellors Road, London W6 9HP. Registered in England with No.
> > > > 3770548.
> > > >
> > >
> > >
> >
> >
>
[Attachment #3 (text/html)]
Hi,<br>
<br>
The sync_inodes() backport is missing a CFS test. I've also added
some locking for cfs_setattr()/getattr(). You might want to wait
until I post my new patch.<br>
<br>
Roger<br>
<div><span class="gmail_quote"><br>
<br>
On 9/27/05, <b class="gmail_sendername">Roger Tsang</b> <<a \
href="mailto:roger.tsang@gmail.com">roger.tsang@gmail.com</a>> \
wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, \
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> I found a bug in my \
patch. You'd have to add spin_unlock() in cfs_commit_inode() in \
cluster/ssi/cfs/write.c at approx. line 1325:<br> <br>
spin_lock(&cfs_wreq_lock);<br>
// res = cfs_scan_commit(inode, &head, idx_start, \
npages);<br> res = cfs_scan_commit(inode, \
&head, 0, 0);<br> \
spin_unlock(&cfs_wreq_lock);<br> if \
(res) {<div><span class="e" id="q_10698b3996336fc6_1"><br><br> Roger<br>
<br>
<br><div><span class="gmail_quote">On 9/22/05, <b class="gmail_sendername">Roger \
Tsang</b> <<a href="mailto:roger.tsang@gmail.com" target="_blank" onclick="return \
top.js.OpenExtLink(window,event,this)">roger.tsang@gmail.com </a>> \
wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, \
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> The patch is kinda messy \
as it's a straight diff of my latest work, but it will do. Patch it against the \
kernel. I can't check-in any of these until CVS is fixed...<br><span>
<br>
Roger</span><div><span><br>
<br>
<br><div><span class="gmail_quote">On 9/22/05, <b class="gmail_sendername">Roger \
Tsang</b> <<a href="mailto:roger.tsang@gmail.com" target="_blank" onclick="return \
top.js.OpenExtLink(window,event,this)">roger.tsang@gmail.com </a>> \
wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, \
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> I dunno about sync on same \
machine, I don't remember now. It's not a crash, so it's hard to tell. \
I'll do a few more tests later after work hours. I'll put out a CFS patch for \
you to try.<br><span> <br>
Roger</span><div><span><br>
<br>
<br><div><span class="gmail_quote">On 9/22/05, <b class="gmail_sendername">Andy \
Phillips</b> <<a href="mailto:Andrew.Phillips@betfair.com" target="_blank" \
onclick="return top.js.OpenExtLink(window,event,this)">Andrew.Phillips@betfair.com \
</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid \
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi,<br><br>If you \
type "sync" on the same machine in another window, does \
it<br>recover?<br><br>Any ideas as to the underlying cause?<br><br>Andy<br><br>On \
Sat, 2005-09-17 at 14:38 -0400, Roger Tsang wrote:<br>
> Alright I can reproduce this by doing the large file copy. It gets
<br>> stuck here...<br>><br>> Stack traceback for pid 136777<br>>
0xc57f3a80 136777
136733 0 0
D 0xc57f3c40 mc<br>> \
EBP EIP Function \
(args)<br>> 0xd8e01cc0 0xc03b4103 schedule+0x2b3<br>> 0xd8e01cc8 0xc03b462e \
io_schedule+0xe (0xc15001b0)<br>> 0xd8e01cd4 0xc0136745 sync_page+0x35 \
(0xc1251ea8, 0x0, 0xc0136710, <br>> 0xc57f3a80, 0xd8e01d24)<br>> 0xd8e01cf4 \
0xc03b49a9 __wait_on_bit_lock+0x49 (0x2, 0xc1251ea8,<br>> 0xc1251ea8, 0x0, \
0x0)<br>> 0xd8e01d50 0xc0136f5a __lock_page+0x8a (0xd666c8a0, 0x1d838,<br>> \
0xda0e96a0, 0x1d838, 0x2) <br>> 0xd8e01de8 0xc013764b \
do_generic_mapping_read+0x3db (0xd666c8a0,<br>> 0xda0e96e8, 0xda0e96a0, \
0xd8e01f14, 0xd8e01e1c)<br>> 0xd8e01e38 0xc0137b24 __generic_file_aio_read+0x194 \
(0xd8e01ed8,<br>> 0xd8e01e50, 0x1, 0xd8e01f14, 0x8135998) <br>> 0xd8e01e64 \
0xc0137be2 generic_file_aio_read+0x52 (0xd8e01ed8,<br>> 0x8135998, 0x2000, \
0x1d838000, 0x0)<br>> 0xd8e01ea0 0xc0268f40 __cfs_file_read+0xc0 (0xd8e01ed8, \
0x0,<br>> 0x8135998, 0x2000, 0xd8e01ed0) <br>> 0xd8e01ebc 0xc0268ffe \
cfs_file_aio_read+0x2e (0xd8e01ed8, 0x8135998,<br>> 0x2000, 0x1d838000, \
0x0)<br>> 0xd8e01f64 0xc015561b do_sync_read+0xab (0xda0e96a0, 0x8135998,<br>> \
0x2000, 0xd8e01fa8, 0x0)<br>> 0xd8e01f90 0xc0155758 vfs_read+0xe8 (0xda0e96a0, \
0x8135998, 0x2000, <br>> 0xd8e01fa8, 0x1d838000)<br>> 0xd8e01fbc 0xc0155a1b \
sys_read+0x4b<br>> 0xc0103c55 \
sysenter_past_esp+0x52<br>><br>> On 9/17/05, Roger Tsang <<a \
href="mailto:roger.tsang@gmail.com" target="_blank" onclick="return \
top.js.OpenExtLink(window,event,this)">
roger.tsang@gmail.com
</a>> wrote:<br>> Okay I ran \
into this hang just a moment ago while copying \
a<br>> very large file from node 2 \
to the init node. It hangs at \
the<br>> very end of the \
file. Then if I do "sync" as you have \
<br>> suggested, the copy \
completes. I guess next time I see \
this<br>> I'll do a backtrace on \
the copy process. My guess is \
it's<br>> probably waiting in CFS \
wait_for_congestion().<br>> \
<br>> Have you tried a different \
IO scheduler? Try deadline if \
you<br>> were using \
cfq.<br>><br>> \
Roger<br>><br>><br>><br>> \
On 8/25/05, John Byrne <<a href="mailto:john.l.byrne@hp.com" target="_blank" \
onclick="return top.js.OpenExtLink(window,event,this)">
john.l.byrne@hp.com</a>> \
wrote:<br>> \
Andy Phillips wrote:<br>> \
> Following on;<br>> \
><br>>
> It appears that if I remount the file system \
with<br>> \
the<br>>
> "sync" option then this problem goes away. \
But<br>> \
performance<br>>
> is bad. Shutting down the other node in the \
cluster<br>> \
does<br>>
> not seem to affect this at \
all.<br>> \
><br>>
> Would SSI or the CFS cause issues with async \
i/o?<br>> \
Would<br>>
> that follow a different path to a normal \
kernel?<br>> \
><br>> \
> Andy<br>> \
><br>><br>>
There can certainly be bugs and I do note that \
your<br>> \
hanging is rather<br>>
large. Maybe that is the cause of the problem. \
Maybe<br>> \
you could make a<br>>
simple test case with 256k writes and see if \
that<br>> \
hangs.<br>><br>> \
John<br>><br>><br>>
-------------------------------------------------------<br>>
SF.Net email is Sponsored by the Better \
Software<br>> \
Conference & EXPO<br>>
September 19-22, 2005 * San Francisco, CA \
*<br>>
Development Lifecycle \
Practices<br>>
Agile & Plan-Driven Development * Managing Projects \
&<br>> \
Teams * Testing & \
QA<br>>
Security * Process Improvement & Measurement \
*<br>> \
<a href="http://www.sqe.com/bsce5sf" target="_blank" onclick="return \
top.js.OpenExtLink(window,event,this)">http://www.sqe.com/bsce5sf</a><br>> &nb \
sp;
_______________________________________________<br>>
Ssic-linux-users mailing \
list<br>>
<a href="mailto:Ssic-linux-users@lists.sourceforge.net" target="_blank" \
onclick="return top.js.OpenExtLink(window,event,this)">Ssic-linux-users@lists.sourcefo \
rge.net</a><br>>
<a href="https://lists.sourceforge.net/lists/listinfo/ssic-linux-users" \
target="_blank" onclick="return \
top.js.OpenExtLink(window,event,this)">https://lists.sourceforge.net/lists/listinfo/ssic-linux-users</a><br>><br>
><br>><br>><br>> \
________________________________________________________________________ <br>> In \
order to protect our email recipients, Betfair use SkyScan from<br>> MessageLabs \
to scan all Incoming and Outgoing mail for viruses.<br>><br>> \
________________________________________________________________________ \
<br>--<br>Andy Phillips, FRAS<br>Systems Architect, Performance and Test \
Manager.<br>Infrastructure.<br><br>Direct Line: 0208 834 8436<br><br>Waterfront | \
Hammersmith Embankment | Chancellors Road London | W6 9HP<br><br>
The information in this e-mail and any attachment is confidential and is
<br>intended only for the named recipient(s). The e-mail may not be<br>disclosed or \
used by any person other than the addressee, nor may it be<br>copied in any way. If \
you are not a named recipient please notify the<br>sender immediately and delete any \
copies of this message. Any <br>unauthorized copying, disclosure or distribution of \
the material in this<br>e-mail is strictly forbidden.Any view or opinions presented \
are solely<br>those of the author and do not necessarily represent those \
of<br>Betfair.Betfair
is the trading name of The Sporting Exchange Limited<br>whose registered office is: \
Waterfront, Hammersmith Embankment,<br>Chancellors Road, London W6 9HP. Registered in \
England with No. 3770548.<br></blockquote></div><br>
</span></div></blockquote></div><br>
</span></div><br clear="all"></blockquote></div><br>
</span></div></blockquote></div><br>
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
ssic-linux-devel mailing list
ssic-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic