[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ssic-linux-devel
Subject:    [SSI-devel] SSI-1.9 node hang while balancing write cache
From:       Roger Tsang <roger.tsang () gmail ! com>
Date:       2005-10-15 12:16:05
Message-ID: 498263350510150516o4db3f761xeb6ee077a4d83813 () mail ! gmail ! com
[Download RAW message or body]

>
> * CFS hang while remote copying very large files. (believed to be fixed)
>

This one just hit again. I think this has to with CFS writeback somewhere. A
lot of processes with the same backtrace stuck waiting for IO. The condition
is cleared by doing a `sync` on the node that is experiencing this. Similar
to what Andy Philips originally reported with getting stuck while doing
large writes.

These processes were stuck in an infinite for loop in balance_dirty_pages().
It means the if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh) test
always fails. This could happen when !nr_reclaimable and nr_writeback >
dirty_thresh. Doing a `sync` fixes the problem because it flushes the entire
writeback. Any more ideas?

-Roger


0xd8c6c550 160716 160712 0 0 D 0xd8c6c710 httpd
EBP EIP Function (args)
0xd802bbb8 0xc03b6b23 schedule+0x2b3 (0xd802bbcc, 0x268b84b3, 0xc011fb15,
0xc0617c40, 0xcc9bfbcc)
0xd802bbf4 0xc03b7126 schedule_timeout+0x76
0xd802bbfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xd8c6c550,
0xc01305f0, 0xd802bc34)
0xd802bc54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xd802bc74,
0xc04afb24, 0x1)
0xd802bcc4 0xc013f593 balance_dirty_pages+0x93 (0xd8508660)
0xd802bcd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xd8508660,
0xc1499030, 0xcd3, 0x81, 0xd54)
0xd802bd80 0xc013bc1e generic_file_buffered_write+0x2ee (0xd802bed8,
0xd802be50, 0x1, 0xddbd54, 0x0)
0xd802bdf8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xd802bed8,
0xd802be50, 0x1, 0xd802bf14, 0xd8508548)
0xd802be2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xd802bed8,
0xd802be50, 0x1, 0xd802bf14, 0x0)
0xd802be64 0xc013c715 generic_file_aio_write+0x75 (0xd802bed8, 0xbfffe7e0,
0x81, 0xddbcd3, 0x0)
0xd802bea0 0xc026c28a __cfs_file_write+0xda (0xd802bed8, 0x0, 0xbfffe7e0,
0x81, 0xd802bed0)
0xd802bebc 0xc026c33e cfs_file_aio_write+0x2e (0xd802bed8, 0xbfffe7e0, 0x81,
0xddbcd3, 0x0)
0xd802bf64 0xc015854b do_sync_write+0xab (0xdcf4e920, 0xbfffe7e0, 0x81,
0xd802bfa8, 0x558004)
0xd802bf90 0xc0158688 vfs_write+0xe8 (0xdcf4e920, 0xbfffe7e0, 0x81,
0xd802bfa8, 0xddbcd3)
0xd802bfbc 0xc015879b sys_write+0x4b
0xc0103c65 sysenter_past_esp+0x52

0xc8237550 133910 133897 0 0 D 0xc8237710 ndbd
EBP EIP Function (args)
0xc81d3bb8 0xc03b6b23 schedule+0x2b3 (0xc81d3bcc, 0x268aeeb5, 0x1,
0xc8de7bcc, 0xcaac3bcc)
0xc81d3bf4 0xc03b7126 schedule_timeout+0x76
0xc81d3bfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xc8237550,
0xc01305f0, 0xc81d3c34)
0xc81d3c54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xc81d3c74,
0xdf8f7684, 0x2e)
0xc81d3cc4 0xc013f593 balance_dirty_pages+0x93 (0xdebe61e0)
0xc81d3cd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xdebe61e0,
0xc13540d0, 0x0, 0x1000, 0x1000)
0xc81d3d80 0xc013bc1e generic_file_buffered_write+0x2ee (0xc81d3ed8,
0xc81d3e50, 0x1, 0x6c6000, 0x0)
0xc81d3df8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xc81d3ed8,
0xc81d3e50, 0x1, 0xc81d3f14, 0xdebe60c8)
0xc81d3e2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xc81d3ed8,
0xc81d3e50, 0x1, 0xc81d3f14, 0x0)
0xc81d3e64 0xc013c715 generic_file_aio_write+0x75 (0xc81d3ed8, 0xa768a008,
0x8000, 0x6c0000, 0x0)
0xc81d3ea0 0xc026c28a __cfs_file_write+0xda (0xc81d3ed8, 0x0, 0xa768a008,
0x8000, 0xc81d3ed0)
0xc81d3ebc 0xc026c33e cfs_file_aio_write+0x2e (0xc81d3ed8, 0xa768a008,
0x8000, 0x6c0000, 0x0)
0xc81d3f64 0xc015854b do_sync_write+0xab (0x0, 0xc8237550, 0x0, 0x0,
0x6c0000)
0xc0363700 ip_rcv_finish (0xd446cce0, 0xa768a008, 0x8000, 0xc81d3fa8, 0x8)
0xc0158688 vfs_write+0xe8 (0xd446cce0, 0xa768a008, 0x8000, 0xc81d3fa8,
0x6c0000)
0xc81d3fbc 0xc015879b sys_write+0x4b
0xc0103c65 sysenter_past_esp+0x52

0xdfeaeaa0 133079 1 0 0 D 0xdfeaec60 ntpd
EBP EIP Function (args)
0xdcdd9bb8 0xc03b6b23 schedule+0x2b3 (0xdcdd9bcc, 0x26868821, 0xdcdd9c28,
0xdc4bbbcc, 0xdd541b5c)
0xdcdd9bf4 0xc03b7126 schedule_timeout+0x76
0xdcdd9bfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xdfeaeaa0,
0xc01305f0, 0xdcdd9c34)
0xdcdd9c54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xdcdd9c74,
0xc04afb24, 0x5)
0xdcdd9cc4 0xc013f593 balance_dirty_pages+0x93 (0xc6dd2660)
0xdcdd9cd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xc6dd2660,
0xc10c6430, 0x0, 0x1, 0x1)
0xdcdd9d80 0xc013bc1e generic_file_buffered_write+0x2ee (0xdcdd9ed8,
0xdcdd9e50, 0x1, 0x1, 0x0)
0xdcdd9df8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xdcdd9ed8,
0xdcdd9e50, 0x1, 0xdcdd9f14, 0xc6dd2548)
0xdcdd9e2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xdcdd9ed8,
0xdcdd9e50, 0x1, 0xdcdd9f14, 0x0)
0xdcdd9e64 0xc013c715 generic_file_aio_write+0x75 (0xdcdd9ed8, 0x800fef0f,
0x1, 0x0, 0x0)
0xdcdd9ea0 0xc026c28a __cfs_file_write+0xda (0xdcdd9ed8, 0x0, 0x800fef0f,
0x1, 0xdcdd9ed0)
0xdcdd9ebc 0xc026c33e cfs_file_aio_write+0x2e (0xdcdd9ed8, 0x800fef0f, 0x1,
0x0, 0x0)
0xdcdd9f64 0xc015854b do_sync_write+0xab (0xdf517920, 0x800fef0f, 0x1,
0xdcdd9fa8, 0x26)
0xdcdd9f90 0xc0158688 vfs_write+0xe8 (0xdf517920, 0x800fef0f, 0x1,
0xdcdd9fa8, 0x0)
0xdcdd9fbc 0xc015879b sys_write+0x4b
0xc0103c65 sysenter_past_esp+0x52

[Attachment #3 (text/html)]

<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, \
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">* CFS hang while remote copying \
very large files. (believed to be fixed)<br></blockquote> </div><br>
This one just hit again.&nbsp; I think this has to with CFS writeback
somewhere.&nbsp; A lot of processes with the same backtrace stuck
waiting for IO.&nbsp; The condition is cleared by doing a `sync` on the
node that is experiencing this.&nbsp; Similar to what Andy Philips
originally reported with getting stuck while doing large writes.<br>
<br>
These processes were stuck in an infinite for loop in
balance_dirty_pages().&nbsp; It means the if (nr_reclaimable +
wbs.nr_writeback &lt;= dirty_thresh) test always fails.&nbsp; This
could happen when !nr_reclaimable and nr_writeback &gt;
dirty_thresh.&nbsp; Doing a `sync` fixes the problem because it flushes
the entire writeback.&nbsp; Any more ideas?<br>
<br>
-Roger<br>
<br>
<br>
0xd8c6c550&nbsp;&nbsp; 160716&nbsp;&nbsp; 160712&nbsp; 0&nbsp;&nbsp;&nbsp; \
0&nbsp;&nbsp; D&nbsp; 0xd8c6c710&nbsp; httpd<br> \
EBP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
EIP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Function (args)<br> 0xd802bbb8 \
0xc03b6b23 schedule+0x2b3 (0xd802bbcc, 0x268b84b3, 0xc011fb15, 0xc0617c40, \
0xcc9bfbcc)<br> 0xd802bbf4 0xc03b7126 schedule_timeout+0x76<br>
0xd802bbfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xd8c6c550, 0xc01305f0, \
0xd802bc34)<br> 0xd802bc54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, \
0xd802bc74, 0xc04afb24, 0x1)<br> 0xd802bcc4 0xc013f593 balance_dirty_pages+0x93 \
(0xd8508660)<br> 0xd802bcd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 \
(0xd8508660, 0xc1499030, 0xcd3, 0x81, 0xd54)<br> 0xd802bd80 0xc013bc1e \
generic_file_buffered_write+0x2ee (0xd802bed8, 0xd802be50, 0x1, 0xddbd54, 0x0)<br> \
0xd802bdf8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xd802bed8, 0xd802be50, \
0x1, 0xd802bf14, 0xd8508548)<br> 0xd802be2c 0xc013c4d8 \
generic_file_aio_write_nolock+0x48 (0xd802bed8, 0xd802be50, 0x1, 0xd802bf14, 0x0)<br> \
0xd802be64 0xc013c715 generic_file_aio_write+0x75 (0xd802bed8, 0xbfffe7e0, 0x81, \
0xddbcd3, 0x0)<br> 0xd802bea0 0xc026c28a __cfs_file_write+0xda (0xd802bed8, 0x0, \
0xbfffe7e0, 0x81, 0xd802bed0)<br> 0xd802bebc 0xc026c33e cfs_file_aio_write+0x2e \
(0xd802bed8, 0xbfffe7e0, 0x81, 0xddbcd3, 0x0)<br> 0xd802bf64 0xc015854b \
do_sync_write+0xab (0xdcf4e920, 0xbfffe7e0, 0x81, 0xd802bfa8, 0x558004)<br> \
0xd802bf90 0xc0158688 vfs_write+0xe8 (0xdcf4e920, 0xbfffe7e0, 0x81, 0xd802bfa8, \
0xddbcd3)<br> 0xd802bfbc 0xc015879b sys_write+0x4b<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0xc0103c65 \
sysenter_past_esp+0x52<br> <br>
0xc8237550&nbsp;&nbsp; 133910&nbsp;&nbsp; 133897&nbsp; 0&nbsp;&nbsp;&nbsp; \
0&nbsp;&nbsp; D&nbsp; 0xc8237710&nbsp; ndbd<br> \
EBP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
EIP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Function (args)<br> 0xc81d3bb8 \
0xc03b6b23 schedule+0x2b3 (0xc81d3bcc, 0x268aeeb5, 0x1, 0xc8de7bcc, 0xcaac3bcc)<br> \
0xc81d3bf4 0xc03b7126 schedule_timeout+0x76<br> 0xc81d3bfc 0xc03b7071 \
io_schedule_timeout+0x11 (0x64, 0x0, 0xc8237550, 0xc01305f0, 0xc81d3c34)<br> \
0xc81d3c54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, 0xc81d3c74, 0xdf8f7684, \
0x2e)<br> 0xc81d3cc4 0xc013f593 balance_dirty_pages+0x93 (0xdebe61e0)<br>
0xc81d3cd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 (0xdebe61e0, 0xc13540d0, \
0x0, 0x1000, 0x1000)<br> 0xc81d3d80 0xc013bc1e generic_file_buffered_write+0x2ee \
(0xc81d3ed8, 0xc81d3e50, 0x1, 0x6c6000, 0x0)<br> 0xc81d3df8 0xc013c28d \
__generic_file_aio_write_nolock+0x27d (0xc81d3ed8, 0xc81d3e50, 0x1, 0xc81d3f14, \
0xdebe60c8)<br> 0xc81d3e2c 0xc013c4d8 generic_file_aio_write_nolock+0x48 (0xc81d3ed8, \
0xc81d3e50, 0x1, 0xc81d3f14, 0x0)<br> 0xc81d3e64 0xc013c715 \
generic_file_aio_write+0x75 (0xc81d3ed8, 0xa768a008, 0x8000, 0x6c0000, 0x0)<br> \
0xc81d3ea0 0xc026c28a __cfs_file_write+0xda (0xc81d3ed8, 0x0, 0xa768a008, 0x8000, \
0xc81d3ed0)<br> 0xc81d3ebc 0xc026c33e cfs_file_aio_write+0x2e (0xc81d3ed8, \
0xa768a008, 0x8000, 0x6c0000, 0x0)<br> 0xc81d3f64 0xc015854b do_sync_write+0xab (0x0, \
0xc8237550, 0x0, 0x0, 0x6c0000)<br> \
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0xc0363700 ip_rcv_finish \
(0xd446cce0, 0xa768a008, 0x8000, 0xc81d3fa8, 0x8)<br> \
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0xc0158688 \
vfs_write+0xe8 (0xd446cce0, 0xa768a008, 0x8000, 0xc81d3fa8, 0x6c0000)<br> 0xc81d3fbc \
0xc015879b sys_write+0x4b<br> \
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0xc0103c65 \
sysenter_past_esp+0x52<br> <br>
0xdfeaeaa0&nbsp;&nbsp; 133079&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
1&nbsp; 0&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp; D&nbsp; 0xdfeaec60&nbsp; ntpd<br>
EBP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
EIP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Function (args)<br> 0xdcdd9bb8 \
0xc03b6b23 schedule+0x2b3 (0xdcdd9bcc, 0x26868821, 0xdcdd9c28, 0xdc4bbbcc, \
0xdd541b5c)<br> 0xdcdd9bf4 0xc03b7126 schedule_timeout+0x76<br>
0xdcdd9bfc 0xc03b7071 io_schedule_timeout+0x11 (0x64, 0x0, 0xdfeaeaa0, 0xc01305f0, \
0xdcdd9c34)<br> 0xdcdd9c54 0xc02ec824 blk_congestion_wait+0x74 (0x1, 0x64, \
0xdcdd9c74, 0xc04afb24, 0x5)<br> 0xdcdd9cc4 0xc013f593 balance_dirty_pages+0x93 \
(0xc6dd2660)<br> 0xdcdd9cd0 0xc013f685 balance_dirty_pages_ratelimited+0x45 \
(0xc6dd2660, 0xc10c6430, 0x0, 0x1, 0x1)<br> 0xdcdd9d80 0xc013bc1e \
generic_file_buffered_write+0x2ee (0xdcdd9ed8, 0xdcdd9e50, 0x1, 0x1, 0x0)<br> \
0xdcdd9df8 0xc013c28d __generic_file_aio_write_nolock+0x27d (0xdcdd9ed8, 0xdcdd9e50, \
0x1, 0xdcdd9f14, 0xc6dd2548)<br> 0xdcdd9e2c 0xc013c4d8 \
generic_file_aio_write_nolock+0x48 (0xdcdd9ed8, 0xdcdd9e50, 0x1, 0xdcdd9f14, 0x0)<br> \
0xdcdd9e64 0xc013c715 generic_file_aio_write+0x75 (0xdcdd9ed8, 0x800fef0f, 0x1, 0x0, \
0x0)<br> 0xdcdd9ea0 0xc026c28a __cfs_file_write+0xda (0xdcdd9ed8, 0x0, 0x800fef0f, \
0x1, 0xdcdd9ed0)<br> 0xdcdd9ebc 0xc026c33e cfs_file_aio_write+0x2e (0xdcdd9ed8, \
0x800fef0f, 0x1, 0x0, 0x0)<br> 0xdcdd9f64 0xc015854b do_sync_write+0xab (0xdf517920, \
0x800fef0f, 0x1, 0xdcdd9fa8, 0x26)<br> 0xdcdd9f90 0xc0158688 vfs_write+0xe8 \
(0xdf517920, 0x800fef0f, 0x1, 0xdcdd9fa8, 0x0)<br> 0xdcdd9fbc 0xc015879b \
sys_write+0x4b<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0xc0103c65 sysenter_past_esp+0x52<br>


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
ssic-linux-devel mailing list
ssic-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic