[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ssic-linux-devel
Subject:    [SSI-devel] [ ssic-linux-Bugs-1811510 ] deadlock on loop mounted fs
From:       "SourceForge.net" <noreply () sourceforge ! net>
Date:       2008-06-19 6:32:10
Message-ID: E1K9Dgo-0001lf-0h () b55xhf1 ! ch3 ! sourceforge ! com
[Download RAW message or body]

Bugs item #1811510, was opened at 2007-10-11 08:22
Message generated for change (Comment added) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1811510&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Filesystem
Group: v1.9.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Roger Tsang (rogertsang)
Summary: deadlock on loop mounted fs

Initial Comment:
1. Make a sparse file

   perl -e 'open BIGFILE, ">BIGFILE"; seek BIGFILE, 1024 * 1024 * 1024, 0; print \
BIGFILE "big"'

2. make a filesystem on it

   losetup /dev/loop/0 BIGFILE
   mkfs -t ext3 /dev/loop/0

3. mount it

   mount -t ext3 /dev/loop/0 /mnt

4. write a lot of files to it

   cd /mnt
   dump 0f - / | restore rf -

eventualy the node where we are writing to the loopback mounted fs gets deadlocked.  \
It's still up as far as the cluster is concerned, but any attempt to start a process \
on it blocks.



----------------------------------------------------------------------

> Comment By: Roger Tsang (rogertsang)
Date: 2008-06-19 02:32

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Try the attached patch.

More work would need to be done to pass a flag to kernel space for CFS to
use a different congestion bit in the case of CFS on loopback.  However the
proposed solution only works if you are not going to CFS mount another
loopback on top of a CFS mount on loopback on CFS.  So the simple fix would
be this patch.  Loopback becomes a standard mount.
File Added: util-linux.1811510.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-03-16 20:35

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Should be fixed in 2.0.0pre3...

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-10-20 21:58

Message:
Logged In: YES 
user_id=1246761
Originator: NO

It looks like CFS ran out of memory.  Try the latest checkin of
kernel/cluster/ssi/cfs code that re-enables commit for soft mounts.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-10-20 14:42

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Does 2.6.10-ssi run into this bug?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2007-10-16 10:33

Message:
Logged In: NO 

Still looks the same as the old bug... This time it is stacked
generic_file_writev().

cfs_async (has i_sem)
  loop0
    pdflush
      kjournald
        cfs_async (waiting for i_sem)

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2007-10-12 07:36

Message:
Logged In: YES 
user_id=166336
Originator: YES

Here's some debugging.  I've got to the point where the "restore" process
on node 1 seems hung.  On node 2 I try an "onnode 1 pwd".  It hangs.

One node 1:

Entering kdb (current=0xc0502bc0, pid 0) on processor 0 due to Keyboard
Entry
[0]kdb> ps
1 idle process (state I) and 50 sleeping system daemon (state M) processes
suppressed
Task Addr       Pid   Parent [*] cpu State Thread     Command

0xcf82a5d0        5        2  0    0   R  0xcf82a7b0  events/0
0xcf68b990      117       11  0    0   D  0xcf68bb70  pdflush
0xcf68a310      121        2  0    0   D  0xcf68a4f0  cfs_async
0xcf6b99b0      122        2  0    0   D  0xcf6b9b90  cfs_async
0xcf6b9410      123        2  0    0   D  0xcf6b95f0  cfs_async
0xcf6b8e70      124        2  0    0   D  0xcf6b9050  cfs_async
0xcf6b88d0      125        2  0    0   D  0xcf6b8ab0  cfs_async
0xcf6b8330      126        2  0    0   D  0xcf6b8510  cfs_async
0xcf6c99d0      127        2  0    0   D  0xcf6c9bb0  cfs_async
0xcf6c9430      128        2  0    0   D  0xcf6c9610  cfs_async
0xce92b730        1        0  0    0   D  0xce92b910  init
[...]
0xce90b150    67763        2  0    0   D  0xce90b330  loop0
0xce90d170    67820        2  0    0   D  0xce90d350  kjournald
0xce8f96d0    67822    67636  0    0   S  0xce8f98b0  dump
0xce8f8b90    67823    67636  0    0   D  0xce8f8d70  restore
0xcf13f970    67824    67822  0    0   S  0xcf13fb50  dump
0xcf13f3d0    67825    67824  0    0   S  0xcf13f5b0  dump
0xcf7861f0    67826    67824  0    0   S  0xcf7863d0  dump
0xcf786790    67827    67824  0    0   S  0xcf786970  dump
0xcf47d9b0   132773        2  0    0   D  0xcf47db90  onnode
[0]kdb> btp 132773
Stack traceback for pid 132773
0xcf47d9b0   132773        2  0    0   D  0xcf47db90  onnode
EBP        EIP        Function (args)
0xce879ba8 0xc046c2e6 schedule+0x3a6 (0xce879c10)
0xce879bb4 0xc046d348 io_schedule+0x28 (0xc1271c70)
0xce879bc0 0xc014aed5 sync_page+0x45 (0xc10c37f8, 0x0, 0xc014ae90,
0xcf47d9b0, 0xce879c10)
0xce879be0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc10c37f8,
0xc10c37f8, 0x0, 0x0)
0xce879c3c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xce7c31a0, 0x0,
0x1)
0xce879cc4 0xc014beeb do_generic_mapping_read+0x3db (0xce88ca00,
0xce7c31f0, 0xce7c31a0, 0xce879e00, 0xce879d00)
0xce879d1c 0xc014c3ed __generic_file_aio_read+0x1ed (0xce879dc4,
0xce879d34, 0x1, 0xce879e00, 0xcf06d600)
0xce879d48 0xc014c473 generic_file_aio_read+0x53 (0xce879dc4, 0xcf06d600,
0x80, 0x0, 0x0)
0xce879d84 0xc028375a __cfs_file_read+0xaa (0xce879dc4, 0x0, 0xcf06d600,
0x80, 0xce879da0)
0xce879da8 0xc0283828 cfs_file_aio_read+0x38 (0xce879dc4, 0xcf06d600,
0x80, 0x0, 0x0)
0xce879e50 0xc016c3b3 do_sync_read+0xa3 (0xce7c31a0, 0xcf06d600, 0x80,
0xce879e8c, 0xce879000)
0xce879e74 0xc016c490 vfs_read+0xb0 (0xce7c31a0, 0xcf06d600, 0x80,
0xce879e8c, 0x0)
0xce879e9c 0xc017895a kernel_read+0x4a (0xce7c31a0, 0x0, 0xcf06d600, 0x80,
0xcf06d600)
0xce879ec0 0xc017946a prepare_binprm+0xca (0xcf06d600, 0x7fff, 0xc13b4080,
0x0, 0x0)
0xce879eec 0xc0179a16 ssi_do_execve+0x1a6 (0xcf012920, 0xce6f8800,
0xcf6aa400, 0xce879fa0, 0x0)
0xce879f78 0xc0245c3a rexecve_server+0xea (0xcf50e000, 0xcf47d9b0,
0xcf012920, 0xce6f8800, 0xcf6aa400)
0xce879fec 0xc02454f5 rexecve_server_setup+0x55
           0xc01023a5 kernel_thread_helper+0x5
[0]kdb> btp 67823
Stack traceback for pid 67823
0xce8f8b90    67823    67636  0    0   D  0xce8f8d70  restore
EBP        EIP        Function (args)
0xca2fcea0 0xc046c2e6 schedule+0x3a6 (0x0, 0xce8f8b90, 0xc013f0a0,
0xca2fced4, 0xca2fced4)
0xca2fcef4 0xc029f3ba cfs_wait_on_request+0x7a (0xc9a8c200, 0xca2fcf14,
0x0, 0x1, 0x0)
0xca2fcf24 0xc0285a9e cfs_wait_on_requests+0x8e (0xccb63be4, 0x0, 0x0,
0x0, 0xce7c3600)
0xca2fcf48 0xc0286f66 cfs_sync_inode+0x76 (0xccb63be4, 0x0, 0x0, 0x2,
0x0)
0xca2fcf80 0xc0283653 cfs_file_flush+0x93 (0xce7c3600, 0x81a4, 0xccdef200,
0x5, 0xccdef204)
0xca2fcf9c 0xc016bb3c filp_close+0x6c (0xce7c3600, 0xccdef200, 0xce7c3600,
0x5, 0x0)
0xca2fcfbc 0xc016bbce sys_close+0x6e
           0xc0105a3b syscall_call+0x7
[0]kdb>
[0]kdb> btp 67763
Stack traceback for pid 67763
0xce90b150    67763        2  0    0   D  0xce90b330  loop0
EBP        EIP        Function (args)
0xca488db8 0xc046c2e6 schedule+0x3a6 (0xca488e20)
0xca488dc4 0xc046d348 io_schedule+0x28 (0xc12711e0)
0xca488dd0 0xc014aed5 sync_page+0x45 (0xc11d6be0, 0x0, 0xc014ae90,
0xce90b150, 0xca488e20)
0xca488df0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc11d6be0,
0xc11d6be0, 0x0, 0x0)
0xca488e4c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xcd6ca600,
0x38002, 0x1)
0xca488ed4 0xc014beeb do_generic_mapping_read+0x3db (0xcb632f40,
0xcd6ca650, 0xcd6ca600, 0xca488f58, 0xca488ef4)
0xca488f04 0xc014c61b generic_file_sendfile+0x5b (0xcd6ca600, 0xca488f58,
0x1000, 0xd08f15d0, 0xca488f60)
0xca488f3c 0xc02838bd cfs_file_sendfile+0x8d (0xcd6ca600, 0xca488f58,
0x1000, 0xd08f15d0, 0xca488f60)
0xca488f74 0xd08f16fc [loop]do_lo_receive+0x5c (0xc9353000, 0xc4279630,
0x1000, 0x38002000, 0x0)
0xca488fa4 0xd08f176e [loop]lo_receive+0x5e (0xc9353000, 0xc1ed33e0,
0x1000, 0x38002000, 0x0)
0xca488fc8 0xd08f17eb [loop]do_bio_filebacked+0x4b (0xc9353000,
0xc1ed33e0, 0x0, 0xc9353138, 0xd08f1a60)
0xca488fec 0xd08f1b3b [loop]loop_thread+0xdb
           0xc01023a5 kernel_thread_helper+0x5




----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-10-11 22:21

Message:
Logged In: YES 
user_id=1246761
Originator: NO

Sounds like [ 686748 ] Filesystem stacking deadlock.

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2007-10-11 08:22

Message:
Logged In: YES 
user_id=166336
Originator: YES

This is with the 2.6.11 kernel

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1811510&group_id=32541

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
ssic-linux-devel mailing list
ssic-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic