[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ssic-linux-devel
Subject:    [SSI-devel] [ ssic-linux-Bugs-1842982 ] vproc_hold_movement oops
From:       "SourceForge.net" <noreply () sourceforge ! net>
Date:       2008-01-02 1:16:44
Message-ID: E1J9sDs-0003Qr-Bd () sc8-sf-web24 ! sourceforge ! net
[Download RAW message or body]

Bugs item #1842982, was opened at 2007-12-02 18:32
Message generated for change (Settings changed) made by rogertsang
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1842982&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Process Management
Group: v1.9.1
> Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Roger Tsang (rogertsang)
Assigned to: Nobody/Anonymous (nobody)
Summary: vproc_hold_movement oops

Initial Comment:
<4>procfs: impossible type (25)<7>Assertion failed! vp != ((void *)0), \
cluster/ssi/vproc/dvp_vpops.c, vpop_report_state, line=1369 <1>Unable to handle \
kernel NULL pointer dereference at virtual address 0000000c <1> printing eip:
<4>c029226c
<1>*pde = 00000000
<1>Oops: 0000 [#1]
<4>SMP
<4>Modules linked in: loop nfsd tun ipt_REJECT ipt_state ipt_multiport iptable_filter \
ipt_MASQUERADE iptable_nat ip_conntrack ip_tables softdog nls_iso8859_1 nls_cp437 \
vfat fat usb_storage binfmt_misc uhci_hcd ehci_hcd usbcore floppy drbd via_rhine \
sk98lin r8169 forcedeth dm_mod <4>CPU:    1
<4>EIP:    0060:[<c029226c>]    Not tainted VLI
<4>EFLAGS: 00010296   (2.6.11-ssi5.31)
<4>EIP is at vproc_hold_movement+0xc/0x1f0
<4>eax: 00000000   ebx: 00000000   ecx: c04c3a10   edx: 00000000
<4>esi: 00000001   edi: c4c6d400   ebp: f7d43b6c   esp: f7d43b0c
<4>ds: 007b   es: 007b   ss: 0068
<4>Process child_reaper (pid: 2, threadinfo=f7d42000 task=f7d41630)
<4>Stack: 00000000 f7d43b30 c0136f7c f536df2c 00000001 00000000 00000000 00000000
<4>       c04c3a10 f7d43b58 c011b6c1 f536df2c 00000001 00000000 00000000 c04c3a10
<4>       c04c3a0c 00000001 00000286 f7d43b84 c011b738 00000000 00000001 c4c6d400
<4>Call Trace:
<4> [<c0104eff>] show_stack+0x7f/0xa0
<4> [<c01050a6>] show_registers+0x166/0x230
[1]more>
Only 'q' or 'Q' are processed at more prompt, input ignored
<4> [<c0105446>] die+0xf6/0x1c0
<4> [<c011850d>] do_page_fault+0x45d/0x652
<4> [<c0104b5f>] error_code+0x2b/0x30
<4> [<c0296f82>] pvpop_report_state+0x32/0x690
<4> [<c029e191>] vpop_report_state+0x1b1/0x3c0
<4> [<c01214fa>] release_task+0x17a/0x1c0
<4> [<c01227b6>] wait_task_zombie+0xe6/0x240
<4> [<c0122ddb>] pproc_reap+0x29b/0x380
<4> [<c0293704>] pvpop_reap+0x204/0x500
<4> [<c0292e7d>] dpvproc_nocldwait_async_handler+0x13d/0x2f0
<4> [<c02771a5>] async_cleanup_task_structs+0x55/0x90
<4> [<c02b5005>] initproc_postroot_init+0x145/0x230
<4> [<c027d872>] ssisys_cluster_initproc+0x12/0x20
<4> [<c027bd7b>] do_ssisys+0x9b/0x1f0
<4> [<c027bf1e>] sys_ssisys+0x4e/0x70
<4> [<c0103fc5>] sysenter_past_esp+0x52/0x75
<4>Code: 89 42 08 c9 c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 c9 c3 8d 74 26 00 8d \
bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 54 8b 45 08 <8b> 58 0c 8d 93 c4 00 00 00 89 \
d0 89 55 b0 e8 f1 a6 1b 00 8b 4d <4>
[1]kdb> bt
Stack traceback for pid 2
0xf7d41630        2        0  1    1   R  0xf7d41800 *child_reaper
EBP        EIP        Function (args)
0xf7d43b6c 0xc029226c vproc_hold_movement+0xc (0x0, 0x0, 0xc047ad88, 0x292, \
0xf7d43ba4) 0xf7d43c00 0xc0296f82 pvpop_report_state+0x32 (0x0, 0xc4c6d400, \
0xf7d43c54, 0x0, 0x1) 0xf7d43c48 0xc029e191 vpop_report_state+0x1b1 (0xc4c6d400, \
0x11, 0x0, 0x1, 0x0) 0xf7d43c84 0xc01214fa release_task+0x17a (0xe51239f0, 0x0, \
0xf7d43cb8, 0x0, 0x0) 0xf7d43cc8 0xc01227b6 wait_task_zombie+0xe6 (0xe51239f0, 0x0, \
0x0, 0xf7d43e4c, 0xf7d43e50) 0xf7d43d18 0xc0122ddb pproc_reap+0x29b (0xe51239f0, 0x0, \
0xf7d43e4c, 0xf7d43e50, 0x313d3) 0xf7d43e28 0xc0293704 pvpop_reap+0x204 (0xcfc34000, \
0xffffffff, 0x20, 0x313d3, 0xf7d43e4c) 0xf7d43efc 0xc0292e7d \
dpvproc_nocldwait_async_handler+0x13d (0xc6a4c218, 0xf7d42000, 0xf7d42000, \
0xf7d41630, 0x8) 0xf7d43f18 0xc02771a5 async_cleanup_task_structs+0x55 (0xf7d41630, \
0x0, 0x40000001, 0x0, 0xc02b4eb0) 0xf7d43f58 0xc02b5005 initproc_postroot_init+0x145
0xf7d43f60 0xc027d872 ssisys_cluster_initproc+0x12

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-12-26 23:08

Message:
Logged In: YES 
user_id=1246761
Originator: YES

The original oops is produced on 2.0.0pre1, but affected code dates back
to 1.9.1

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-12-10 21:58

Message:
Logged In: YES 
user_id=1246761
Originator: YES

I cannot reproduce this bug; and I'm not using IB interconnect.  Post your
OOPS if you can reproduce it with the new VPROC_HASH_LIST code.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-12-10 21:49

Message:
Logged In: YES 
user_id=1246761
Originator: YES

Latest checkin marked #ifdef VPROC_HASH_LIST includes SMP bug fix for
possible vproc hash corruption due to duplicate vproc release.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-12-06 02:14

Message:
Logged In: YES 
user_id=1246761
Originator: YES

Not using new ATOMIC_VPROC_REFCNT code.

Also reported to occur during Infiniband IPC bring up (with 1.9.3) - which
means bug can be reproduced?

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-12-02 18:41

Message:
Logged In: YES 
user_id=1246761
Originator: YES

Assert at vproc_origin_inform_nodedown_node() is related to oops on node
3?

ng over master from node 3.
<4>Node 3 has gone down!!!
<7>Assertion failed! surrogate_origin_node == this_node,
cluster/ssi/vproc/nd_origin.c, vproc_origin_inform_nodedown_done, line=289
<4>passed the first scan in ipcname_pull_data
<4>num_objects[MSG] = 0
<4>num_objects[SEM] = 2
<4>num_objects[SHM] = 9
<4>ipcnameserver ready completed

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-12-02 18:34

Message:
Logged In: YES 
user_id=1246761
Originator: YES

Oops on dual-core AMD Opteron at vproc_hold_movement() due to null vp
returned by tnc_locate_vproc_pid() when pid is at origin node but
tnc_locate_vproc_pid() thinks it is not in the vproc hash and should be in
the vproc hash.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1842982&group_id=32541

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
ssic-linux-devel mailing list
ssic-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic