[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-users
Subject:    Re: [Ocfs2-users] loss of connection
From:       Sunil Mushran <sunil.mushran () oracle ! com>
Date:       2010-12-15 20:00:19
Message-ID: 4D091E53.9050001 () oracle ! com
[Download RAW message or body]

So the o2net disconnect can be explained with the cpu soft lockup.
But the cpu soft lockup is a bit funky. The stack shows spin_unlock.
Typically one would expect it on a spin_lock and the hunt would be
for the process holding that spinlock. But then this is kvm. If pvops
is enabled, then it could be kvm related. Maybe. I am guessing here.
See the ubuntu bug db. Maybe they have another report of a similar
issue. That may tell us more.

On 12/14/2010 11:17 PM, Andreas Rittershofer wrote:
> Am 15.12.2010 um 08:04 schrieb Sunil Mushran:
> 
> > On 12/14/2010 10:59 PM, Andreas Rittershofer wrote:
> > > My log says suddenly:
> > > 
> > > Dec 14 02:35:16 hp1 kernel: [1492482.232822] o2net: no longer connected to node \
> > >                 hp2 (num 1) at 192.168.1.2:7777
> > > Dec 14 02:35:18 hp1 kernel: [1492483.960150] BUG: soft lockup - CPU#1 stuck for \
> > > 61s! [kvm:32398] 
> > > I have no idea what happens here and why - but the result are a lot of problems \
> > > with virtual machines. 
> > > 
> > > Viele Grüße
> > > 
> > > Andreas Rittershofer
> > > 
> > There should be a stack in /var/log/messages is connection with
> > the soft lockup. Also, versions are good to know.
> 
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] Pid: 32398, comm: kvm Not tainted \
>                 2.6.32-26-server #47-Ubuntu ProLiant DL580 G5
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RIP: 0010:[<ffffffff8155a719>]  \
>                 [<ffffffff8155a719>] _spin_unlock_irqrestore+0x19/0x30
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RSP: 0018:ffff8807cb61ba10  EFLAGS: \
>                 00000282
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RAX: 0000000000000282 RBX: \
>                 ffff8807cb61ba18 RCX: ffff880ce47e09f0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RDX: 0000000000ae3c4c RSI: \
>                 0000000000000282 RDI: 0000000000000282
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RBP: ffffffff81012cae R08: \
>                 ffff880ce47e09e0 R09: 11ef23612a7a8443
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] R10: 0000000000000001 R11: \
>                 0000000000000000 R12: 0000000000000286
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] R13: 0000000000000004 R14: \
>                 000000001200c2fc R15: 0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] FS:  00007f317085a710(0000) \
>                 GS:ffff880028220000(0000) knlGS:0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] CS:  0010 DS: 002b ES: 002b CR0: \
>                 000000008005003b
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] CR2: 00007f014de7b298 CR3: \
>                 0000000cf4379000 CR4: 00000000000026e0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] DR0: 0000000000000000 DR1: \
>                 0000000000000000 DR2: 0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] DR3: 0000000000000000 DR6: \
>                 00000000ffff0ff0 DR7: 0000000000000400
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] Call Trace:
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa045c8b0>] ? \
>                 ocfs2_should_refresh_lock_res+0x130/0x200 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa045ca4a>] ? \
>                 ocfs2_inode_lock_update+0xca/0x4d0 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0460df8>] ? \
>                 ocfs2_inode_lock_full_nested+0x2e8/0x660 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0461449>] ? \
>                 ocfs2_inode_lock_with_page+0x39/0x90 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0457f0e>] ? \
>                 __ocfs2_cluster_unlock+0x12e/0x2f0 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0461449>] ? \
>                 ocfs2_inode_lock_with_page+0x39/0x90 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0446cfd>] ? \
>                 ocfs2_readpage+0x5d/0x310 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff810f46b0>] ? \
>                 T.811+0x100/0x400
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff810f4a66>] ? \
>                 generic_file_aio_read+0xb6/0x1d0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0466930>] ? \
>                 ocfs2_file_aio_read+0x100/0x420 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81096772>] ? \
>                 futex_wait+0x222/0x350
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81143afa>] ? \
>                 do_sync_read+0xfa/0x140
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81084250>] ? \
>                 autoremove_wake_function+0x0/0x40
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff8155a6ce>] ? \
>                 _spin_lock+0xe/0x20
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81095862>] ? \
>                 futex_wake+0x112/0x130
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81252246>] ? \
>                 security_file_permission+0x16/0x20
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff811443e5>] ? \
>                 vfs_read+0xb5/0x1a0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff811446f2>] ? \
>                 sys_pread64+0x82/0xa0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff810121b2>] ? \
>                 system_call_fastpath+0x16/0x1b
> Dec 14 02:35:18 hp1 kernel: [1492483.960787] Modules linked in: ocfs2 quota_tree \
> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs \
> xt_multiport ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 \
> xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge \
> stp kvm_intel kvm fbcon tileblit font bitblit softcursor vga16fb vgastate radeon \
> ttm drm_kms_helper bnx2 drm psmouse lp ipmi_si ses parport i2c_algo_bit serio_raw \
> usbhid shpchp ipmi_msghandler hid hpilo enclosure qla2xxx scsi_transport_fc \
> ohci1394 ieee1394 scsi_tgt e1000e cciss 
> 
> Yesterday morning and this morning I had the same problems; I just made an apt-get \
> update / upgrade hoping to avoid this problem tomorrow morning. 
> 
> Viele Grüße
> 
> Andreas Rittershofer
> 


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic