'Re: [Linux-cluster] Bug inquiry (#831330)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       redhat-linux-cluster
Subject:    Re: [Linux-cluster] Bug inquiry (#831330)
From:       Antonio Castellano <dev () sdd ! jp>
Date:       2012-11-14 7:09:38
Message-ID: 1352876980602019000002ec () sv0 ! inside ! kobe ! sdd ! jp
[Download RAW message or body]

Hi, Steven. 
Thank you for the reply.

I'm sending you here the syslog portion where the problem appears. Maybe it will be \
of some help.  The kernel version is 2.6.18-308.11.1.el5PAE.

Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: fatal: invalid \
                metadata block 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2:&#160;&#160; bh = \
                151918444 (magic number) 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2:&#160;&#160; function \
                = get_leaf, file = fs/gfs2/dir.c, line = 763 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: about to withdraw \
                this file system 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: telling LM to \
                withdraw 
Nov 12 15:50:17 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: withdrawn 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95f76f6>] gfs2_lm_withdraw+0x8d/0xb0 \
                [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f960a98e>] gfs2_meta_check_ii+0x28/0x33 \
                [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95ed682>] get_leaf+0x5e/0x9d [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95edccb>] get_first_leaf+0x24/0x2a [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95edd52>] gfs2_dirent_search+0x81/0x180 \
                [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95ee07f>] gfs2_dirent_find+0x0/0x4c [gfs2] \

Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95f4344>] run_queue+0xbd/0x18a [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95ef448>] gfs2_dir_search+0x1d/0x7f [gfs2] \

Nov 12 15:50:17 blahblah6 kernel:&#160; [<c04833e2>] permission+0xa2/0xb5 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95f5aa0>] gfs2_lookupi+0x116/0x14f [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95f5a5a>] gfs2_lookupi+0xd0/0x14f [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f9602136>] gfs2_lookup+0x1b/0x8e [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<f95f3b6c>] gfs2_glock_put+0xcf/0xe7 [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c048c807>] d_alloc+0x151/0x17f 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c04831c8>] do_lookup+0x102/0x1b6 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c0484b45>] __link_path_walk+0x318/0xd1d 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c0485584>] link_path_walk+0x3a/0x99 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c0485961>] do_path_lookup+0x231/0x297 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c04860bb>] __user_walk_fd+0x29/0x3a 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c047f4b9>] vfs_stat_fd+0x15/0x3c 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c047f525>] sys_stat64+0xf/0x23 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c06258e8>] do_page_fault+0x356/0x653 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c047795f>] __fput+0x15c/0x184 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c0625592>] do_page_fault+0x0/0x653 
Nov 12 15:50:17 blahblah6 kernel:&#160; [<c0404ee1>] sysenter_past_esp+0x56/0x79

We have 5 servers accessing a shared filesystem that consists of 24 virtual disks on \
top of multiple HDDs using GSF2. Once this problem happens in a virtual disk, we \
can't write into it (but the rest of the virtual disks keep on working without any \
problem). Also, it seems that running fsck fixes the virtual disk temporarily, but \
after a while it breaks again. Is there any way to fix this problem, or at least \
reduce how often it happens (it's happening almost every day in our system), without \
having to inst all an older kernel version?

Best regards,

> Hi,
> 
> On Mon, 2012-11-12 at 15:24 +0900, Antonio Castellano wrote:
> > Hi,
> > 
> > I'd like to know about the status of the bug number 831330 and its schedule. Our \
> > system is complaining about it and I don't have enough permissions to access its \
> > bugzilla related page. It is urgent. 
> > This is the link related to the text reported in our log:
> > https://access.redhat.com/knowledge/ja/node/141203
> > 
> > And this is the bugzilla link:
> > https://bugzilla.redhat.com/show_bug.cgi?id=831330
> > 
> > Is there anybody out there that can help me? The help will be greatly \
> > appreciated. 
> > Thank you very much!
> > 
> Assuming that you are a Red Hat customer, please open a ticket. The bug
> mostly contains customer's private data, so that I don't think opening
> this one up would help much as there would be little that we could
> share.
> 
> This is though, our highest priority bug at the moment (when I say our,
> I mean the GFS2 team). There is a simple workaround (just use a slightly
> older kernel) which is one reason why we've had trouble in tracing this,
> because people are (understandably) using that rather than running the
> kernel we've built to debug this issue.
> 
> We've been unable to reproduce this internally, despite trying many
> different workloads. If you are in a position to help us debug the
> issue, then any assistance is very gratefully received,
> 
> Steve.
> 
> 
> 

--
Antonio Castellano [DEV@SDD.jp]
   Seventh Dimension Design, Inc.
   http://www.SDD.jp
   VOICE: +81-78-252-8855, FAX: +81-78-252-8856

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[prev in list] [next in list] [prev in thread] [next in thread]