[prev in list] [next in list] [prev in thread] [next in thread]
List: lustre-discuss
Subject: [Lustre-discuss] Fw: Re: lvbo_init failed after e2fsck
From: wanglu () ihep ! ac ! cn (WANG Lu)
Date: 2011-08-03 13:33:26
Message-ID: 16459057.57511312378406211.JavaMail.javamailuser () localhost
[Download RAW message or body]
Hi Andreas,
Do you think I need to do a whole lfsck?
I got an error when I was generating mdsdb on our test file system (with OST \
pool). Errors are same as discussed at this thread:
http://comments.gmane.org/gmane.comp.file-systems.lustre.user/10111
Do you have any suggestion?
Lu Wang
Computing Center
IHEP
-----????-----
???: "WANG Lu" <wanglu at ihep.ac.cn>
????: 2011?8?2? ???
???: "WANG Lu" <wanglu at ihep.ac.cn>
??: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
??: Re: [Lustre-discuss] lvbo_init failed after e2fsck
Update some information:
1. After running "ll_recover_lost_found_objs", we still have the "lvbo_init faild" \
error. 2. There is no files under "lost+found", in some of related OSTs. Here is the \
result of debugf:
# debugfs -c /dev/sdb1
debugfs 1.41.10.sun2 (24-Feb-2010)
/dev/sdb1: catastrophic mode - not reading inode or group bitmaps
debugfs: ls
2 (12) . 2 (12) .. 11 (20) lost+found 103784449 (16) CONFIGS
12 (20) last_rcvd 13 (20) health_check 222863361 (3996) O
debugfs: cd lost+found
debugfs: ls
11 (12) . 2 (4084) .. 0 (4096) 0 (4096) 0 (4096)
3. We are currently running Lustre 1.8.5.
Thank you in advance for your help!
Lu Wang
CC-IHEP
> -----????-----
> ???: "WANG Lu" <wanglu at ihep.ac.cn>
> ????: 2011?8?1? ???
> ???: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> ??:
> ??: [Lustre-discuss] lvbo_init failed after e2fsck
>
> Dear all,
> After an annual e2fsck of all OSTs, two of our OSTs have become read only with \
> error:
> Jul 25 08:37:34 com04 kernel: LDISKFS-fs error (device sdb1): \
> ldiskfs_dx_find_entry: bad entry in directory #222863370: inode out of bounds - \
> offset=3280896, inode=656179638, rec_len=4096, name_len=0
> Jul 25 08:37:34 com04 kernel: Aborting journal on device sdb1-8.
> Jul 25 08:37:34 com04 kernel: LDISKFS-fs (sdb1): Remounting filesystem read-only
> tune2fs shows the OSTs are at stat "clean with error", after umount and e2fsck \
> again, the two OSTs could be mount normally(and the stat changed to "clean").
> However, we began to meet hundreds of "lvbo_init failed" on serveral OSTs, not \
> limited on the two OSTs which have been read-only.
> Three of our OSTs have met hundreds of lvbo_init faild after an annual e2fsck \
> examination.
> Aug 1 17:48:26 com04 kernel: LustreError: \
> 5493:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 1 previous \
> similar message
> Aug 1 17:59:02 com04 kernel: LustreError: \
> 5632:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001d_UUID: \
> lvbo_init failed for resource 2997406: rc -2
> Aug 1 17:59:02 com04 kernel: LustreError: \
> 5632:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 1 previous \
> similar message
> Aug 1 18:10:51 com04 kernel: LustreError: \
> 5602:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001d_UUID: \
> lvbo_init failed for resource 3240254: rc -2
> Aug 1 18:10:51 com04 kernel: LustreError: \
> 5602:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 2 previous similar \
> messages
> Aug 1 18:21:49 com04 kernel: LustreError: \
> 5642:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001f_UUID: \
> lvbo_init failed for resource 3204200: rc -2
> Aug 1 18:21:49 com04 kernel: LustreError: \
> 5642:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 6 previous similar \
> messages
> Aug 1 18:53:18 com04 kernel: LustreError: \
> 5324:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001f_UUID: \
> lvbo_init failed for resource 12856264: rc -2
> Aug 1 18:53:18 com04 kernel: LustreError: \
> 5324:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 1 previous similar message \
> According to previous discussions, it seems that the related Objects have been \
> deleted or moved to lost+found. I am not sure: 1. if the commmand " \
> ll_recover_lost_found_objs" can get back all the lost objects 2. if not, how can I \
> get a list of demaged files? 3. as users continuely writing new data to the \
> OSTs, the number of demaged Objects will increase?
> do you have any suggestion? Thank you very much!
>
>
> Lu Wang
> Computing Center
> IHEP,China
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic