[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] Fw: Re:   lvbo_init  failed  after e2fsck
From:       wanglu () ihep ! ac ! cn (WANG Lu)
Date:       2011-08-03 13:33:26
Message-ID: 16459057.57511312378406211.JavaMail.javamailuser () localhost
[Download RAW message or body]

Hi Andreas,
   Do you think I need to do a whole lfsck?
   I got an error when I was generating mdsdb on our test file system (with OST \
pool). Errors  are same as discussed at this thread:
http://comments.gmane.org/gmane.comp.file-systems.lustre.user/10111
   Do you have any suggestion?

Lu Wang
Computing Center
IHEP


-----????-----
???: "WANG Lu" <wanglu at ihep.ac.cn>
????: 2011?8?2? ???
???: "WANG Lu" <wanglu at ihep.ac.cn>
??: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
??: Re:  [Lustre-discuss] lvbo_init  failed  after e2fsck

Update some information:

1. After running "ll_recover_lost_found_objs", we still have the "lvbo_init faild" \
error. 2. There is no files under "lost+found", in some of related OSTs. Here is the \
result of debugf:

# debugfs -c /dev/sdb1
debugfs 1.41.10.sun2 (24-Feb-2010)
/dev/sdb1: catastrophic mode - not reading inode or group bitmaps
debugfs:  ls
 2  (12) .    2  (12) ..    11  (20) lost+found    103784449  (16) CONFIGS   
 12  (20) last_rcvd    13  (20) health_check    222863361  (3996) O   
debugfs:  cd lost+found     
debugfs:  ls
 11  (12) .    2  (4084) ..    0  (4096)     0  (4096)     0  (4096)    

3. We are currently running Lustre 1.8.5. 

Thank you in advance for your help!

Lu Wang
CC-IHEP






> -----????-----
> ???: "WANG Lu" <wanglu at ihep.ac.cn>
> ????: 2011?8?1? ???
> ???: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> ??: 
> ??: [Lustre-discuss] lvbo_init  failed  after e2fsck
> 
> Dear all, 
> After an annual e2fsck of all OSTs, two of our OSTs have become read only with \
>                 error:
> Jul 25 08:37:34 com04 kernel: LDISKFS-fs error (device sdb1): \
> ldiskfs_dx_find_entry: bad entry in directory #222863370: inode out of bounds - \
>                 offset=3280896, inode=656179638, rec_len=4096, name_len=0
> Jul 25 08:37:34 com04 kernel: Aborting journal on device sdb1-8.
> Jul 25 08:37:34 com04 kernel: LDISKFS-fs (sdb1): Remounting filesystem read-only
> tune2fs shows the OSTs are at stat "clean with error", after umount and e2fsck \
> again, the two OSTs could be mount normally(and the stat changed to "clean").  
> However, we began to  meet hundreds of "lvbo_init failed" on serveral OSTs, not \
> limited on the two OSTs which have been read-only.  
> Three of our OSTs have met hundreds of lvbo_init faild after an annual e2fsck \
> examination.  
> Aug  1 17:48:26 com04 kernel: LustreError: \
>                 5493:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 1 previous \
>                 similar message
> Aug  1 17:59:02 com04 kernel: LustreError: \
> 5632:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001d_UUID: \
>                 lvbo_init failed for resource 2997406: rc -2
> Aug  1 17:59:02 com04 kernel: LustreError: \
>                 5632:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 1 previous \
>                 similar message
> Aug  1 18:10:51 com04 kernel: LustreError: \
> 5602:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001d_UUID: \
>                 lvbo_init failed for resource 3240254: rc -2
> Aug  1 18:10:51 com04 kernel: LustreError: \
> 5602:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 2 previous similar \
>                 messages
> Aug  1 18:21:49 com04 kernel: LustreError: \
> 5642:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001f_UUID: \
>                 lvbo_init failed for resource 3204200: rc -2
> Aug  1 18:21:49 com04 kernel: LustreError: \
> 5642:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 6 previous similar \
>                 messages
> Aug  1 18:53:18 com04 kernel: LustreError: \
> 5324:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-publicfs-OST001f_UUID: \
>                 lvbo_init failed for resource 12856264: rc -2
> Aug  1 18:53:18 com04 kernel: LustreError: \
> 5324:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 1 previous similar message \
>  According to previous discussions, it seems that the related Objects have been \
> deleted or moved to lost+found. I am not sure:  1.  if the commmand " \
> ll_recover_lost_found_objs" can get back all the lost objects 2.  if not, how can I \
> get a list of  demaged files? 3.  as users  continuely writing new data to the \
> OSTs, the number of demaged Objects will increase? 
> do you have any suggestion? Thank you very much!
> 
> 
> Lu Wang
> Computing Center
> IHEP,China
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic