[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] OSS/MDS server crashes after recovery.
From:       Roy.Dragseth () cc ! uit ! no (Roy Dragseth)
Date:       2006-12-29 2:55:20
Message-ID: 200612291055.14328.Roy.Dragseth () cc ! uit ! no
[Download RAW message or body]

We had a disk failure on one of our raid arrays and had to do fsck on all 
ost/mds-devices.  The fsck process did a lot of repairs, but eventually it 
succeded cleaning all devices.  When I try to start lustre again I everything 
seems fine until I try to connect the clients, then one of the two combined 
mds/oss servers crashes as soon as the recovery grace period is over.  If I 
start with --abort_recovery the server hangs almost immediately.

 It dumps a lot of debug log files in the /tmp directory but I cannot make any 
sense of them.  The syslog gets filled with things like this:

Dec 29 10:52:36 lustre-11-1 kernel: LustreError: 6943:0:
(client.c:554:ptlrpc_check_reply()) previously skipped 1 similar messages
Dec 29 10:52:36 lustre-11-1 kernel: Lustre: 
OSC_lustre-11-0.local_ost8_home-mds: Connection restored to service ost8 
using nid 0@lo.
Dec 29 10:52:36 lustre-11-1 kernel: Lustre: previously skipped 1 similar 
messages
Dec 29 10:52:36 lustre-11-1 kernel: LustreError: 7113:0:
(lov_obd.c:837:lov_clear_orphans()) error in orphan recovery on OST idx 0/4: 
rc = -16
Dec 29 10:52:36 lustre-11-1 kernel: LustreError: 7113:0:
(lov_obd.c:837:lov_clear_orphans()) previously skipped 1 similar messages



This crash makes it impossible to continue into the lfsck realm of fixing 
things.

System info:
RH EL4 w/2.6.9-34.EL_lustre1.4.6.4smp
lustre 1.4.6.4

Lustre setup:
Two combined MDS/OSS servers with dual FC connections to two sata raids, 
serving a home area and a scratch area.

Any help is greatly appreciated, my last resort is to reformat and roll 
everything in from backup.


Regards,
r.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic