[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zfs-discuss
Subject:    Re: [zfs-discuss] bootadm hang WAS tuning zfs_arc_min
From:       Jim Klimov <jimklimov () cos ! ru>
Date:       2011-10-19 13:13:08
Message-ID: 4E9ECCE4.2040302 () cos ! ru
[Download RAW message or body]

2011-10-12 11:56, Frank Van Damme пишет:
>
> The root of the problem seems to be that that process never completes.
>
> 9 /lib/svc/bin/svc.startd
> 332 /sbin/sh /lib/svc/method/boot-archive-update
> 347 /sbin/bootadm update-archive
>
> Can't kill it and run from the cmdline either, it simply ignores 
> SIGKILL. (Which shouldn't even be possible).
>

I guess it is possible when things lock up in kernel calls, waiting for 
them to complete.
It has happened on me a number of times, usually related to ZFS pool 
being too busy working or repairing to do anything else, and this per se 
often lead to system crashing (see i.e. my adventures this spring 
reported on the forums). I had hit a number of problems generally 
leading to the whole zfs subsystem "running away to a happy place".

As an indication of this you can try running something as simple as 
"zpool list" in the background (otherwise your shell locks up too) and 
see if it ever completes:

# zpool list &

Earlier there were bugs related to inaccessible snapshots (marked for 
deletion, but not actually deletable until you mount and unmount the 
parent dataset) - these mostly fired in zfs-auto-snap auto-deletions, 
but also happened to influence bootadm.

I am not sure in what way bootadm relies on zfs/zpool, but empirically - 
it does.
You might work around the problem by:
* exporting "data" zfs pools before updating the bootarchive (bootadm 
update-archive); if you're rebooting the system anyway - stop the zones 
and services manually, and give this a try.
* booting from another media like a Failsafe Boot (SXCE, Sol10) or 
LiveCD (Indiana) and importing your rootpool to "/a", then run
# bootadm update-archive -R /a
* booting into single-user mode, making the root RW if needed, and 
updating the archive.
** You're likely to go this way anyway if your boot is interrupted due 
to an outdated boot archive (SMF failure - requires a repair shell 
interaction). When the archive is updated, you need to clear the service 
(svcadm clear boot-archive) and exit the repair shell in order to 
continue booting the OS.
* brute force - updating the bootarchive (/platform/i86pc/boot_archive 
and /platform/i86pc/amd64/boot_archive ) manually as an FS image, with 
files listed in /boot/solaris/filelist.ramdisk. Usually failure on boot 
is related to updating of some config files in /etc...

//Jim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic