'[lvm-discuss] Resyncrhonization anomaly'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-lvm-discuss
Subject:    [lvm-discuss] Resyncrhonization anomaly
From:       Sanjay.Nadkarni () sun ! com (Sanjay Nadkarni)
Date:       2005-08-18 21:54:09
Message-ID: 43056669.7010605 () sun ! com
[Download RAW message or body]

The fact that it went into maintenance is a bit disturbing.  However 
what bothers  me more is that you have Last Err on one submirror and 
okay on the other.   This is not good.  Were there any messages on the 
console ?  Is this reproducible ?

 The correct states should have been SM0 in Maintenance and SM1 in Last 
Erred.  Under this condition when resync is done, the contents of last 
erred are read and copied to the metadevice in maintenance.  The 
assumption here is that the metadevice that's gone into last erred has 
the latest set of changes.  However you should know that the integrity 
of the entire mirror is *not* guaranteed whenever a mirror has gone to 
the last erred state since this means that the mirror saw error on the 
only remaining side...i.e. all bets are off.  If the last write that 
failed was a metadata update then typically the logging functionality of 
UFS can handle this error.  But by no means should this be taken as a 
guarantee.

You mention that removing the md_resync_bufsz fixed.  Were the other 
setting left in place ?  If so can you check if can you check if the 
same problems occurs if the only setting is md_mirror:md_resync_bufsz = 
2048.  i.e. remove set maxphys and md:md_maxphys

-Sanjay






Matty wrote:

>
> Sorry to keep pinging the list, but I came across another odd issue as 
> part of my testing. If a device is in the "Needs maintenance" state, 
> how can it be used to synchronize another sub mirror?:
>
> d0: Mirror
>     Submirror 0: d10
>       State: Needs maintenance
>     Submirror 1: d20
>       State: Resyncing
>     Resync in progress: 0 % done
>     Pass: 1
>     Read option: roundrobin (default)
>     Write option: parallel (default)
>     Size: 36962352 blocks (17 GB)
>
> d10: Submirror of d0
>     State: Needs maintenance
>     Size: 36962352 blocks (17 GB)
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t0d0s0          0     No      Last Erred   Yes
>
>
> d20: Submirror of d0
>     State: Resyncing
>     Size: 36962352 blocks (17 GB)
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t1d0s0          0     No            Okay   Yes
>
> $ iosat -zxnM 5
>                   extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0  100.4    0.0    6.3  0.0  0.4    0.0    4.1   0  41 c0t1d0
>   100.4    0.0    6.3    0.0  0.0  0.6    0.0    5.8   0  58 c0t0d0
>   100.4  100.4    6.3    6.3  0.0  1.0    0.0    5.0   0 100 d0
>   100.4    0.0    6.3    0.0  0.0  0.6    0.0    5.8   0  58 d10
>     0.0  100.4    0.0    6.3  0.0  0.4    0.0    4.1   0  41 d20
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0  100.4    0.0    6.3  0.0  0.4    0.0    4.1   0  41 c0t1d0
>   100.6    0.0    6.3    0.0  0.0  0.6    0.0    5.8   0  58 c0t0d0
>   100.6  100.4    6.3    6.3  0.0  1.0    0.0    5.0   0 100 d0
>   100.6    0.0    6.3    0.0  0.0  0.6    0.0    5.8   0  58 d10
>     0.0  100.4    0.0    6.3  0.0  0.4    0.0    4.1   0  41 d20
>
> This clearly shows that md is reading from SM0 and writing to SM1. Is 
> this normal? If a device is in trouble (e.g., Needs maintenance), I 
> would think that it shouldn't be synchronizing data to other 
> subvolumes. Am I completely off base here?
>
> Thanks,
> - Ryan
> _______________________________________________
> lvm-discuss mailing list
> lvm-discuss@opensolaris.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic