[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lvm-devel
Subject:    [lvm-devel] preliminary [PATCH] lvconvert --repair
From:       Petr Rockai <prockai () redhat ! com>
Date:       2008-05-04 23:14:38
Message-ID: 87tzhd64rl.fsf () eriador ! mornfall ! net
[Download RAW message or body]

Hi,

the attached patch is a preliminary version of code that should add mirror
repair functionality to lvconvert. Currently, only rudimentary support for such
actions is available in the form of vgreduce --removemissing, that will
downconvert any partial mirrors as a side-effect. This is what dmeventd does
when it detects a drive failure, but such a heavyweight action is generally a
bad idea (tm).

This patch should pave a way for a less intrusive mirror repair utility, that
is fully able to operate under "partial VG" conditions (the previous patch
makes this possible at the library level). Further down the TODO is to change
dmeventd's mirror DSO to take advantage of this functionality, which in turn
makes that more useful and less dangerous.

The reasoning behind having this functionality in lvconvert is that most of
what the repair needs to do is in some way analogic to what lvconvert already
does -- although the patch is a little complex at first sight, a lot of that is
shuffling blocks of code around to make the repair more comfortable.

I would be grateful, if someone more familiar with lvconvert code could check
(and possibly explain) that FIXME I have added in front of that
list_size(&lv->segments) != 1 check. Moreover, an examination of my use of the
lp structure in the --repair branch could be handy. Although that code seems to
work as expected for me, and it should according to my understanding of the
code, I am still not 100% sure that it's right.

Thanks.


["lvconvert-repair-preview.diff" (text/x-diff)]

Sun May  4 23:54:35 CEST 2008  me@mornfall.net
  * Add a restart-loop encoded as a goto to lvconvert.
  
  Yes, it could have been done as a do-while loop just as well, but:
  a) lvm seems to be goto-happy
  b) do-while would indent a bunch of stuff out of my screen
Sun May  4 23:52:54 CEST 2008  me@mornfall.net
  * Add a comment to _lv_postorder.
Sun May  4 23:52:31 CEST 2008  me@mornfall.net
  * Rename all mark_missing bits to mark_partial.
Thu Apr 24 22:14:09 CEST 2008  me@mornfall.net
  * Exclude MISSING_PVs from create_pv_list in allocatable_only case.
Thu Apr 24 22:13:25 CEST 2008  me@mornfall.net
  * Even better (more resilient) approach to the activation issue.
Thu Apr 24 21:25:38 CEST 2008  me@mornfall.net
  * A correct fix for the lvconvert --repair vs activation issue.
Thu Apr 24 19:24:56 CEST 2008  me@mornfall.net
  * Try to fix the lvconvert --repair vs. activation issue.
Sun Apr 20 10:17:05 CEST 2008  me@mornfall.net
  * Check for MISSING_PV in vgreduce --removemissing.
Sun Apr 20 10:01:09 CEST 2008  me@mornfall.net
  * Generalize _lv_mark_missing to _lv_postorder with callback.
Mon Mar 31 18:36:56 CEST 2008  me@mornfall.net
  * Factor the 2 downconversion cases to one.
Mon Mar 31 18:36:23 CEST 2008  me@mornfall.net
  * Move the segment count check and adjust comment (lvconvert).
Sat Mar 29 19:36:44 CET 2008  me@mornfall.net
  * More code shuffling (this however introduces some problems).
Sat Mar 29 19:22:11 CET 2008  me@mornfall.net
  * Shuffle lvconvert code a little to make remove-and-add in one go easier.
Sun Mar 23 19:53:26 CET 2008  me@mornfall.net
  * Update comment for _lv_mark_missing.
Sun Mar 23 19:53:12 CET 2008  me@mornfall.net
  * Improve error message when partial metadata cannot be written.
Thu Mar 13 17:14:44 CET 2008  me@mornfall.net
  * Let vgcfgrestore overwrite partial metadata.
  
  However, this probably deserves --partial, to ensure that this is what the user
  meant? On the other hand, vgcfgrestore is potentially destructive command
  already and this doesn't seem to make it worse. It is definitely useful to edit
  metadata by hand.....
Thu Mar 13 17:12:41 CET 2008  me@mornfall.net
  * The log_lv, origin and cow links are in segments, not in LVs.
Thu Mar 13 17:12:17 CET 2008  me@mornfall.net
  * Don't forget to reset cmd->handles_partial when done.
Thu Mar 13 17:11:44 CET 2008  me@mornfall.net
  * Actually mark VG with MISSING_PV in it as PARTIAL_VG.
Thu Mar 13 17:08:00 CET 2008  me@mornfall.net
  * Don't export PARTIAL_VG, we recompute it every time. Enough to persist MISSING_PV.
Thu Mar 13 17:07:21 CET 2008  me@mornfall.net
  * Don't forget to mark COMPLETE_LV as non-exported flag.
Thu Mar 13 15:16:34 CET 2008  me@mornfall.net
  * Simplify and optimize _lv_mark_missing.
Fri Feb 15 09:52:45 CET 2008  me@mornfall.net
  * Resolve conflicts.
Fri Feb 15 09:51:11 CET 2008  me@mornfall.net
  * First go at lvconvert --repair implementation.
Fri Feb 15 09:50:53 CET 2008  me@mornfall.net
  * Add repair_ARG to lvconvert's argument list.
Thu Feb 14 16:20:46 CET 2008  me@mornfall.net
  * Add --repair switch to lvconvert for fixups in partial VGs.
Fri Feb 15 10:07:46 CET 2008  me@mornfall.net
  * Clean up commented out code.
Fri Feb 15 09:50:19 CET 2008  me@mornfall.net
  * Make lvremove work on partial VGs.
Fri Feb 15 09:49:42 CET 2008  me@mornfall.net
  * Fix vgreduce --removemissing for consistent partial VGs.
Fri Feb 15 09:49:02 CET 2008  me@mornfall.net
  * Fix _lv_mark_missing in metadata.c.
Fri Feb 15 09:47:54 CET 2008  me@mornfall.net
  * Add PARTIAL_LV to flags.c, as NULL (ie. not stored in metadata).
Thu Feb 14 17:37:06 CET 2008  me@mornfall.net
  * Make the PARTIAL_LV marking transitive.
Thu Feb 14 16:21:19 CET 2008  me@mornfall.net
  * Mark partial LVs with PARTIAL_LV status flag.
Thu Feb 14 16:20:07 CET 2008  me@mornfall.net
  * Add possibility for commands to specify they handle partial VGs.
diff -rN -u -p old-hotspare-prime/tools/args.h new-hotspare-prime/tools/args.h
--- old-hotspare-prime/tools/args.h	2008-05-04 23:59:35.043995087 +0200
+++ new-hotspare-prime/tools/args.h	2008-05-04 23:59:35.055995830 +0200
@@ -49,6 +49,7 @@ arg(nosync_ARG, '\0', "nosync", NULL, 0)
 arg(resync_ARG, '\0', "resync", NULL, 0)
 arg(corelog_ARG, '\0', "corelog", NULL, 0)
 arg(mirrorlog_ARG, '\0', "mirrorlog", string_arg, 0)
+arg(repair_ARG, '\0', "repair", NULL, 0)
 arg(monitor_ARG, '\0', "monitor", yes_no_arg, 0)
 arg(config_ARG, '\0', "config", string_arg, 0)
 arg(trustcache_ARG, '\0', "trustcache", NULL, 0)
diff -rN -u -p old-hotspare-prime/tools/commands.h new-hotspare-prime/tools/commands.h
--- old-hotspare-prime/tools/commands.h	2008-05-04 23:59:35.043995087 +0200
+++ new-hotspare-prime/tools/commands.h	2008-05-04 23:59:35.059995728 +0200
@@ -94,6 +94,7 @@ xx(lvconvert,
    0,
    "lvconvert "
    "[-m|--mirrors Mirrors [{--mirrorlog {disk|core}|--corelog}]]\n"
+   "\t[--repair]\n"
    "\t[-R|--regionsize MirrorLogRegionSize]\n"
    "\t[--alloc AllocationPolicy]\n"
    "\t[-b|--background]\n"
@@ -115,7 +116,8 @@ xx(lvconvert,
    "\tOriginalLogicalVolume[Path] SnapshotLogicalVolume[Path]\n",
 
    alloc_ARG, background_ARG, chunksize_ARG, corelog_ARG, interval_ARG,
-   mirrorlog_ARG, mirrors_ARG, regionsize_ARG, snapshot_ARG, test_ARG, zero_ARG)
+   mirrorlog_ARG, mirrors_ARG, regionsize_ARG, repair_ARG, snapshot_ARG,
+   test_ARG, zero_ARG)
 
 xx(lvcreate,
    "Create a logical volume",
diff -rN -u -p old-hotspare-prime/tools/lvconvert.c new-hotspare-prime/tools/lvconvert.c
--- old-hotspare-prime/tools/lvconvert.c	2008-05-04 23:59:35.043995087 +0200
+++ new-hotspare-prime/tools/lvconvert.c	2008-05-04 23:59:35.067995874 +0200
@@ -364,6 +364,61 @@ static int _insert_lvconvert_layer(struc
 	return 1;
 }
 
+static int _area_missing(struct lv_segment *lvseg, int s)
+{
+	if (seg_type(lvseg, s) == AREA_LV) {
+		if (seg_lv(lvseg, s)->status & PARTIAL_LV)
+			return 1;
+	} else if (seg_type(lvseg, s) == AREA_PV) {
+		if (seg_pv(lvseg, s)->status & MISSING_PV)
+			return 1;
+	}
+	return 0;
+}
+
+/* FIXME we want to handle mirror stacks here... */
+static int _count_failed_mirrors(struct logical_volume *lv)
+{
+	struct lv_segment *lvseg;
+	int ret = 0;
+	int s;
+	list_iterate_items(lvseg, &lv->segments) {
+		if (!seg_is_mirrored(lvseg))
+			return -1;
+		for(s = 0; s < lvseg->area_count; ++s) {
+			if (_area_missing(lvseg, s))
+				++ ret;
+		}
+	}
+	return ret;
+}
+
+static struct list *_failed_pv_list(struct cmd_context *cmd,
+				    struct volume_group *vg)
+{
+	struct list *r;
+	struct pv_list *pvl, *new_pvl;
+
+	if (!(r = dm_pool_alloc(cmd->mem, sizeof(*r)))) {
+		log_error("Allocation of list failed");
+		return_0;
+	}
+
+	list_init(r);
+	list_iterate_items(pvl, &vg->pvs) {
+		if (!(pvl->pv->status & MISSING_PV))
+			continue;
+
+		if (!(new_pvl = dm_pool_alloc(cmd->mem, sizeof(*new_pvl)))) {
+			log_err("Unable to allocate physical volume list.");
+			return 0;
+		}
+		new_pvl->pv = pvl->pv;
+		list_add(r, &new_pvl->list);
+	}
+	return r;
+}
+
 /* walk down the stacked mirror LV to the original mirror LV */
 static struct logical_volume *_original_lv(struct logical_volume *lv)
 {
@@ -383,17 +438,26 @@ static int lvconvert_mirrors(struct cmd_
 	const char *mirrorlog;
 	unsigned corelog = 0;
 	struct logical_volume *original_lv;
+	struct logical_volume *log_lv;
+	int failed_mirrors = 0;
+	int repairing = 0;
 
 	seg = first_seg(lv);
 	existing_mirrors = lv_mirror_count(lv);
 
 	/* If called with no argument, try collapsing the resync layers */
 	if (!arg_count(cmd, mirrors_ARG) && !arg_count(cmd, mirrorlog_ARG) &&
-	    !arg_count(cmd, corelog_ARG) && !arg_count(cmd, regionsize_ARG)) {
+	    !arg_count(cmd, corelog_ARG) && !arg_count(cmd, regionsize_ARG) &&
+	    !arg_count(cmd, repair_ARG)) {
 		lp->need_polling = 1;
 		return 1;
 	}
 
+	if (arg_count(cmd, mirrors_ARG) && arg_count(cmd, repair_ARG)) {
+		log_error("You can only use one of -m, --repair.");
+		return 0;
+	}
+
 	/*
 	 * Adjust required number of mirrors
 	 *
@@ -411,38 +475,55 @@ static int lvconvert_mirrors(struct cmd_
 	else
 		lp->mirrors += 1;
 
-	/*
-	 * Did the user try to subtract more legs than available?
-	 */
-	if (lp->mirrors < 1) {
-		log_error("Logical volume %s only has %" PRIu32 " mirrors.",
-			  lv->name, existing_mirrors);
-		return 0;
-	}
-
-	/*
-	 * Adjust log type
-	 */
-	if (arg_count(cmd, corelog_ARG))
-		corelog = 1;
-
-	mirrorlog = arg_str_value(cmd, mirrorlog_ARG,
-				  corelog ? "core" : DEFAULT_MIRRORLOG);
-	if (!strcmp("disk", mirrorlog)) {
-		if (corelog) {
-			log_error("--mirrorlog disk and --corelog "
-				  "are incompatible");
+	if (arg_count(cmd,repair_ARG)) {
+		cmd->handles_partial = 1;
+		lp->need_polling = 0;
+		if (!(lv->status & PARTIAL_LV)) {
+			log_error("The mirror is consistent, nothing to repair.");
+			return_0;
+		}
+		failed_mirrors = _count_failed_mirrors(lv);
+		lp->mirrors -= failed_mirrors;
+		log_error("Mirror counts: %d/%d failed.",
+			  failed_mirrors, existing_mirrors);
+		lp->pvh = _failed_pv_list(cmd, lv->vg);
+		log_lv=first_seg(lv)->log_lv;
+		if (!log_lv || log_lv->status & PARTIAL_LV)
+			corelog = 1;
+	} else {
+		/*
+		 * Did the user try to subtract more legs than available?
+		 */
+		if (lp->mirrors < 1) {
+			log_error("Logical volume %s only has %" PRIu32 " mirrors.",
+				  lv->name, existing_mirrors);
+			return 0;
+		}
+		
+		/*
+		 * Adjust log type
+		 */
+		if (arg_count(cmd, corelog_ARG))
+			corelog = 1;
+		
+		mirrorlog = arg_str_value(cmd, mirrorlog_ARG,
+					  corelog ? "core" : DEFAULT_MIRRORLOG);
+		if (!strcmp("disk", mirrorlog)) {
+			if (corelog) {
+				log_error("--mirrorlog disk and --corelog "
+					  "are incompatible");
+				return 0;
+			}
+			corelog = 0;
+		} else if (!strcmp("core", mirrorlog))
+			corelog = 1;
+		else {
+			log_error("Unknown mirrorlog type: %s", mirrorlog);
 			return 0;
 		}
-		corelog = 0;
-	} else if (!strcmp("core", mirrorlog))
-		corelog = 1;
-	else {
-		log_error("Unknown mirrorlog type: %s", mirrorlog);
-		return 0;
-	}
 
-	log_verbose("Setting logging type to %s", mirrorlog);
+		log_verbose("Setting logging type to %s", mirrorlog);
+	}
 
 	/*
 	 * Region size must not change on existing mirrors
@@ -455,6 +536,18 @@ static int lvconvert_mirrors(struct cmd_
 	}
 
 	/*
+	 * FIXME This check used to precede mirror->mirror conversion
+	 * but didn't affect mirror->linear or linear->mirror. I do
+	 * not understand what is its intention, in fact.
+	 */
+	if (list_size(&lv->segments) != 1) {
+		log_error("Logical volume %s has multiple "
+			  "mirror segments.", lv->name);
+		return 0;
+	}
+	
+ restart:
+	/*
 	 * Converting from mirror to linear
 	 */
 	if ((lp->mirrors == 1)) {
@@ -463,11 +556,17 @@ static int lvconvert_mirrors(struct cmd_
 				  lv->name);
 			return 1;
 		}
+	}
 
-		if (!lv_remove_mirrors(cmd, lv, existing_mirrors - 1, 1,
+	/*
+	 * Downconversion.
+	 */
+	if (lp->mirrors < existing_mirrors) {
+		/* Reduce number of mirrors */
+		if (!lv_remove_mirrors(cmd, lv, existing_mirrors - lp->mirrors,
+				       (corelog || lp->mirrors == 1) ? 1U : 0U,
 				       lp->pv_count ? lp->pvh : NULL, 0))
 			return_0;
-		goto commit_changes;
 	}
 
 	/*
@@ -494,43 +593,7 @@ static int lvconvert_mirrors(struct cmd_
 			return_0;
 		if (lp->wait_completion)
 			lp->need_polling = 1;
-		goto commit_changes;
-	}
-
-	/*
-	 * Converting from mirror to mirror with different leg count,
-	 * or different log type.
-	 */
-	if (list_size(&lv->segments) != 1) {
-		log_error("Logical volume %s has multiple "
-			  "mirror segments.", lv->name);
-		return 0;
-	}
-
-	if (lp->mirrors == existing_mirrors) {
-		original_lv = _original_lv(lv);
-		if (!first_seg(original_lv)->log_lv && !corelog) {
-			if (!add_mirror_log(cmd, original_lv, 1,
-					    adjusted_mirror_region_size(
-							lv->vg->extent_size,
-							lv->le_count,
-							lp->region_size),
-					    lp->pvh, lp->alloc))
-				return_0;
-		} else if (first_seg(original_lv)->log_lv && corelog) {
-			if (!remove_mirror_log(cmd, original_lv,
-					       lp->pv_count ? lp->pvh : NULL))
-				return_0;
-		} else {
-			/* No change */
-			log_error("Logical volume %s already has %"
-				  PRIu32 " mirror(s).", lv->name,
-				  lp->mirrors - 1);
-			if (lv->status & CONVERTING)
-				lp->need_polling = 1;
-			return 1;
-		}
-	} else if (lp->mirrors > existing_mirrors) {
+	} else if (lp->mirrors > existing_mirrors || failed_mirrors) {
 		if (lv->status & MIRROR_NOTSYNCED) {
 			log_error("Not adding mirror to mirrored LV "
 				  "without initial resync");
@@ -572,14 +635,36 @@ static int lvconvert_mirrors(struct cmd_
 			return_0;
 		lv->status |= CONVERTING;
 		lp->need_polling = 1;
-	} else {
-		/* Reduce number of mirrors */
-		if (!lv_remove_mirrors(cmd, lv, existing_mirrors - lp->mirrors,
-				       corelog ? 1U : 0U,
-				       lp->pv_count ? lp->pvh : NULL, 0))
-			return_0;
 	}
 
+	if (lp->mirrors == existing_mirrors) {
+		original_lv = _original_lv(lv);
+		if (!first_seg(original_lv)->log_lv && !corelog) {
+			if (!add_mirror_log(cmd, original_lv, 1,
+					    adjusted_mirror_region_size(
+							lv->vg->extent_size,
+							lv->le_count,
+							lp->region_size),
+					    lp->pvh, lp->alloc))
+				return_0;
+		} else if (first_seg(original_lv)->log_lv && corelog) {
+			if (!remove_mirror_log(cmd, original_lv,
+					       lp->pv_count ? lp->pvh : NULL))
+				return_0;
+		} else {
+			/* No change */
+			log_error("Logical volume %s already has %"
+				  PRIu32 " mirror(s).", lv->name,
+				  lp->mirrors - 1);
+			if (lv->status & CONVERTING)
+				lp->need_polling = 1;
+			return 1;
+		}
+	}
+
+	if (repairing)
+		goto restart;
+
 commit_changes:
 	log_very_verbose("Updating logical volume \"%s\" on disk(s)", lv->name);
 
diff -rN -u -p old-hotspare-prime/tools/lvmcmdline.c new-hotspare-prime/tools/lvmcmdline.c
--- old-hotspare-prime/tools/lvmcmdline.c	2008-05-04 23:59:35.035995291 +0200
+++ new-hotspare-prime/tools/lvmcmdline.c	2008-05-04 23:59:35.063995976 +0200
@@ -826,6 +826,7 @@ static void _apply_settings(struct cmd_c
 
 	cmd->fmt = arg_ptr_value(cmd, metadatatype_ARG,
 				 cmd->current_settings.fmt);
+	cmd->handles_partial = 0;
 }
 
 static char *_copy_command_line(struct cmd_context *cmd, int argc, char **argv)
diff -rN -u -p old-hotspare-prime/tools/lvremove.c new-hotspare-prime/tools/lvremove.c
--- old-hotspare-prime/tools/lvremove.c	2008-05-04 23:59:35.047995684 +0200
+++ new-hotspare-prime/tools/lvremove.c	2008-05-04 23:59:35.059995728 +0200
@@ -31,6 +31,8 @@ int lvremove(struct cmd_context *cmd, in
 		return EINVALID_CMD_LINE;
 	}
 
+	cmd->handles_partial = 1;
+
 	return process_each_lv(cmd, argc, argv, LCK_VG_WRITE, NULL,
 			       &lvremove_single);
 }
diff -rN -u -p old-hotspare-prime/tools/toollib.c new-hotspare-prime/tools/toollib.c
--- old-hotspare-prime/tools/toollib.c	2008-05-04 23:59:35.019995837 +0200
+++ new-hotspare-prime/tools/toollib.c	2008-05-04 23:59:35.059995728 +0200
@@ -1054,6 +1054,11 @@ static int _create_pv_entry(struct dm_po
 		return 1;
 	}
 
+	if (allocatable_only && (pvl->pv->status & MISSING_PV)) {
+		log_error("Physical volume %s is missing", pvname);
+		return 1;
+	}
+
 	if (allocatable_only &&
 	    (pvl->pv->pe_count == pvl->pv->pe_alloc_count)) {
 		log_err("No free extents on physical volume \"%s\"", pvname);
diff -rN -u -p old-hotspare-prime/tools/vgcfgrestore.c new-hotspare-prime/tools/vgcfgrestore.c
--- old-hotspare-prime/tools/vgcfgrestore.c	2008-05-04 23:59:35.035995291 +0200
+++ new-hotspare-prime/tools/vgcfgrestore.c	2008-05-04 23:59:35.063995976 +0200
@@ -18,6 +18,7 @@
 int vgcfgrestore(struct cmd_context *cmd, int argc, char **argv)
 {
 	char *vg_name = NULL;
+	cmd->handles_partial = 1;
 
 	if (argc == 1) {
 		vg_name = skip_dev_dir(cmd, argv[0], NULL);
diff -rN -u -p old-hotspare-prime/tools/vgreduce.c new-hotspare-prime/tools/vgreduce.c
--- old-hotspare-prime/tools/vgreduce.c	2008-05-04 23:59:35.047995684 +0200
+++ new-hotspare-prime/tools/vgreduce.c	2008-05-04 23:59:35.059995728 +0200
@@ -186,7 +186,7 @@ static int _make_vg_consistent(struct cm
 	/* Remove missing PVs */
 	list_iterate_safe(pvh, pvht, &vg->pvs) {
 		pvl = list_item(pvh, struct pv_list);
-		if (pvl->pv->dev)
+		if (pvl->pv->dev && !(pvl->pv->status & MISSING_PV))
 			continue;
 		if (!_remove_pv(vg, pvl))
 			return_0;
@@ -505,7 +505,7 @@ int vgreduce(struct cmd_context *cmd, in
 	}
 
 	if (arg_count(cmd, removemissing_ARG)) {
-		if (vg && consistent) {
+		if (vg && consistent && !(vg->status & PARTIAL_VG)) {
 			log_error("Volume group \"%s\" is already consistent",
 				  vg_name);
 			unlock_vg(cmd, vg_name);


Yours,
    Petr.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com
 http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic