Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Feb 2011 12:48:13 +0000 (UTC)
From:      Alexander Motin <mav@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-projects@freebsd.org
Subject:   svn commit: r218481 - projects/graid/head/sys/geom/raid
Message-ID:  <201102091248.p19CmDtE084639@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: mav
Date: Wed Feb  9 12:48:12 2011
New Revision: 218481
URL: http://svn.freebsd.org/changeset/base/218481

Log:
  Do not abort rebuild on read errors, just log it and continue. For 2-disk
  array we have no more redundancy to recover any way. And if this rebuild
  really implements resync, then skipping damaged block is actually a right
  behavior, as second copy is most likely valid and can be used for reading.
  Aborting rebuild same time will make that copy inaccessible.
  
  Another reason to do it is that present code tries to rebuild/resync
  everything that possible. Aborted rebuild will be restarted and likely
  end with the same result, causing infinite loop.

Modified:
  projects/graid/head/sys/geom/raid/tr_raid1.c

Modified: projects/graid/head/sys/geom/raid/tr_raid1.c
==============================================================================
--- projects/graid/head/sys/geom/raid/tr_raid1.c	Wed Feb  9 12:03:22 2011	(r218480)
+++ projects/graid/head/sys/geom/raid/tr_raid1.c	Wed Feb  9 12:48:12 2011	(r218481)
@@ -671,18 +671,29 @@ g_raid_tr_iodone_raid1(struct g_raid_tr_
 		 */
 		if (trs->trso_type == TR_RAID1_REBUILD) {
 			if (bp->bio_cmd == BIO_READ) {
+
+				/* Immediately abort rebuild, if requested. */
+				if (trs->trso_flags & TR_RAID1_F_ABORT) {
+					trs->trso_flags &= ~TR_RAID1_F_DOING_SOME;
+					g_raid_tr_raid1_rebuild_abort(tr);
+					return;
+				}
+
+				/* On read error, skip and cross fingers. */
+				if (bp->bio_error != 0) {
+					G_RAID_LOGREQ(0, bp,
+					    "Read error during rebuild (%d), "
+					    "possible data loss!",
+					    bp->bio_error);
+					goto rebuild_round_done;
+				}
+
 				/*
 				 * The read operation finished, queue the
 				 * write and get out.
 				 */
 				G_RAID_LOGREQ(4, bp, "rebuild read done. %d",
 				    bp->bio_error);
-				if (bp->bio_error != 0 ||
-				    trs->trso_flags & TR_RAID1_F_ABORT) {
-					trs->trso_flags &= ~TR_RAID1_F_DOING_SOME;
-					g_raid_tr_raid1_rebuild_abort(tr);
-					return;
-				}
 				bp->bio_cmd = BIO_WRITE;
 				bp->bio_cflags = G_RAID_BIO_FLAG_SYNC;
 				bp->bio_offset = bp->bio_offset;
@@ -712,6 +723,8 @@ g_raid_tr_iodone_raid1(struct g_raid_tr_
 					return;
 				}
 /* XXX A lot of the following is needed when we kick of the work -- refactor */
+rebuild_round_done:
+				nsd = trs->trso_failed_sd;
 				trs->trso_flags &= ~TR_RAID1_F_LOCKED;
 				g_raid_unlock_range(sd->sd_volume,
 				    bp->bio_offset, bp->bio_length);



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201102091248.p19CmDtE084639>