Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 05 Jan 2001 17:44:02 +0100
From:      Tor.Egge@fast.no
To:        blk@skynet.be
Cc:        grog@lemis.com, andy.depetter@ops.skynet.be, freebsd-stable@FreeBSD.ORG
Subject:   Re: Problems with corrupted vinum devices...
Message-ID:  <200101051644.RAA17887@midten.fast.no>
In-Reply-To: Your message of "Fri, 5 Jan 2001 17:04:47 %2B0100"
References:  <v0422080ab67b97176227@[172.17.1.121]>

next in thread | previous in thread | raw e-mail | index | archive | help
> 	However, at some point within the next couple of hours (we're not 
> quite sure when), the machine crashed, and vinum failed to come up on 
> the reboot.  We had to comment out the /etc/fstab entries that 
> referred to the vinum volumes, and since then have been trying to 
> debug the problems as to why vinum can't find or start any devices at 
> all.

I suggest increasing INITIAL_DRIVES in vinumvar.h to avoid array
resize and the associated race conditions.  When I tried to configure
vinum to use 14 disks yesterday, the machine immediately crashed with
a trap 12 in response to 'vinum create'.  I bumped INITIAL_DRIVES to
16 to avoid the drive array resize that caused the problem.  I bumped
INITIAL_SUBDISKS_IN_PLEX too, just to be safe.

To avoid similar races with RAID-5 under high load or with
softupdates, I had to bump INITIAL_LOCKS to avoid a fatal range lock
array resize.

For 4.2-STABLE, I'm currently using the following patch to implement
the workarounds:


diff -ru /tmp/vinum/vinumlock.c ./vinumlock.c
--- /tmp/vinum/vinumlock.c	Mon May 22 18:21:37 2000
+++ ./vinumlock.c	Thu Jan  4 20:07:06 2001
@@ -249,7 +249,7 @@
 #endif
 		plex->lockwaits++;			    /* waited one more time */
 		tsleep((void *) lock->stripe, PRIBIO, "vrlock", 2 * hz);
-		lock = plex->lock;			    /* start again */
+		lock = plex->lock - 1;			    /* start again */
 		foundlocks = 0;
 		pos = NULL;
 	    }
@@ -288,6 +288,12 @@
 unlockrange(int plexno, struct rangelock *lock)
 {
     daddr_t lockaddr;
+    struct plex *plex;
+
+    plex = &PLEX[plexno];
+    if (lock < &plex->lock[0] || lock >= &plex->lock[plex->alloclocks])
+	panic("vinum: rangelock %p on plex %d invalid, not between %p and %p",
+	lock, plexno, &plex->lock[0], &plex->lock[plex->alloclocks]);
 
 #ifdef VINUMDEBUG
     if (debug & DEBUG_LASTREQS)
diff -ru /tmp/vinum/vinumvar.h ./vinumvar.h
--- /tmp/vinum/vinumvar.h	Mon May 22 18:21:37 2000
+++ ./vinumvar.h	Thu Jan  4 19:12:01 2001
@@ -158,15 +158,15 @@
  * probably too small.
  */
 
-    INITIAL_DRIVES = 4,
+    INITIAL_DRIVES = 16,
     INITIAL_VOLUMES = 4,
     INITIAL_PLEXES = 8,
     INITIAL_SUBDISKS = 16,
-    INITIAL_SUBDISKS_IN_PLEX = 4,			    /* number of subdisks to allocate to a plex */
+    INITIAL_SUBDISKS_IN_PLEX = 16,			    /* number of subdisks to allocate to a plex */
     INITIAL_SUBDISKS_IN_DRIVE = 4,			    /* number of subdisks to allocate to a drive */
     INITIAL_DRIVE_FREELIST = 16,			    /* number of entries in drive freelist */
     PLEX_REGION_TABLE_SIZE = 8,				    /* number of entries in plex region tables */
-    INITIAL_LOCKS = 256,				    /* number of locks to allocate to a plex */
+    INITIAL_LOCKS = 4096,				    /* number of locks to allocate to a plex */
     MAX_REVIVE_BLOCKSIZE = MAXPHYS,			    /* maximum revive block size */
     DEFAULT_REVIVE_BLOCKSIZE = 65536,			    /* default revive block size */
     VINUMHOSTNAMELEN = 32,				    /* host name field in label */


- Tor Egge


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101051644.RAA17887>