From owner-freebsd-questions@FreeBSD.ORG Tue Feb 17 04:01:41 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E13116A4CE; Tue, 17 Feb 2004 04:01:41 -0800 (PST) Received: from mail006.syd.optusnet.com.au (mail006.syd.optusnet.com.au [211.29.132.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF35843D1F; Tue, 17 Feb 2004 04:01:39 -0800 (PST) (envelope-from tfrank@optushome.com.au) Received: from marvin.home.local (c211-28-241-189.eburwd5.vic.optusnet.com.au [211.28.241.189])i1HC1bg04251; Tue, 17 Feb 2004 23:01:37 +1100 Received: by marvin.home.local (Postfix, from userid 1001) id 8BD86312; Tue, 17 Feb 2004 23:01:36 +1100 (EST) Date: Tue, 17 Feb 2004 23:01:36 +1100 From: Tony Frank To: "Greg 'groggy' Lehey" Message-ID: <20040217120136.GB20535@marvin.home.local> References: <20040216110444.GA83416@marvin.home.local> <20040216232130.GR33797@wantadilla.lemis.com> <20040217003926.GA19004@marvin.home.local> <20040217072306.GH33797@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="LZvS9be/3tNcYl/X" Content-Disposition: inline In-Reply-To: <20040217072306.GH33797@wantadilla.lemis.com> User-Agent: Mutt/1.4.2i cc: Tony Frank cc: freebsd-questions@FreeBSD.org Subject: Re: vinum raid5 subdisks keep changing length? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Feb 2004 12:01:41 -0000 --LZvS9be/3tNcYl/X Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Feb 17, 2004 at 05:53:06PM +1030, Greg 'groggy' Lehey wrote: > On Tuesday, 17 February 2004 at 11:39:26 +1100, Tony Frank wrote: > > On Tue, Feb 17, 2004 at 09:51:30AM +1030, Greg 'groggy' Lehey wrote: > >> On Monday, 16 February 2004 at 22:04:44 +1100, Tony Frank wrote: [... snip ...] > OK, I tried almost exactly the same thing. My disks are fractionally > smaller than yours, so I took a different integral number of stripes. > I didn't get any messages, and the config now looks like: > > volume data > plex name data.p0 org raid5 984s vol data > sd name data.p0.s0 drive drive1 plex data.p0 len 8374824s driveoffset 265s plexoffset 0s > sd name data.p0.s1 drive drive2 plex data.p0 len 8374824s driveoffset 265s plexoffset 984s > sd name data.p0.s2 drive drive3 plex data.p0 len 8374824s driveoffset 265s plexoffset 1968s > sd name data.p0.s3 drive drive4 plex data.p0 len 8374824s driveoffset 265s plexoffset 2952s > sd name data.p0.s4 drive drive5 plex data.p0 len 8374824s driveoffset 265s plexoffset 3936s > > I've stopped and started vinum a couple of times, and all works well. > I then removed the objects and tried again with subdisks 4 sectors > longer. Vinum gives the message: > > vinum: removing 16 blocks of partial stripe at the end of data.p0 > > printconfig is then identical with the previous version. With no obvious known problems and this being a test system I have wiped the disks and started over. My near exact steps: Rebooted from 4.9-RELEASE CD1 Went to 'fixit' mode with live filesystem CD2 "Wiped" the disks: dd if=/dev/zero of=/dev/da0 bs=512 count=32 fdisk -BI /dev/da0 dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 disklabel -w -B da0s1 auto First thing I notice is that the geometry for my SCSI disks differs by 1 cylinder between da0 and da0s1. I configured all the vinum slices as da0s1h etc. Anyway, then I went on to a 'standard' installation. I selected ad0 (space), q -no changes (already had a partition from above steps) Selected "BootMgr" Repeated for each disk (ad0, ad2, da0-da3) Deleted all the prelisted partitions & filesystems and performed an 'auto' based on ad2 (leaves ad0, da0-da3 unused) Selected "Minimal" installation type Sourced from CD Post install configured fxp0 for DHCP, Accepted NFS client, selected no for all other options Set timezone to Australia Victoria Added an extra user "tony" with additional group wheel. No ports, no packages, no other options. Rebooted System booted from ad2 with 4.9-RELEASE GENERIC kernel Ran disklabel -e da0s1 and added a 'h' slice of: # h: * * vinum Repeated for ad0s1, da0s1-da3s1. ad0 is size 16498692, da0-da3 is size 8803557 The scsi disk being smaller, I use it's size for stripe calculations. For stripe of 984s: (8803557 - 256) / 984 = 8946 (rounded to nearest whole number) 8946 * 984 = 8802864 8802864 + 256 = 8803120 which is less than total drive size so it should fit. I build a vinum config file test-config, just raid5 volume without vinum root: ### start test-config drive vinumdrive0 device /dev/ad0s1h drive vinumdrive1 device /dev/da0s1h drive vinumdrive2 device /dev/da1s1h drive vinumdrive3 device /dev/da2s1h drive vinumdrive4 device /dev/da3s1h volume data plex org raid5 984s sd drive vinumdrive0 len 8802864s driveoffset 265s sd drive vinumdrive1 len 8802864s driveoffset 265s sd drive vinumdrive2 len 8802864s driveoffset 265s sd drive vinumdrive3 len 8802864s driveoffset 265s sd drive vinumdrive4 len 8802864s driveoffset 265s ### end test-config Create the configuration: raider# vinum create test-config 5 drives: D vinumdrive0 State: up Device /dev/ad0s1h Avail: 3757/8056 MB (46%) D vinumdrive1 State: up Device /dev/da0s1h Avail: 0/4298 MB (0%) D vinumdrive2 State: up Device /dev/da1s1h Avail: 0/4298 MB (0%) D vinumdrive3 State: up Device /dev/da2s1h Avail: 0/4298 MB (0%) D vinumdrive4 State: up Device /dev/da3s1h Avail: 0/4298 MB (0%) 1 volumes: V data State: down Plexes: 1 Size: 16 GB 1 plexes: P data.p0 R5 State: init Subdisks: 5 Size: 16 GB 5 subdisks: S data.p0.s0 State: empty PO: 0 B Size: 4298 MB S data.p0.s1 State: empty PO: 492 kB Size: 4298 MB S data.p0.s2 State: empty PO: 984 kB Size: 4298 MB S data.p0.s3 State: empty PO: 1476 kB Size: 4298 MB S data.p0.s4 State: empty PO: 1968 kB Size: 4298 MB Initialise the plex: raider# vinum init data.p0 raider# vinum[215]: initializing subdisk /dev/vinum/sd/data.p0.s1 vinum[216]: initializing subdisk /dev/vinum/sd/data.p0.s2 vinum[217]: initializing subdisk /dev/vinum/sd/data.p0.s3 vinum[218]: initializing subdisk /dev/vinum/sd/data.p0.s4 vinum[214]: initializing subdisk /dev/vinum/sd/data.p0.s0 While waiting for my SCSI drives to write 4G worth of zeros I had a bit of a review of /usr/src/sbin/vinum/commands.c Specifically it seems there's some bits done in initsd that is a little strange to me. Specifically SSize is checked and initsize is set up early on. Later SSize is checked again and finally SSize is used instead of initsize. Patch on how I suspect it should work is attached. While I was poking, my disks initialised: subdisk /dev/vinum/sd/data.p0.s0 initialized subdisk /dev/vinum/sd/data.p0.s4 initialized subdisk /dev/vinum/sd/data.p0.s3 initialized subdisk /dev/vinum/sd/data.p0.s2 initialized subdisk /dev/vinum/sd/data.p0.s1 initialized raider# vinum list 5 drives: D vinumdrive0 State: up Device /dev/ad0s1h Avail: 3757/8056 MB (46%) D vinumdrive1 State: up Device /dev/da0s1h Avail: 0/4298 MB (0%) D vinumdrive2 State: up Device /dev/da1s1h Avail: 0/4298 MB (0%) D vinumdrive3 State: up Device /dev/da2s1h Avail: 0/4298 MB (0%) D vinumdrive4 State: up Device /dev/da3s1h Avail: 0/4298 MB (0%) 1 volumes: V data State: up Plexes: 1 Size: 16 GB 1 plexes: P data.p0 R5 State: up Subdisks: 5 Size: 16 GB 5 subdisks: S data.p0.s0 State: up PO: 0 B Size: 4298 MB S data.p0.s1 State: up PO: 492 kB Size: 4298 MB S data.p0.s2 State: up PO: 984 kB Size: 4298 MB S data.p0.s3 State: up PO: 1476 kB Size: 4298 MB S data.p0.s4 State: up PO: 1968 kB Size: 4298 MB raider# newfs -v /dev/vinum/data Warning: Block size and bytes per inode restrict cylinders per group to 22. Warning: 1856 sector(s) in last cylinder unallocated /dev/vinum/data: 35211456 sectors in 8597 cylinders of 1 tracks, 4096 sectors 17193.1MB in 391 cyl groups (22 c/g, 44.00MB/g, 10944 i/g) super-block backups (for fsck -b #) at: 32, 90144, 180256, 270368, 360480, 450592, 540704, 630816, 720928, 811040, 901152, 991264, 1081376, 1171488, 1261600, 1351712, 1441824, 1531936, 1622048, 1712160, 1802272, 1892384, 1982496, 2072608, 2162720, 2252832, 2342944, 2433056, 2523168, 2613280, 2703392, 2793504, 2883616, 2973728, 3063840, 3153952, 3244064, 3334176, 3424288, 3514400, 3604512, 3694624, 3784736, 3874848, 3964960, 4055072, 4145184, 4235296, 4325408, 4415520, 4505632, 4595744, 4685856, 4775968, 4866080, 4956192, 5046304, 5136416, 5226528, 5316640, 5406752, 5496864, 5586976, 5677088, 5767200, 5857312, 5947424, 6037536, 6127648, 6217760, 6307872, 6397984, 6488096, 6578208, 6668320, 6758432, 6848544, 6938656, 7028768, 7118880, 7208992, 7299104, 7389216, 7479328, 7569440, 7659552, 7749664, [... snip ...] 33792032, 33882144, 33972256, 34062368, 34152480, 34242592, 34332704, 34422816, 34512928, 34603040, 34693152, 34783264, 34873376, 34963488, 35053600, 35143712 raider# tunefs -n enable /dev/vinum/data tunefs: soft updates set raider# vinum printconfig built-config raider# diff test-config built-config 0a1 > # Vinum configuration of raider.home.local, saved at Tue Feb 17 22:18:40 2004 6d6 < 8,14c8,13 < plex org raid5 984s < sd drive vinumdrive0 len 8802864s driveoffset 265s < sd drive vinumdrive1 len 8802864s driveoffset 265s < sd drive vinumdrive2 len 8802864s driveoffset 265s < sd drive vinumdrive3 len 8802864s driveoffset 265s < sd drive vinumdrive4 len 8802864s driveoffset 265s < --- > plex name data.p0 org raid5 984s vol data > sd name data.p0.s0 drive vinumdrive0 plex data.p0 len 8802864s driveoffset 265s plexoffset 0s > sd name data.p0.s1 drive vinumdrive1 plex data.p0 len 8802864s driveoffset 265s plexoffset 984s > sd name data.p0.s2 drive vinumdrive2 plex data.p0 len 8802864s driveoffset 265s plexoffset 1968s > sd name data.p0.s3 drive vinumdrive3 plex data.p0 len 8802864s driveoffset 265s plexoffset 2952s > sd name data.p0.s4 drive vinumdrive4 plex data.p0 len 8802864s driveoffset 265s plexoffset 3936s raider# All looks fine up to this point. In fact /var/log/messages has all the vinum messages also and vinum_history shows what I have done. I now added data volume to fstab: raider# cat /etc/fstab # See the fstab(5) manual page for important information on automatic mounts # of network filesystems before modifying this file. # # Device Mountpoint FStype Options Dump Pass# /dev/ad2s1b none swap sw 0 0 /dev/ad2s1a / ufs rw 1 1 /dev/ad2s1f /tmp ufs rw 2 2 /dev/ad2s1g /usr ufs rw 2 2 /dev/ad2s1e /var ufs rw 2 2 /dev/vinum/data /data ufs rw 2 2 /dev/acd0c /cdrom cd9660 ro,noauto 0 0 proc /proc procfs rw 0 0 I also added vinum_load="YES" to /boot/loader.conf: raider# cat /boot/loader.conf # -- sysinstall generated deltas -- # userconfig_script_load="YES" vinum_load="YES" raider# Did a few more tests - vinum stop / vinum start etc - still no problems. And reboot: raider# shutdown -r now /kernel and /modules/vinum.ko are loaded early on and system boots. And immediately ends up in single user mode as I left out some important bits from loader.conf like: vinum.drives="/dev/ad0s1 /dev/da0s1 /dev/da1s1 /dev/da2s1 /dev/da3s1" Fixed that and rebooted again. This time everything starts normally. No errors reported by vinum, however the vinum startup messages no longer appear in the /var/log/messages file anymore although they do still display on the console. vinum list shows everything as expected wrt devices & avail. vinum printconfig output matches (except for time) to that taken before reboot. Took vinum out of /boot/loader.conf and included start_vinum="YES" in /etc/rc.conf. Rebooted, all appears ok still. messages & dmesg show no record of vinum, but kldstat -v shows it plus I can see vinum messages on the console, only one of 'interest' is: vinum: /dev is mounted read-only, not rebuilding /dev/vinum Tried rebooting several more times, including some vinum stop/start sequences etc. When I entered vinum stop, the vinum messages were logged to /var/log/messages. Likewise when I entered vinum start the messages were also logged. It seems that only the messages during boot are not captured. This is the same whether I have vinum loaded by loader.conf or through rc.conf. Despite this logging oddity everything else appears to work just fine using 4.9-RELEASE. I am currently building world based on RELENG_4 cvsup from this evening. Will try it all again with the new kernel & world probably tomorrow. Main thing that is different is that I no longer have a vinum root environment. I might try to rebuild that while I wait for world to build. Vinum root needs a bit more planning from the start due to disk offsets and the like... > > Please advise if I can help - I have plenty of free time this week > > and am willing to get my hands dirty. > Hmm. Unfortunately, I don't have much time for the rest of the week. > Does this mean you're not coming to the AUUG security symposium in > Canberra? No, perhaps I should keep myself more abreast of 'local' events. AUUG hmm.. will have to ask about it at the next vicfug gathering. > There's obviously something going on here which isn't immediately > obvious. Take a look at > http://www.vinumvm.org/vinum/how-to-debug.html and send me the > information I ask for there, and maybe we can track it down. IMHO I did supply basically everything in the original email. No core or panics to debug, the rest of the config & setup was listed. Will further report tomorrow on outcome of further testing. Scenarios: data raid5 with vinum root on 4.9-RELEASE plain data raid5 with RELENG_4 data raid5 with vinum root on RELENG_4 Thanks for your time, patch attached. Tony --LZvS9be/3tNcYl/X Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=command_patch --- /usr/src/sbin/vinum/commands.c Tue Jun 24 23:31:55 2003 +++ commands.c Tue Feb 17 22:05:05 2004 @@ -37,7 +37,7 @@ * advised of the possibility of such damage. * * $Id: commands.c,v 1.14 2000/11/14 20:01:23 grog Exp grog $ * $FreeBSD: src/sbin/vinum/commands.c,v 1.31.2.6 2003/06/06 05:13:29 grog Exp $ */ #include @@ -465,9 +465,6 @@ message->verify = vflag; /* verify what we write? */ message->force = 1; /* insist */ ioctl(superdev, VINUM_SETSTATE, message); - if ((SSize > 0) /* specified a size for init */ - &&(SSize < 512)) - SSize <<= DEV_BSHIFT; if (reply.error) { fprintf(stderr, "Can't initialize %s: %s (%d)\n", @@ -483,7 +480,7 @@ message->type = sd_object; /* and type of object */ message->state = object_up; message->verify = vflag; /* verify what we write? */ - message->blocksize = SSize; + message->blocksize = initsize; ioctl(superdev, VINUM_SETSTATE, message); } while (reply.error == EAGAIN); /* until we're done */ --LZvS9be/3tNcYl/X--