Date: Tue, 21 Aug 2007 14:15:08 +0200 From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se> To: freebsd-geom@freebsd.org, freebsd-stable@freebsd.org Subject: Crashed gmirror, single disk marked SYNC and wont boot... Message-ID: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se>
next in thread | raw e-mail | index | archive | help
Hi FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: =20 Tue Feb 13 18:24:34 CET 2007 johan@elfi.stromnet.se:/usr/obj/usr/=20 src/sys/ROUTER.POLLING i386 (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, =20 IPSEC, also pfsync and carp) This weekend I had a disk failing on me in a machine running gmirror =20 gm0 with 2 providers (ad0 and ad6). The whole box froze with no =20 screen output, and on hard reboot I got some LBA errors etc from ad0, =20= after a few reboots it got up and running though (I wasnt at the =20 screen, had do do it by phone so couldn't really debug very well). As soon as the box got up, I removed ad0 from the gmirror, so ad6 was =20= the only provider. Today I got a new disk that would replace ad0.. Now remeber, ad6 was the only disk in the mirror. I took the box down =20= fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4=20 +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. =20 Okay, there came the first problem; the boot loader gave me the usual =20= options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 =20 i got the same prompt again.. F5 nothing at all.. Funny!... The =20 system refused to load the loader (or whatever the 1-9 menu thingy is =20= called) kernel or anything.. So I finally plugged the old ad0 disk into the machine to at least =20 get it booted, thinking it would go up on the gmirror.. Nope..: (got the new ad4 out here) ad0: 38166MB <WDC WD400BB-00CAA1 17.07W17> at ata0-master UDMA100 ad6: 152627MB <SAMSUNG HD160JJ ZM100-41> at ata3-master SATA150 GEOM_MIRROR: Device gm0 created (id=3D4029378995). GEOM_MIRROR: Device gm0: provider ad6 detected. Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR GEOM_MIRROR: Force device gm0 start due to timeout. Trying to mount root from ufs:/dev/mirror/gm0s1a Manual root filesystem specification: <fstype>:<device> Mount <device> using filesystem <fstype> eg. ufs:da0s1a ? List valid disk boot devices <empty line> Abort manual input mountroot> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a =20 clean shutdown without problems.. It didnt even recognize any slices =20 on ad6s1 (altough the ad6s1 was found)... I entered ad0s1 as root and booted from there, ofcourse i got to =20 emergency shell since fstab looked for the gmirror devices, which =20 didnt exist.. Some more digging into gmirror, I did a gmirror dump ad6: Metadata on /dev/ad6: magic: GEOM::MIRROR version: 3 name: gm0 mid: 4029378995 did: 449032193 all: 3 genid: 0 syncid: 5 priority: 0 slice: 4096 balance: round-robin mediasize: 20416757248 sectorsize: 512 syncoffset: 0 mflags: NONE dflags: SYNCHRONIZING hcprovider: provsize: 160041885696 MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f Some googling indicated that SYNCHRONIZING means that its not =20 "complete" and wont mount? Is that correct? Why would it be in that =20 state then, I just shut it down fine... And where the f*ck did my =20 slices go??.. Did a sysctl kern.geom.mirror.debug=3D2 and tried to gmirror activate =20= the mirror: GEOM_MIRROR[1]: Creating device gm0 (id=3D4029378995). GEOM_MIRROR[0]: Device gm0 created (id=3D4029378995). GEOM_MIRROR[1]: root_mount_hold 0xc3539510 GEOM_MIRROR[1]: Adding disk ad6 to gm0. GEOM_MIRROR[2]: Adding disk ad6. GEOM_MIRROR[2]: Disk ad6 connected. GEOM_MIRROR[1]: Disk ad6 state changed from NONE to NEW (device gm0). GEOM_MIRROR[0]: Device gm0: provider ad6 detected. GEOM_MIRROR[2]: Tasting ad6s1. GEOM_MIRROR[0]: Force device gm0 start due to timeout. GEOM_MIRROR[1]: root_mount_rel[2169] 0xc3539510 GEOM_MIRROR[2]: No I/O requests for gm0, it can be destroyed. GEOM_MIRROR[2]: Metadata on ad6 updated. GEOM_MIRROR[2]: Access ad6 r-1w-1e-1 =3D 0 GEOM_MIRROR[0]: Device gm0 destroyed. GEOM_MIRROR[1]: Thread exiting. GEOM_MIRROR[1]: Consumer ad6 destroyed. Soo.. What is going on here? Anyone with some clues? Currently =20 running on the ad0 disk, no raid at all.. Lets hope it doesnt die on =20 me (havent had any signs of that since sunday when it froze and gave =20 boot errors now so I'm hoping..). The data loss from using ad0 =20 instead of ad6 is probably minimal, its a router so its more or less =20 only logging that seems to been lost... For now I just want to get =20 clear about wth happened here and how to prevent it, and how to get =20 back up on a gmirror with ad6 and ad4 (to be plugged in) so I can =20 throw ad0 out... Thanks -- Johan Str=F6m Stromnet johan@stromnet.se http://www.stromnet.se/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8039436E-1824-4C2E-915B-9069DEF23B10>