From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 22:28:11 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 45C6E16A4E0 for ; Wed, 2 Aug 2006 22:28:11 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id BC4C643D77 for ; Wed, 2 Aug 2006 22:28:00 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 56BC9593B; Thu, 3 Aug 2006 00:27:59 +0200 (CEST) Message-ID: <44D126EF.9070503@quip.cz> Date: Thu, 03 Aug 2006 00:27:59 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: rick-freebsd@kiwi-computer.com References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> In-Reply-To: <20060802210709.GA15310@megan.kiwi-computer.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 22:28:11 -0000 Rick C. Petty wrote: > On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote: > >>>Did you have SMART enabled in the BIOS? >> >>Yes, (as I remember - I have only remote access now) and have > > > Then I doubt the disk itself had any errors.. Likely a bad cable or > controller, which I've typically seen manifested under heavier disk > activity. [...] > Yup, disks disappear when they stop responding to "bus reset" commands. > This seems to happen on various controllers after an unpredictable number > of READ_DMA or WRITE_DMA timeout errors. Theoretically, you could reinit > the channel and see if the disk pops back up. Reinit did not help, only reboot. > One thing to note: I > recommend putting the disks on separate channels so if a reinit fails, you > don't lose both disks. I hate it when manufacturers put two SATA ports on > the same ATA channel.. Cheap for them, problematic for you. I dont understand hardware much, but SATA controller is set to IDE mode in BIOS and disks are on ATA channel 2 as ad4 Master and ad5 Slave. If BIOS settings is changed to AHCI, dmesg shows two more ATA channels, ad4 as ata2-master and second disk will be ad8 on ata4-master (without changing cables / connections). As I see same problem with disk disappearing with AHCI and IDE, I have decided to use IDE mode, which seems to me little bit faster in gmirror synchronization. Is there big difference between AHCI and IDE mode of SATA controller? As I see in dmesg, controller is Intel ICH7 *SATA300* but disks are SATA150, I this cause some troubles? >>>>Can anybody tell me, where is the problem / how can I found what is wrong? >>> >>> >>>What's the output of "gmirror status" ?? I suspect on a reboot, gmirror >>>will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once >>>that is complete, gmirror won't be DEGRADED anymore. >> >># gmirror status >> Name Status Components >>mirror/gm0 DEGRADED ad4 > > > Hmm, and is ad5 detected? (rhetorical question, because I see that it was) > > >>Gmirror is not synchronized after reboot: >> >>Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 >>detected. >>Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) >>broken, skipping. > > > Looks like the disk was marked with bad metadata. > > >>So disk is OK, but gmirror refused to use it? > > > Yes. I would first suggest trying "gmirror deactivate -v gm0 ad5" then > trying to reactivate it. Maybe that will flush out the wrong metadata. > If that doesn't work, try booting in verbose mode and attaching the dmesg > (in particular, when the mirror is being attached). > Last resort (although not a horrible option), you can try removing ad5 from > the mirror and relabelling (gmirror label, not bsdlabel) it. If the remove > fails, use a combination of forget and clear. gmirror forget and insert helped: root@track ~/# gmirror deactivate -v gm0 ad5 No such provider: ad5. root@track ~/# gmirror forget -v gm0 Done. root@track ~/# gmirror insert -v gm0 ad5 Done. root@track ~/# gmirror status Name Status Components mirror/gm0 DEGRADED ad4 ad5 (0%) >>If disks are OK, what is wrong? What caused READ / WRITE timeouts? >>Broken SATA controler? FreeBSD ATA driver? > > > Try replacing the cables, trying a different SATA controller. I've seen > these timeouts *a lot* and usually my gmirror/gvinum partitions all > survive (after reboot at least). There are a lot of threads on this and > other mailing lists describing the timeout problems. Yes, I read many post about similar problems. I have similar problem on 4 machines, so I think this is not cable problem. Maybe bad controller in whole serie of ASUS RS120, or something like this. (4 of 4 same machines has similar problems with disk subsystem) Thank you. Miroslav Lachman