From owner-freebsd-stable@FreeBSD.ORG Tue Jan 26 16:46:22 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 142F7106566B for ; Tue, 26 Jan 2010 16:46:22 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by mx1.freebsd.org (Postfix) with ESMTP id 72FAE8FC0C for ; Tue, 26 Jan 2010 16:46:20 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta01.emeryville.ca.mail.comcast.net with comcast id aDkB1d0040x6nqcA1GmMWR; Tue, 26 Jan 2010 16:46:21 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta12.emeryville.ca.mail.comcast.net with comcast id aGmL1d00D3S48mS8YGmLfn; Tue, 26 Jan 2010 16:46:21 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 7879D1E3033; Tue, 26 Jan 2010 08:46:19 -0800 (PST) Date: Tue, 26 Jan 2010 08:46:19 -0800 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20100126164619.GA50461@icarus.home.lan> References: <20100126143021.GA47535@icarus.home.lan> <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de> User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: ZFS "zpool replace" problems X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jan 2010 16:46:22 -0000 On Tue, Jan 26, 2010 at 04:03:20PM +0100, Gerrit Kühn wrote: > On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick > wrote about Re: ZFS "zpool replace" problems: > JC> 2) How did you attach ad18? Did you tell the system about it using > JC> atacontrol? If so, what commands did you use? > > Yes. The drives did not appear automatically (verified with atacontrol > list). Then I first tried reinit ata9, but that did not work out, so I did > a detach/attach for ata9, then the drive was there (with list and also > the device node appeared). The procedure -- at least on Intel controllers in AHCI mode -- is: - zpool offline - atacontrol detach ataX (where X = channel associated with disk) - Physically remove bad disk - Physically insert new disk - Wait 15 seconds for stuff to settle - atacontrol attach ataX (where X = previous channel detached) - zpool replace - zpool online "reinit" shouldn't be needed at all -- in fact, I've seen reinit cause some craziness (even on Intel controllers), including a system deadlock, but this was back during the RELENG_6 and RELENG_7 days. Great improvements have been made to ata(4) since then. If you need me to validate the above procedure (it's been a while since I've had to hot-swap a disk), I can do so. I do have a 4-disk Supermicro SuperServer 5015B-MTB (ICH9-based) sitting on my workbench which I can test with. > Meanwhile I took out the ad18 drive again and tried to use a different > drive. But that was listed as "UNAVAIL" with corrupted data by zfs. > Probably it already branded the disk for resilvering and is looking for > exactly this one now. I also put in the disk which caused the problem > above again. The resilvering process started again, but very soon the > drive got detached again resulting in the same situation I described above. It honestly sounds like hot-swapping is causing some chaos on your system. Are all of the controllers involved configured for AHCI? If not, physical removal/insertion should be done only when the system power is off. If so, mav@ or others may be able to help figure out what's going on in the underlying ata(4) layer. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |