From owner-freebsd-stable@FreeBSD.ORG Wed Mar 3 08:18:13 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 893781065674 for ; Wed, 3 Mar 2010 08:18:13 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f223.google.com (mail-fx0-f223.google.com [209.85.220.223]) by mx1.freebsd.org (Postfix) with ESMTP id 149F78FC08 for ; Wed, 3 Mar 2010 08:18:12 +0000 (UTC) Received: by fxm23 with SMTP id 23so558956fxm.3 for ; Wed, 03 Mar 2010 00:18:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=ocg0VzULJkctd+95QbEBAOQkt+9GLSK6JB9tt0Ztkqg=; b=t86gp1wPcyHuZSeiWe97oBjcO4eaoU8a0COWC1vWPf4JhCuBqQ4pqc3/UwLOy+aMIG PYDjcOlqUQCFRuH7dfNdANSVkqjFu7hnAvzifUStkaHFY0hZzD4egEy1s4a7gpzjubPR xYJrDjhcqTiHKU1iWgcd/OxS6xS9KOEqFjvd4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=wly4e0sN2fArtXjgeJLEWt5Sot3A2Mb9NiLoj1c0J/6mG/06zmUpK1vvv5jwMZYm5g en+JKUhrtxdG/Pcq13lOIvPUk2e5NkNb3dOTmTjMA01119u/5ga20CIZktyeV5v6C0af yClt83afOyRuKmjsdgrgCE7XwZwRspohYzFi4= Received: by 10.223.4.145 with SMTP id 17mr7964185far.17.1267604289023; Wed, 03 Mar 2010 00:18:09 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id 15sm3476673fxm.4.2010.03.03.00.18.07 (version=SSLv3 cipher=RC4-MD5); Wed, 03 Mar 2010 00:18:08 -0800 (PST) Sender: Alexander Motin Message-ID: <4B8E1B3D.306@FreeBSD.org> Date: Wed, 03 Mar 2010 10:18:05 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Harald Schmalzbauer References: <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de> In-Reply-To: <4B8E1489.2070306@omnilan.de> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: ahcich timeouts, only with ahci, not with ataahci X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Mar 2010 08:18:13 -0000 Harald Schmalzbauer wrote: > Alexander Motin schrieb am 23.02.2010 16:10 (localtime): >> Harald Schmalzbauer wrote: >>> I'm frequently getting my machine locked with ahcichX timeouts: >>> ahcich2: Timeout on slot 0 >>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr >>> 00000000 >>> ahcich2: Timeout on slot 8 >>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr >>> 00000000 >>> ahcich2: Timeout on slot 8 >>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr >>> 00000000 >>> ... >> >> Looking that is (Interrupt status) is zero and `rs == cs | ss` (running >> command bitmasks in driver and hardware), controller doesn't report >> command completion. Looking on TFD status 0xc0 with BUSY bit set, I >> would suppose that either disk stuck in command processing for some >> reason, or controller missed command completion status. >> >> Have you noticed 30 second (default ATA timeout) pause before timeout >> message printed? Just want to be sure that driver waited enough before >> give up. >> >>> This happens when backup over GbE overloads ZFS/HDD capabilities. >>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking >>> up almost immediately, but from it still happens. >>> When I don't use ahci but ataahci (the old driver if I understand things >>> correct) I also see the ZFS burst write congestion, but this doesn't >>> lead to controller timeouts, thus blocking the machine. >>> >>> Sometimes the machine recovers from the disk lock, but most often I have >>> to reboot. >> >> How it looks when it doesn't? Can you send me full log messages? > > Hello, this morning I had a stall, but the machine recovered after about > one Minute. Here's what I got from the kernel: > ahcich2: Timeout on slot 29 > ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr > 00000000 > em1: watchdog timeout -- resetting > em1: watchdog timeout -- resetting > ahcich2: Timeout on slot 10 > ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr > 00000000 > ahcich2: Timeout on slot 18 > ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr > 00000000 > ahcich2: Timeout on slot 2 > ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr > 00000000 > ahcich2: Timeout on slot 2 > ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr > 00000000 > > Does this tell you something useful? It doesn't. Looking on logged register content - commands are indeed still running and no interrupts requested. Interesting to see em1 watchdog timeout there. Aren't they related somehow? -- Alexander Motin