From owner-freebsd-current@FreeBSD.ORG Mon Sep 14 14:10:33 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8AEAE1065672; Mon, 14 Sep 2009 14:10:33 +0000 (UTC) (envelope-from gaijin.k@gmail.com) Received: from mail-qy0-f204.google.com (mail-qy0-f204.google.com [209.85.221.204]) by mx1.freebsd.org (Postfix) with ESMTP id 26DD78FC17; Mon, 14 Sep 2009 14:10:32 +0000 (UTC) Received: by qyk42 with SMTP id 42so2295046qyk.10 for ; Mon, 14 Sep 2009 07:10:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=QBxk4Hl+HeiK1NbjqLQmM6FGxc5DYzRL6rEBAdE3aE0=; b=OxrnVeaeZ3OftgRdX9dk4uLoWWCGcq69e03UXo+vnCE2U6Gr4XDiWotUsgkgNJ5VOP 8OfIVF+RC3W6VtF99bITnQqg+QN0GFKphvpM49gQSsoDfJ84Orrwh5VR7+SZrlFtdORI qc1STkn7dXw0ieTN1bLsEElDwu2N7pT3WNsds= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=Ms30Nyah3w8+tbRXeje8Ce7om3PBpYSh1dTCNO/x/E7AfQ7q5swvJa3zFxGgr3jngv qKQ7NN4fPuiSeRjIuUP7UZcAkVjeXDzK6VJAB3XJpx+1TtCAo9TRek8g/HtIqbtsWkX5 2spIZSmqs7YNO+VCygfLynRUFhwoSd5xdpb7Y= Received: by 10.224.78.7 with SMTP id i7mr5131038qak.303.1252937431816; Mon, 14 Sep 2009 07:10:31 -0700 (PDT) Received: from Nokia-N810-43-7 ([32.138.73.1]) by mx.google.com with ESMTPS id 2sm1310161qwi.52.2009.09.14.07.10.26 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 14 Sep 2009 07:10:31 -0700 (PDT) Date: Mon, 14 Sep 2009 10:09:41 -0400 From: Alexandre Sunny To: Kris Kennaway Message-ID: <20090914100941.0adc00aa@Nokia-N810-43-7> In-Reply-To: <4AAD5DD2.4030104@FreeBSD.org> References: <4AAD4E51.5060908@FreeBSD.org> <4AAD5365.5000902@FreeBSD.org> <4AAD5DD2.4030104@FreeBSD.org> X-Mailer: Claws Mail 3.7.1 (GTK+ 2.10.12; arm-unknown-linux-gnueabi) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Alexander Motin , FreeBSD Current Subject: Re: ata timeouts under load X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Sep 2009 14:10:33 -0000 On Sun, 13 Sep 2009 22:02:10 +0100 Kris Kennaway wrote: > Alexander Motin wrote: > > Kris Kennaway wrote: > >> I am getting timeouts on 8.0b4/HEAD when I do a lot of ZFS I/O to > >> a pool on ad4: > >> > >> atapci0: port > >> 0xc800-0xc807,0xc400-0xc403,0xc000-0xc007,0xb800-0xb803,0xb400-0xb40f,0xb000-0xb0ff > >> irq 20 at device 15.0 on pci0 > >> ata2: on atapci0 > >> ata3: on atapci0 > >> ata0: on atapci1 > >> ata1: on atapci1 > >> > >> ad4: 476940MB at ata2-master > >> SATA150 ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue > >> timeout - completing request directly > >> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > >> completing request directly > >> ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - > >> completing request directly > >> ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - > >> completing request directly > >> ad4: WARNING - SET_MULTI taskqueue timeout - completing request > >> directly ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) > >> LBA=344052040 ad4: WARNING - SETFEATURES SET TRANSFER MODE > >> taskqueue timeout - completing request directly > >> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > >> completing request directly > >> > >> It becomes stuck in a loop displaying the above and is unable to > >> complete further I/O operations. I wonder if it is just batching > >> up a lot of I/O and then timing out because it is busy, and then > >> not recovering from this state? > >> > >> Any ideas what could be wrong? > > > > There are two different kinds of timeouts we can see: > > - first one, "ad4: WARNING - ..." is just a queue waiting timeout. > > It is not the reason, but consequence of the problem. And I have > > doubts that it is reasonable to do it. > > - second one, "TIMEOUT - WRITE_DMA48 ..." is a real command > > execution timeout. I don't know whether this is result of some > > improper error recovery, or you drive indeed lost required servo > > information near LBA=344052040 and tries to find it too long. You > > can try to read that sector and nearby ones with dd. > > > > It's always that sequence (with setfeatures timing out first, then > the dma later)...and the block number varies widely, also whether > it's read/write. The disk itself & the data it contains appears to > be OK as far as I have been able to determine so far. Does smartctl -A /dev/ad4 report "Seek Error Rate" and/or "ECC Error Rate", and, if so, do those values change while errors are being reported? "Replaced Sector Count" or something similar might give some insight too. -- Alexandre Kovalenko.