From owner-freebsd-fs@FreeBSD.ORG Thu Jan 24 12:40:36 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8C844380; Thu, 24 Jan 2013 12:40:36 +0000 (UTC) (envelope-from universite@ukr.net) Received: from ffe16.ukr.net (ffe16.ukr.net [195.214.192.51]) by mx1.freebsd.org (Postfix) with ESMTP id 36F543CE; Thu, 24 Jan 2013 12:40:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=ffe; h=Date:Message-Id:From:To:Subject:Cc:Content-Type:Content-Transfer-Encoding:MIME-Version; bh=Fi9RGfPkJ88vQumCDEouStYkub4SO0ftgDNuWGVx07w=; b=dXuVSCHZDnmaJIoyukwr7yWEO/XGeJDcmKT37oc6LZ2eKtHagKs87RpZxqHZh29HSBNe/3+IFzqVX0tRY8NKR/9xEiCmS2vEAajMrlqDpPFN+6K9q0sBKX372NQDxw28bNoSoZ5LmIm82M5GsFVwf0XalKiRKk5/JADG0GZgBgc=; Received: from mail by ffe16.ukr.net with local ID 1TyLmM-0003XF-5m ; Thu, 24 Jan 2013 14:19:38 +0200 MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: binary Content-Type: text/plain; charset="windows-1251" Subject: AHCI timeout when using ZFS + AIO + NCQ To: fs@freebsd.org From: "Vladislav Prodan" X-Mailer: freemail.ukr.net 4.0 Message-Id: <13391.1359029978.3957795939058384896@ffe16.ukr.net> X-Browser: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 Date: Thu, 24 Jan 2013 14:19:38 +0200 Cc: current@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 12:40:36 -0000 I have the server: FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012 Jan 24 12:53:01 vesuvius kernel: atapci0: port 0xc040-0xc047,0xc030-0xc033,0xc020-0xc027,0xc010-0xc013,0xc000-0xc00f mem 0xfe210000-0xfe2101ff irq 51 at device 0.0 on pci3 ... Jan 24 12:53:01 vesuvius kernel: ahci0: port 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfe307000-0xfe3073ff irq 19 at device 17.0 on pci0 Jan 24 12:53:01 vesuvius kernel: ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported ... Jan 24 12:53:01 vesuvius kernel: ada2 at ahcich2 bus 0 scbus4 target 0 lun 0 Jan 24 12:53:01 vesuvius kernel: ada2: ATA-8 SATA 3.x device Jan 24 12:53:01 vesuvius kernel: ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) Jan 24 12:53:01 vesuvius kernel: ada2: Command Queueing enabled Jan 24 12:53:01 vesuvius kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) Jan 24 12:53:01 vesuvius kernel: ada2: Previously was known as ad12 ... I use 4 HDD in RAID10 via ZFS. With a very irregular intervals fall off HDD drives. As a result, the server stops. Jan 24 06:48:06 vesuvius kernel: ahcich2: Timeout on slot 6 port 0 Jan 24 06:48:06 vesuvius kernel: ahcich2: is 00000000 cs 00000000 ss 000000c0 rs 000000c0 tfd 40 serr 00000000 cmd 0000e817 Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4c 4e 1e 40 68 00 00 01 00 00 Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): Retrying command Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked Jan 24 06:51:11 vesuvius kernel: swap_pager: I/O error - pagein failed; blkno 4227133,size 8192, error 6 Jan 24 06:51:11 vesuvius kernel: (ada2:(pass2:vm_fault: pager read error, pid 1943 (named) Jan 24 06:51:11 vesuvius kernel: ahcich2:0:ahcich2:0:0:0:0): lost device Jan 24 06:51:11 vesuvius kernel: 0): passdevgonecb: devfs entry is gone Jan 24 06:51:11 vesuvius kernel: pid 1943 (named), uid 53: exited on signal 11 ... Helps only restart by pressing Power. Judging by the state of SMART, HDD have no problems. SATA data cable changed. I found a similar problem: http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html PR: amd64/165547: NVIDIA MCP67 AHCI SATA controller timeout -- Vladislav V. Prodan System & Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE