From owner-freebsd-stable@FreeBSD.ORG Wed Feb 8 22:38:21 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 266A21065674 for ; Wed, 8 Feb 2012 22:38:21 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id 0B7048FC16 for ; Wed, 8 Feb 2012 22:38:20 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta09.emeryville.ca.mail.comcast.net with comcast id XaYS1i0090EPchoA9aeLV7; Wed, 08 Feb 2012 22:38:20 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.emeryville.ca.mail.comcast.net with comcast id XaeK1i00b1t3BNj8MaeKuQ; Wed, 08 Feb 2012 22:38:20 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 3F578102C1E; Wed, 8 Feb 2012 14:38:19 -0800 (PST) Date: Wed, 8 Feb 2012 14:38:19 -0800 From: Jeremy Chadwick To: Alexander Motin Message-ID: <20120208223819.GA27488@icarus.home.lan> References: <4F32E289.4080806@sentex.net> <4F32F5B0.2060203@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F32F5B0.2060203@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: siisch1: Error while READ LOG EXT X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 22:38:21 -0000 On Thu, Feb 09, 2012 at 12:22:40AM +0200, Alexander Motin wrote: > On 08.02.2012 23:27, Jeremy Chadwick wrote: > >On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote: > >>I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. > >> > >>siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 > >> vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' > >> device = 'PCI-X to Serial ATA Controller (SiI 3124)' > >> class = mass storage > >> subclass = RAID > >> bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled > >> bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled > >> bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled > >> cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 > >> cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions > >> cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message > >> > >>siis0: port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 > >>siis0: [ITHREAD] > >>siisch0: at channel 0 on siis0 > >>siisch0: [ITHREAD] > >>siisch1: at channel 1 on siis0 > >>siisch1: [ITHREAD] > >>siisch2: at channel 2 on siis0 > >>siisch2: [ITHREAD] > >>siisch3: at channel 3 on siis0 > >>siisch3: [ITHREAD] > >> > >># camcontrol devlist > >> at scbus0 target 0 lun 0 (pass0,ada0) > >> at scbus0 target 1 lun 0 (pass1,ada1) > >> at scbus0 target 2 lun 0 (pass2,ada2) > >> at scbus0 target 3 lun 0 (pass3,ada3) > >> at scbus0 target 15 lun 0 (pass4,pmp1) > >> at scbus1 target 0 lun 0 (pass5,ada4) > >> at scbus1 target 1 lun 0 (pass6,ada5) > >> at scbus1 target 2 lun 0 (pass7,ada6) > >> at scbus1 target 3 lun 0 (pass8,ada7) > >> at scbus1 target 4 lun 0 (pass9,ada8) > >> at scbus1 target 15 lun 0 (pass10,pmp0) > >> at scbus4 target 0 lun 0 (pass11,da0) > >> at scbus4 target 0 lun 1 (pass12,da1) > >> at scbus4 target 16 lun 0 (pass13) > >> at scbus5 target 0 lun 0 (pass14,da2) > >> at scbus6 target 0 lun 0 (pass15,ada9) > >> at scbus7 target 0 lun 0 (pass16,ada10) > >> at scbus8 target 0 lun 0 (pass17,ada11) > >> at scbus11 target 0 lun 0 (pass18,ada12) > >> > >> > >>Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. > >> > >> > >>Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 > >>Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 > >>Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > >>Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 > >>Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 > >>Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > >>Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 > >>Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 > >>Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > >>Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 > >>Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 > >>Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > > > >This indicates the controller on channel 1 (siisch1) is "stalled" > >waiting for underlying communication with the device attached to it. > > > >>Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 01:33:52 backup3 last message repeated 2 times > >>Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 01:50:31 backup3 last message repeated 2 times > >>Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT > >>Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT > > > >This indicates the underlying device was handed a READ LOG EXT ATA > >command (command 0x2f) and the device did not respond promptly > >(resulting in the timeout messages you see). > > There are hours between timeouts and READ LOG EXT errors. they are > not directly related, but may have the same reason. > > >>smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. > >> > >>{snipping SMART stats} > > > >You're focused heavily on the READ LOG EXT command. READ LOG EXT is > >intended for accessing the GP Log section of a drive. EXT stands for > >"Extended". "GP Log" means "General Purpose Log", and is where all > >sorts of logging information regarding drive performance is stored. > >It's usually stored within a reserved section of the platters, or in the > >HPA area. It's not within a "standard" user-accessible LBA/sector > >region. This is a completely separate log from that of SMART logs. > > READ LOG EXT commands here used to fetch status of some failed NCQ > commands. It is normal (the only) way to get detailed error status > in that case. Error of the READ LOG EXT commands may mean that it is > not regular media error, but may be problem with communication, > firmware or something else. As usual -- thanks for your fantastic insights, Alexander. I wasn't aware that to get NCQ error conditions you had to read from the GPLog region, I was under the impression this was somehow returned via extended SATA status codes (since I with NCQ the internal ordering CDBs changes). Interesting. NCQ is still something (on an ATA and OS level, for lack of better terminology) I am not fully familiar with, though functionally I understand what it provides/gains. Where in the ata-cam code is this done? I'd like to look at it for educational reasons. Firmware problems are a possibility in this case, but I'm reluctant to believe problems with communication between controller and drive. His CRC error count on the drive itself is zero. So the only communication failure I can think of would be between the controller and the port multiplier (e.g. before anything actually reaches the drive), or some kind of firmware-level problem with the port multiplier device (I believe they do have some sort of ASIC on them? I've never used one). Mike, can you please provide output from the following commands? * smartctl -x /dev/ada9 * smartctl -l sataphy /dev/ada9 * smartctl -l devstat /dev/ada9 * smartctl -l gplog,0x10 /dev/ada9 Note that some of these may fail with some error equivalent to "GPLog 0xXX not supported", and that's perfectly okay. Thank you. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |