From owner-freebsd-stable@FreeBSD.ORG Mon Jul 19 02:58:41 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5D871065672 for ; Mon, 19 Jul 2010 02:58:41 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id 539268FC1A for ; Mon, 19 Jul 2010 02:58:40 +0000 (UTC) Received: from omta18.westchester.pa.mail.comcast.net ([76.96.62.90]) by qmta01.westchester.pa.mail.comcast.net with comcast id jexW1e0061wpRvQ51eyh3t; Mon, 19 Jul 2010 02:58:41 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta18.westchester.pa.mail.comcast.net with comcast id jeyg1e0053LrwQ23eeygj5; Mon, 19 Jul 2010 02:58:41 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4637D9B425; Sun, 18 Jul 2010 19:58:39 -0700 (PDT) Date: Sun, 18 Jul 2010 19:58:39 -0700 From: Jeremy Chadwick To: Mike Tancsa Message-ID: <20100719025839.GA91809@icarus.home.lan> References: <201007182108.o6IL88eG043887@lava.sentex.ca> <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100719023419.GA91006@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-stable@freebsd.org Subject: Re: deadlock or bad disk ? RELENG_8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 02:58:41 -0000 On Sun, Jul 18, 2010 at 07:34:19PM -0700, Jeremy Chadwick wrote: > Now I'm confused -- this indicates twa(4) is involved, not arcmsr(4). > > Can you please provide a verbose explanation of the configuration of the > disks and controllers in this machine, including device and disk names > and what they're associated with, plus if they're RAIDed in any way? > > Thanks. I re-worked this out myself based on the OP's dmesg. It's confusing because there's literally 6 different storage controllers on a single machine: * arcmsr0 <--> irq 18 <--> Areca SATA Host Adapter RAID Controller siis0 <--> irq 17 <--> SiI3132 SATA controller * twa0 <--> irq 18 <--> 3ware 9000 series Storage Controller ahci0 <--> irq 16 <--> JMicron JMB361 AHCI SATA controller atapci0 <--> irq 17 <--> JMicron JMB361 ATA controller * ahci1 <--> irq 19 <--> Intel ICH10 AHCI SATA controller Controllers marked with asterisk (*) are in use/involved. Others don't appear to have anything connected to them. Channels and what above controllers they're connected to. Again, same with the asterisk: ahcich0 <--> ahci0 ahcich1 <--> ahci0 ata2 <--> atapci0 * ahcich2 <--> ahci1 * ahcich3 <--> ahci1 * ahcich4 <--> ahci1 * ahcich5 <--> ahci1 ahcich6 <--> ahci1 ahcich7 <--> ahci1 The dmesg output also shows this. I have no idea what it means: (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step Now we get into the disks. The kernel interspersed output within drivers so I had to work this out myself. da0 <--> arcmsr0 <--> Areca usrvar (RAID volume) da1 <--> arcmsr0 <--> Areca backup1 (RAID volume) da2 <--> twa0 <--> No idea, but looks like a RAID volume ada0 <--> ahcich2 <--> ST31000340AS (disk) ada1 <--> ahcich3 <--> ST31000340AS (disk) ada2 <--> ahcich4 <--> ST31000333AS (disk) ada3 <--> ahcich5 <--> ST31000528AS (disk) So one thing of interest is that the Areca and 3ware controllers are sharing an IRQ. If you do extensive bidirectional I/O between disks on the arcmsr0 and twa0 controllers at the same time (e.g. read from arcmsr0 which writes to twa0, and read from twa0 which writes to arcmsr0), do you see this problem? vmstat -i output would help here, except that it's going to show the rate as a total (for both controllers). I don't know if a way to get more granular output. pciconf -lvc output might also help (to see if the controllers are using MSI or not); only interested in the arcmsr0, twa0, and ahci1 entries. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |