From owner-freebsd-scsi@FreeBSD.ORG  Mon Apr 18 11:07:08 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 418691065674
	for <freebsd-scsi@FreeBSD.org>; Mon, 18 Apr 2011 11:07:08 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 2650D8FC24
	for <freebsd-scsi@FreeBSD.org>; Mon, 18 Apr 2011 11:07:08 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p3IB78FL019606
	for <freebsd-scsi@FreeBSD.org>; Mon, 18 Apr 2011 11:07:08 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p3IB77Be019604
	for freebsd-scsi@FreeBSD.org; Mon, 18 Apr 2011 11:07:07 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 18 Apr 2011 11:07:07 GMT
Message-Id: <201104181107.p3IB77Be019604@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-scsi@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2011 11:07:08 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/154432  scsi       [xpt] run_interrupt_driven_hooks: still waiting after 
o kern/153361  scsi       [ciss] Smart Array 5300 boot/detect drive problem
o kern/152250  scsi       [ciss] [patch] Kernel panic when hw.ciss.expose_hidden
o kern/151564  scsi       [ciss] ciss(4) should increase  CISS_MAX_LOGICAL to 10
o docs/151336  scsi       Missing documentation of scsi_ and ata_ functions in c
s kern/149927  scsi       [cam] hard drive not stopped before removing power dur
o kern/148083  scsi       [aac] Strange device reporting
o kern/147704  scsi       [mpt] sys/dev/mpt: new chip revision, partially unsupp
o kern/146287  scsi       [ciss] ciss(4) cannot see more than one SmartArray con
o kern/145768  scsi       [mpt] can't perform I/O on SAS based SAN disk in freeb
o kern/144648  scsi       [aac] Strange values of speed and bus width in dmesg
o kern/144301  scsi       [ciss] [hang] HP proliant server locks when using ciss
o kern/142351  scsi       [mpt] LSILogic driver performance problems
o kern/141934  scsi       [cam] [patch] add support for SEAGATE DAT Scopion 130
o kern/134488  scsi       [mpt] MPT SCSI driver probes max. 8 LUNs per device
o kern/132250  scsi       [ciss] ciss driver does not support more then 15 drive
o kern/132206  scsi       [mpt] system panics on boot when mirroring and 2nd dri
o kern/130621  scsi       [mpt] tranfer rate is inscrutable slow when use lsi213
o kern/129602  scsi       [ahd] ahd(4) gets confused and wedges SCSI bus
o kern/128452  scsi       [sa] [panic] Accessing SCSI tape drive randomly crashe
o kern/128245  scsi       [scsi] "inquiry data fails comparison at DV1 step" [re
o kern/127927  scsi       [isp] isp(4) target driver crashes kernel when set up 
o kern/127717  scsi       [ata] [patch] [request] - support write cache toggling
o kern/124667  scsi       [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi
o kern/123674  scsi       [ahc] ahc driver dumping
o kern/123520  scsi       [ahd] unable to boot from net while using ahd
o sparc/121676 scsi       [iscsi] iscontrol do not connect iscsi-target on sparc
o kern/120487  scsi       [sg] scsi_sg incompatible with scanners
o kern/120247  scsi       [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 
o kern/114597  scsi       [sym] System hangs at SCSI bus reset with dual HBAs
o kern/110847  scsi       [ahd] Tyan U320 onboard problem with more than 3 disks
o kern/99954   scsi       [ahc] reading from DVD failes on 6.x [regression]
o kern/92798   scsi       [ahc] SCSI problem with timeouts
o kern/90282   scsi       [sym] SCSI bus resets cause loss of ch device
o kern/76178   scsi       [ahd] Problem with ahd and large SCSI Raid system
o kern/74627   scsi       [ahc] [hang] Adaptec 2940U2W Can't boot 5.3
s kern/61165   scsi       [panic] kernel page fault after calling cam_send_ccb
o kern/60641   scsi       [sym] Sporadic SCSI bus resets with 53C810 under load
o kern/60598   scsi       wire down of scsi devices conflicts with config
s kern/57398   scsi       [mly] Current fails to install on mly(4) based RAID di
o bin/57088    scsi       [cam] [patch] for a possible fd leak in libcam.c
o kern/52638   scsi       [panic] SCSI U320 on SMP server won't run faster than 
o kern/44587   scsi       dev/dpt/dpt.h is missing defines required for DPT_HAND
o kern/39388   scsi       ncr/sym drivers fail with 53c810 and more than 256MB m
o kern/35234   scsi       World access to /dev/pass? (for scanner) requires acce

45 problems total.


From owner-freebsd-scsi@FreeBSD.ORG  Mon Apr 18 16:56:30 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4ADC3106566C;
	Mon, 18 Apr 2011 16:56:30 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 1EEF38FC16;
	Mon, 18 Apr 2011 16:56:30 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id C66AE46B06;
	Mon, 18 Apr 2011 12:56:29 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3B6308A02B;
	Mon, 18 Apr 2011 12:56:29 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Andre Albsmeier <Andre.Albsmeier@siemens.com>
Date: Mon, 18 Apr 2011 09:18:25 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; )
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104151235.05114.jhb@freebsd.org>
	<20110418113657.GA6080@curry.mchp.siemens.de>
In-Reply-To: <20110418113657.GA6080@curry.mchp.siemens.de>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201104180918.26054.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Mon, 18 Apr 2011 12:56:29 -0400 (EDT)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	scsi@freebsd.org
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2011 16:56:30 -0000

On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > Author: jhb
> > > > Date: Fri Feb  4 14:44:59 2011
> > > > New Revision: 218277
> > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > 
> > > > Log:
> > > >   MFC 217075:
> > > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > > >   interrupt config hooks to execute.
> > > >   
> > > >   To preserve the KBI, I did not renumber priorities but simply removed
> > > >   PCONFIG.
> > > > 
> > > > Modified:
> > > >   stable/7/sys/kern/subr_autoconf.c
> > > >   stable/7/sys/sys/priority.h
> > > > Directory Properties:
> > > >   stable/7/sys/   (props changed)
> > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > 
> > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > 
> > ==============================================================================
> > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> > (r218276)
> > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> > (r218277)
> > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > >  	warned = 0;
> > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > >  		    EWOULDBLOCK) {
> > > >  			mtx_unlock(&intr_config_hook_lock);
> > > >  			warned++;
> > > 
> > > 
> > > This broke several of my machines in a somewhat strange way:
> > > 
> > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > their boot device anymore. What they all got in common is:
> > > 
> > > - an Adaptec 2940 Ultra SCSI adapter
> > > - two SCSI harddisks (da0 and da1) of various brands
> > > - one SCSI CDROM drive (cd0)
> > > 
> > > To be exact, none of the three devices (da0, da1, cd0) were
> > > detected at all. Other machines with a similar configuration
> > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > any problems. So I simply removed the CDROM drives on the 4
> > > machines in question and they all booted again.
> > > 
> > > Today I decided to dig into this and after reverting(*) the
> > > above change, they worked with the CDROM again. I have cross-
> > > checked it 3 times. No idea what's happening here...
> > > 
> > > 	-Andre
> > > 
> > > (*) To be honest, I use this patch so I had to modify only one file:
> > > 
> > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > > @@ -108,7 +108,7 @@
> > >  	warned = 0;
> > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > >  		    EWOULDBLOCK) {
> > >  			mtx_unlock(&intr_config_hook_lock);
> > >  			warned++;
> > 
> > Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 
> > dmesg might be nice to look at if possible.
> 
> OK, I have set up a machine for testing. In my other mail
> I was wrong saying that the pass devices appear when using
> the problematic kernel...
> 
> Here are the dmesgs:
> 
> - dmesg_bad is the original kernel as of Friday
> - dmesg_ok is the patched kernel (see above) as of Friday
> - dmesg.diff is the diff between both
> 
> If you want me to try something just tell me...

Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail the
mount and drop into ddb and then get 'ps' output?

I think the CAM boot probe is broken a bit.  xpt_rescan_done() always calls
xpt_release_boot(), but we don't hold the boot for each bus added while
buses_config_done is 0, so it seems CAM only waits for at least one bus to
rescan before it lets the boot continue?  This seems wrong (i.e. one would
think it would let all the busses added before this point scan before
continuing).

However, in your dmesg, it starts to print out an announcement for a pass
device before it starts mounting root, so it seems that xpt is finishing too
early somehow.

-- 
John Baldwin

From owner-freebsd-scsi@FreeBSD.ORG  Mon Apr 18 17:33:35 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4B2FA1065674
	for <scsi@freebsd.org>; Mon, 18 Apr 2011 17:33:35 +0000 (UTC)
	(envelope-from Andre.Albsmeier@siemens.com)
Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28])
	by mx1.freebsd.org (Postfix) with ESMTP id C37148FC13
	for <scsi@freebsd.org>; Mon, 18 Apr 2011 17:33:34 +0000 (UTC)
Received: from mail3.siemens.de (localhost [127.0.0.1])
	by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3IHKWqG001226;
	Mon, 18 Apr 2011 19:20:32 +0200
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130])
	by mail3.siemens.de (8.13.6/8.13.6) with ESMTP id p3IHKW5N000886;
	Mon, 18 Apr 2011 19:20:32 +0200
Received: (from localhost)
	by curry.mchp.siemens.de (8.14.4/8.14.4) id p3IHKWUc017794;
Date: Mon, 18 Apr 2011 19:20:32 +0200
From: Andre Albsmeier <Andre.Albsmeier@siemens.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20110418172032.GA8849@curry.mchp.siemens.de>
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104151235.05114.jhb@freebsd.org>
	<20110418113657.GA6080@curry.mchp.siemens.de>
	<201104180918.26054.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201104180918.26054.jhb@freebsd.org>
X-Echelon: <censored>
X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses!
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2011 17:33:35 -0000

On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > Author: jhb
> > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > New Revision: 218277
> > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > 
> > > > > Log:
> > > > >   MFC 217075:
> > > > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > > > >   interrupt config hooks to execute.
> > > > >   
> > > > >   To preserve the KBI, I did not renumber priorities but simply removed
> > > > >   PCONFIG.
> > > > > 
> > > > > Modified:
> > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > >   stable/7/sys/sys/priority.h
> > > > > Directory Properties:
> > > > >   stable/7/sys/   (props changed)
> > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > 
> > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > 
> > > ==============================================================================
> > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> > > (r218276)
> > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> > > (r218277)
> > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > >  	warned = 0;
> > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > >  		    EWOULDBLOCK) {
> > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > >  			warned++;
> > > > 
> > > > 
> > > > This broke several of my machines in a somewhat strange way:
> > > > 
> > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > their boot device anymore. What they all got in common is:
> > > > 
> > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > - one SCSI CDROM drive (cd0)
> > > > 
> > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > detected at all. Other machines with a similar configuration
> > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > any problems. So I simply removed the CDROM drives on the 4
> > > > machines in question and they all booted again.
> > > > 
> > > > Today I decided to dig into this and after reverting(*) the
> > > > above change, they worked with the CDROM again. I have cross-
> > > > checked it 3 times. No idea what's happening here...
> > > > 
> > > > 	-Andre
> > > > 
> > > > (*) To be honest, I use this patch so I had to modify only one file:
> > > > 
> > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > > > @@ -108,7 +108,7 @@
> > > >  	warned = 0;
> > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > >  		    EWOULDBLOCK) {
> > > >  			mtx_unlock(&intr_config_hook_lock);
> > > >  			warned++;
> > > 
> > > Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 
> > > dmesg might be nice to look at if possible.
> > 
> > OK, I have set up a machine for testing. In my other mail
> > I was wrong saying that the pass devices appear when using
> > the problematic kernel...
> > 
> > Here are the dmesgs:
> > 
> > - dmesg_bad is the original kernel as of Friday
> > - dmesg_ok is the patched kernel (see above) as of Friday
> > - dmesg.diff is the diff between both
> > 
> > If you want me to try something just tell me...
> 
> Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail the

I tried this already. Normally, I use 500 (which works on all machines)
and retried with 5000. No change...

> mount and drop into ddb and then get 'ps' output?

I will try it tomorrow.

> 
> I think the CAM boot probe is broken a bit.  xpt_rescan_done() always calls
> xpt_release_boot(), but we don't hold the boot for each bus added while
> buses_config_done is 0, so it seems CAM only waits for at least one bus to
> rescan before it lets the boot continue?  This seems wrong (i.e. one would
> think it would let all the busses added before this point scan before
> continuing).

Hmm, I got only one SCSI bus in that machine. Of my 17 machines, 15 are
SCSI-based. Only 4 had this problem. One of them has gut two busses,
the others only one.

> 
> However, in your dmesg, it starts to print out an announcement for a pass
> device before it starts mounting root, so it seems that xpt is finishing too
> early somehow.

Yes, I saw this as well. And in the "good" dmesg there are these
two lines which look a bit screwn up:

GEOM: nda0 at ahc0 bus 0 target 0 lun 0
...
ew disk da1

Don't know if this indicates some problem...

As I said, when I first run into this problem last week, I _THINK_ I saw
the pass devices appear at least on one broken box. But I won't swear
on this. Another thing I remember is that there was at least one problematic
box which booted successfully on the second try.

Thanks,

	-Andre

> 
> -- 
> John Baldwin

-- 
Linux: Sozialismus, der nicht funktioniert

From owner-freebsd-scsi@FreeBSD.ORG  Tue Apr 19 12:56:52 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E8E91065674;
	Tue, 19 Apr 2011 12:56:50 +0000 (UTC)
	(envelope-from Andre.Albsmeier@siemens.com)
Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28])
	by mx1.freebsd.org (Postfix) with ESMTP id 362928FC22;
	Tue, 19 Apr 2011 12:56:49 +0000 (UTC)
Received: from mail2.siemens.de (localhost [127.0.0.1])
	by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3JCumXC030826;
	Tue, 19 Apr 2011 14:56:48 +0200
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130])
	by mail2.siemens.de (8.13.6/8.13.6) with ESMTP id p3JCumIP006686;
	Tue, 19 Apr 2011 14:56:48 +0200
Received: (from localhost)
	by curry.mchp.siemens.de (8.14.4/8.14.4) id p3JCumrp020303;
Date: Tue, 19 Apr 2011 14:56:48 +0200
From: Andre Albsmeier <Andre.Albsmeier@siemens.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20110419125648.GA17780@curry.mchp.siemens.de>
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104151235.05114.jhb@freebsd.org>
	<20110418113657.GA6080@curry.mchp.siemens.de>
	<201104180918.26054.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201104180918.26054.jhb@freebsd.org>
X-Echelon: <censored>
X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses!
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Apr 2011 12:56:52 -0000

On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > Author: jhb
> > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > New Revision: 218277
> > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > 
> > > > > Log:
> > > > >   MFC 217075:
> > > > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > > > >   interrupt config hooks to execute.
> > > > >   
> > > > >   To preserve the KBI, I did not renumber priorities but simply removed
> > > > >   PCONFIG.
> > > > > 
> > > > > Modified:
> > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > >   stable/7/sys/sys/priority.h
> > > > > Directory Properties:
> > > > >   stable/7/sys/   (props changed)
> > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > 
> > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > 
> > > ==============================================================================
> > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> > > (r218276)
> > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> > > (r218277)
> > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > >  	warned = 0;
> > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > >  		    EWOULDBLOCK) {
> > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > >  			warned++;
> > > > 
> > > > 
> > > > This broke several of my machines in a somewhat strange way:
> > > > 
> > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > their boot device anymore. What they all got in common is:
> > > > 
> > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > - one SCSI CDROM drive (cd0)
> > > > 
> > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > detected at all. Other machines with a similar configuration
> > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > any problems. So I simply removed the CDROM drives on the 4
> > > > machines in question and they all booted again.
> > > > 
> > > > Today I decided to dig into this and after reverting(*) the
> > > > above change, they worked with the CDROM again. I have cross-
> > > > checked it 3 times. No idea what's happening here...
> > > > 
> > > > 	-Andre
> > > > 
> > > > (*) To be honest, I use this patch so I had to modify only one file:
> > > > 
> > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > > > @@ -108,7 +108,7 @@
> > > >  	warned = 0;
> > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > >  		    EWOULDBLOCK) {
> > > >  			mtx_unlock(&intr_config_hook_lock);
> > > >  			warned++;
> > > 
> > > Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 
> > > dmesg might be nice to look at if possible.
> > 
> > OK, I have set up a machine for testing. In my other mail
> > I was wrong saying that the pass devices appear when using
> > the problematic kernel...
> > 
> > Here are the dmesgs:
> > 
> > - dmesg_bad is the original kernel as of Friday
> > - dmesg_ok is the patched kernel (see above) as of Friday
> > - dmesg.diff is the diff between both
> > 
> > If you want me to try something just tell me...
> 
> Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail the
> mount and drop into ddb and then get 'ps' output?

As soon as I include the debugger into the kernel the problem
is gone. I have double-checked it two times now: With debugger
the drives are detected, without debugger mostly (but not always)
not.

I currently have it running in an endless rebooting loop hoping,
that it fails eventually...

	-Andre

> 
> I think the CAM boot probe is broken a bit.  xpt_rescan_done() always calls
> xpt_release_boot(), but we don't hold the boot for each bus added while
> buses_config_done is 0, so it seems CAM only waits for at least one bus to
> rescan before it lets the boot continue?  This seems wrong (i.e. one would
> think it would let all the busses added before this point scan before
> continuing).
> 
> However, in your dmesg, it starts to print out an announcement for a pass
> device before it starts mounting root, so it seems that xpt is finishing too
> early somehow.
> 
> -- 
> John Baldwin

-- 
UNIX is an operating system, OS/2 is half an operating system,
Windows is a shell, and DOS is a bootsector virus.

From owner-freebsd-scsi@FreeBSD.ORG  Tue Apr 19 13:22:27 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CEA97106566B;
	Tue, 19 Apr 2011 13:22:27 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id A3AA38FC16;
	Tue, 19 Apr 2011 13:22:27 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 3D9F446B51;
	Tue, 19 Apr 2011 09:22:27 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id B24A28A02A;
	Tue, 19 Apr 2011 09:22:26 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Andre Albsmeier <Andre.Albsmeier@siemens.com>
Date: Tue, 19 Apr 2011 09:20:25 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; )
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104180918.26054.jhb@freebsd.org>
	<20110419125648.GA17780@curry.mchp.siemens.de>
In-Reply-To: <20110419125648.GA17780@curry.mchp.siemens.de>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201104190920.25924.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 19 Apr 2011 09:22:26 -0400 (EDT)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Apr 2011 13:22:27 -0000

On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote:
> On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > > Author: jhb
> > > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > > New Revision: 218277
> > > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > > 
> > > > > > Log:
> > > > > >   MFC 217075:
> > > > > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > > > > >   interrupt config hooks to execute.
> > > > > >   
> > > > > >   To preserve the KBI, I did not renumber priorities but simply removed
> > > > > >   PCONFIG.
> > > > > > 
> > > > > > Modified:
> > > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > > >   stable/7/sys/sys/priority.h
> > > > > > Directory Properties:
> > > > > >   stable/7/sys/   (props changed)
> > > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > > 
> > > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > > 
> > > > ==============================================================================
> > > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> > > > (r218276)
> > > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> > > > (r218277)
> > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > > >  	warned = 0;
> > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > >  		    EWOULDBLOCK) {
> > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > >  			warned++;
> > > > > 
> > > > > 
> > > > > This broke several of my machines in a somewhat strange way:
> > > > > 
> > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > > their boot device anymore. What they all got in common is:
> > > > > 
> > > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > > - one SCSI CDROM drive (cd0)
> > > > > 
> > > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > > detected at all. Other machines with a similar configuration
> > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > > any problems. So I simply removed the CDROM drives on the 4
> > > > > machines in question and they all booted again.
> > > > > 
> > > > > Today I decided to dig into this and after reverting(*) the
> > > > > above change, they worked with the CDROM again. I have cross-
> > > > > checked it 3 times. No idea what's happening here...
> > > > > 
> > > > > 	-Andre
> > > > > 
> > > > > (*) To be honest, I use this patch so I had to modify only one file:
> > > > > 
> > > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > > > > @@ -108,7 +108,7 @@
> > > > >  	warned = 0;
> > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > >  		    EWOULDBLOCK) {
> > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > >  			warned++;
> > > > 
> > > > Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 
> > > > dmesg might be nice to look at if possible.
> > > 
> > > OK, I have set up a machine for testing. In my other mail
> > > I was wrong saying that the pass devices appear when using
> > > the problematic kernel...
> > > 
> > > Here are the dmesgs:
> > > 
> > > - dmesg_bad is the original kernel as of Friday
> > > - dmesg_ok is the patched kernel (see above) as of Friday
> > > - dmesg.diff is the diff between both
> > > 
> > > If you want me to try something just tell me...
> > 
> > Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail the
> > mount and drop into ddb and then get 'ps' output?
> 
> As soon as I include the debugger into the kernel the problem
> is gone. I have double-checked it two times now: With debugger
> the drives are detected, without debugger mostly (but not always)
> not.
> 
> I currently have it running in an endless rebooting loop hoping,
> that it fails eventually...

Hummm.  This seems like it is a timing related race. :(

-- 
John Baldwin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Apr 19 13:26:51 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C9F5A106564A;
	Tue, 19 Apr 2011 13:26:51 +0000 (UTC)
	(envelope-from Andre.Albsmeier@siemens.com)
Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28])
	by mx1.freebsd.org (Postfix) with ESMTP id 4CBCD8FC16;
	Tue, 19 Apr 2011 13:26:51 +0000 (UTC)
Received: from mail1.siemens.de (localhost [127.0.0.1])
	by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3JDQoMM003051;
	Tue, 19 Apr 2011 15:26:50 +0200
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130])
	by mail1.siemens.de (8.13.6/8.13.6) with ESMTP id p3JDQoR7016930;
	Tue, 19 Apr 2011 15:26:50 +0200
Received: (from localhost)
	by curry.mchp.siemens.de (8.14.4/8.14.4) id p3JDQorH020367;
Date: Tue, 19 Apr 2011 15:26:50 +0200
From: Andre Albsmeier <Andre.Albsmeier@siemens.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20110419132650.GA17934@curry.mchp.siemens.de>
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104180918.26054.jhb@freebsd.org>
	<20110419125648.GA17780@curry.mchp.siemens.de>
	<201104190920.25924.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201104190920.25924.jhb@freebsd.org>
X-Echelon: <censored>
X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses!
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Apr 2011 13:26:51 -0000

On Tue, 19-Apr-2011 at 15:20:25 +0200, John Baldwin wrote:
> On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote:
> > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > > > Author: jhb
> > > > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > > > New Revision: 218277
> > > > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > > > 
> > > > > > > Log:
> > > > > > >   MFC 217075:
> > > > > > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > > > > > >   interrupt config hooks to execute.
> > > > > > >   
> > > > > > >   To preserve the KBI, I did not renumber priorities but simply removed
> > > > > > >   PCONFIG.
> > > > > > > 
> > > > > > > Modified:
> > > > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > > > >   stable/7/sys/sys/priority.h
> > > > > > > Directory Properties:
> > > > > > >   stable/7/sys/   (props changed)
> > > > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > > > 
> > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > > > 
> > > > > ==============================================================================
> > > > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> > > > > (r218276)
> > > > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> > > > > (r218277)
> > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > > > >  	warned = 0;
> > > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > >  		    EWOULDBLOCK) {
> > > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > > >  			warned++;
> > > > > > 
> > > > > > 
> > > > > > This broke several of my machines in a somewhat strange way:
> > > > > > 
> > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > > > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > > > their boot device anymore. What they all got in common is:
> > > > > > 
> > > > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > > > - one SCSI CDROM drive (cd0)
> > > > > > 
> > > > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > > > detected at all. Other machines with a similar configuration
> > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > > > any problems. So I simply removed the CDROM drives on the 4
> > > > > > machines in question and they all booted again.
> > > > > > 
> > > > > > Today I decided to dig into this and after reverting(*) the
> > > > > > above change, they worked with the CDROM again. I have cross-
> > > > > > checked it 3 times. No idea what's happening here...
> > > > > > 
> > > > > > 	-Andre
> > > > > > 
> > > > > > (*) To be honest, I use this patch so I had to modify only one file:
> > > > > > 
> > > > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > > > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > > > > > @@ -108,7 +108,7 @@
> > > > > >  	warned = 0;
> > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > >  		    EWOULDBLOCK) {
> > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > >  			warned++;
> > > > > 
> > > > > Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 
> > > > > dmesg might be nice to look at if possible.
> > > > 
> > > > OK, I have set up a machine for testing. In my other mail
> > > > I was wrong saying that the pass devices appear when using
> > > > the problematic kernel...
> > > > 
> > > > Here are the dmesgs:
> > > > 
> > > > - dmesg_bad is the original kernel as of Friday
> > > > - dmesg_ok is the patched kernel (see above) as of Friday
> > > > - dmesg.diff is the diff between both
> > > > 
> > > > If you want me to try something just tell me...
> > > 
> > > Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail the
> > > mount and drop into ddb and then get 'ps' output?
> > 
> > As soon as I include the debugger into the kernel the problem
> > is gone. I have double-checked it two times now: With debugger
> > the drives are detected, without debugger mostly (but not always)
> > not.
> > 
> > I currently have it running in an endless rebooting loop hoping,
> > that it fails eventually...
> 
> Hummm.  This seems like it is a timing related race. :(

Yes, especially since it does not fail reliably -- even when
using a kernel without debugger...

	-Andre

> 
> -- 
> John Baldwin

-- 
C:\>WIN
The computer obeys and wins.
You lose and Bill collects.

From owner-freebsd-scsi@FreeBSD.ORG  Wed Apr 20 05:50:54 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4BBAF106564A
	for <scsi@freebsd.org>; Wed, 20 Apr 2011 05:50:54 +0000 (UTC)
	(envelope-from Andre.Albsmeier@siemens.com)
Received: from thoth.sbs.de (thoth.sbs.de [192.35.17.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 6326C8FC14
	for <scsi@freebsd.org>; Wed, 20 Apr 2011 05:50:53 +0000 (UTC)
Received: from mail2.siemens.de (localhost [127.0.0.1])
	by thoth.sbs.de (8.13.6/8.13.6) with ESMTP id p3K5WRQh015819;
	Wed, 20 Apr 2011 07:32:27 +0200
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130])
	by mail2.siemens.de (8.13.6/8.13.6) with ESMTP id p3K5WQqJ024901;
	Wed, 20 Apr 2011 07:32:26 +0200
Received: (from localhost)
	by curry.mchp.siemens.de (8.14.4/8.14.4) id p3K5WQf5023278;
Date: Wed, 20 Apr 2011 07:32:26 +0200
From: Andre Albsmeier <Andre.Albsmeier@siemens.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20110420053226.GA22854@curry.mchp.siemens.de>
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104180918.26054.jhb@freebsd.org>
	<20110419125648.GA17780@curry.mchp.siemens.de>
	<201104190920.25924.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201104190920.25924.jhb@freebsd.org>
X-Echelon: <censored>
X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses!
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Apr 2011 05:50:54 -0000

On Tue, 19-Apr-2011 at 15:20:25 +0200, John Baldwin wrote:
> On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote:
> > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > > > Author: jhb
> > > > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > > > New Revision: 218277
> > > > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > > > 
> > > > > > > Log:
> > > > > > >   MFC 217075:
> > > > > > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > > > > > >   interrupt config hooks to execute.
> > > > > > >   
> > > > > > >   To preserve the KBI, I did not renumber priorities but simply removed
> > > > > > >   PCONFIG.
> > > > > > > 
> > > > > > > Modified:
> > > > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > > > >   stable/7/sys/sys/priority.h
> > > > > > > Directory Properties:
> > > > > > >   stable/7/sys/   (props changed)
> > > > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > > > 
> > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > > > 
> > > > > ==============================================================================
> > > > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> > > > > (r218276)
> > > > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> > > > > (r218277)
> > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > > > >  	warned = 0;
> > > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > >  		    EWOULDBLOCK) {
> > > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > > >  			warned++;
> > > > > > 
> > > > > > 
> > > > > > This broke several of my machines in a somewhat strange way:
> > > > > > 
> > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > > > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > > > their boot device anymore. What they all got in common is:
> > > > > > 
> > > > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > > > - one SCSI CDROM drive (cd0)
> > > > > > 
> > > > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > > > detected at all. Other machines with a similar configuration
> > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > > > any problems. So I simply removed the CDROM drives on the 4
> > > > > > machines in question and they all booted again.
> > > > > > 
> > > > > > Today I decided to dig into this and after reverting(*) the
> > > > > > above change, they worked with the CDROM again. I have cross-
> > > > > > checked it 3 times. No idea what's happening here...
> > > > > > 
> > > > > > 	-Andre
> > > > > > 
> > > > > > (*) To be honest, I use this patch so I had to modify only one file:
> > > > > > 
> > > > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > > > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > > > > > @@ -108,7 +108,7 @@
> > > > > >  	warned = 0;
> > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > >  		    EWOULDBLOCK) {
> > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > >  			warned++;
> > > > > 
> > > > > Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 
> > > > > dmesg might be nice to look at if possible.
> > > > 
> > > > OK, I have set up a machine for testing. In my other mail
> > > > I was wrong saying that the pass devices appear when using
> > > > the problematic kernel...
> > > > 
> > > > Here are the dmesgs:
> > > > 
> > > > - dmesg_bad is the original kernel as of Friday
> > > > - dmesg_ok is the patched kernel (see above) as of Friday
> > > > - dmesg.diff is the diff between both
> > > > 
> > > > If you want me to try something just tell me...
> > > 
> > > Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail the
> > > mount and drop into ddb and then get 'ps' output?
> > 
> > As soon as I include the debugger into the kernel the problem
> > is gone. I have double-checked it two times now: With debugger
> > the drives are detected, without debugger mostly (but not always)
> > not.
> > 
> > I currently have it running in an endless rebooting loop hoping,
> > that it fails eventually...
> 
> Hummm.  This seems like it is a timing related race. :(

Success! Sometimes at night it finally panic'ed even with the
debugger in the kernel. Here is the output of 'ps' and some other
commands I remembered (no idea if any of these make sense in this
context :-)). It is still in this state with the serial console
attached so just tell me what to type ;-).


KDB: enter: manual escape to debugger
[thread pid 1 tid 100001 ]
Stopped at      kdb_enter_why+0x3b:     xorl    %eax,%eax
db> ps
  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
   35     0     0     0  RL                          [softdepflush]
   34     0     0     0  RL                          [syncer]
   33     0     0     0  RL                          [vnlru]
   32     0     0     0  RL                          [bufdaemon]
   31     0     0     0  RL                          [pagezero]
   30     0     0     0  RL                          [idlepoll]
   29     0     0     0  RL                          [vmdaemon]
   28     0     0     0  RL                          [pagedaemon]
   27     0     0     0  WL                          [irq1: atkbd0]
   26     0     0     0  WL                          [swi0: uart uart]
   25     0     0     0  SL      -        0xc182a63c [fdc0]
   24     0     0     0  SL      idle     0xc1829600 [aic_recovery0]
   23     0     0     0  WL                          [irq11: ahc0]
   22     0     0     0  SL      idle     0xc1829600 [aic_recovery0]
   21     0     0     0  WL                          [irq10: fxp0]
   20     0     0     0  WL                          [irq9: acpi0 intsmb0]
   19     0     0     0  SL      -        0xc181b800 [kqueue taskq]
   18     0     0     0  WL                          [swi6: task queue]
   17     0     0     0  WL                          [swi6: Giant taskq]
--More--            9     0     0     0  RL                          [thread taskq]
   16     0     0     0  WL                          [swi5: Fast task queue]
   15     0     0     0  WL                          [swi2: cambio]
    8     0     0     0  SL      ccb_scan 0xc0766714 [xpt_thrd]
    7     0     0     0  SL      -        0xc181bd80 [acpi_task_2]
    6     0     0     0  SL      -        0xc181bd80 [acpi_task_1]
    5     0     0     0  SL      -        0xc181bd80 [acpi_task_0]
   14     0     0     0  SL      -        0xc077be54 [yarrow]
    4     0     0     0  SL      -        0xc077942c [g_down]
    3     0     0     0  SL      -        0xc0779428 [g_up]
    2     0     0     0  SL      -        0xc0779420 [g_event]
   13     0     0     0  WL                          [swi3: vm]
   12     0     0     0  LL     *Giant    0xc1821dc0 [swi4: clock]
   11     0     0     0  WL                          [swi1: net]
   10     0     0     0  RL                          [idle]
    1     0     0     0  RL      CPU 0               [swapper]
    0     0     0     0  SLs     sched    0xc07794c0 [swapper]
db> show threads
  100035 (0xc1a474c0)  fork_trampoline() at fork_trampoline
  100034 (0xc19a8000)  fork_trampoline() at fork_trampoline
  100033 (0xc19a8260)  fork_trampoline() at fork_trampoline
  100032 (0xc19a84c0)  fork_trampoline() at fork_trampoline
  100031 (0xc19a8720)  fork_trampoline() at fork_trampoline
  100030 (0xc19a8980)  fork_trampoline() at fork_trampoline
  100029 (0xc19a8be0)  fork_trampoline() at fork_trampoline
  100028 (0xc19a9000)  fork_trampoline() at fork_trampoline
  100027 (0xc19a9260)  fork_trampoline() at fork_trampoline
  100026 (0xc19a94c0)  fork_trampoline() at fork_trampoline
  100025 (0xc19a9720)  sched_switch(c19a9720,0,1,7a3b3f3c,11,...) at sched_switch+0xa0
  100024 (0xc1855720)  sched_switch(c1855720,0,1,2419c5c9,11,...) at sched_switch+0xa0
  100023 (0xc1855980)  sched_switch(c1855980,0,1,b2704d4a,11,...) at sched_switch+0xa0
  100022 (0xc1855be0)  sched_switch(c1855be0,0,1,2419b12b,11,...) at sched_switch+0xa0
  100021 (0xc18ad000)  fork_trampoline() at fork_trampoline
  100020 (0xc18ad260)  fork_trampoline() at fork_trampoline
  100019 (0xc18ad4c0)  sched_switch(c18ad4c0,0,1,241f3621,11,...) at sched_switch+0xa0
  100018 (0xc18ad720)  fork_trampoline() at fork_trampoline
  100017 (0xc18ad980)  fork_trampoline() at fork_trampoline
  100016 (0xc18adbe0)  sched_switch(c18adbe0,0,4,b27f7ade,11,...) at sched_switch+0xa0
--More--          100015 (0xc183e260)  fork_trampoline() at fork_trampoline
  100014 (0xc183e4c0)  sched_switch(c183e4c0,0,1,b270879a,11,...) at sched_switch+0xa0
  100013 (0xc183e720)  sched_switch(c183e720,0,1,24199731,11,...) at sched_switch+0xa0
  100012 (0xc183e980)  sched_switch(c183e980,0,1,241f0bf7,11,...) at sched_switch+0xa0
  100011 (0xc183ebe0)  sched_switch(c183ebe0,0,1,241ef695,11,...) at sched_switch+0xa0
  100010 (0xc1855000)  sched_switch(c1855000,0,1,241ee335,11,...) at sched_switch+0xa0
  100009 (0xc1855260)  sched_switch(c1855260,0,1,b01ec712,11,...) at sched_switch+0xa0
  100008 (0xc18554c0)  sched_switch(c18554c0,0,1,241960e3,11,...) at sched_switch+0xa0
  100007 (0xc183d000)  sched_switch(c183d000,0,1,241943af,11,...) at sched_switch+0xa0
  100006 (0xc183d260)  sched_switch(c183d260,0,1,b01ecf55,11,...) at sched_switch+0xa0
  100005 (0xc183d4c0)  fork_trampoline() at fork_trampoline
  100004 (0xc183d720)  sched_switch(c183d720,0,1,b357cafa,11,...) at sched_switch+0xa0
  100003 (0xc183d980)  fork_trampoline() at fork_trampoline
  100002 (0xc183dbe0)  sched_switch(c183dbe0,0,6,b2703e6d,11,...) at sched_switch+0xa0
  100001 (0xc183e000)  kdb_enter_why(c06feb01,c0708b8f,ffffffff,c17e7b6c,c06bf6f1,...) at kdb_enter_why+0x3b
  100000 (0xc07797a0)  sched_switch(c07797a0,0,1,b28c7b89,11,...) at sched_switch+0xa0
db> show thread
Thread 100001 at 0xc183e000:
 proc (pid 1): 0xc183c000
 flags: 0x10005  pflags: 0
 state: RUNNING (CPU 0)
 priority: 52
db> show geom
class: FD (0xc0758720)
  geom: fd0 (0xc19e9880), rank=1
    provider: fd0 (0xc19e9800), access=r0w0e0
      consumer: 0xc19e4280 (fd0), access=r0w0e0

class: DEV (0xc073bca0)
  geom: fd0 (0xc19e9680), rank=2
    consumer: 0xc19e4280 (fd0), access=r0w0e0

class: PART (0xc073c4a0)

class: VFS (0xc073c3a0)

class: MBR (0xc073c320)

class: MBREXT (0xc073c2c0)

class: BSD (0xc073bbc0)

class: MD (0xc0737400)

class: SWAP (0xc0754e20)

class: DISK (0xc073bda0)

db> trace
Tracing pid 1 tid 100001 td 0xc183e000
kdb_enter_why(c06feb01,c0708b8f,ffffffff,c17e7b6c,c06bf6f1,...) at kdb_enter_why+0x3b
scgetc(c07866e0,1,c07790c4,c07825e0,c17e7bd4,...) at scgetc+0x47d
sc_cngetc(c0739ea0,0,c17e7ba0,c059568a,c17e7bc0,...) at sc_cngetc+0xe1
cncheckc(c17e7bc0,c05ddd55,0,0,c17e7c53,...) at cncheckc+0x58
cngetc(0,0,c17e7c53,c17e7bd4,c1a411b0,...) at cngetc+0x1a
gets(c17e7bd4,80,1,0,0,...) at gets+0x25
vfs_mountroot_ask(c1a411b0,c0709785,c0715a97,1,c05c3910,...) at vfs_mountroot_ask+0x72
vfs_mountroot(0,0,0,0,0,...) at vfs_mountroot+0x39c
start_init(0,c17e7d38,0,0,0,...) at start_init+0x3c
fork_exit(c0504f70,0,c17e7d38) at fork_exit+0x94
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc17e7d70, ebp = 0 ---
db> show proc 8
Process 8 (xpt_thrd) at 0xc183c2dc:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0xc07794c0
 ABI: null
 threads: 1
100013                   D       ccb_scan 0xc0766714 [xpt_thrd]
db> show proc 23
Process 23 (irq11: ahc0) at 0xc1996894:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0xc07794c0
 ABI: null
 threads: 1
100023                   I                           [irq11: ahc0]


Thanks,

	-Andre

From owner-freebsd-scsi@FreeBSD.ORG  Sat Apr 23 23:47:17 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1AEDF1065670
	for <freebsd-scsi@freebsd.org>; Sat, 23 Apr 2011 23:47:17 +0000 (UTC)
	(envelope-from freebsdml@nmacleod.com)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
	by mx1.freebsd.org (Postfix) with ESMTP id E5DFF8FC08
	for <freebsd-scsi@freebsd.org>; Sat, 23 Apr 2011 23:47:16 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
	by sam.nabble.com with esmtp (Exim 4.69)
	(envelope-from <freebsdml@nmacleod.com>) id 1QDmGE-0000jr-80
	for freebsd-scsi@freebsd.org; Sat, 23 Apr 2011 16:29:10 -0700
Date: Sat, 23 Apr 2011 16:29:10 -0700 (PDT)
From: milhousevh <freebsdml@nmacleod.com>
To: freebsd-scsi@freebsd.org
Message-ID: <1303601350238-4335508.post@n5.nabble.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: mps0 driver - idle/standby/sleep (ie. spin down) supported (SATA
 drives)?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Apr 2011 23:47:17 -0000

Hi, I've searched but am unable to find any discussion (good or bad) relating
to idle, standby and sleep support in the mps0 driver. Presumably this means
it's supported and working, or nobody is using it if it's not!

I have an LSI 9211-8i flashed with the latest IT firmware, it's working
great in FreeBSD 8.2 and 9.0 (using zfsguru distribution) with 8 Samsung
drives on an AMD64-based HP Proliant Microserver N36L.

I would like to put the drives into standby (spin down) after they have been
idle for 30 minutes, however executing "camcontrol idle da0 -t 30" (where
da0 is a Samsung F4 2TB disk) results in the following response in FreeBSD
8.2 and 9.0:

[root@zfsguru /]# camcontrol idle da0 -t 30
(pass0:mps0:0:0:0): IDLE. ACB: e3 00 00 00 00 40 00 00 00 00 06 00
(pass0:mps0:0:0:0): CAM status: Function Not Available

standby and sleep are similar:

[root@zfsguru /]# camcontrol standby da0 -t 30
(pass0:mps0:0:0:0): STANDBY. ACB: e2 00 00 00 00 40 00 00 00 00 06 00
(pass0:mps0:0:0:0): CAM status: Function Not Available
[root@zfsguru /]# camcontrol sleep da0 -t 30
(pass0:mps0:0:0:0): SLEEP. ACB: e6 00 00 00 00 40 00 00 00 00 00 00
(pass0:mps0:0:0:0): CAM status: Function Not Available

I have the following entries in dmesg relating to mps0:

[root@zfsguru /]# dmesg|grep mps0
mps0: <LSI SAS2008> port 0xd000-0xd0ff mem
0xfe8fc000-0xfe8fffff,0xfe880000-0xfe8bffff irq 18 at device 0.0 on pci2
mps0: Firmware: 09.00.00.00
mps0: IOCCapabilities:
1285c&lt;ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc&gt;
da0 at mps0 bus 0 scbus0 target 0 lun 0
da1 at mps0 bus 0 scbus0 target 1 lun 0
da2 at mps0 bus 0 scbus0 target 2 lun 0
da3 at mps0 bus 0 scbus0 target 3 lun 0
da4 at mps0 bus 0 scbus0 target 4 lun 0
da5 at mps0 bus 0 scbus0 target 5 lun 0
da6 at mps0 bus 0 scbus0 target 6 lun 0
da7 at mps0 bus 0 scbus0 target 7 lun 0

Is it possible to configure idle/standby/sleep with the mps0 driver, as I
would like to spin down my disks after a configurable period to save
power/noise. Or am I attempting to configure the standby/idle disk timeouts
in the wrong way?

I'd rather not depend on APM for the Samsung F4 disks as the latest APM
profile used by the F4 disks is extremely aggressive, spinning down the
disks only a few seconds after each access.

Using ataidle -I 15 -S 30 works absolutely fine on the F4 disk in FreeBSD
7.2 (spinning the disks down after 30 minutes inactivity) but this is when
the F4 disks are connected to a "standard" motherboard SATA controller (eg.
AMD SB700), and obviously ataidle isn't working at all for the LSI 9211-8i
with mps0 driver.

Any guidance on this problem will be much appreciated!

Many thanks
Neil

--
View this message in context: http://freebsd.1045724.n5.nabble.com/mps0-driver-idle-standby-sleep-ie-spin-down-supported-SATA-drives-tp4335508p4335508.html
Sent from the freebsd-scsi mailing list archive at Nabble.com.