From owner-freebsd-scsi@FreeBSD.ORG Mon Apr 18 11:07:08 2011 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 418691065674 for ; Mon, 18 Apr 2011 11:07:08 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2650D8FC24 for ; Mon, 18 Apr 2011 11:07:08 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p3IB78FL019606 for ; Mon, 18 Apr 2011 11:07:08 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p3IB77Be019604 for freebsd-scsi@FreeBSD.org; Mon, 18 Apr 2011 11:07:07 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 18 Apr 2011 11:07:07 GMT Message-Id: <201104181107.p3IB77Be019604@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2011 11:07:08 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153361 scsi [ciss] Smart Array 5300 boot/detect drive problem o kern/152250 scsi [ciss] [patch] Kernel panic when hw.ciss.expose_hidden o kern/151564 scsi [ciss] ciss(4) should increase CISS_MAX_LOGICAL to 10 o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/141934 scsi [cam] [patch] add support for SEAGATE DAT Scopion 130 o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o bin/57088 scsi [cam] [patch] for a possible fd leak in libcam.c o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 45 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Apr 18 16:56:30 2011 Return-Path: Delivered-To: scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4ADC3106566C; Mon, 18 Apr 2011 16:56:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1EEF38FC16; Mon, 18 Apr 2011 16:56:30 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C66AE46B06; Mon, 18 Apr 2011 12:56:29 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3B6308A02B; Mon, 18 Apr 2011 12:56:29 -0400 (EDT) From: John Baldwin To: Andre Albsmeier Date: Mon, 18 Apr 2011 09:18:25 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104151235.05114.jhb@freebsd.org> <20110418113657.GA6080@curry.mchp.siemens.de> In-Reply-To: <20110418113657.GA6080@curry.mchp.siemens.de> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201104180918.26054.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 18 Apr 2011 12:56:29 -0400 (EDT) Cc: "svn-src-stable-7@freebsd.org" , scsi@freebsd.org Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2011 16:56:30 -0000 On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > Author: jhb > > > > Date: Fri Feb 4 14:44:59 2011 > > > > New Revision: 218277 > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > Log: > > > > MFC 217075: > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > interrupt config hooks to execute. > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > PCONFIG. > > > > > > > > Modified: > > > > stable/7/sys/kern/subr_autoconf.c > > > > stable/7/sys/sys/priority.h > > > > Directory Properties: > > > > stable/7/sys/ (props changed) > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > ============================================================================== > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > (r218276) > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > (r218277) > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > warned = 0; > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > EWOULDBLOCK) { > > > > mtx_unlock(&intr_config_hook_lock); > > > > warned++; > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > their boot device anymore. What they all got in common is: > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > - two SCSI harddisks (da0 and da1) of various brands > > > - one SCSI CDROM drive (cd0) > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > detected at all. Other machines with a similar configuration > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > any problems. So I simply removed the CDROM drives on the 4 > > > machines in question and they all booted again. > > > > > > Today I decided to dig into this and after reverting(*) the > > > above change, they worked with the CDROM again. I have cross- > > > checked it 3 times. No idea what's happening here... > > > > > > -Andre > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > @@ -108,7 +108,7 @@ > > > warned = 0; > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > EWOULDBLOCK) { > > > mtx_unlock(&intr_config_hook_lock); > > > warned++; > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > dmesg might be nice to look at if possible. > > OK, I have set up a machine for testing. In my other mail > I was wrong saying that the pass devices appear when using > the problematic kernel... > > Here are the dmesgs: > > - dmesg_bad is the original kernel as of Friday > - dmesg_ok is the patched kernel (see above) as of Friday > - dmesg.diff is the diff between both > > If you want me to try something just tell me... Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the mount and drop into ddb and then get 'ps' output? I think the CAM boot probe is broken a bit. xpt_rescan_done() always calls xpt_release_boot(), but we don't hold the boot for each bus added while buses_config_done is 0, so it seems CAM only waits for at least one bus to rescan before it lets the boot continue? This seems wrong (i.e. one would think it would let all the busses added before this point scan before continuing). However, in your dmesg, it starts to print out an announcement for a pass device before it starts mounting root, so it seems that xpt is finishing too early somehow. -- John Baldwin From owner-freebsd-scsi@FreeBSD.ORG Mon Apr 18 17:33:35 2011 Return-Path: Delivered-To: scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B2FA1065674 for ; Mon, 18 Apr 2011 17:33:35 +0000 (UTC) (envelope-from Andre.Albsmeier@siemens.com) Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28]) by mx1.freebsd.org (Postfix) with ESMTP id C37148FC13 for ; Mon, 18 Apr 2011 17:33:34 +0000 (UTC) Received: from mail3.siemens.de (localhost [127.0.0.1]) by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3IHKWqG001226; Mon, 18 Apr 2011 19:20:32 +0200 Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130]) by mail3.siemens.de (8.13.6/8.13.6) with ESMTP id p3IHKW5N000886; Mon, 18 Apr 2011 19:20:32 +0200 Received: (from localhost) by curry.mchp.siemens.de (8.14.4/8.14.4) id p3IHKWUc017794; Date: Mon, 18 Apr 2011 19:20:32 +0200 From: Andre Albsmeier To: John Baldwin Message-ID: <20110418172032.GA8849@curry.mchp.siemens.de> References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104151235.05114.jhb@freebsd.org> <20110418113657.GA6080@curry.mchp.siemens.de> <201104180918.26054.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201104180918.26054.jhb@freebsd.org> X-Echelon: X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses! User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "svn-src-stable-7@freebsd.org" , "scsi@freebsd.org" Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2011 17:33:35 -0000 On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote: > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > > Author: jhb > > > > > Date: Fri Feb 4 14:44:59 2011 > > > > > New Revision: 218277 > > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > > > Log: > > > > > MFC 217075: > > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > > interrupt config hooks to execute. > > > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > > PCONFIG. > > > > > > > > > > Modified: > > > > > stable/7/sys/kern/subr_autoconf.c > > > > > stable/7/sys/sys/priority.h > > > > > Directory Properties: > > > > > stable/7/sys/ (props changed) > > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > > > ============================================================================== > > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > > (r218276) > > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > > (r218277) > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > > warned = 0; > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > EWOULDBLOCK) { > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > warned++; > > > > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > > their boot device anymore. What they all got in common is: > > > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > > - two SCSI harddisks (da0 and da1) of various brands > > > > - one SCSI CDROM drive (cd0) > > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > > detected at all. Other machines with a similar configuration > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > > any problems. So I simply removed the CDROM drives on the 4 > > > > machines in question and they all booted again. > > > > > > > > Today I decided to dig into this and after reverting(*) the > > > > above change, they worked with the CDROM again. I have cross- > > > > checked it 3 times. No idea what's happening here... > > > > > > > > -Andre > > > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > > @@ -108,7 +108,7 @@ > > > > warned = 0; > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > EWOULDBLOCK) { > > > > mtx_unlock(&intr_config_hook_lock); > > > > warned++; > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > > dmesg might be nice to look at if possible. > > > > OK, I have set up a machine for testing. In my other mail > > I was wrong saying that the pass devices appear when using > > the problematic kernel... > > > > Here are the dmesgs: > > > > - dmesg_bad is the original kernel as of Friday > > - dmesg_ok is the patched kernel (see above) as of Friday > > - dmesg.diff is the diff between both > > > > If you want me to try something just tell me... > > Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the I tried this already. Normally, I use 500 (which works on all machines) and retried with 5000. No change... > mount and drop into ddb and then get 'ps' output? I will try it tomorrow. > > I think the CAM boot probe is broken a bit. xpt_rescan_done() always calls > xpt_release_boot(), but we don't hold the boot for each bus added while > buses_config_done is 0, so it seems CAM only waits for at least one bus to > rescan before it lets the boot continue? This seems wrong (i.e. one would > think it would let all the busses added before this point scan before > continuing). Hmm, I got only one SCSI bus in that machine. Of my 17 machines, 15 are SCSI-based. Only 4 had this problem. One of them has gut two busses, the others only one. > > However, in your dmesg, it starts to print out an announcement for a pass > device before it starts mounting root, so it seems that xpt is finishing too > early somehow. Yes, I saw this as well. And in the "good" dmesg there are these two lines which look a bit screwn up: GEOM: nda0 at ahc0 bus 0 target 0 lun 0 ... ew disk da1 Don't know if this indicates some problem... As I said, when I first run into this problem last week, I _THINK_ I saw the pass devices appear at least on one broken box. But I won't swear on this. Another thing I remember is that there was at least one problematic box which booted successfully on the second try. Thanks, -Andre > > -- > John Baldwin -- Linux: Sozialismus, der nicht funktioniert From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 19 12:56:52 2011 Return-Path: Delivered-To: scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E8E91065674; Tue, 19 Apr 2011 12:56:50 +0000 (UTC) (envelope-from Andre.Albsmeier@siemens.com) Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28]) by mx1.freebsd.org (Postfix) with ESMTP id 362928FC22; Tue, 19 Apr 2011 12:56:49 +0000 (UTC) Received: from mail2.siemens.de (localhost [127.0.0.1]) by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3JCumXC030826; Tue, 19 Apr 2011 14:56:48 +0200 Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130]) by mail2.siemens.de (8.13.6/8.13.6) with ESMTP id p3JCumIP006686; Tue, 19 Apr 2011 14:56:48 +0200 Received: (from localhost) by curry.mchp.siemens.de (8.14.4/8.14.4) id p3JCumrp020303; Date: Tue, 19 Apr 2011 14:56:48 +0200 From: Andre Albsmeier To: John Baldwin Message-ID: <20110419125648.GA17780@curry.mchp.siemens.de> References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104151235.05114.jhb@freebsd.org> <20110418113657.GA6080@curry.mchp.siemens.de> <201104180918.26054.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201104180918.26054.jhb@freebsd.org> X-Echelon: X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses! User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "svn-src-stable-7@freebsd.org" , "scsi@freebsd.org" Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Apr 2011 12:56:52 -0000 On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote: > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > > Author: jhb > > > > > Date: Fri Feb 4 14:44:59 2011 > > > > > New Revision: 218277 > > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > > > Log: > > > > > MFC 217075: > > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > > interrupt config hooks to execute. > > > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > > PCONFIG. > > > > > > > > > > Modified: > > > > > stable/7/sys/kern/subr_autoconf.c > > > > > stable/7/sys/sys/priority.h > > > > > Directory Properties: > > > > > stable/7/sys/ (props changed) > > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > > > ============================================================================== > > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > > (r218276) > > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > > (r218277) > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > > warned = 0; > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > EWOULDBLOCK) { > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > warned++; > > > > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > > their boot device anymore. What they all got in common is: > > > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > > - two SCSI harddisks (da0 and da1) of various brands > > > > - one SCSI CDROM drive (cd0) > > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > > detected at all. Other machines with a similar configuration > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > > any problems. So I simply removed the CDROM drives on the 4 > > > > machines in question and they all booted again. > > > > > > > > Today I decided to dig into this and after reverting(*) the > > > > above change, they worked with the CDROM again. I have cross- > > > > checked it 3 times. No idea what's happening here... > > > > > > > > -Andre > > > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > > @@ -108,7 +108,7 @@ > > > > warned = 0; > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > EWOULDBLOCK) { > > > > mtx_unlock(&intr_config_hook_lock); > > > > warned++; > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > > dmesg might be nice to look at if possible. > > > > OK, I have set up a machine for testing. In my other mail > > I was wrong saying that the pass devices appear when using > > the problematic kernel... > > > > Here are the dmesgs: > > > > - dmesg_bad is the original kernel as of Friday > > - dmesg_ok is the patched kernel (see above) as of Friday > > - dmesg.diff is the diff between both > > > > If you want me to try something just tell me... > > Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the > mount and drop into ddb and then get 'ps' output? As soon as I include the debugger into the kernel the problem is gone. I have double-checked it two times now: With debugger the drives are detected, without debugger mostly (but not always) not. I currently have it running in an endless rebooting loop hoping, that it fails eventually... -Andre > > I think the CAM boot probe is broken a bit. xpt_rescan_done() always calls > xpt_release_boot(), but we don't hold the boot for each bus added while > buses_config_done is 0, so it seems CAM only waits for at least one bus to > rescan before it lets the boot continue? This seems wrong (i.e. one would > think it would let all the busses added before this point scan before > continuing). > > However, in your dmesg, it starts to print out an announcement for a pass > device before it starts mounting root, so it seems that xpt is finishing too > early somehow. > > -- > John Baldwin -- UNIX is an operating system, OS/2 is half an operating system, Windows is a shell, and DOS is a bootsector virus. From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 19 13:22:27 2011 Return-Path: Delivered-To: scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEA97106566B; Tue, 19 Apr 2011 13:22:27 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A3AA38FC16; Tue, 19 Apr 2011 13:22:27 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 3D9F446B51; Tue, 19 Apr 2011 09:22:27 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B24A28A02A; Tue, 19 Apr 2011 09:22:26 -0400 (EDT) From: John Baldwin To: Andre Albsmeier Date: Tue, 19 Apr 2011 09:20:25 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104180918.26054.jhb@freebsd.org> <20110419125648.GA17780@curry.mchp.siemens.de> In-Reply-To: <20110419125648.GA17780@curry.mchp.siemens.de> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201104190920.25924.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 19 Apr 2011 09:22:26 -0400 (EDT) Cc: "svn-src-stable-7@freebsd.org" , "scsi@freebsd.org" Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Apr 2011 13:22:27 -0000 On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote: > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote: > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > > > Author: jhb > > > > > > Date: Fri Feb 4 14:44:59 2011 > > > > > > New Revision: 218277 > > > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > > > > > Log: > > > > > > MFC 217075: > > > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > > > interrupt config hooks to execute. > > > > > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > > > PCONFIG. > > > > > > > > > > > > Modified: > > > > > > stable/7/sys/kern/subr_autoconf.c > > > > > > stable/7/sys/sys/priority.h > > > > > > Directory Properties: > > > > > > stable/7/sys/ (props changed) > > > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > > > > > ============================================================================== > > > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > > > (r218276) > > > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > > > (r218277) > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > > > warned = 0; > > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > EWOULDBLOCK) { > > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > > warned++; > > > > > > > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > > > their boot device anymore. What they all got in common is: > > > > > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > > > - two SCSI harddisks (da0 and da1) of various brands > > > > > - one SCSI CDROM drive (cd0) > > > > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > > > detected at all. Other machines with a similar configuration > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > > > any problems. So I simply removed the CDROM drives on the 4 > > > > > machines in question and they all booted again. > > > > > > > > > > Today I decided to dig into this and after reverting(*) the > > > > > above change, they worked with the CDROM again. I have cross- > > > > > checked it 3 times. No idea what's happening here... > > > > > > > > > > -Andre > > > > > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > > > @@ -108,7 +108,7 @@ > > > > > warned = 0; > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > EWOULDBLOCK) { > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > warned++; > > > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > > > dmesg might be nice to look at if possible. > > > > > > OK, I have set up a machine for testing. In my other mail > > > I was wrong saying that the pass devices appear when using > > > the problematic kernel... > > > > > > Here are the dmesgs: > > > > > > - dmesg_bad is the original kernel as of Friday > > > - dmesg_ok is the patched kernel (see above) as of Friday > > > - dmesg.diff is the diff between both > > > > > > If you want me to try something just tell me... > > > > Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the > > mount and drop into ddb and then get 'ps' output? > > As soon as I include the debugger into the kernel the problem > is gone. I have double-checked it two times now: With debugger > the drives are detected, without debugger mostly (but not always) > not. > > I currently have it running in an endless rebooting loop hoping, > that it fails eventually... Hummm. This seems like it is a timing related race. :( -- John Baldwin From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 19 13:26:51 2011 Return-Path: Delivered-To: scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9F5A106564A; Tue, 19 Apr 2011 13:26:51 +0000 (UTC) (envelope-from Andre.Albsmeier@siemens.com) Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28]) by mx1.freebsd.org (Postfix) with ESMTP id 4CBCD8FC16; Tue, 19 Apr 2011 13:26:51 +0000 (UTC) Received: from mail1.siemens.de (localhost [127.0.0.1]) by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3JDQoMM003051; Tue, 19 Apr 2011 15:26:50 +0200 Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130]) by mail1.siemens.de (8.13.6/8.13.6) with ESMTP id p3JDQoR7016930; Tue, 19 Apr 2011 15:26:50 +0200 Received: (from localhost) by curry.mchp.siemens.de (8.14.4/8.14.4) id p3JDQorH020367; Date: Tue, 19 Apr 2011 15:26:50 +0200 From: Andre Albsmeier To: John Baldwin Message-ID: <20110419132650.GA17934@curry.mchp.siemens.de> References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104180918.26054.jhb@freebsd.org> <20110419125648.GA17780@curry.mchp.siemens.de> <201104190920.25924.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201104190920.25924.jhb@freebsd.org> X-Echelon: X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses! User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "svn-src-stable-7@freebsd.org" , "scsi@freebsd.org" Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Apr 2011 13:26:51 -0000 On Tue, 19-Apr-2011 at 15:20:25 +0200, John Baldwin wrote: > On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote: > > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote: > > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > > > > Author: jhb > > > > > > > Date: Fri Feb 4 14:44:59 2011 > > > > > > > New Revision: 218277 > > > > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > > > > > > > Log: > > > > > > > MFC 217075: > > > > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > > > > interrupt config hooks to execute. > > > > > > > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > > > > PCONFIG. > > > > > > > > > > > > > > Modified: > > > > > > > stable/7/sys/kern/subr_autoconf.c > > > > > > > stable/7/sys/sys/priority.h > > > > > > > Directory Properties: > > > > > > > stable/7/sys/ (props changed) > > > > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > > > > > > > ============================================================================== > > > > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > > > > (r218276) > > > > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > > > > (r218277) > > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > > > > warned = 0; > > > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > > EWOULDBLOCK) { > > > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > > > warned++; > > > > > > > > > > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > > > > their boot device anymore. What they all got in common is: > > > > > > > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > > > > - two SCSI harddisks (da0 and da1) of various brands > > > > > > - one SCSI CDROM drive (cd0) > > > > > > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > > > > detected at all. Other machines with a similar configuration > > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > > > > any problems. So I simply removed the CDROM drives on the 4 > > > > > > machines in question and they all booted again. > > > > > > > > > > > > Today I decided to dig into this and after reverting(*) the > > > > > > above change, they worked with the CDROM again. I have cross- > > > > > > checked it 3 times. No idea what's happening here... > > > > > > > > > > > > -Andre > > > > > > > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > > > > @@ -108,7 +108,7 @@ > > > > > > warned = 0; > > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > EWOULDBLOCK) { > > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > > warned++; > > > > > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > > > > dmesg might be nice to look at if possible. > > > > > > > > OK, I have set up a machine for testing. In my other mail > > > > I was wrong saying that the pass devices appear when using > > > > the problematic kernel... > > > > > > > > Here are the dmesgs: > > > > > > > > - dmesg_bad is the original kernel as of Friday > > > > - dmesg_ok is the patched kernel (see above) as of Friday > > > > - dmesg.diff is the diff between both > > > > > > > > If you want me to try something just tell me... > > > > > > Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the > > > mount and drop into ddb and then get 'ps' output? > > > > As soon as I include the debugger into the kernel the problem > > is gone. I have double-checked it two times now: With debugger > > the drives are detected, without debugger mostly (but not always) > > not. > > > > I currently have it running in an endless rebooting loop hoping, > > that it fails eventually... > > Hummm. This seems like it is a timing related race. :( Yes, especially since it does not fail reliably -- even when using a kernel without debugger... -Andre > > -- > John Baldwin -- C:\>WIN The computer obeys and wins. You lose and Bill collects. From owner-freebsd-scsi@FreeBSD.ORG Wed Apr 20 05:50:54 2011 Return-Path: Delivered-To: scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4BBAF106564A for ; Wed, 20 Apr 2011 05:50:54 +0000 (UTC) (envelope-from Andre.Albsmeier@siemens.com) Received: from thoth.sbs.de (thoth.sbs.de [192.35.17.2]) by mx1.freebsd.org (Postfix) with ESMTP id 6326C8FC14 for ; Wed, 20 Apr 2011 05:50:53 +0000 (UTC) Received: from mail2.siemens.de (localhost [127.0.0.1]) by thoth.sbs.de (8.13.6/8.13.6) with ESMTP id p3K5WRQh015819; Wed, 20 Apr 2011 07:32:27 +0200 Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130]) by mail2.siemens.de (8.13.6/8.13.6) with ESMTP id p3K5WQqJ024901; Wed, 20 Apr 2011 07:32:26 +0200 Received: (from localhost) by curry.mchp.siemens.de (8.14.4/8.14.4) id p3K5WQf5023278; Date: Wed, 20 Apr 2011 07:32:26 +0200 From: Andre Albsmeier To: John Baldwin Message-ID: <20110420053226.GA22854@curry.mchp.siemens.de> References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104180918.26054.jhb@freebsd.org> <20110419125648.GA17780@curry.mchp.siemens.de> <201104190920.25924.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201104190920.25924.jhb@freebsd.org> X-Echelon: X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses! User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "svn-src-stable-7@freebsd.org" , "scsi@freebsd.org" Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Apr 2011 05:50:54 -0000 On Tue, 19-Apr-2011 at 15:20:25 +0200, John Baldwin wrote: > On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote: > > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote: > > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > > > > Author: jhb > > > > > > > Date: Fri Feb 4 14:44:59 2011 > > > > > > > New Revision: 218277 > > > > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > > > > > > > Log: > > > > > > > MFC 217075: > > > > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > > > > interrupt config hooks to execute. > > > > > > > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > > > > PCONFIG. > > > > > > > > > > > > > > Modified: > > > > > > > stable/7/sys/kern/subr_autoconf.c > > > > > > > stable/7/sys/sys/priority.h > > > > > > > Directory Properties: > > > > > > > stable/7/sys/ (props changed) > > > > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > > > > > > > ============================================================================== > > > > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > > > > (r218276) > > > > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > > > > (r218277) > > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > > > > warned = 0; > > > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > > EWOULDBLOCK) { > > > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > > > warned++; > > > > > > > > > > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > > > > their boot device anymore. What they all got in common is: > > > > > > > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > > > > - two SCSI harddisks (da0 and da1) of various brands > > > > > > - one SCSI CDROM drive (cd0) > > > > > > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > > > > detected at all. Other machines with a similar configuration > > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > > > > any problems. So I simply removed the CDROM drives on the 4 > > > > > > machines in question and they all booted again. > > > > > > > > > > > > Today I decided to dig into this and after reverting(*) the > > > > > > above change, they worked with the CDROM again. I have cross- > > > > > > checked it 3 times. No idea what's happening here... > > > > > > > > > > > > -Andre > > > > > > > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > > > > @@ -108,7 +108,7 @@ > > > > > > warned = 0; > > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > > EWOULDBLOCK) { > > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > > warned++; > > > > > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > > > > dmesg might be nice to look at if possible. > > > > > > > > OK, I have set up a machine for testing. In my other mail > > > > I was wrong saying that the pass devices appear when using > > > > the problematic kernel... > > > > > > > > Here are the dmesgs: > > > > > > > > - dmesg_bad is the original kernel as of Friday > > > > - dmesg_ok is the patched kernel (see above) as of Friday > > > > - dmesg.diff is the diff between both > > > > > > > > If you want me to try something just tell me... > > > > > > Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the > > > mount and drop into ddb and then get 'ps' output? > > > > As soon as I include the debugger into the kernel the problem > > is gone. I have double-checked it two times now: With debugger > > the drives are detected, without debugger mostly (but not always) > > not. > > > > I currently have it running in an endless rebooting loop hoping, > > that it fails eventually... > > Hummm. This seems like it is a timing related race. :( Success! Sometimes at night it finally panic'ed even with the debugger in the kernel. Here is the output of 'ps' and some other commands I remembered (no idea if any of these make sense in this context :-)). It is still in this state with the serial console attached so just tell me what to type ;-). KDB: enter: manual escape to debugger [thread pid 1 tid 100001 ] Stopped at kdb_enter_why+0x3b: xorl %eax,%eax db> ps pid ppid pgrp uid state wmesg wchan cmd 35 0 0 0 RL [softdepflush] 34 0 0 0 RL [syncer] 33 0 0 0 RL [vnlru] 32 0 0 0 RL [bufdaemon] 31 0 0 0 RL [pagezero] 30 0 0 0 RL [idlepoll] 29 0 0 0 RL [vmdaemon] 28 0 0 0 RL [pagedaemon] 27 0 0 0 WL [irq1: atkbd0] 26 0 0 0 WL [swi0: uart uart] 25 0 0 0 SL - 0xc182a63c [fdc0] 24 0 0 0 SL idle 0xc1829600 [aic_recovery0] 23 0 0 0 WL [irq11: ahc0] 22 0 0 0 SL idle 0xc1829600 [aic_recovery0] 21 0 0 0 WL [irq10: fxp0] 20 0 0 0 WL [irq9: acpi0 intsmb0] 19 0 0 0 SL - 0xc181b800 [kqueue taskq] 18 0 0 0 WL [swi6: task queue] 17 0 0 0 WL [swi6: Giant taskq] --More-- 9 0 0 0 RL [thread taskq] 16 0 0 0 WL [swi5: Fast task queue] 15 0 0 0 WL [swi2: cambio] 8 0 0 0 SL ccb_scan 0xc0766714 [xpt_thrd] 7 0 0 0 SL - 0xc181bd80 [acpi_task_2] 6 0 0 0 SL - 0xc181bd80 [acpi_task_1] 5 0 0 0 SL - 0xc181bd80 [acpi_task_0] 14 0 0 0 SL - 0xc077be54 [yarrow] 4 0 0 0 SL - 0xc077942c [g_down] 3 0 0 0 SL - 0xc0779428 [g_up] 2 0 0 0 SL - 0xc0779420 [g_event] 13 0 0 0 WL [swi3: vm] 12 0 0 0 LL *Giant 0xc1821dc0 [swi4: clock] 11 0 0 0 WL [swi1: net] 10 0 0 0 RL [idle] 1 0 0 0 RL CPU 0 [swapper] 0 0 0 0 SLs sched 0xc07794c0 [swapper] db> show threads 100035 (0xc1a474c0) fork_trampoline() at fork_trampoline 100034 (0xc19a8000) fork_trampoline() at fork_trampoline 100033 (0xc19a8260) fork_trampoline() at fork_trampoline 100032 (0xc19a84c0) fork_trampoline() at fork_trampoline 100031 (0xc19a8720) fork_trampoline() at fork_trampoline 100030 (0xc19a8980) fork_trampoline() at fork_trampoline 100029 (0xc19a8be0) fork_trampoline() at fork_trampoline 100028 (0xc19a9000) fork_trampoline() at fork_trampoline 100027 (0xc19a9260) fork_trampoline() at fork_trampoline 100026 (0xc19a94c0) fork_trampoline() at fork_trampoline 100025 (0xc19a9720) sched_switch(c19a9720,0,1,7a3b3f3c,11,...) at sched_switch+0xa0 100024 (0xc1855720) sched_switch(c1855720,0,1,2419c5c9,11,...) at sched_switch+0xa0 100023 (0xc1855980) sched_switch(c1855980,0,1,b2704d4a,11,...) at sched_switch+0xa0 100022 (0xc1855be0) sched_switch(c1855be0,0,1,2419b12b,11,...) at sched_switch+0xa0 100021 (0xc18ad000) fork_trampoline() at fork_trampoline 100020 (0xc18ad260) fork_trampoline() at fork_trampoline 100019 (0xc18ad4c0) sched_switch(c18ad4c0,0,1,241f3621,11,...) at sched_switch+0xa0 100018 (0xc18ad720) fork_trampoline() at fork_trampoline 100017 (0xc18ad980) fork_trampoline() at fork_trampoline 100016 (0xc18adbe0) sched_switch(c18adbe0,0,4,b27f7ade,11,...) at sched_switch+0xa0 --More-- 100015 (0xc183e260) fork_trampoline() at fork_trampoline 100014 (0xc183e4c0) sched_switch(c183e4c0,0,1,b270879a,11,...) at sched_switch+0xa0 100013 (0xc183e720) sched_switch(c183e720,0,1,24199731,11,...) at sched_switch+0xa0 100012 (0xc183e980) sched_switch(c183e980,0,1,241f0bf7,11,...) at sched_switch+0xa0 100011 (0xc183ebe0) sched_switch(c183ebe0,0,1,241ef695,11,...) at sched_switch+0xa0 100010 (0xc1855000) sched_switch(c1855000,0,1,241ee335,11,...) at sched_switch+0xa0 100009 (0xc1855260) sched_switch(c1855260,0,1,b01ec712,11,...) at sched_switch+0xa0 100008 (0xc18554c0) sched_switch(c18554c0,0,1,241960e3,11,...) at sched_switch+0xa0 100007 (0xc183d000) sched_switch(c183d000,0,1,241943af,11,...) at sched_switch+0xa0 100006 (0xc183d260) sched_switch(c183d260,0,1,b01ecf55,11,...) at sched_switch+0xa0 100005 (0xc183d4c0) fork_trampoline() at fork_trampoline 100004 (0xc183d720) sched_switch(c183d720,0,1,b357cafa,11,...) at sched_switch+0xa0 100003 (0xc183d980) fork_trampoline() at fork_trampoline 100002 (0xc183dbe0) sched_switch(c183dbe0,0,6,b2703e6d,11,...) at sched_switch+0xa0 100001 (0xc183e000) kdb_enter_why(c06feb01,c0708b8f,ffffffff,c17e7b6c,c06bf6f1,...) at kdb_enter_why+0x3b 100000 (0xc07797a0) sched_switch(c07797a0,0,1,b28c7b89,11,...) at sched_switch+0xa0 db> show thread Thread 100001 at 0xc183e000: proc (pid 1): 0xc183c000 flags: 0x10005 pflags: 0 state: RUNNING (CPU 0) priority: 52 db> show geom class: FD (0xc0758720) geom: fd0 (0xc19e9880), rank=1 provider: fd0 (0xc19e9800), access=r0w0e0 consumer: 0xc19e4280 (fd0), access=r0w0e0 class: DEV (0xc073bca0) geom: fd0 (0xc19e9680), rank=2 consumer: 0xc19e4280 (fd0), access=r0w0e0 class: PART (0xc073c4a0) class: VFS (0xc073c3a0) class: MBR (0xc073c320) class: MBREXT (0xc073c2c0) class: BSD (0xc073bbc0) class: MD (0xc0737400) class: SWAP (0xc0754e20) class: DISK (0xc073bda0) db> trace Tracing pid 1 tid 100001 td 0xc183e000 kdb_enter_why(c06feb01,c0708b8f,ffffffff,c17e7b6c,c06bf6f1,...) at kdb_enter_why+0x3b scgetc(c07866e0,1,c07790c4,c07825e0,c17e7bd4,...) at scgetc+0x47d sc_cngetc(c0739ea0,0,c17e7ba0,c059568a,c17e7bc0,...) at sc_cngetc+0xe1 cncheckc(c17e7bc0,c05ddd55,0,0,c17e7c53,...) at cncheckc+0x58 cngetc(0,0,c17e7c53,c17e7bd4,c1a411b0,...) at cngetc+0x1a gets(c17e7bd4,80,1,0,0,...) at gets+0x25 vfs_mountroot_ask(c1a411b0,c0709785,c0715a97,1,c05c3910,...) at vfs_mountroot_ask+0x72 vfs_mountroot(0,0,0,0,0,...) at vfs_mountroot+0x39c start_init(0,c17e7d38,0,0,0,...) at start_init+0x3c fork_exit(c0504f70,0,c17e7d38) at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xc17e7d70, ebp = 0 --- db> show proc 8 Process 8 (xpt_thrd) at 0xc183c2dc: state: NORMAL uid: 0 gids: 0 parent: pid 0 at 0xc07794c0 ABI: null threads: 1 100013 D ccb_scan 0xc0766714 [xpt_thrd] db> show proc 23 Process 23 (irq11: ahc0) at 0xc1996894: state: NORMAL uid: 0 gids: 0 parent: pid 0 at 0xc07794c0 ABI: null threads: 1 100023 I [irq11: ahc0] Thanks, -Andre From owner-freebsd-scsi@FreeBSD.ORG Sat Apr 23 23:47:17 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1AEDF1065670 for ; Sat, 23 Apr 2011 23:47:17 +0000 (UTC) (envelope-from freebsdml@nmacleod.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id E5DFF8FC08 for ; Sat, 23 Apr 2011 23:47:16 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.69) (envelope-from ) id 1QDmGE-0000jr-80 for freebsd-scsi@freebsd.org; Sat, 23 Apr 2011 16:29:10 -0700 Date: Sat, 23 Apr 2011 16:29:10 -0700 (PDT) From: milhousevh To: freebsd-scsi@freebsd.org Message-ID: <1303601350238-4335508.post@n5.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: mps0 driver - idle/standby/sleep (ie. spin down) supported (SATA drives)? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Apr 2011 23:47:17 -0000 Hi, I've searched but am unable to find any discussion (good or bad) relating to idle, standby and sleep support in the mps0 driver. Presumably this means it's supported and working, or nobody is using it if it's not! I have an LSI 9211-8i flashed with the latest IT firmware, it's working great in FreeBSD 8.2 and 9.0 (using zfsguru distribution) with 8 Samsung drives on an AMD64-based HP Proliant Microserver N36L. I would like to put the drives into standby (spin down) after they have been idle for 30 minutes, however executing "camcontrol idle da0 -t 30" (where da0 is a Samsung F4 2TB disk) results in the following response in FreeBSD 8.2 and 9.0: [root@zfsguru /]# camcontrol idle da0 -t 30 (pass0:mps0:0:0:0): IDLE. ACB: e3 00 00 00 00 40 00 00 00 00 06 00 (pass0:mps0:0:0:0): CAM status: Function Not Available standby and sleep are similar: [root@zfsguru /]# camcontrol standby da0 -t 30 (pass0:mps0:0:0:0): STANDBY. ACB: e2 00 00 00 00 40 00 00 00 00 06 00 (pass0:mps0:0:0:0): CAM status: Function Not Available [root@zfsguru /]# camcontrol sleep da0 -t 30 (pass0:mps0:0:0:0): SLEEP. ACB: e6 00 00 00 00 40 00 00 00 00 00 00 (pass0:mps0:0:0:0): CAM status: Function Not Available I have the following entries in dmesg relating to mps0: [root@zfsguru /]# dmesg|grep mps0 mps0: port 0xd000-0xd0ff mem 0xfe8fc000-0xfe8fffff,0xfe880000-0xfe8bffff irq 18 at device 0.0 on pci2 mps0: Firmware: 09.00.00.00 mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> da0 at mps0 bus 0 scbus0 target 0 lun 0 da1 at mps0 bus 0 scbus0 target 1 lun 0 da2 at mps0 bus 0 scbus0 target 2 lun 0 da3 at mps0 bus 0 scbus0 target 3 lun 0 da4 at mps0 bus 0 scbus0 target 4 lun 0 da5 at mps0 bus 0 scbus0 target 5 lun 0 da6 at mps0 bus 0 scbus0 target 6 lun 0 da7 at mps0 bus 0 scbus0 target 7 lun 0 Is it possible to configure idle/standby/sleep with the mps0 driver, as I would like to spin down my disks after a configurable period to save power/noise. Or am I attempting to configure the standby/idle disk timeouts in the wrong way? I'd rather not depend on APM for the Samsung F4 disks as the latest APM profile used by the F4 disks is extremely aggressive, spinning down the disks only a few seconds after each access. Using ataidle -I 15 -S 30 works absolutely fine on the F4 disk in FreeBSD 7.2 (spinning the disks down after 30 minutes inactivity) but this is when the F4 disks are connected to a "standard" motherboard SATA controller (eg. AMD SB700), and obviously ataidle isn't working at all for the LSI 9211-8i with mps0 driver. Any guidance on this problem will be much appreciated! Many thanks Neil -- View this message in context: http://freebsd.1045724.n5.nabble.com/mps0-driver-idle-standby-sleep-ie-spin-down-supported-SATA-drives-tp4335508p4335508.html Sent from the freebsd-scsi mailing list archive at Nabble.com.