Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Oct 2000 16:34:24 +0200
From:      sthaug@nethelp.no
To:        gibbs@scsiguy.com
Cc:        freebsd-scsi@FreeBSD.ORG
Subject:   Re: Stressed SCSI subsystem locks up the system 
Message-ID:  <54202.971361264@verdi.nethelp.no>
In-Reply-To: Your message of "Wed, 11 Oct 2000 05:27:31 %2B0000"
References:  <200010110527.e9B5RV603276@aslan.scsiguy.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> As always, I am interested in knowing the details of this problem and
> would like to resolve it.  The easiest way to do this is to switch
> over to using 4.1-stable built from source so I can work directly with
> the site to debug the problem.

We have a similar problem (may not be the same). We have a mail server
with the following SCSI configuration:

ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 10 at device 14.0 on pci0
aic7890/91: Wide Channel A, SCSI Id=7, 32/255 SCBs
sa0 at ahc0 bus 0 target 2 lun 0
sa0: <ARCHIVE Python 04106-XXX 7270> Removable Sequential Access SCSI-2 device 
sa0: 7.812MB/s transfers (7.812MHz, offset 15)
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST39173LW 6246> Fixed Direct Access SCSI-2 device 
da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit)
da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C)
da1 at ahc0 bus 0 target 6 lun 0
da1: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
da1: 80.000MB/s transfers (40.000MHz, offset 15, 16bit)
da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)

This server has been extremely stable with 4.1-STABLE and earlier. With
4.1.1-STABLE we have had two cases of the system crashing with "page
fault while in kernel mode" - and then it hangs while trying to sync the
disks (but still responds to ping!). The instruction pointer that is
printed is 0xc0135167 (same in both cases), which is inside ahc_action():

c0134ca8 T ahc_done
c0134f78 t ahc_action
c01358bc t ahc_get_tran_settings

Specifically, line 441 in ahc_action, from

$FreeBSD: src/sys/dev/aic7xxx/aic7xxx_freebsd.c,v 1.3.2.1 2000/09/23 00:24:03 gibbs Exp $

436                     if ((scb = ahc_get_scb(ahc)) == NULL) {
437             
438                             ahc_lock(ahc, &s);
439                             ahc->flags |= AHC_RESOURCE_SHORTAGE;
440                             ahc_unlock(ahc, &s);
441                             xpt_freeze_simq(sim, /*count*/1);
442                             ahc_set_transaction_status(scb, CAM_REQUEUE_REQ);
443                             xpt_done(ccb);
444                             return;

Line 441 of "../../dev/aic7xxx/aic7xxx_freebsd.c" starts at address 0xc0135159 <ahc_action+481> and ends at 0xc0135175 <ahc_action+509>.

At the moment I'm tempted to simply revert to the 4.1-STABLE code on this
host. It looks like the differences between 4.1-STABLE and 4.1.1-STABLE are
rather large - aic7xxx_freebsd.c doesn't exist in 4.1-STABLE, ahc_action is
in aic7xxx.c instead:

$FreeBSD: src/sys/dev/aic7xxx/aic7xxx.c,v 1.41.2.1 2000/03/18 23:00:11 gibbs Exp $

Any suggestions before I revert to 4.1-STABLE?

Steinar Haug, Nethelp consulting, sthaug@nethelp.no


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54202.971361264>