Skip site navigation (1)Skip section navigation (2)
Date:      07 Dec 2002 15:09:18 -0500
From:      Dan Pelleg <daniel+bsd@pelleg.org>
To:        Mike Hoskins <mike@adept.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: RELEASE crash - SCSI related?
Message-ID:  <u2s3cp9h9qp.fsf@gs166.sp.cs.cmu.edu>
In-Reply-To: <20021206135205.O98942-100000@fubar.adept.org>
References:  <20021206135205.O98942-100000@fubar.adept.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Hoskins <mike@adept.org> writes:

> On Fri, 6 Dec 2002, Dan Pelleg wrote:
> > This NFS server would crash every now and then (once in a few weeks,
> > seems to be correlated with heavy disk activity). Auto fsck will usually
> > fail and occasionally a few gigs of data will be lost. I'm beginning to
> > suspect the disk array
> 
> What sort of disks, array, etc. are you using?
> 

ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0xd800-0xd8ff mem 0xfeaff000-0xfeafffff irq 10 at device 5.1 on pci0
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
...
da2 at ahc1 bus 0 target 0 lun 0
da2: <IFT IFT-7200 0132> Fixed Direct Access SCSI-4 device 
da2: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled
da2: 667743MB (1367537920 512 byte sectors: 255H 63S/T 19589C)

it's a SCSI-to-ATA controller (in this dmesg it's slowed down, it usually
runs at 160), configured at RAID-5.

I have softupdates on (also quotas, if that matters).

> > #0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487
> > #1  0xc01c1c97 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:316
> > #2  0xc01c20bc in poweroff_wait (junk=0xc032b0c0, howto=-964112384)
> >     at /usr/src/sys/kern/kern_shutdown.c:595
> > #3  0xc0172b0c in ahc_search_qinfifo (ahc=0xc688d000, target=0, channel=65 'A', lun=0,
> >     tag=210, role=ROLE_INITIATOR, status=0, action=SEARCH_COUNT)
> >     at /usr/src/sys/dev/aic7xxx/aic7xxx.c:5378
> > #4  0xc0178c04 in ahc_timeout (arg=0xc68a45a8)
> >     at /usr/src/sys/dev/aic7xxx/aic7xxx_osm.c:1608
> > #5  0xc01c7ba5 in softclock () at /usr/src/sys/kern/kern_timeout.c:131
> > #6  0xc02fa700 in splz_swi ()
> 
> 
> This has been behaving.  Do you have a similarly configured server where
> you could try building a -STABLE snapshot?  That obviously doesn't negate
> the need to resolve this issue, but may get you up and running until a
> solution is found.
> 

Oh, I'm up and I'm running. It's just that every once in a while I'm not
"running" anymore, and if I'm unlucky, before I'm "up" again there are
a few good few hours of fsck, a filled up lost+found, and data loss.

I don't have a spare to test -STABLE against. I'm not even sure I can
reproduce the crash. As I said, I'm suspecting the array or the cabling at
this point. But while I'm talking to vendors to address both of these
non-FreeBSD issues I would like to know if there's anything at the kernel
level I could be doing. For example, am I more likely to come up cleanly if
I turn softupdates off?

-- 

  Dan Pelleg

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?u2s3cp9h9qp.fsf>