From owner-freebsd-current Tue Oct 13 11:56:56 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA00768 for freebsd-current-outgoing; Tue, 13 Oct 1998 11:56:56 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA00762 for ; Tue, 13 Oct 1998 11:56:52 -0700 (PDT) (envelope-from gibbs@narnia.plutotech.com) Received: (from gibbs@localhost) by narnia.plutotech.com (8.9.1/8.7.3) id MAA10377; Tue, 13 Oct 1998 12:49:35 -0600 (MDT) Date: Tue, 13 Oct 1998 12:49:35 -0600 (MDT) From: "Justin T. Gibbs" Message-Id: <199810131849.MAA10377@narnia.plutotech.com> To: Neil Blakey-Milner cc: current@FreeBSD.ORG Subject: Re: Some SCSI(?) problems whilst running SMP Newsgroups: pluto.freebsd.current In-Reply-To: <19981013134739.A26388@rucus.ru.ac.za> User-Agent: tin/pre-1.4-971204 (UNIX) (FreeBSD/3.0-BETA (i386)) Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Ok, the problem is this: > > When we enable SMP support, within any time from an hour to 6 days, we will > die with SCSI errors - of late "SCB timeout handled by another timeout" I > think is the proferred explanation. The "death" seems to occur quickly after > extensive access to the disks, but it also just dies arbitrarily, usually > after the machine has been up for a few days. It doesn't seem to be specific > to any drive failing either. (we've swapped drives around, etc) Crap. I was hoping this problem had been resolved since the people who usually complain about it (Hi Mark M.!) have been silent in recent months. My guess is that the callout free list has become corrupted somehow, but I don't know enough about our SMP implementation to know where to start looking for re-entrancy problems. All reports I've seen of this problem have been only under SMP and I don't have any SMP equipment here in order to try and reproduce the problem with. Can you turn the printf in sys/dev/aic7xxx/aic7xxx.c:ahc_timeout() into a panic, drop into GDB, and examine the data structures in kern_timeout.c? -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message