Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 04 Jun 1998 12:12:30 -0400 (EDT)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Bob Willcox <bob@luke.pmr.com>
Cc:        Michael Hancock <michaelh@cet.co.jp>, "freebsd-current@freebsd.org" <freebsd-current@FreeBSD.ORG>, tcobb <tcobb@staff.circle.net>, Karl Pielorz <kpielorz@tdx.co.uk>, Mike Smith <mike@smith.net.au>, Greg Lehey <grog@lemis.com>
Subject:   Re: DPT driver fails and panics with Degraded Array
Message-ID:  <XFMail.980604121230.shimon@simon-shapiro.org>
In-Reply-To: <19980603073200.A16652@pmr.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 03-Jun-98 Bob Willcox wrote:
 
...

>> Why would a driver call biodone on a buffer that doens't belong to it?
> 
> Probably not relavent, but in the DPT device driver that I wrote for AIX
> I had to put some pretty ugly validity checks in the interrupt code to
> prevent my driver from trying to do an iodone (AIX's version of biodone)
> on already completed (or purged, I don't remember for sure...its been
> over a year now) commands.  Seems that the DPT firmware would (on
> occasion) interrupt with a status packet that pointed to a ccb that my
> driver had already completed.  As I recall this would only happen under
> heavy load and it was pretty intermittant.  As far as I know, it was
> never actually fixed.

The FreeBSD driver actually does exactly that.  I encountered exactly that
situation in earlier firmware revisions (7H1 or so).  I put more defenses
in the driver than necessary.  Later revisions of the firmware (7L0 or so)
took care of the problem, but the defensive code stayed, as #ifdef`s.

Many of these problems are actually (arguabbly?) induced by timing problems
on the PCI bus.  Certain PCI-PCI bridges (or even motherboard ``main''
chipsets will deliver interrupts, I/O bus transactions and memory
transactions out of order when hammered very rapidly, under heavy load, or
both.  We proved it clearly with certain ``industrial'' computers, and
certain motherboards, by making the symptoms go away (or drastically
change) as you move the DPT, video cards, Ethernet cards, etc. from slot to
slot.

If one is really paranoid, one can enable DPT_VERIFY_HINTR to get this code
back.  Even more severe cases of paranoia can be satisfied by enabling
DPT_HANDLE_TIMEOUTS.  For those who are as sick as I am, you can define an
DPT_INTR_DELAY as some small integer.

What these do is, in the order listed:

DPT_VERIFY_HINTR:  Mark and stamp each CCB so as to guarantee that it is
not handled twice.

DPT_HANDLE_TIMEOUTS:  turn on elaborate mechanism that will track
transactions (CCBs) that seem to linger on beyond their useful life.

DPT_INTR_DELAY:  Will cause the interrupt service routine to spin a little
bit, giving the hardware chance to settle a bit before dpt_intr gets all
excited about it.

Simon




---


Sincerely Yours, 

Simon Shapiro                                           Shimon@Simon-Shapiro.ORG
                                                        770.265.7340

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980604121230.shimon>