Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Jun 1998 21:21:13 -0400 (EDT)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Mike Smith <mike@smith.net.au>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@FreeBSD.ORG>, "freebsd-scsi@freebsd.org" <freebsd-scsi@FreeBSD.ORG>, tcobb <tcobb@staff.circle.net>
Subject:   Re: DPT Redux
Message-ID:  <XFMail.980601212113.shimon@simon-shapiro.org>
In-Reply-To: <199805310309.UAA09016@antipodes.cdrom.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 31-May-98 Mike Smith wrote:
 ...

> Thanks for the extra info.  Are you able to simulate the failure by eg. 
> disconnecting one of the 'active' drives?  If you can't do this on a 
> regular basis, I believe we are able to arrange temporary access to a 
> similar but idle system where this can be simulate.  Simon may also be 
> able to offer some suggestions inre. possible poor interaction between 
> the dpt driver and some firmware revisions.

I have tested and simpulated this problem.  Again, the DPT driver in FreeBSD
does not know a disk from an onion.  It simply passes SCSI SCBs formatted
by the abstraction layer to the controller, and passes results back.
>From the controller model I can guess the firmware revisions range in
question.  I have run tests on most of them, and, under normal conditions,
what is described, simply does not happen.

I did find a window with these conditions:

*  During boot (and only during boot), while the scsi abstraction layer
   still runs in polled mode (interrupts off).

*  The DPT controller has enough bandwidth to accept commands one at a time.

*  The DPT controller then delays responding to commands 1,000 longer than
   the SCSI abstraction layer (sd.c, in this case) specified.  In 3.0 I
   reduced this to only 50 times longer.

*  When command completion is probed, the DPT will NOT report error, but
   successful condition, or no condition at all.

Under these conditions, the DPT driver could return a ``successful''
completion code.  In this case, the abstraction layer will post the device
with whatever capacity value was there before calling the DPT driver.
It is possible, under these conditions that nonsense will be assumed.
The panic may be triggered by the SCSI abstraction layer trying to
interpret some of its trash as valid data.

Since the DPT driver does not supply, in its callback, any pointers, the
memory reference failure is most likely not directly induced by the DPT
driver.

A patch to close this window was submitted for review and will be checked
in as soon as the FreeBSD committer accepts the code as valid and
acceptable.

Summary:  Theyre is a bit of ``pointing elsewhere'' here as, after thorough
review, I do not see the memory failure in the driver.  Neither do I see
any other defect.

As a historical curiosity, I have seen this failure mode in certain interm
DPT firmware version.  The failure was in the firmware, and was induced by
a large array re-build.  It was not restricted to while-building, but
caused the array to trash permanently.

I doubt that version of the firmware was supplied to the complainer in this
case.  Since I have not recived any direct info, as I asked for, this is
but a wild guess.

Simon


---


Sincerely Yours, 

Simon Shapiro                                           Shimon@Simon-Shapiro.ORG
                                                        770.265.7340

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980601212113.shimon>