From owner-freebsd-scsi Mon Jun 1 21:20:23 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id VAA03224 for freebsd-scsi-outgoing; Mon, 1 Jun 1998 21:20:23 -0700 (PDT) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from sendero.simon-shapiro.org (sendero.simon-shapiro.org.142.69.207.in-addr.arpa [207.69.142.25] (may be forged)) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id VAA03049 for ; Mon, 1 Jun 1998 21:20:00 -0700 (PDT) (envelope-from shimon@sendero.simon-shapiro.org) Received: (qmail 1047 invoked by uid 1000); 2 Jun 1998 01:21:14 -0000 Message-ID: X-Mailer: XFMail 1.3 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <199805310309.UAA09016@antipodes.cdrom.com> Date: Mon, 01 Jun 1998 21:21:13 -0400 (EDT) Reply-To: shimon@simon-shapiro.org Organization: The Simon Shapiro Foundation From: Simon Shapiro To: Mike Smith Subject: Re: DPT Redux Cc: "freebsd-current@freebsd.org" , "freebsd-scsi@freebsd.org" , tcobb Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 31-May-98 Mike Smith wrote: ... > Thanks for the extra info. Are you able to simulate the failure by eg. > disconnecting one of the 'active' drives? If you can't do this on a > regular basis, I believe we are able to arrange temporary access to a > similar but idle system where this can be simulate. Simon may also be > able to offer some suggestions inre. possible poor interaction between > the dpt driver and some firmware revisions. I have tested and simpulated this problem. Again, the DPT driver in FreeBSD does not know a disk from an onion. It simply passes SCSI SCBs formatted by the abstraction layer to the controller, and passes results back. >From the controller model I can guess the firmware revisions range in question. I have run tests on most of them, and, under normal conditions, what is described, simply does not happen. I did find a window with these conditions: * During boot (and only during boot), while the scsi abstraction layer still runs in polled mode (interrupts off). * The DPT controller has enough bandwidth to accept commands one at a time. * The DPT controller then delays responding to commands 1,000 longer than the SCSI abstraction layer (sd.c, in this case) specified. In 3.0 I reduced this to only 50 times longer. * When command completion is probed, the DPT will NOT report error, but successful condition, or no condition at all. Under these conditions, the DPT driver could return a ``successful'' completion code. In this case, the abstraction layer will post the device with whatever capacity value was there before calling the DPT driver. It is possible, under these conditions that nonsense will be assumed. The panic may be triggered by the SCSI abstraction layer trying to interpret some of its trash as valid data. Since the DPT driver does not supply, in its callback, any pointers, the memory reference failure is most likely not directly induced by the DPT driver. A patch to close this window was submitted for review and will be checked in as soon as the FreeBSD committer accepts the code as valid and acceptable. Summary: Theyre is a bit of ``pointing elsewhere'' here as, after thorough review, I do not see the memory failure in the driver. Neither do I see any other defect. As a historical curiosity, I have seen this failure mode in certain interm DPT firmware version. The failure was in the firmware, and was induced by a large array re-build. It was not restricted to while-building, but caused the array to trash permanently. I doubt that version of the firmware was supplied to the complainer in this case. Since I have not recived any direct info, as I asked for, this is but a wild guess. Simon --- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG 770.265.7340 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message