From owner-freebsd-scsi  Sun Jul  6 16:37:12 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id QAA11446
          for freebsd-scsi-outgoing; Sun, 6 Jul 1997 16:37:12 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id QAA11418
          for <freebsd-SCSI@freebsd.org>; Sun, 6 Jul 1997 16:36:40 -0700 (PDT)
Received: (qmail 3287 invoked by uid 1000); 6 Jul 1997 23:36:19 -0000
Message-ID: <XFMail.970706163618.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
Date: Sun, 06 Jul 1997 16:36:18 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: freebsd-SCSI@freebsd.org
Subject: New Release - DPT RAID Controllers
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hi Y'all SCSIers,

ftp.i-connect.net and/or sendero-ppp.i-connect.net now have version 1.1.6
of the FreeBSd driver for the DPT PCI SCSI RAId controllers.  These are in
the /pub/crash and /crash directory.  I will also upload to freefall a copy
of the patch.  This is against RELENG_2_2 as of today.

New in this release:

* Several new config options.  See sys/i386/conf/LINT for details.
  See more below on HANDLE_TIMEOUTS.

* Got rid of an annoying bug that caused biodone panics.

* SCSI software interrupts are now tested under heavy load (512 processed)
  and seem to be very healthy.

Patch 1.1.6 includes all these changes.

On SCSI Software Interrupts:

I was asked by few of you what we have done here.  so here it goes:

We simply mirrored the net ISR code, and put it one noth below the netowrk
priority.  To use, do the following:

#include <scsi/softisr.h>

Pick an interrupt.  I normally use SCSISR_DPT (bit 0).  There are 32 bits
in the mask.  For every interrupt, use a separate interrupt bit.  For some
strange reasons, the netisr code does not permit more than one interrupt
per source file.  As Justin Gibbs pointed out to me, you really do not need
more than that.

Next, write a routing that will execute every time that particular interrupt
happens.  Say, you call it foo_isr:

static void
foo_isr(void);

Is a good declaration, and the function should be written to match.
For this example, let us assume you want it to be associated with bit 7
of the SCSI software interrupts mask.
Remember:  when the function is called, it will be at a very high priority
(appears higher than splbio().  We really do not know why yet, but it is 
under investigation.  In any case, minimize your critical section.  See 
dpt_scsi.c for details.

Early in your code, put the following:

SCSI_SET(SCSISR_7, foo_isr);

then, at any point in your code, where you want foo_isr to execute, 
ASYNCHRONOUSLY with your code, call:

schedscsisr(SCSISR_7);

Once the kernel goes back to splzero, any request thus scheduled, will be
called, in high priority!

What is it good for?  Just like in the networking code, it allows you to
(essentially) start another thread of execution in the kernel.
For example, the normal SCSI HBA driver receives a request for I/O, tinkers
with it a bit (S/G, etc.) and then sends it to the hardware.  This last
action involved I/O bus operations and a moderate amount of polling.
Instead, the DPT driver (almost) always puts the request in a queue and
imeediately tells the SCSI system ``queued successfully''.  It then
schedules a software interrupt.  The interrupt routine runs whenever it runs
and processes the queue.  This allows I/O requests to never block on (or be
paced by) hardware.  Under moderate I/O loads, it is a waste of time.
Under heavy loads, it really makes a difference. What difference?
With 512 processes concurrently reading and writing raw devices, the load
average goes down from 280 to 0.03 (it went down to 20 with NET software
interrupts).  Yes, the system is still heavily loaded; Disk I/O can take as
long as 13 seconds to complete.  But, networking code, user code, etc. is 
still unhampered.  Actually, even asynch I/O (buffered) improves
dramatically.  The maximum wait goes down to 85us waiting for the controller
and 30us past the interrupt service.  On that test load, the best interrupt
latency is 3us and the worst 37us.  This is within 10us of an idle system.

BTW, these numbers are with a queue of 64 commands on the DPT hardware.
Future release of the firmware will increase that to 256, 1024, and 8192.

On DPT_HANDLE_TIMEOUTS:

Normally, the DPT driver has no timeout mechanism in it, nor does it need
one;  the firmware on the controler does all the I/O management, re-tries,
ECC, and other good stuff.

With this option, commands will timeout after a while.  The timeout
mechanism works as follows:

Once booted, every ten seconds, dpt_handle_timeouts() will be called.
This function scans all submitted commands (sent to the DPT and not done 
yet).  If a SCSI command is older than what the SCSI upper layer wants it
to be (times the current number of requests on the controller), it is
tagged.  Tagged commands are given that much time again, to get done.
If not, they are destroyed, and the upper layer is notified of the failure.
this manifests itself (in functions that examine read/write syscalls
results :-) as an I/O error to the program.  Nothing more.
If a command is completed during this grace period, it will be handled
as if nothing happened 9except for a console message).  If the command
completes after destruction, the results are tossed away.  We simulated,
carefully, all these condsitions and it all appears to work.

Why bother?  Well, try to put a DPT behind certain PCI bridges.
What happens then is that, on accasion, an interrupt will reach the DPT
interrupt service routine sooner than the DMA transfer of the data
stabilized across the bridge (the DPT always does a DMA of a status struct
followed by an interrupt).  The driver reacts to this nonsense by promptly
tossing the whole completion report (we have NO way of telling what the
cirrupt mailbox-struct should have been).  While we so smartly tossed away 
the corrupt message, the DPT has no way of sending it again (4us behind it
will be another DMa nad another interrupt), and the upper layer is still 
waiting for an event that will never happen.  the timeout hack allows the
application to be told about the failure ad releases all the resources
associated - preventing a hang.

This is it for now.  you feedbabck is very welcome.

Simon