Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 May 1997 17:05:21 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        brett@lariat.org, gurney_j@resnet.uoregon.edu
Cc:        HARDWARE@freebsd.org, rberndt@nething.com, WELCHDW@wofford.edu
Subject:   Re: isa bus and boca multiport boards
Message-ID:  <199705210705.RAA10446@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>I was under the impression that this is what was already done, but now that
>I look at it, I see that you're right! The code loops on a variable called
>"unit", incrementing it from 0 to the precompiled constant NSIO. This can
>waste a great deal of time, especially since the interrupts are
>edge-triggered and the list is scanned at least twice per interrupt.

Many minor improvements are possible, but significant improvements are
difficult to achieve without a hardware register giving a bitmap of the
active interrupts.  I believe BocaBoards have register(s) for this.
Someone who has a BocaBoard should implement checking the register.

All ports are checked for fairness.  When you have a 16 active ports
on one board, it is too easy for ports on other boards to be starved.
Ports with smaller fifos should be given lower unit numbers so that they
get served first.  The siointr1() layer attempts to return early for
the COM_MULTIPORT case.  It doesn't do this very well - in the worst
case it does more than 3 * fifo_size i/o's per call to handle a full
receiver fifo and an empty transmitter fifo.  For fairness, it should
handle at most one or two events per call, but this would be inefficient.

>1) The code looks at a flag called "gone" on each and every port (present
>or not) during each and every interrupt service. Since the presence or
>absence of a port is determined at boot time, it'd be MUCH more efficient
>if the code worked down a linear list of only the ports that were present
>and on the relevant IRQ.

This would be slightly more efficient.  Not much more, because 16
com_addr(unit) != NULL && com_addr(unit)->gone != 0 checks can be done in
less time than it takes to do _one_ i/o for a present port on a modern
ix86 system.  Using linear lists instead of linked lists is good here
since it avoids cache misses.

>The edge-catching algorithm is also less efficient than it might be. To
>make sure you haven't missed an edge, you must scan the UARTs and get ALL
>THE WAY AROUND THE LIST ONCE without finding any more ports to service. You
>can then return from the ISR. The two best ways to do this are (a) set a

I never got around to implementing this.

>fact, it may scan as many as NSIO-1 extra ports on each interrupt. On a
>system with a many serial ports, this is a LOT of extra time.

For 16 ports, it takes about half as long as to do the actual i/o for one
16550 port.

>Also, rather than dereferencing the pointer "com" again and again, the ISR
>could selectively enregister parts of the record that contain the comm
>port's statistics. 

It already does as much as possible.  ix86's don't have enough registers
to do much better, and in any case the compiler should do it.  This is
not very important for modern ix86's, since all accesses except the first
are cache hits so they take only one cycle.

>I also see a subroutine call that could be optimized out.

Subroutine calls are cheap, and inlining tends to give worse register
allocation.

>Finally, some things are variables that needn't be, such as I/O port
>numbers. Memory accesses are expensive and increments and decrements are
>cheap (or free due to pipelining if instructions are ordered properly). So,

Memory accesses are cheap (unless there is a cache miss, and all the
pre-computed register numbers in the com struct are contiguous, so
more than one cache miss would be unlucky).  They take the same time as
increments and decrements of registers on modern ix86's and tend to give
better register allocation.  Of course, you can do better in assembler
by doing perfect register allocation and pipelining, but the slow i/o
limits potential gains to a few percent (relative) and < 1% per port
(absolute).

>I don't know what the policy on ASM code is in FreeBSD, but this seems like
>an opportunity to do some VERY serious optimization where it's much needed!

I try to avoid it, and enjoy making serial drivers written in C several
times more efficient than previous and competing versions written in
assembler :-).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199705210705.RAA10446>