Date: Wed, 21 May 1997 17:05:21 +1000 From: Bruce Evans <bde@zeta.org.au> To: brett@lariat.org, gurney_j@resnet.uoregon.edu Cc: HARDWARE@freebsd.org, rberndt@nething.com, WELCHDW@wofford.edu Subject: Re: isa bus and boca multiport boards Message-ID: <199705210705.RAA10446@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>I was under the impression that this is what was already done, but now that >I look at it, I see that you're right! The code loops on a variable called >"unit", incrementing it from 0 to the precompiled constant NSIO. This can >waste a great deal of time, especially since the interrupts are >edge-triggered and the list is scanned at least twice per interrupt. Many minor improvements are possible, but significant improvements are difficult to achieve without a hardware register giving a bitmap of the active interrupts. I believe BocaBoards have register(s) for this. Someone who has a BocaBoard should implement checking the register. All ports are checked for fairness. When you have a 16 active ports on one board, it is too easy for ports on other boards to be starved. Ports with smaller fifos should be given lower unit numbers so that they get served first. The siointr1() layer attempts to return early for the COM_MULTIPORT case. It doesn't do this very well - in the worst case it does more than 3 * fifo_size i/o's per call to handle a full receiver fifo and an empty transmitter fifo. For fairness, it should handle at most one or two events per call, but this would be inefficient. >1) The code looks at a flag called "gone" on each and every port (present >or not) during each and every interrupt service. Since the presence or >absence of a port is determined at boot time, it'd be MUCH more efficient >if the code worked down a linear list of only the ports that were present >and on the relevant IRQ. This would be slightly more efficient. Not much more, because 16 com_addr(unit) != NULL && com_addr(unit)->gone != 0 checks can be done in less time than it takes to do _one_ i/o for a present port on a modern ix86 system. Using linear lists instead of linked lists is good here since it avoids cache misses. >The edge-catching algorithm is also less efficient than it might be. To >make sure you haven't missed an edge, you must scan the UARTs and get ALL >THE WAY AROUND THE LIST ONCE without finding any more ports to service. You >can then return from the ISR. The two best ways to do this are (a) set a I never got around to implementing this. >fact, it may scan as many as NSIO-1 extra ports on each interrupt. On a >system with a many serial ports, this is a LOT of extra time. For 16 ports, it takes about half as long as to do the actual i/o for one 16550 port. >Also, rather than dereferencing the pointer "com" again and again, the ISR >could selectively enregister parts of the record that contain the comm >port's statistics. It already does as much as possible. ix86's don't have enough registers to do much better, and in any case the compiler should do it. This is not very important for modern ix86's, since all accesses except the first are cache hits so they take only one cycle. >I also see a subroutine call that could be optimized out. Subroutine calls are cheap, and inlining tends to give worse register allocation. >Finally, some things are variables that needn't be, such as I/O port >numbers. Memory accesses are expensive and increments and decrements are >cheap (or free due to pipelining if instructions are ordered properly). So, Memory accesses are cheap (unless there is a cache miss, and all the pre-computed register numbers in the com struct are contiguous, so more than one cache miss would be unlucky). They take the same time as increments and decrements of registers on modern ix86's and tend to give better register allocation. Of course, you can do better in assembler by doing perfect register allocation and pipelining, but the slow i/o limits potential gains to a few percent (relative) and < 1% per port (absolute). >I don't know what the policy on ASM code is in FreeBSD, but this seems like >an opportunity to do some VERY serious optimization where it's much needed! I try to avoid it, and enjoy making serial drivers written in C several times more efficient than previous and competing versions written in assembler :-). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199705210705.RAA10446>