Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Oct 2008 19:41:39 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Scott Long <scottl@samsco.org>
Cc:        svn-src-head@FreeBSD.org, Marius Strobl <marius@FreeBSD.org>, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org
Subject:   Re: svn commit: r183896 - head/sys/dev/bge
Message-ID:  <20081015184833.N43215@delplex.bde.org>
In-Reply-To: <48F5053D.7070705@samsco.org>
References:  <200810142028.m9EKShoL015514@svn.freebsd.org> <48F5053D.7070705@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Oct 2008, Scott Long wrote:

> Marius Strobl wrote:
>> Author: marius
>> Date: Tue Oct 14 20:28:42 2008
>> New Revision: 183896
>> URL: http://svn.freebsd.org/changeset/base/183896
>> 
>> Log:
>>   Use bus_{read,write}_4(9) instead of bus_space_{read,write}_4(9)
>>   in order to get rid of the bus space handle and tag in the softc.
>
> Has anyone looked at the generated code from this interface switch and
> compared it what was getting generated previously?  Way back when,

I just looked at the source code.  This seems to be only a small
pessimization since it involves just one extra indirection and
related loss of optimization possibilities: (sc->sc_bt, sc->sc_bh)
becomes r = sc->sc_br; (r->r_bustag, r->r_bushandle).  In theory,
the compiler could optimize by caching the tag and handle in registers
in either case so that only 1 extra indirection is needed per function,
but I've never seen that being done enough to make much difference.

However, some drivers (e.g., ata) were too stupid to cache sc_bt and
sc_bh, and converting these to use bus_nonspace*() was a relatively large
optimization: r = sc->sc_br; (rman_get_bustag(r), rman_get_bushandle(r))
became r = sc->sc_br; (r->r_bustag, r->r_bushandle).  Since rman_get_bustag()
and rman_get_bushandle() have never been inline, calling both of them
on every bus space access gave enormous code space and instruction
count bloat and corresponding (relatively tiny) runtime bloat.  The
instructions normally run so much faster than the i/o that you can do
hundreds or thousands of them per i/o before noticing the runtime
bloat, and the instruction count bloat here is only about a factor of 20.

> including <machine/bus_memio.h> made bus_space_read|write_4() compile
> into a direct memory access on machines that supported it.  The dubious
> removal of bus_memio.h and bus_pio.h took away that benefit, and I'm
> afraid that it's only getting worse now.  Bus writes to card memory are
> still very important to high-performance devices and shouldn't be
> pessimized in the name of simpler-looking C code.

I hate the bloat too, but it rarely matters (see above).  I mainly noticed
it using ddb.  Where nice drivers are optimized to use a single instruction
per i/o (or perhaps 2 with minimal bus space bloat), the ones that use
rman_get*() on every i/o take 20+, and with a primitive debugger like ddb
it is painful to even skip over all the function calls.

Which devices have fast enough i/o for the extra indirection to matter?
bge tries hard to avoid all PCI access in time-critical code.  I think
we reduced the PCI accesses to 1 PCI write per interrupt.  Most device
accesses for bge use host memory which is DMAed to/from by the hardware,
so bus space doesn't apply.  This type of access seems to be the best
or only way for a PCI device to go fast enough (though it is still too
slow on at least i386 since the DMA always causes cache misses).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081015184833.N43215>