Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 03 Sep 2012 10:19:14 -0600
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        freebsd-arch@freebsd.org
Subject:   Some busdma stats
Message-ID:  <1346689154.1140.601.camel@revolution.hippie.lan>

next in thread | raw e-mail | index | archive | help
I decided that a good way to learn more about the busdma subsystem would
be to actually work with the code rather than just reading it. 

Regardless of whether we eventually fix every driver to eliminate
transfers that aren't aligned to cache line boundaries, or somehow
change the busdma code to automatically bounce unaligned requests, we
need efficient allocation of small buffers aligned and sized to cache
lines.  I wrote some code to use uma(9) to manage pools of aligned
buffers based on size, and set up a pool of uncachable/coherent buffers
and a pool of "regular memory" buffers.

One thing that jumps right out at you is that pretty much every call to
bus_dmamem_alloc() these days uses the BUS_DMA_COHERENT flag, at least
for the devices on my dreamplug unit:

root@dpcur:/root # vmstat -z | egrep "ITEM|dma"
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
dma maps:                48,      0,    1838,      34,    2022,   0,   0
dma buffer 32:           32,      0,       0,       0,       0,   0,   0
dma buffer 64:           64,      0,       0,       0,       0,   0,   0
dma buffer 128:         128,      0,       0,       0,       0,   0,   0
dma buffer 256:         256,      0,       2,      28,       2,   0,   0
dma buffer 512:         512,      0,       0,       0,       0,   0,   0
dma buffer 1024:       1024,      0,       0,       0,       0,   0,   0
dma buffer 2048:       2048,      0,       0,       0,       0,   0,   0
dma buffer 4096:       4096,      0,       0,       0,       0,   0,   0
dma coherent 32:         32,      0,    1024,     106,    1024,   0,   0
dma coherent 64:         64,      0,       0,       0,       0,   0,   0
dma coherent 128:       128,      0,     129,      51,     129,   0,   0
dma coherent 256:       256,      0,     297,      33,     333,   0,   0
dma coherent 512:       512,      0,       8,      16,      16,   0,   0
dma coherent 1024:     1024,      0,      16,       4,      32,   0,   0
dma coherent 2048:     2048,      0,      12,       0,      24,   0,   0
dma coherent 4096:     4096,      0,      13,       1,      13,   0,   0

These stats represent every call to bus_dmamem_alloc() except for the
SATA devices, which do a single allocation of just over 17K -- also
using BUS_DMA_COHERENT -- which is large enough that it gets handled by
kmem_alloc_contig() rather than by the pool allocator.  The 'dma maps'
number is the number of maps created by bus_dmamap_create() -- that is,
it doesn't include maps automatically created by bus_dmamem_alloc().

Efficient allocation of small buffers (not wasting a page per allocation
due to either alignment or cachability issues) should help pave the way
for fixing existing drivers to allocate individual buffers for each DMA
transfer rather than allocating a huge area and attempting to sub-divide
it internally (which is bad, since a driver doesn't have all the info
needed to sub-divide it safely and avoid partial cache line flushes).

The new allocator code is architecture-agnostic and could be used by any
busdma implementation that wants to manage pools of small aligned
buffers.  It's not clear to me where the .c and .h file for such a thing
should live within src/sys (right now for testing I've got them in
arm/arm and arm/include).

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1346689154.1140.601.camel>