From owner-freebsd-arch@FreeBSD.ORG Mon Sep 3 16:19:30 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9DE7A106566B for ; Mon, 3 Sep 2012 16:19:30 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214]) by mx1.freebsd.org (Postfix) with ESMTP id 782898FC16 for ; Mon, 3 Sep 2012 16:19:30 +0000 (UTC) Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218]) by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id q83GJGma037239 for ; Mon, 3 Sep 2012 10:19:23 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q83GJEY1040387 for ; Mon, 3 Sep 2012 10:19:14 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: freebsd-arch@freebsd.org Content-Type: text/plain; charset="us-ascii" Date: Mon, 03 Sep 2012 10:19:14 -0600 Message-ID: <1346689154.1140.601.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Subject: Some busdma stats X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Sep 2012 16:19:30 -0000 I decided that a good way to learn more about the busdma subsystem would be to actually work with the code rather than just reading it. Regardless of whether we eventually fix every driver to eliminate transfers that aren't aligned to cache line boundaries, or somehow change the busdma code to automatically bounce unaligned requests, we need efficient allocation of small buffers aligned and sized to cache lines. I wrote some code to use uma(9) to manage pools of aligned buffers based on size, and set up a pool of uncachable/coherent buffers and a pool of "regular memory" buffers. One thing that jumps right out at you is that pretty much every call to bus_dmamem_alloc() these days uses the BUS_DMA_COHERENT flag, at least for the devices on my dreamplug unit: root@dpcur:/root # vmstat -z | egrep "ITEM|dma" ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP dma maps: 48, 0, 1838, 34, 2022, 0, 0 dma buffer 32: 32, 0, 0, 0, 0, 0, 0 dma buffer 64: 64, 0, 0, 0, 0, 0, 0 dma buffer 128: 128, 0, 0, 0, 0, 0, 0 dma buffer 256: 256, 0, 2, 28, 2, 0, 0 dma buffer 512: 512, 0, 0, 0, 0, 0, 0 dma buffer 1024: 1024, 0, 0, 0, 0, 0, 0 dma buffer 2048: 2048, 0, 0, 0, 0, 0, 0 dma buffer 4096: 4096, 0, 0, 0, 0, 0, 0 dma coherent 32: 32, 0, 1024, 106, 1024, 0, 0 dma coherent 64: 64, 0, 0, 0, 0, 0, 0 dma coherent 128: 128, 0, 129, 51, 129, 0, 0 dma coherent 256: 256, 0, 297, 33, 333, 0, 0 dma coherent 512: 512, 0, 8, 16, 16, 0, 0 dma coherent 1024: 1024, 0, 16, 4, 32, 0, 0 dma coherent 2048: 2048, 0, 12, 0, 24, 0, 0 dma coherent 4096: 4096, 0, 13, 1, 13, 0, 0 These stats represent every call to bus_dmamem_alloc() except for the SATA devices, which do a single allocation of just over 17K -- also using BUS_DMA_COHERENT -- which is large enough that it gets handled by kmem_alloc_contig() rather than by the pool allocator. The 'dma maps' number is the number of maps created by bus_dmamap_create() -- that is, it doesn't include maps automatically created by bus_dmamem_alloc(). Efficient allocation of small buffers (not wasting a page per allocation due to either alignment or cachability issues) should help pave the way for fixing existing drivers to allocate individual buffers for each DMA transfer rather than allocating a huge area and attempting to sub-divide it internally (which is bad, since a driver doesn't have all the info needed to sub-divide it safely and avoid partial cache line flushes). The new allocator code is architecture-agnostic and could be used by any busdma implementation that wants to manage pools of small aligned buffers. It's not clear to me where the .c and .h file for such a thing should live within src/sys (right now for testing I've got them in arm/arm and arm/include). -- Ian