Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Aug 2012 16:07:30 -0600
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Tim Kientzle <tim@kientzle.com>, Adrian Chadd <adrian@freebsd.org>, Hans Petter Selasky <hans.petter.selasky@bitfrost.no>, freebsd-arm@freebsd.org, freebsd-mips@freebsd.org, freebsd-arch@freebsd.org
Subject:   Re: Partial cacheline flush problems on ARM and MIPS
Message-ID:  <1346105250.1140.314.camel@revolution.hippie.lan>
In-Reply-To: <9642068B-3C66-42BD-8515-14F734B3FF89@bsdimp.com>
References:  <6D83AF9D-577B-4C83-84B7-C4E3B32695FC@bsdimp.com> <zarafa.503b0e81.5c36.1a2f71091ebf9bd2@eric2.bitfrost> <A749E691-BF25-4B72-B929-56ABEB10F3E9@bsdimp.com> <DA9750F9-7B8A-49AF-8ECA-AC7D565CF3F5@kientzle.com> <CAJ-VmomMCvR7nr73sLLmwsotU=TT399asBcJvinHCqspo6BN1w@mail.gmail.com> <9642068B-3C66-42BD-8515-14F734B3FF89@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2012-08-27 at 09:40 -0600, Warner Losh wrote:
> On Aug 27, 2012, at 9:20 AM, Adrian Chadd wrote:
> > That does remind me, I think the ath(4) driver does the same (since it
> > allocates its own descriptor block and then treats it as an array of
> > descriptors for the hardware to access) - I should ensure that
> > sizeof(ath_desc) is aligned on the relevant architecture. It gets
> > slightly scary - AR93xx TX descriptors are "L1 cache == 128 byte
> > aligned" which is an enormous waste of memory compared to a 16 or 32
> > byte aligned platform. Alas..
> 
> The problem is with cache line sharing, not necessarily with alignment.  If you are only ever using one of them at a time, or if you have perfect hygiene, you can cope with this situation without undue waste.  The perfect hygiene might be hard sometimes.

This brings up an interesting tangential issue for this busdma
discussion.  For some controller hardware you allocate a block of memory
which is treated as an array of "descriptors" or some other shared
control information, you set a register in the hardware to point to that
block of memory, and then there is some degree of concurrent access of
that memory by hardware and CPU.

The interesting part is that some such hardware cannot operate in phases
as anticipated by our busdma model.  That is, there's no clear
demarkation points between "the CPU has exclusive access to the memory"
and "the hardware has exclusive access to the memory."  Usually for
these schemes to work correctly, the memory has to be mapped as
uncached, unbuffered, strongly ordered, or whatever combo of those makes
sense for a given platform.

We have arm drivers that use bus_dmamem_alloc() with the
BUS_DMA_COHERENT flag to obtain such memory, even though that wasn't the
intended meaning for that flag.  If the armv4 busdma implementation were
changed to stop honoring the COHERENT flag (it's supposed to be an
optional feature) those drivers would stop working.  So we need to track
down such mistakes and fix them, but the question is:  fix them how?

I think it may make sense to let busdma handle it, because you may get
some advantage from the allocation being made based upon the constraints
encoded in the inherited chain of tags for the driver.  

On the other hand, drivers doing this sort of thing are usually pretty
close to the silicon and have a good idea for themselves what the
hardware constraints are.  We could just say that drivers with such
needs should call kmem_alloc_contig() or kmem_alloc_attr() for
themselves.

If we say it's a thing that busdma should handle, then I think we need:

      * A flag that is universal across all platforms that means
        unambiguously that you need memory that is mapped however
        device-register memory is mapped on that platform (uncached,
        unbuffered, strongly ordered; I'm tempted to say "whatever
        pmap_mapdev() does" but I'm not sure that's rigorously correct).
      * If the request cannot be honored for some reason it has to
        return failure, not quietly give you regular cached memory
        instead (which is what BUS_DMA_COHERENT does).
      * The busdma sequence of sync operations does not apply to memory
        allocated with this flag, and indeed you must not call the sync
        functions on such memory.

The x86 busdma code recently grew a BUS_DMA_NOCACHE flag, perhaps that's
the name that should be supported across all platforms?

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1346105250.1140.314.camel>