Date: Wed, 11 Jan 2012 11:38:42 -0700 From: Scott Long <scottl@samsco.org> To: Ian Lepore <freebsd@damnhippie.dyndns.org> Cc: FreeBSD current <freebsd-current@freebsd.org>, Luigi Rizzo <rizzo@iet.unipi.it> Subject: Re: memory barriers in bus_dmamap_sync() ? Message-ID: <3E27CFAB-DCB3-49E4-9C2A-DD8449B15D64@samsco.org> In-Reply-To: <1326301842.2419.80.camel@revolution.hippie.lan> References: <20120110213719.GA92799@onelab2.iet.unipi.it> <CAJ-VmomdQ5ZWBf_h1xJhppO8WsinvK7RJiDSgDrYKpo%2BJ8eGYQ@mail.gmail.com> <20120110224100.GB93082@onelab2.iet.unipi.it> <201201111005.28610.jhb@freebsd.org> <20120111162944.GB2266@onelab2.iet.unipi.it> <4E8FCE8E-DDCB-4B38-9BFD-2A67BF03D50F@samsco.org> <1326301842.2419.80.camel@revolution.hippie.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 11, 2012, at 10:10 AM, Ian Lepore wrote: > On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote: >>=20 >> Where barriers _are_ needed is in interrupt handlers, and I can >> discuss that if you're interested. >>=20 >> Scott >>=20 >=20 > I'd be interested in hearing about that (and in general I'm loving the > details coming out in your explanations -- thanks!). >=20 > -- Ian >=20 >=20 Well, I unfortunately wasn't as clear as I should have been. Interrupt = handlers need bus barriers, not cpu cache/instruction barriers. This is = because the interrupt signal can arrive at the CPU before data and = control words are finished being DMA's up from the controller. Also, = many controllers require an acknowledgement write to be performed before = leaving the interrupt handler, so the driver needs to do a bus barrier = to ensure that the write flushes. But these are two different topics, = so let me start with the interrupt handler. Legacy interrupts in PCI are carried on discrete pins and are level = triggered. When the device wants to signal an interrupt, it asserts the = pin. That assertion is seen at the IOAPIC on the host bridge and = converted to an interrupt message, which is then sent immediately to the = CPU's lAPIC. This all happened very, very quickly. Meanwhile, the = interrupt condition could have been predicated on the device DMA'ing = bytes up to host memory, and those DMA writes could have gotten stalled = and buffered on the way up the PCI topology. The end result is often = that the driver interrupt handler runs before those writes have hit host = memory. To fix this, drivers do a read of a card register as the first = step in the interrupt handler, even if the read is just a dummy and the = result is thrown away. Thanks to PCI ordering, the read will ensure = that any pending writes from the card have flushed all the way up, and = everything will be coherent by the time the read completes. MSI and MSIX interrupts on modern PCI and PCIe fix this. These = interrupts are sent as byte messages that are DMA'd to the host bridge. = Since they are in-band data, they are subject to the same ordering rules = as all other data on the bus, and thus ordering for them is implicit. = When the MSI message reaches the host bridge, it's converted into an = lAPIC message just like before. However, the driver doesn't need to do = a flushing read because it knows that the MSI message was the last write = on the bus, therefore everything prior to it has arrived and everything = is coherent. Since reads are expensive in PCI, this saves a = considerable amount of time in the driver. Unfortunately, it adds = non-deterministic latency to the interrupt since the MSI message is = in-band and has no way to force priority flushing on a busy bus. So = while MSI/MSIX save some time in the interrupt handler, they actually = make the overall latency situation potentially worse (thanks Intel!). The acknowledgement write issue is a little more straight forward. If = the card requires an acknowledgment write from the driver to know that = the interrupt has been serviced (so that it'll then know to de-assert = the interrupt line), that write has to be flushed to the hardware before = the interrupt handler completes. Otherwise, the write could get = stalled, the interrupt remain asserted, and in the interrupt erroneously = re-trigger on the host CPU. I've seen cases where this devolves into = the card getting out of sync with the driver to the point that = interrupts get missed. Also, this gets a little weird sometimes with = buggy MSI hacks in both device and PCI bridge hardware. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E27CFAB-DCB3-49E4-9C2A-DD8449B15D64>