Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Sep 2014 21:54:03 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Ian Lepore <ian@FreeBSD.org>
Cc:        "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org>, ticso@cicely.de
Subject:   Re: Cubieboard: Spurious interrupt detected
Message-ID:  <20140906045403.GU82175@funkthat.com>
In-Reply-To: <1409967197.1150.339.camel@revolution.hippie.lan>
References:  <2279481.3MX4OEDuCl@quad> <20140905215702.GL3196@cicely7.cicely.de> <1409958716.1150.321.camel@revolution.hippie.lan> <CAJ-Vmo=EJVFqNnMo_dzevGvFWLSR6LVfYbYmOot1bLZbCvVMTQ@mail.gmail.com> <20140906011526.GT82175@funkthat.com> <1409967197.1150.339.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Ian Lepore wrote this message on Fri, Sep 05, 2014 at 19:33 -0600:
> On Fri, 2014-09-05 at 18:15 -0700, John-Mark Gurney wrote:
> > Adrian Chadd wrote this message on Fri, Sep 05, 2014 at 17:44 -0700:
> > > On 5 September 2014 16:11, Ian Lepore <ian@freebsd.org> wrote:
> > > > On Fri, 2014-09-05 at 23:57 +0200, Bernd Walter wrote:
> > > >> On Sat, Sep 06, 2014 at 01:43:23AM +0400, Maxim V FIlimonov wrote:
> > > >> > And another problem: every now and then the kernel says something like that:
> > > >> > Sep  5 19:22:37  kernel: Spurious interrupt detected
> > > >> > Sep  5 19:22:37  kernel: Spurious interrupt detected
> > > >> > Sep  5 19:23:46  last message repeated 10 times
> > > >> >
> > > >> > I've heard that FreeBSD happens to do that on ARM devices. What could be the
> > > >> > problem here?
> > > >>
> > > >> Means something generates inetrrupts, which are not handled by a driver.
> > > >> Could be the cause for your load problem too.
> > > >>
> > > >
> > > > No, that would be stray interrupts.  Spurious interrupts happen when an
> > > > interrupt is asserted, but by time the processor asks the interrupt
> > > > controller for the current active interrupt, it is no longer active.
> > > >
> > > > One way it can happen is when an interrupt handler writes to a device to
> > > > clear a pending interrupt and that write takes a long time to complete
> > > > because the device is on a slow bus, and the interrupt controller is on
> > > > a faster bus.  The EOI to the controller outraces the device write that
> > > > would clear the pending interrupt condition, so the processor is
> > > > re-interrupted, but by time it asks for the next active interrupt the
> > > > device write has finally completed and the interrupt is no longer
> > > > pending.
> > > >
> > > > That sequence used to happen a lot, and it was "fixed" by adding an
> > > > l2cache sync (basically a "drain write buffer") just before an EOI.  You
> > > > sometimes still see an occasional spurious interrupt, but it shouldn't
> > > > be happening multiple times per second as seen in the logging above.
> > > 
> > > Hm, interesting. I remember your discussion about it on IRC. The
> > > atheros code ends up working around this in the driver by doing a read
> > > from the ISR after writing out bits to clear things, so the clear is
> > > flushed out.
> > > 
> > > I wonder if we should be asking all device drivers to be doing their
> > > own ISR flushing before returning from their interrupt handlers.
> > 
> > This is required on PCI (that you do a read to clear the posted/pending
> > write)...    So, IMO, yes, all device drivers should do the proper
> > clearing of their writes to the ISR...
> > 
> 
> But a driver can't assume that a read is sufficient on all architectures
> it may run on.  bus_space_barrier() is the right way.  Also, it's not

Except that I don't think even on PCI a bus_space_barrier is sufficient...

I was just looking at i386's implementation of bus_space_barrier and
it just does a stack access...  This won't be sufficient to clear any
PCI bridges that may have the write still pending...

There's also the issue that if __GNUCLIKE_ASM is not defined, the code
will compile w/o ANY barrier, not even a compiler_membar...  We should
probably add a #else #error please add your compilers equivalent...

> just that a barrier is needed before exiting an isr... if the isr uses
> locking to synchronize with hardware access by the non-isr part of the
> driver, then the bus space barriers are needed in conjunction with the
> locking, so that, for example, the isr's usage of the hardware is truly
> complete before a lock is released.  
> 
> Scattered amongst 10 of the roughly 240 drivers in sys/dev there are 42
> calls to bus_space_barrier().  Getting all the drivers fixed will be a
> big job.  That's why I was thinking along the lines of an
> architecture-wide workaround with potentially a way to mark a driver as
> not needing the workaround once we get the fixing underway.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140906045403.GU82175>