From owner-freebsd-current@FreeBSD.ORG Mon Apr 11 01:05:44 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A56B516A4CE; Mon, 11 Apr 2005 01:05:44 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4983A43D48; Mon, 11 Apr 2005 01:05:44 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 3CCD572DDD; Sun, 10 Apr 2005 18:05:44 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 3736672DD4; Sun, 10 Apr 2005 18:05:44 -0700 (PDT) Date: Sun, 10 Apr 2005 18:05:44 -0700 (PDT) From: Doug White To: Matthew Dillon In-Reply-To: <200504110012.j3B0CCZQ046727@apollo.backplane.com> Message-ID: <20050410172818.D82708@carver.gumbysoft.com> References: <20050406233405.O47071@carver.gumbysoft.com> <200504081656.51917.jhb@FreeBSD.org> <20050410152946.W82708@carver.gumbysoft.com> <200504110012.j3B0CCZQ046727@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: Potential source of interrupt aliasing X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Apr 2005 01:05:44 -0000 On Sun, 10 Apr 2005, Matthew Dillon wrote: > A couple of things don't click here. > > First, unless this 'boot interrupt' IRQ is pointing to an APIC vector > that is initialized to point at the softclock there is no way the > softclock ithread could be involved. I'm not saying that it isn't > running away, just that the boot interrupt business is probably not > the cause. This boot interrupt thingy kinda sounds like a red herring. softclock is the poor innocent bystander. Any ithread would do. As long as its something that prevents other ithreads from being scheduled. Thats still an experiment in progress, though, so don't get too hung up on it. > Secondly, you HAVE to mask the APIC vector in the interrupt service > routine if the service routine is going to schedule an ithread. There's > no choice... it HAS to be done because the ISR isn't capable of clearing > the originating interrupt from the device... the interrupt thread has to > do that. Or acknowledge the interrupt in the hardware before scheduling the ithread via a routine provided by the driver. > *BUT* it *IS* possible that the wrong APIC vector is being masked (and > not because of an interrupt alias, but because the actual hard interrupt > is misrouted). I don't think this is the case. Somehow the vector would have to get corrupted during this function call, which is line 609 in src/sys/i386/i386/local_apic.c: isrc = intr_lookup_source(apic_idt_to_irq(frame.if_vec)); which reduces to an array lookup with an offset index. apic_idt_to_irq(), with the asserts and range checks removed, is: return (vector - APIC_IO_INTS); And intr_lookup_source is: return (interrupt_sources[vector]); I would expect much wider aliasing or stray interrupt problems if this was occuring. > I've seen this occur numerous times. What happens is > that a device generates an mis-routed interrupt which causes the > interrupt handler for an UNRELATED device to run. It runs to completion > but since the device it thought interrupted was not the device that > actually interrupted, the interrupt on the actual originating device > never gets cleared so the moment the ithread completes and unmasks that > APIC vector, the APIC issues another interrupt. The result is that the > ithread is constantly running. > > Misrouted interrupts are a serious problem. They seem to be caused by > the BIOS or ACPI getting confused about how bridges are wired... when > multiple devices route an interrupt through the same pin on a bridge > and one is routed, the BIOS or ACPI gets seriously confused about > the second device and may believe that the second device can be routed > to a different IRQ when, in fact, it can't. You wind up with one of > the two devices on the wrong IRQ. This problem is exasperated when > the BIOS routes some of the devices for use by the BIOS (such as for > PXE booting), or to handle a USB keyboard, or something of that sort. I'm convinced these "misrouted interrupts" are sourcing from the boot interrupt functionality. You don't route interrupts in APIC mode; its a flat space. All of the APIC entries stack together as if they were one gigantic IOAPIC that every PCI device's INTx lines were attached to. This is the System Interrupts model described in the ACPI specification. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org