From owner-freebsd-current@FreeBSD.ORG Wed Jun 23 01:02:18 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 735F716A4CE; Wed, 23 Jun 2004 01:02:18 +0000 (GMT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id A499343D5A; Wed, 23 Jun 2004 01:02:17 +0000 (GMT) (envelope-from gnagelhout@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2657.72) id ; Tue, 22 Jun 2004 21:01:54 -0400 Message-ID: From: Gerrit Nagelhout To: 'Matthew Dillon' Date: Tue, 22 Jun 2004 21:01:53 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2657.72) Content-Type: text/plain; charset="iso-8859-1" cc: Julian Elischer cc: John Baldwin cc: current@FreeBSD.org Subject: RE: STI, HLT in acpi_cpu_idle_c1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2004 01:02:18 -0000 > :I am working with Don Bowman to try and debug this problem > (the lockup). > :I have an emulator attached, and managed to get it into the locked up > :state. Three of the cpus are in idle (acpi_cpu_c1) (and > have interrupts > :enabled, EFLAGS=0x246), and the other one (cpu 3) is in > smp_tlb_shootdown > :waiting for one more processor to respond. The APIC > register for CPU 3 > :(icr_lo) indicates that the IPI (0xf3) has been sent (ie > it's idle). > :The isr registers for CPU 1 indicate that vector oxf3 is > pending, but it > :is not being handled. I am still trying to figure out why > this is, but > :does anyone have any suggested on what else I can look at? > :Thanks, > : > :Gerrit > > If the interrupt is pending on the idle cpu's APICs but > no interrupt is > being delivered, and the idle cpus are in HLT with > interrupts enabled, > then something is masking the pending interrupt. Check > the following: > > In the local APIC for each idle cpu: > > * Check the TPR (task priority register) verses the > priority set for > the IPI interrupt. The top 4 bits is the main priority field. > Interrupts with priorities <= the main priority bits > will be masked > (so 11111111 masks all interrupts). The TPR priority > should be lower > then the priority set for the IPI in question. > > * Check the PPR (process priority register). This > register tells you > what the priority of the highest pending interrupt that can be > dispensed to the processor. It will be set to the same > contents as > the TPR if no servable interrupt is pending. The PPR > is a quick way > to tell what priority of interrupt the APIC is trying > to deliver to the > cpu. My guess is that it will be 0 (meaning that the > APIC is not trying > to deliver anything to the cpu). > > * Check the ISR bits, the TMR bits, and the IRR bits. > These control > interrupt delivery: > > (from /usr/src/sys/i386/include/apicreg.h in the DFly > source tree): .... > > If the interrupt is masked it should be set in the IRR > but not set in > the ISR. If it is set in the ISR and interrupts are > enabled on the cpu, > then I have no idea what the hell is going on (because > the cpu should then > service the interrupt)... unless the cpu is totally bokered. > > Other possibilities: The IPI interrupt vector on the > receiving CPUs is > not set up properly (I'm not sure how you can access that > data, it's > programmed via the ICR so presumably it can be read back > out via the > ICR somehow. Not sure). > > In anycase, pull out /usr/src/sys/i386/include/apicreg.h from the > DragonFly source base... I had to go through all this > crap a year ago > and decided to document the APIC registers to the hilt > (based on the Intel > documentation). There is a lot of very useful information in that > header file. > > -Matt > Matthew Dillon > > Thanks for the detailed info on this. It looks like CPU1 is trying to service the interrupt because PPR = 0xf0, and TPR = 0x00. It is also the only CPU that has a bit set in ISR. In this case, CPU 3 was initiating the IPI (although I don't know why its icr_lo is 0xc00f6 because I was expecting it to be 0xc00f3 (and it was in previous lockups). I still have no idea why CPU1 is not handling this interrupt though. I am still getting used to this emulator, but I think the values I am reading are believable: P3>dumpAllLocalApic CPU 0 ID: 0x6000000 TPR: 0x0 PPR: 0x0 icr_lo:0xf3 ISR0: 0x0 ISR1: 0x0 ISR2: 0x0 ISR3: 0x0 ISR4: 0x0 ISR5: 0x0 ISR6: 0x0 ISR7: 0x0 CPU 1 ID: 0x7000000 TPR: 0x0 PPR: 0xf0 icr_lo:0xf3 ISR0: 0x0 ISR1: 0x0 ISR2: 0x0 ISR3: 0x0 ISR4: 0x0 ISR5: 0x0 ISR6: 0x0 ISR7: 0x80000 CPU 2 ID: 0x0 TPR: 0x0 PPR: 0x0 icr_lo:0xfb ISR0: 0x0 ISR1: 0x0 ISR2: 0x0 ISR3: 0x0 ISR4: 0x0 ISR5: 0x0 ISR6: 0x0 ISR7: 0x0 CPU 3 ID: 0x1000000 TPR: 0x0 PPR: 0x0 icr_lo:0xc00f6 ISR0: 0x0 ISR1: 0x0 ISR2: 0x0 ISR3: 0x0 ISR4: 0x0 ISR5: 0x0 ISR6: 0x0 ISR7: 0x0 P3> Gerrit