From owner-freebsd-current@FreeBSD.ORG Tue Aug 23 22:23:50 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF22916A420 for ; Tue, 23 Aug 2005 22:23:50 +0000 (GMT) (envelope-from markir@paradise.net.nz) Received: from linda-2.paradise.net.nz (bm-2a.paradise.net.nz [202.0.58.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id F290C43D45 for ; Tue, 23 Aug 2005 22:23:49 +0000 (GMT) (envelope-from markir@paradise.net.nz) Received: from smtp-1.paradise.net.nz (smtp-1a.paradise.net.nz [202.0.32.194]) by linda-2.paradise.net.nz (Paradise.net.nz) with ESMTP id <0ILP00IQ54VO4H@linda-2.paradise.net.nz> for freebsd-current@freebsd.org; Wed, 24 Aug 2005 10:23:48 +1200 (NZST) Received: from [192.168.1.11] (218-101-14-82.paradise.net.nz [218.101.14.82]) by smtp-1.paradise.net.nz (Postfix) with ESMTP id 8CFF582ABD for ; Wed, 24 Aug 2005 10:23:47 +1200 (NZST) Date: Wed, 24 Aug 2005 10:23:45 +1200 From: Mark Kirkwood To: freebsd-current@freebsd.org Message-id: <430BA1F1.1010907@paradise.net.nz> MIME-version: 1.0 Content-type: multipart/mixed; boundary=------------080405080507030004020606 X-Accept-Language: en-us, en User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050726) Subject: 6.0 BETA2 reboot hangs on SMP system - progress of a sort X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2005 22:23:50 -0000 This is a multi-part message in MIME format. --------------080405080507030004020606 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit A brief description of system is (dmesg attached): Tyan S2510 dual 1Ghz RELENG_6 From Aug 14 GENERIC Reboot hangs after displaying "cpu_reset: Stopping other CPUs". This appears to be a hard lockup, as I cannot break to the debugger. In an effort to see where the problem was, I amended src/sys/i386/i386/vm_machdep.c src/sys/kern/subr_smp.c Adding some printf statements and removing the #ifdef DIAGNOSTIC in subr_smp.c, so that it printed always (diffs attached). To my surprise, after a kernel rebuild + shutdown and restart, I find that 'shutdown -r now' and 'reboot' *now work*! Hmmm, nice but weird. This sort of voodoo suggests something like memory being clobbered somewhere... Hopefully this will help shed some light on this issue - and I am happy to try out any suggestions to diagnose. As an aside, this same thing happens with 5.4-RELEASE - but *only* after an SMP kernel has been built, so this appears to be similar to prs i386/36943, i386/34092. Mark --------------080405080507030004020606 Content-Type: text/plain; name="dmesg" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="dmesg" Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-BETA2 #1: Wed Aug 24 08:24:52 NZST 2005 postgres@ikker.markir.net:/usr/obj/usr/src/sys/GENERIC WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel Pentium III (996.85-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Features=0x387fbff real memory = 2147483648 (2048 MB) avail memory = 2096541696 (1999 MB) MPTable: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Assuming intbase of 0 ioapic1: Assuming intbase of 16 ioapic0 irqs 0-15 on motherboard ioapic1 irqs 16-31 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface cpu0 on motherboard cpu1 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 pci0: at device 1.0 (no driver attached) fxp0: port 0xd400-0xd43f mem 0xfeafe000-0xfeafefff,0xfe900000-0xfe9fffff irq 20 at device 4.0 on pci0 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:e0:81:02:4c:6a fxp1: port 0xd000-0xd03f mem 0xfeafd000-0xfeafdfff,0xfe700000-0xfe7fffff irq 21 at device 5.0 on pci0 miibus1: on fxp1 inphy1: on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:e0:81:02:4c:6b isab0: port 0x580-0x58f at device 15.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0 ata0: on atapci0 ata1: on atapci0 ohci0: mem 0xfeafc000-0xfeafcfff irq 10 at device 15.2 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered pcib1: pcibus 1 on motherboard pci1: on pcib1 atapci1: port 0xefe0-0xefe7,0xefac-0xefaf,0xefa0-0xefa7,0xefa8-0xefab,0xef90-0xef9f mem 0xfebf0000-0xfebfffff irq 27 at device 2.0 on pci1 ata2: on atapci1 ata3: on atapci1 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xd17ff,0xd1800-0xd27ff,0xd2800-0xd37ff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounters tick every 1.000 msec acd0: CDRW at ata0-master PIO4 ad4: 39205MB at ata2-master UDMA133 ad5: 39205MB at ata2-slave UDMA133 ad6: 39205MB at ata3-master UDMA133 ad7: 39205MB at ata3-slave UDMA133 ar0: 156822MB status: READY ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master ar0: disk2 READY using ad5 at ata2-slave ar0: disk3 READY using ad7 at ata3-slave SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ar0s2a Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...2 0 0 done All buffers synced. unmount of /dev failed (BUSY) Uptime: 1m53s Rebooting... cpu_reset: Entering SMP section cpu_reset: Stopping other CPUs stop_cpus: entering stop_cpus: send ipi stop_cpus: entering spin for int cpu_reset: Stopped other CPUs ok cpu_reset: leaving SMP section cpu_reset_real: entering cpu_reset_real: case CPU_GEODE1100 cpu_reset_real: case CPU_PC98 + BROKEN_KEYBOARD_RESET Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-BETA2 #1: Wed Aug 24 08:24:52 NZST 2005 postgres@ikker.markir.net:/usr/obj/usr/src/sys/GENERIC WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel Pentium III (996.84-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Features=0x387fbff real memory = 2147483648 (2048 MB) avail memory = 2096541696 (1999 MB) MPTable: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Assuming intbase of 0 ioapic1: Assuming intbase of 16 ioapic0 irqs 0-15 on motherboard ioapic1 irqs 16-31 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface cpu0 on motherboard cpu1 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 pci0: at device 1.0 (no driver attached) fxp0: port 0xd400-0xd43f mem 0xfeafe000-0xfeafefff,0xfe900000-0xfe9fffff irq 20 at device 4.0 on pci0 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:e0:81:02:4c:6a fxp1: port 0xd000-0xd03f mem 0xfeafd000-0xfeafdfff,0xfe700000-0xfe7fffff irq 21 at device 5.0 on pci0 miibus1: on fxp1 inphy1: on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:e0:81:02:4c:6b isab0: port 0x580-0x58f at device 15.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0 ata0: on atapci0 ata1: on atapci0 ohci0: mem 0xfeafc000-0xfeafcfff irq 10 at device 15.2 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered pcib1: pcibus 1 on motherboard pci1: on pcib1 atapci1: port 0xefe0-0xefe7,0xefac-0xefaf,0xefa0-0xefa7,0xefa8-0xefab,0xef90-0xef9f mem 0xfebf0000-0xfebfffff irq 27 at device 2.0 on pci1 ata2: on atapci1 ata3: on atapci1 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xd17ff,0xd1800-0xd27ff,0xd2800-0xd37ff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounters tick every 1.000 msec acd0: CDRW at ata0-master PIO4 ad4: 39205MB at ata2-master UDMA133 ad5: 39205MB at ata2-slave UDMA133 ad6: 39205MB at ata3-master UDMA133 ad7: 39205MB at ata3-slave UDMA133 ar0: 156822MB status: READY ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master ar0: disk2 READY using ad5 at ata2-slave ar0: disk3 READY using ad7 at ata3-slave SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ar0s2a Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...1 0 done All buffers synced. unmount of /dev failed (BUSY) Uptime: 6m28s Rebooting... cpu_reset: Entering SMP section cpu_reset: Stopping other CPUs stop_cpus: entering stop_cpus: send ipi stop_cpus: entering spin for int cpu_reset: Stopped other CPUs ok cpu_reset: leaving SMP section cpu_reset_real: entering cpu_reset_real: case CPU_GEODE1100 cpu_reset_real: case CPU_PC98 + BROKEN_KEYBOARD_RESET Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-BETA2 #1: Wed Aug 24 08:24:52 NZST 2005 postgres@ikker.markir.net:/usr/obj/usr/src/sys/GENERIC WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel Pentium III (996.85-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Features=0x387fbff real memory = 2147483648 (2048 MB) avail memory = 2096541696 (1999 MB) MPTable: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Assuming intbase of 0 ioapic1: Assuming intbase of 16 ioapic0 irqs 0-15 on motherboard ioapic1 irqs 16-31 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface cpu0 on motherboard cpu1 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 pci0: at device 1.0 (no driver attached) fxp0: port 0xd400-0xd43f mem 0xfeafe000-0xfeafefff,0xfe900000-0xfe9fffff irq 20 at device 4.0 on pci0 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:e0:81:02:4c:6a fxp1: port 0xd000-0xd03f mem 0xfeafd000-0xfeafdfff,0xfe700000-0xfe7fffff irq 21 at device 5.0 on pci0 miibus1: on fxp1 inphy1: on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:e0:81:02:4c:6b isab0: port 0x580-0x58f at device 15.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0 ata0: on atapci0 ata1: on atapci0 ohci0: mem 0xfeafc000-0xfeafcfff irq 10 at device 15.2 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered pcib1: pcibus 1 on motherboard pci1: on pcib1 atapci1: port 0xefe0-0xefe7,0xefac-0xefaf,0xefa0-0xefa7,0xefa8-0xefab,0xef90-0xef9f mem 0xfebf0000-0xfebfffff irq 27 at device 2.0 on pci1 ata2: on atapci1 ata3: on atapci1 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xd17ff,0xd1800-0xd27ff,0xd2800-0xd37ff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounters tick every 1.000 msec acd0: CDRW at ata0-master PIO4 ad4: 39205MB at ata2-master UDMA133 ad5: 39205MB at ata2-slave UDMA133 ad6: 39205MB at ata3-master UDMA133 ad7: 39205MB at ata3-slave UDMA133 ar0: 156822MB status: READY ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master ar0: disk2 READY using ad5 at ata2-slave ar0: disk3 READY using ad7 at ata3-slave SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ar0s2a --------------080405080507030004020606 Content-Type: text/plain; name="subr_smp.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="subr_smp.c.diff" *** subr_smp.c.orig Wed Aug 24 08:03:13 2005 --- subr_smp.c Wed Aug 24 08:05:57 2005 *************** *** 233,256 **** { int i; if (!smp_started) return 0; CTR1(KTR_SMP, "stop_cpus(%x)", map); /* send the stop IPI to all CPUs in map */ ipi_selected(map, IPI_STOP); i = 0; while ((atomic_load_acq_int(&stopped_cpus) & map) != map) { /* spin */ i++; - #ifdef DIAGNOSTIC if (i == 100000) { ! printf("timeout stopping cpus\n"); break; } - #endif } return 1; --- 233,257 ---- { int i; + printf("stop_cpus: entering\n"); if (!smp_started) return 0; CTR1(KTR_SMP, "stop_cpus(%x)", map); + printf("stop_cpus: send ipi\n"); /* send the stop IPI to all CPUs in map */ ipi_selected(map, IPI_STOP); + printf("stop_cpus: entering spin for int\n"); i = 0; while ((atomic_load_acq_int(&stopped_cpus) & map) != map) { /* spin */ i++; if (i == 100000) { ! printf("stop_cpus: timeout stopping cpus\n"); break; } } return 1; --------------080405080507030004020606 Content-Type: text/plain; name="vm_machdep.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="vm_machdep.c.diff" *** vm_machdep.c.orig Wed Aug 24 07:55:03 2005 --- vm_machdep.c Wed Aug 24 08:02:14 2005 *************** *** 541,554 **** --- 541,557 ---- #ifdef SMP u_int cnt, map; + printf("cpu_reset: Entering SMP section\n"); if (smp_active) { map = PCPU_GET(other_cpus) & ~stopped_cpus; if (map != 0) { printf("cpu_reset: Stopping other CPUs\n"); stop_cpus(map); + printf("cpu_reset: Stopped other CPUs ok\n"); } if (PCPU_GET(cpuid) != 0) { + printf("cpu_reset: case PCPU_GET \n"); cpu_reset_proxyid = PCPU_GET(cpuid); cpustop_restartfunc = cpu_reset_proxy; cpu_reset_proxy_active = 0; *************** *** 570,575 **** --- 573,579 ---- DELAY(1000000); } #endif + printf("cpu_reset: leaving SMP section\n"); cpu_reset_real(); /* NOTREACHED */ } *************** *** 578,588 **** --- 582,595 ---- cpu_reset_real() { + printf("cpu_reset_real: entering\n"); #ifdef CPU_ELAN + printf("cpu_reset_real: case CPU_ELAN \n"); if (elan_mmcr != NULL) elan_mmcr->RESCFG = 1; #endif + printf("cpu_reset_real: case CPU_GEODE1100 \n"); if (cpu == CPU_GEODE1100) { /* Attempt Geode's own reset */ outl(0xcf8, 0x80009044ul); *************** *** 590,595 **** --- 597,603 ---- } #ifdef PC98 + printf("cpu_reset_real: case CPU_PC98 \n"); /* * Attempt to do a CPU reset via CPU reset port. */ *************** *** 601,606 **** --- 609,615 ---- outb(0xf0, 0x00); /* Reset. */ #else #if !defined(BROKEN_KEYBOARD_RESET) + printf("cpu_reset_real: case CPU_PC98 + BROKEN_KEYBOARD_RESET \n"); /* * Attempt to do a CPU reset via the keyboard controller, * do not turn off GateA20, as any machine that fails *************** *** 613,618 **** --- 622,628 ---- #endif #endif /* PC98 */ + printf("cpu_reset_real: case unmapping entire address space \n"); /* Force a shutdown by unmapping entire address space. */ bzero((caddr_t)PTD, NBPTD); --------------080405080507030004020606--