From owner-freebsd-stable Thu May 23 2:28:53 2002 Delivered-To: freebsd-stable@freebsd.org Received: from alogis.com (firewall.solit-ag.de [212.184.102.1]) by hub.freebsd.org (Postfix) with ESMTP id 9030237B40A; Thu, 23 May 2002 02:28:01 -0700 (PDT) Received: from alogis.com (kipp@clausthal.int1.b.intern [10.1.1.30]) by alogis.com (8.11.1/8.9.3) with ESMTP id g4N9Rxl87795; Thu, 23 May 2002 11:28:00 +0200 (CEST) (envelope-from holger.kipp@alogis.com) Message-ID: <3CECB23B.5DD164A1@alogis.com> Date: Thu, 23 May 2002 11:11:23 +0200 From: Holger Kipp X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.13 i686) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-smp@FreeBSD.ORG Cc: Pete French , frank@exit.com, maildrop@qwest.net, stable@FreeBSD.ORG Subject: Re: 4.6-RC system hangs (fxp0, smp, sym) (UPDATE) References: <3CEA128A.AB7FAC93@alogis.com> <15594.22920.415872.835007@moe.cs.duke.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I did some more testing, and finally found something interesting: I had IO-APIC disabled. All errors went away after I enabled IO-APIC in the BIOS. Now for the guesswork: The only real difference I can see here is the assignment of IRQs to controllers (SCSI and NIC). See below. I only have one SMP machine with said problems to test this. If you encounter similar problems, could you please check if your configuration fits this pattern? If it doesn't, I'm out of ideas... dmesg and mptable for the working setup below. Regards, Holger --- 8< --------------------------------------------------------------------- ==> I noticed that with IO-APIC, sym0, sym1 and fxp0 have different IRQs assigned (all errors gone), whilst with the old setup, they all have the same IRQ. errors no errors sym0 IRQ 11 at 13.0 IRQ 2 at 13.0 sym1 IRQ 11 at 13.1 IRQ 11 at 13.1 fxp0 IRQ 11 at 15.0 IRQ 16 at 15.0 ==> Could this explain the problems I (and others) see? - I have all disks on sym0, but maybe copying from sym0 to sym1 could give errors if they share the same IRQ (if timing is critical)? - have only one NIC, but with several NICs on the same IRQ, could this lead to problems as seen by others? - maybe new NIC-Problems actually are the result of optimized drivers (changed timing)? - is difference between IRQ-handling NCR <-> SYM sufficient to explain the behaviour differences between those two I experience? dmesg.sym.apic.txt --- 8< --------------------------------------------------------------------- Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.6-RC #0: Wed May 22 19:56:42 CEST 2002 root@idefix.intern.hkipp.de:/usr/obj/usr/src/sys/IDEFIXSYM Timecounter "i8254" frequency 1193182 Hz CPU: Pentium II/Pentium II Xeon/Celeron (349.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x652 Stepping = 2 Features=0x183fbff real memory = 134217728 (131072K bytes) avail memory = 127340544 (124356K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc033b000. Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 8 entries at 0xc00fdf50 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #0 intpin 21 -> irq 2 IOAPIC #0 intpin 22 -> irq 11 IOAPIC #0 intpin 20 -> irq 16 pci0: on pcib0 sym0: <875> port 0xf800-0xf8ff mem 0xfedff000-0xfedfffff,0xfedfec00-0xfedfecff irq 2 at device 13.0 on pci0 sym0: No NVRAM, ID 7, Fast-20, SE, parity checking sym1: <875> port 0xf400-0xf4ff mem 0xfedfd000-0xfedfdfff,0xfedfe800-0xfedfe8ff irq 11 at device 13.1 on pci0 sym1: No NVRAM, ID 7, Fast-20, SE, parity checking fxp0: port 0xfce0-0xfcff mem 0xfeb00000-0xfebfffff,0xfedfa000-0xfedfafff irq 16 at device 15.0 on pci0 fxp0: Ethernet address 00:a0:c9:ec:70:26 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: at device 18.0 on pci0 isa0: on isab0 atapci0: port 0xfcb0-0xfcbf at device 18.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: at 18.2 irq 11 Timecounter "PIIX" frequency 3579545 Hz chip1: port 0x2180-0x218f at device 18.3 on pci0 pci0: at 20.0 isa0: too many dependant configs (8) orm0: