From owner-freebsd-stable@FreeBSD.ORG Fri Apr 21 21:59:59 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F016616A400 for ; Fri, 21 Apr 2006 21:59:58 +0000 (UTC) (envelope-from rand@meridian-enviro.com) Received: from newman.meridian-enviro.com (newman.meridian-enviro.com [207.109.235.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6348343D45 for ; Fri, 21 Apr 2006 21:59:58 +0000 (GMT) (envelope-from rand@meridian-enviro.com) Received: from delta.meridian-enviro.com (delta.meridian-enviro.com [10.10.10.43]) by newman.meridian-enviro.com (8.13.1/8.13.1) with ESMTP id k3LLxZX0053359; Fri, 21 Apr 2006 16:59:35 -0500 (CDT) (envelope-from rand@meridian-enviro.com) Date: Fri, 21 Apr 2006 16:59:33 -0500 Message-ID: <874q0mpp2y.wl%rand@meridian-enviro.com> From: "Douglas K. Rand" To: freebsd-stable@freebsd.org User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/21.3 (i386--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Virus-Scanned: ClamAV 0.88/1412/Fri Apr 21 15:33:41 2006 on newman.meridian-enviro.com X-Virus-Status: Clean Cc: Subject: iir + Tyan S2460 + SMP problems X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Apr 2006 21:59:59 -0000 We're having problems with FreeBSD 5.4, 6.0, and 6.1 and an ICP Vortex GDT8546RZ 4 port SATA RAID card in a Tyan S2460 system with dual AMD Athlon MP 1600+ CPUs. We do not have any problems with this configuration under FreeBSD 4.11, and we have the same ICP cards in Tyan based Opterion system (S2882 and S4882) run with out problems under FreeBSD 5.4 and 6.1. We can reproduce the problem on two different S2460 based systems, and have tried 2 seperate ICP GDT8546RZ cards, so we don't believe it is a hardware problem. (Our success with FreeBSD 4.11 also provides some evidence that our hardware is OK.) The problem is that the system seems to stop doing any disk IO through the ICP card. Processes that don't need to page in work fine. (You can hit return in a shell, get another login: prompt on other consoles, and the like.) The system continues to respond to pings, but anything that attempts to do a disk IO simply stops. Sometimes the kernel emits messages like this: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096 The test we are using to produce this "hang" is a fairly trivial expansion of a tar ball being fed via nc from another system. We run on the source system: tar cf - radar | nc -w 3 10.10.10.229 12345 And on the system being tested we run: nc -l 12345 | tar xvf - One iteration of this test is the extraction of a 1.2 GB directory of 2,274 files. The problem only exists with SMP kernels. While our other tests almost always failed in the first iteration or two, the longest time to failure was 5 iterations. With out SMP the test ran with out problems for 570 iterations over 18 hours. We've tried a number of different tests. These tests are with a stock 6.1-RC1 kernel from the RC CD's. Unless otherwise specified, all tests are on a UFS2 filesystem with softupdates enabled and a SMP enabled GENERIC kernel. * !SMP: Ran 570 iterations in 18 hours with out a problem, test terminated by hand. * Large (190 GB) UFS2 filesystem with soft updates enabled and SMP kernel: Fails during the first iteration. * Medium (12 GB) UFS2 filesystem with soft updates enabled and SMP kernel: Fails during the first iteration. * !softupdates: fails during first iteration. * !ACPI: fails during the first iteration. * UFS1: fails during the first iteration. * UFS1 + !ADAPTIVE_GIANT: failed during the first iteration. * !ADAPTIVE_GIANT: failed during the first iteration. * Cleared motherboard CMOS: failed at the end of the second iteration. * FULL_PREEMPTION: failed during the first iteration. * !PREEMPTION: failed during the first iteration. * WITNESS + WITNESS_KDB: failed during the second iteration with no witness related kernel messages and with out entering the kernel debugger. * WITNESS + INVARIANTS: failed during the fifth iteration, again w/o kernel messages. * Motherboard BIOS "Use PCI Interrupt Entries in MP Table" set to OFF: failed during first iteration. * Motherboard BIOS "Multiprocessor Specification" set from 1.4 to 1.1: failed during first iteration. * MUTEX_WAKE_ALL: failed during first iteration. I have a serial console and a kernel debugger enabled, so if anybody has suggestions for probes to do once the system is hung let us know. Any advice is welcome. Well, except for "dump the Tyan S2460 motherboards" maybe. Oh, and we're at current BIOS and firmware revs for both the ICP card and the motherboard.