Date: Mon, 12 Jan 2009 21:35:58 +0100 From: Tomas Randa <freebsd@max.af.czu.cz> To: Garance A Drosihn <drosih@rpi.edu> Cc: freebsd-stable@FreeBSD.org, Robert Watson <rwatson@FreeBSD.org> Subject: Re: Big problems with 7.1 locking up :-( Message-ID: <496BA9AE.10801@max.af.czu.cz> In-Reply-To: <p06240808c5911644dd11@[128.113.24.47]> References: <E1LL6dg-0007CN-DI@dilbert.ticketswitch.com> <042FE04A-2F8D-47DD-8454-7BBA3791D7A8@inoc.net> <p06240802c58db5953598@[128.113.24.47]> <alpine.BSF.2.00.0901121453200.16794@fledge.watson.org> <p06240808c5911644dd11@[128.113.24.47]>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, I have similar problems. The last "good" kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I tried ULE, 4BSD etc, but nothing helps, only downgrading system back. Now I am trying 7.1-p1 and problems are here again. Mysql is waiting a lot of time with status "waiting for opening table" or "waiting for close tables" I have 32bit FreeBSD with PAE, 1x xeon 5420, supermicro motherboard, areca SATA controller. Could not be problem in "da" device for example? Thanks Tomas Randa Garance A Drosihn wrote: > At 2:55 PM +0000 1/12/09, Robert Watson wrote: >> On Fri, 9 Jan 2009, Garance A Drosihn wrote: >> >>> At 2:39 PM -0500 1/9/09, Robert Blayzor wrote: >>>> On Jan 8, 2009, at 8:58 PM, Pete French wrote: >>>>> I have a number of HP 1U servers, all of which were running 7.0 >>>>> perfectly happily. I have been testing 7.1 in it's various >>>>> incarnations for the last couple of months on our test server and >>>>> it has performed perfectly. >>>> >>>> I noticed a problem with 7.0 on a couple of Dell servers. [...] >>>> We've since then compiled the kernel under the BSD scheduler to >>>> rule that out, and so far so good. >>>> >>>> Since ULE is now default in 7.1 and not in 7.0, perhaps you can try >>>> that? >>> >>> FWIW, the other guy I know who is having this problem had already >>> switched to using ULE under 7.0-release, and did not have any >>> problems with it. So *his* problem was probably not related to >>> SCHED_ULE, unless something has recently changed there. >>> >>> Turns out he hasn't reverted back to 7.0-release just yet, so he's >>> going to try SCHED_4BSD and see if that helps his situation. >> >> Scheduler changes always come with some risk of exposing bugs that >> have existed in the code for a long time but never really manifested >> themselves. ULE is well shaken-out, having been under development for >> at least five years, but it is possible that some problems will >> become visible as a result of the switch. I would encourage people >> to stick with ULE, but if you're having a stability problem then >> experimenting with scheduler as a variable that could be triggering >> the problem may well be useful to help track down the bug. > > Just to followup on this: My friend did switch back to a 7.1 kernel with > SCHED_4BSD, and he still ran into problems. The error messages weren't > the same, but errors did happen in the same high disk-I/O situations as > the lockup happened with SCHED_ULE. At this point he's fallen back to > the 7.0-kernel that he had been running (which also has SCHED_ULE), and > all the problems have gone away. So at the moment he's running with a > 7.0-ish kernel and the 7.1-release userland, without the hanging > problems. > So the problem is something in the kernel, but it is *NOT* the scheduler > (at least, not in his case). > > He is not eager to do a whole lot of experiments to track down the > problem, since this is happening on busy production machines and he > can't afford to have a lot of downtime on them (especially now that the > semester at RPI has started up). The systems have some large (2 TB) > filesystems on them, and the lockups occur in high disk-I/O situations. > He's seeing the problem on one system which is a dual CPU quad-core > xeon, and another which is a 64 bit P4 with hyperthreading. The one > thing in common between the two setups is that the boot drives + a > 3ware controller (with its array of RAID disks) is moved from one > machine to the other one: > > "its a 3ware 9500 12 port model, the boot drive is connected to > an ICH6 in IDE mode, and yes, I've run it in single, single with > hyper threading, and 8 way mode. All 64 bit." > > We still have no idea where the problem really is. For all we know, > someone spilled a Pepsi on it when he wasn't looking... >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?496BA9AE.10801>