From owner-freebsd-stable@FreeBSD.ORG Tue Dec 23 12:41:43 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A54461065674 for ; Tue, 23 Dec 2008 12:41:43 +0000 (UTC) (envelope-from bsdlist@cogeco.ca) Received: from fep7.cogeco.net (smtp2.cogeco.ca [216.221.81.29]) by mx1.freebsd.org (Postfix) with ESMTP id 7F84E8FC17 for ; Tue, 23 Dec 2008 12:41:43 +0000 (UTC) (envelope-from bsdlist@cogeco.ca) Received: from [192.168.1.126] (d150-251-98.home.cgocable.net [24.150.251.98]) by fep7.cogeco.net (Postfix) with ESMTP id 75C19169D for ; Mon, 22 Dec 2008 09:59:03 -0500 (EST) Message-ID: <494FABC2.1020804@cogeco.ca> Date: Mon, 22 Dec 2008 10:01:22 -0500 From: Paul MacKenzie User-Agent: Thunderbird 3.0a1pre (Windows/2008022014) MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <4949673B.2070701@elehost.com> <494AED9E.9090900@cogeco.ca> In-Reply-To: <494AED9E.9090900@cogeco.ca> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: 7.1-PRERELEASE: arcmsr write performance problem X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Dec 2008 12:41:43 -0000 > I actually find that running Wusage 8.0 a few times even with nice-19 > may be implicated in getting the system to spiral downwards. I hesitate > to mention this as it seems to be working fine on another 7.X server. I > believe that Wusage is tied to 6.X libraries and I wonder if somehow > this may initiate the problem. I also have another sio/com based program > running every few minutes which is also connected to the 6.X library > (scom thermal application for temperature monitoring) and turning both > of these off seems to help. I am going to try a 24 hour period without > either of these two running after a fresh reboot and we will see if this > is indeed one source to my abominable problem. > > Once the system spirals down into its locking then the io performance > never seems to recover unless I reboot it or somehow find the process > that is locked and kill it. > So after the testing period with the scom not running and wusage not running I still have the problem albeit less often. The stress on the system seems to bring it forward as notice by my other tests. It is easiest to see on httpd and a stop needs to be run try to free up the system. Dec 19 07:32:43 /usr/local/sbin/apachectl -k stop Dec 19 07:33:35 kernel: pid 31376 (httpd), uid 80: exited on signal 11 Dec 19 07:33:35 kernel: pid 31194 (httpd), uid 80: exited on signal 11 Dec 19 07:33:36 kernel: pid 31469 (httpd), uid 80: exited on signal 11 Dec 19 07:33:36 kernel: pid 31754 (httpd), uid 80: exited on signal 11 Dec 19 07:33:37 kernel: pid 31168 (httpd), uid 80: exited on signal 11 Dec 19 07:33:37 kernel: pid 31753 (httpd), uid 80: exited on signal 11 Dec 19 07:33:38 kernel: pid 31763 (httpd), uid 80: exited on signal 11 Dec 19 07:33:38 kernel: pid 31748 (httpd), uid 80: exited on signal 11 Its relatively easy to get this restarted but when it is something else in the system it is hard to locate and I have to do a reboot to release the blockage. When it is apache that has "locked" I have to kill it as a graceful will not work. The majority of the processes are in UFS state when it happens. If there was a quick way to determine which one has "locked" the system that would be so helpful as a temporary workaround. I am going to try to replace the motherboard next, This is one of the last hardware replacements to isolate this as a hardware problem.