From owner-freebsd-stable@FreeBSD.ORG  Tue Dec 23 12:41:43 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A54461065674
	for <freebsd-stable@freebsd.org>; Tue, 23 Dec 2008 12:41:43 +0000 (UTC)
	(envelope-from bsdlist@cogeco.ca)
Received: from fep7.cogeco.net (smtp2.cogeco.ca [216.221.81.29])
	by mx1.freebsd.org (Postfix) with ESMTP id 7F84E8FC17
	for <freebsd-stable@freebsd.org>; Tue, 23 Dec 2008 12:41:43 +0000 (UTC)
	(envelope-from bsdlist@cogeco.ca)
Received: from [192.168.1.126] (d150-251-98.home.cgocable.net [24.150.251.98])
	by fep7.cogeco.net (Postfix) with ESMTP id 75C19169D
	for <freebsd-stable@freebsd.org>; Mon, 22 Dec 2008 09:59:03 -0500 (EST)
Message-ID: <494FABC2.1020804@cogeco.ca>
Date: Mon, 22 Dec 2008 10:01:22 -0500
From: Paul MacKenzie <bsdlist@cogeco.ca>
User-Agent: Thunderbird 3.0a1pre (Windows/2008022014)
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
References: <4949673B.2070701@elehost.com> <gid6mg$pi$1@ger.gmane.org>
	<494AED9E.9090900@cogeco.ca>
In-Reply-To: <494AED9E.9090900@cogeco.ca>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: 7.1-PRERELEASE: arcmsr write performance problem
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Dec 2008 12:41:43 -0000


> I actually find that running Wusage 8.0 a few times even with nice-19
> may be implicated in getting the system to spiral downwards. I hesitate
> to mention this as it seems to be working fine on another 7.X server.  I
> believe that Wusage is tied to 6.X libraries and I wonder if somehow
> this may initiate the problem. I also have another sio/com based program
> running every few minutes which is also connected to the 6.X library
> (scom thermal application for temperature monitoring) and turning both
> of these off seems to help.  I am going to try a 24 hour period without
> either of these two running after a fresh reboot and we will see if this
> is indeed one source to my abominable problem.
>
> Once the system spirals down into its locking then the io performance
> never seems to recover unless I reboot it or somehow find the process
> that is locked and kill it.
>

So after the testing period with the scom not running and  wusage not
running I still have the problem albeit less often. The stress on the
system seems to bring it forward as notice by my other tests. It is
easiest to see on httpd and a stop needs to be run try to free up the
system.

Dec 19 07:32:43 /usr/local/sbin/apachectl -k stop
Dec 19 07:33:35 kernel: pid 31376 (httpd), uid 80: exited on signal 11
Dec 19 07:33:35 kernel: pid 31194 (httpd), uid 80: exited on signal 11
Dec 19 07:33:36 kernel: pid 31469 (httpd), uid 80: exited on signal 11
Dec 19 07:33:36 kernel: pid 31754 (httpd), uid 80: exited on signal 11
Dec 19 07:33:37 kernel: pid 31168 (httpd), uid 80: exited on signal 11
Dec 19 07:33:37 kernel: pid 31753 (httpd), uid 80: exited on signal 11
Dec 19 07:33:38 kernel: pid 31763 (httpd), uid 80: exited on signal 11
Dec 19 07:33:38 kernel: pid 31748 (httpd), uid 80: exited on signal 11

Its relatively easy to get this restarted but when it is something else
in the system it is hard to locate and I have to do a reboot to release
the blockage. When it is apache that has "locked" I have to kill it as a
graceful will not work. The majority of the processes are in UFS state
when it happens. If there was a quick way to determine which one has
"locked" the system that would be so helpful as a temporary workaround.

I am going to try to replace the motherboard next, This is one of the
last hardware replacements to isolate this as a hardware problem.