From owner-freebsd-stable@FreeBSD.ORG  Sat Oct 27 22:45:25 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 84A1616A421;
	Sat, 27 Oct 2007 22:45:25 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Received: from weak.local (pointyhat.freebsd.org [IPv6:2001:4f8:fff6::2b])
	by mx1.freebsd.org (Postfix) with ESMTP id 8BEC813C4A7;
	Sat, 27 Oct 2007 22:45:24 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Message-ID: <4723BF87.20302@FreeBSD.org>
Date: Sun, 28 Oct 2007 00:45:27 +0200
From: Kris Kennaway <kris@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: Alexey Popov <lol@chistydom.ru>
References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org>
	<47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org>
	<47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org>
	<4715C297.1020905@chistydom.ru> <4715C5D7.7060806@FreeBSD.org>
	<471EE4D9.5080307@chistydom.ru>
In-Reply-To: <471EE4D9.5080307@chistydom.ru>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: amrd disk performance drop after running under high load
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Oct 2007 22:45:25 -0000

Alexey Popov wrote:
> Hi
> 
> Kris Kennaway wrote:
>>>>> So I can conclude that FreeBSD has a long standing bug in VM that 
>>>>> could be triggered when serving large amount of static data (much 
>>>>> bigger than memory size) on high rates. Possibly this only applies 
>>>>> to large files like mp3 or video. 
>>>> It is possible, we have further work to do to conclude this though.
>>> I forgot to mention I have pmc and kgmon profiling for good and bad 
>>> times. But I have not enough knowledge to interpret it right and not 
>>> sure if it can help.
>> pmc would be useful.
> pmc profiling attached.

Sorry for the delay, I was travelling last weekend and it took a few 
days to catch up.

OK, the pmc traces do seem to show that it's not a lock contention 
issue.  That being the case I don't think the fact that different 
servers perform better is directly related.  In my tests multithreaded 
web servers don't seem to perform well anyway.

There is also no evidence of a VM problem.  What your vmstat and pmc 
traces show is that your system really isn't doing much work at all, 
relatively speaking.

There is also still no evidence of a disk problem.  In fact your disk 
seems to be almost idle in both cases you provided, only doing between 1 
and 10 operations per second, which is trivial.

In the "good" case you are getting a much higher interrupt rate but with 
the data you provided I can't tell where from.  You need to run vmstat 
-i at regular intervals (e.g. every 10 seconds for a minute) during the 
"good" and "bad" times, since it only provides counters and an average 
rate over the uptime of the system.

What there is evidence of is an interrupt aliasing problem between em 
and USB:

irq16: uhci0                  1464547796       1870
irq64: em0                    1463513610       1869

This is a problem on some intel systems.  Basically each em0 interrupt 
is also causing a bogus interrupt to the uhci0 device too.  This will be 
causing some overhead and might be contributing to the UMA problems.  I 
am not sure if it is the main issue, although it could be.  It is mostly 
serious when both irqs run under Giant, because they will both fight for 
it every time one of them interrupts.  That is not the case here but it 
could be other bad scenarios too.  You could try disabling USB support 
in your kernel since you dont seem to be using it.

Kris