From owner-freebsd-stable@FreeBSD.ORG Sat Oct 27 22:45:25 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84A1616A421; Sat, 27 Oct 2007 22:45:25 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (pointyhat.freebsd.org [IPv6:2001:4f8:fff6::2b]) by mx1.freebsd.org (Postfix) with ESMTP id 8BEC813C4A7; Sat, 27 Oct 2007 22:45:24 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <4723BF87.20302@FreeBSD.org> Date: Sun, 28 Oct 2007 00:45:27 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Alexey Popov References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org> <47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org> <47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org> <4715C297.1020905@chistydom.ru> <4715C5D7.7060806@FreeBSD.org> <471EE4D9.5080307@chistydom.ru> In-Reply-To: <471EE4D9.5080307@chistydom.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: amrd disk performance drop after running under high load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Oct 2007 22:45:25 -0000 Alexey Popov wrote: > Hi > > Kris Kennaway wrote: >>>>> So I can conclude that FreeBSD has a long standing bug in VM that >>>>> could be triggered when serving large amount of static data (much >>>>> bigger than memory size) on high rates. Possibly this only applies >>>>> to large files like mp3 or video. >>>> It is possible, we have further work to do to conclude this though. >>> I forgot to mention I have pmc and kgmon profiling for good and bad >>> times. But I have not enough knowledge to interpret it right and not >>> sure if it can help. >> pmc would be useful. > pmc profiling attached. Sorry for the delay, I was travelling last weekend and it took a few days to catch up. OK, the pmc traces do seem to show that it's not a lock contention issue. That being the case I don't think the fact that different servers perform better is directly related. In my tests multithreaded web servers don't seem to perform well anyway. There is also no evidence of a VM problem. What your vmstat and pmc traces show is that your system really isn't doing much work at all, relatively speaking. There is also still no evidence of a disk problem. In fact your disk seems to be almost idle in both cases you provided, only doing between 1 and 10 operations per second, which is trivial. In the "good" case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the "good" and "bad" times, since it only provides counters and an average rate over the uptime of the system. What there is evidence of is an interrupt aliasing problem between em and USB: irq16: uhci0 1464547796 1870 irq64: em0 1463513610 1869 This is a problem on some intel systems. Basically each em0 interrupt is also causing a bogus interrupt to the uhci0 device too. This will be causing some overhead and might be contributing to the UMA problems. I am not sure if it is the main issue, although it could be. It is mostly serious when both irqs run under Giant, because they will both fight for it every time one of them interrupts. That is not the case here but it could be other bad scenarios too. You could try disabling USB support in your kernel since you dont seem to be using it. Kris