From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 08:20:42 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AEEA16A41B; Wed, 17 Oct 2007 08:20:41 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx1.freebsd.org (Postfix) with ESMTP id 6129913C457; Wed, 17 Oct 2007 08:20:39 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <4715C5D7.7060806@FreeBSD.org> Date: Wed, 17 Oct 2007 10:20:39 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Alexey Popov References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org> <47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org> <47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org> <4715C297.1020905@chistydom.ru> In-Reply-To: <4715C297.1020905@chistydom.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: amrd disk performance drop after running under high load X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2007 08:20:42 -0000 Alexey Popov wrote: >>> This is very unlikely, because I have 5 another video storage servers >>> of the same hardware and software configurations and they feel good. >> Clearly something is different about them, though. If you can >> characterize exactly what that is then it will help. > I can't see any difference but a date of installation. Really I compared > all parameters and got nothing interesting. > >>> At first glance one can say that problem is in Dell's x850 series or >>> amr(4), but we run this hardware on many other projects and they work >>> well. Also Linux on them works. >> >> OK but there is no evidence in what you posted so far that amr is >> involved in any way. There is convincing evidence that it is the mbuf >> issue. > Why are you sure this is the mbuf issue? Because that is the only problem shown in the data you posted. > For example, if there is a real > problem with amr or VM causing disk slowdown, then when it occurs the > network subsystem will have another load pattern. Instead of just quick > sending large amounts of data, the system will have to accept large > amount of sumultaneous connections waiting for data. Can this cause high > mbuf contention? I'd expect to see evidence of the main problem. >>> And few hours ago I received feed back from Andrzej Tobola, he has >>> the same problem on FreeBSD 7 with Promise ATA software mirror: >> Well, he didnt provide any evidence yet that it is the same problem, >> so let's not become confused by feelings :) > I think he is telling about 100% disk busy while processing ~5 > transfers/sec. "% busy" as reported by gstat doesn't mean what you think it does. What is the I/O response time? That's the meaningful statistic for evaluating I/O load. Also you didnt post about this. >>> So I can conclude that FreeBSD has a long standing bug in VM that >>> could be triggered when serving large amount of static data (much >>> bigger than memory size) on high rates. Possibly this only applies to >>> large files like mp3 or video. >> It is possible, we have further work to do to conclude this though. > I forgot to mention I have pmc and kgmon profiling for good and bad > times. But I have not enough knowledge to interpret it right and not > sure if it can help. pmc would be useful. > Also now I run nginx instead of lighttpd on one of the problematic > servers. It seems to work much better - sometimes there is a peaks in > disk load, but disk does not become very slow and network output does > not change. The difference of nginx is that it runs in multiple > processes, while lighttpd by default has only one process. Now I > configured lighttpd on other server to run in multiple workers. I'll see > if it helps. > > What else can i try? Still waiting on the vmstat -z output. Kris