From owner-freebsd-current@FreeBSD.ORG Sun Jan 2 13:53:24 2011 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5357A106564A; Sun, 2 Jan 2011 13:53:24 +0000 (UTC) (envelope-from beat@chruetertee.ch) Received: from marvin.chruetertee.ch (marvin.chruetertee.ch [217.150.245.55]) by mx1.freebsd.org (Postfix) with ESMTP id F11228FC15; Sun, 2 Jan 2011 13:53:23 +0000 (UTC) Received: from daedalus.network.local (231-225.77-83.cust.bluewin.ch [83.77.225.231]) (authenticated bits=0) by marvin.chruetertee.ch (8.14.3/8.14.3) with ESMTP id p02DrMIb015906 (version=TLSv1/SSLv3 cipher=DHE-DSS-CAMELLIA256-SHA bits=256 verify=NO); Sun, 2 Jan 2011 13:53:22 GMT (envelope-from beat@chruetertee.ch) Message-ID: <4D208355.6040404@chruetertee.ch> Date: Sun, 02 Jan 2011 14:53:25 +0100 From: =?ISO-8859-1?Q?Beat_G=E4tzi?= User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.16) Gecko/20101210 Lightning/1.0b1 Thunderbird/3.0.11 MIME-Version: 1.0 To: Kostik Belousov References: <4D1F1AE8.5040704@chruetertee.ch> <20110101151008.GA7762@freebsd.org> <4D1F4A48.6080604@chruetertee.ch> <20110101154537.GW90883@deviant.kiev.zoral.com.ua> <4D1F4FB8.3030303@chruetertee.ch> <20110101161254.GX90883@deviant.kiev.zoral.com.ua> <4D1F5992.8030309@chruetertee.ch> <20110101164649.GY90883@deviant.kiev.zoral.com.ua> <4D1F5D5E.8030602@chruetertee.ch> <20110101172635.GZ90883@deviant.kiev.zoral.com.ua> In-Reply-To: <20110101172635.GZ90883@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Best , current@freebsd.org Subject: Re: Suddenly slow lstat syscalls on CURRENT from Juli X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Jan 2011 13:53:24 -0000 On 01.01.2011 18:26, Kostik Belousov wrote: > On Sat, Jan 01, 2011 at 05:59:10PM +0100, Beat G?tzi wrote: >> On 01.01.2011 17:46, Kostik Belousov wrote: >>> On Sat, Jan 01, 2011 at 05:42:58PM +0100, Beat G?tzi wrote: >>>> On 01.01.2011 17:12, Kostik Belousov wrote: >>>>> On Sat, Jan 01, 2011 at 05:00:56PM +0100, Beat G?tzi wrote: >>>>>> On 01.01.2011 16:45, Kostik Belousov wrote: >>>>>>> Check the output of sysctl kern.maxvnodes and vfs.numvnodes. I suspect >>>>>>> they are quite close or equial. If yes, consider increasing maxvnodes. >>>>>>> Another workaround, if you have huge nested directories hierarhy, is >>>>>>> to set vfs.vlru_allow_cache_src to 1. >>>>>> >>>>>> Thanks for the hint. kern.maxvnodes and vfs.numvnodes were equal: >>>>>> # sysctl kern.maxvnodes vfs.numvnodes >>>>>> kern.maxvnodes: 100000 >>>>>> vfs.numvnodes: 100765 >>>>>> >>>>>> I've increased kern.maxvnodes and the problem was gone until >>>>>> vfs.numvnodes reached the value of kern.maxvnodes again: >>>>>> # sysctl kern.maxvnodes vfs.numvnodes >>>>>> kern.maxvnodes: 150000 >>>>>> vfs.numvnodes: 150109 >>>>> The processes should be stuck in "vlruwk" state, that can be >>>>> checked with ps or '^T' on the terminal. >>>> >>>> Yes, there are various processes in "vlruwk" state, >>>> >>>>>> As the directory structure is quite huge on this server I've set >>>>>> vfs.vlru_allow_cache_src to one now. >>>>> Did it helped ? >>>> >>>> No, it doesn't looks like setting vfs.vlru_allow_cache_src helped. The >>>> problem was gone when I increased kern.maxvnodes until vfs.numvnodes >>>> reached that level. I've stopped all running deamons but numvnodes >>>> doesn't decrease. >>> Stopping the daemons would not decrease the count of cached vnodes. >>> What you can do is to call unmount on the filesystems. Supposedly, the >>> filesystems are busy and unmount shall fail, but it will force freed >>> the vnodes that are unused by any process. >> >> That freed around 1500 vnodes. At the moment the vfs.numvnodes doesn't >> increase rapidly and the server is usable. I will keep an eye it to see >> if I run into the same problem again. > This is too small amount of vnodes to be freed for the typical system, > and it feels like a real vnode leak. It would be helpful if you tried > to identify the load that causes the situation to occur. >From the data I collect from the server it doesn't look like a constant leak: 2010-12-21/18/00: vfs.numvnodes: 84089 2010-12-22/18/00: vfs.numvnodes: 76599 2010-12-23/18/00: vfs.numvnodes: 22854 2010-12-24/18/00: vfs.numvnodes: 17940 2010-12-25/18/00: vfs.numvnodes: 84999 2010-12-26/18/00: vfs.numvnodes: 84601 2010-12-27/18/00: vfs.numvnodes: 32724 2010-12-28/18/00: vfs.numvnodes: 141104 I checked for the exact point in time when vfs.numvnodes hit the limit: 2010-12-28/01/30: vfs.numvnodes: 71566 2010-12-28/01/35: vfs.numvnodes: 82448 2010-12-28/01/40: vfs.numvnodes: 89826 2010-12-28/01/45: vfs.numvnodes: 79625 2010-12-28/01/50: vfs.numvnodes: 24678 2010-12-28/01/55: vfs.numvnodes: 100045 2010-12-28/02/00: vfs.numvnodes: 100351 2010-12-28/02/05: vfs.numvnodes: 100685 2010-12-28/02/10: vfs.numvnodes: 100989 Every night at 1:02 there is a script running that checks out SVN trunk of VirtualBox and tries to build that in the tinderbox. From the tinderbox logs I see that the build of virtualbox-ose-devel-4.0.1.r35314 on FreeBSD 7 Jail was finished at 2010-12-28 01:47:05. At 01:48:55 the build for FreeBSD 8 started but it looks like something went wrong there as this build never finished. Unfortunately I'm not able to reproduce what of that build caused this problem. There were no changes in the VBox SVN between the 27th and the 28th so on the 27th an identical tinderbox run was successful. > You are on the UFS, right ? Yes, UFS with Softupdates except on /: # mount /dev/da0s1a on / (ufs, local) devfs on /dev (devfs, local, multilabel) /dev/da0s1d on /tmp (ufs, local, soft-updates) /dev/da0s1f on /usr (ufs, local, soft-updates) /dev/da0s1e on /var (ufs, local, noexec, nosuid, soft-updates) Beat