From owner-freebsd-current@FreeBSD.ORG Thu Dec 15 23:41:37 2011 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F22361065676 for ; Thu, 15 Dec 2011 23:41:36 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id BE3378FC1F for ; Thu, 15 Dec 2011 23:41:36 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id pBFNUqDe063464; Thu, 15 Dec 2011 15:30:56 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201112152330.pBFNUqDe063464@gw.catspoiler.org> Date: Thu, 15 Dec 2011 15:30:52 -0800 (PST) From: Don Lewis To: phk@phk.freebsd.dk In-Reply-To: <1732.1323872049@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: seanbru@yahoo-inc.com, current@FreeBSD.org Subject: Re: dogfooding over in clusteradm land X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2011 23:41:37 -0000 On 14 Dec, Poul-Henning Kamp wrote: > In message <1323868832.5283.9.camel@hitfishpass-lx.corp.yahoo.com>, Sean Bruno > writes: > >>We're seeing what looks like a syncher/ufs resource starvation on 9.0 on >>the cvs2svn ports conversion box. I'm not sure what resource is tapped >>out. > > Search mailarcive for "lemming-syncer" That should only produce a slowdown every 30 seconds but not cause a deadlock. I'd be more suspicious of a memory allocation deadlock. This can happen if the system runs short of free memory because there are a large number of dirty buffers, but it needs to allocate some memory to flush the buffers to disk. This could be more likely to happen if you are using a software raid layer, but I suspect that the recent change to the default UFS block size from 16K to 32K is the culprit. In another thread bde pointed out that the BKVASIZE definition in sys/param.h hadn't been updated to match the new default UFS block size. * BKVASIZE - Nominal buffer space per buffer, in bytes. BKVASIZE is the * minimum KVM memory reservation the kernel is willing to make. * Filesystems can of course request smaller chunks. Actual * backing memory uses a chunk size of a page (PAGE_SIZE). * * If you make BKVASIZE too small you risk seriously fragmenting * the buffer KVM map which may slow things down a bit. If you * make it too big the kernel will not be able to optimally use * the KVM memory reserved for the buffer cache and will wind * up with too-few buffers. * * The default is 16384, roughly 2x the block size used by a * normal UFS filesystem. */ #define MAXBSIZE 65536 /* must be power of 2 */ #define BKVASIZE 16384 /* must be power of 2 */ The problem is that BKVASIZE is used in a number of the tuning calculations in vfs_bio.c: /* * The nominal buffer size (and minimum KVA allocation) is BKVASIZE. * For the first 64MB of ram nominally allocate sufficient buffers to * cover 1/4 of our ram. Beyond the first 64MB allocate additional * buffers to cover 1/10 of our ram over 64MB. When auto-sizing * the buffer cache we limit the eventual kva reservation to * maxbcache bytes. * * factor represents the 1/4 x ram conversion. */ if (nbuf == 0) { int factor = 4 * BKVASIZE / 1024; nbuf = 50; if (physmem_est > 4096) nbuf += min((physmem_est - 4096) / factor, 65536 / factor); if (physmem_est > 65536) nbuf += (physmem_est - 65536) * 2 / (factor * 5); if (maxbcache && nbuf > maxbcache / BKVASIZE) nbuf = maxbcache / BKVASIZE; tuned_nbuf = 1; } else tuned_nbuf = 0; /* XXX Avoid unsigned long overflows later on with maxbufspace. */ maxbuf = (LONG_MAX / 3) / BKVASIZE; /* * maxbufspace is the absolute maximum amount of buffer space we are * allowed to reserve in KVM and in real terms. The absolute maximum * is nominally used by buf_daemon. hibufspace is the nominal maximum * used by most other processes. The differential is required to * ensure that buf_daemon is able to run when other processes might * be blocked waiting for buffer space. * * maxbufspace is based on BKVASIZE. Allocating buffers larger then * this may result in KVM fragmentation which is not handled optimally * by the system. */ maxbufspace = (long)nbuf * BKVASIZE; hibufspace = lmax(3 * maxbufspace / 4, maxbufspace - MAXBSIZE * 10); lobufspace = hibufspace - MAXBSIZE; If you are using the new 32K default filesystem block size, then you may be consuming twice as much memory for buffers than the tuning calculations think you are using. Increasing maxvnodes is probably the wrong way to go, since it will increase memory pressure. As a quick and dirty test, try cutting kern.nbuf in half. The correct fix is probably to rebuild the kernel with BKVASIZE doubled.