Date: Sun, 14 Aug 2005 09:06:37 GMT From: Ade Lovett <ade@FreeBSD.org> To: FreeBSD-gnats-submit@FreeBSD.org Subject: kern/84903: Incorrect initialization of nswbuf Message-ID: <200508140906.j7E96bPI018881@freefall.freebsd.org> Resent-Message-ID: <200508140910.j7E9ACXi018926@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 84903 >Category: kern >Synopsis: Incorrect initialization of nswbuf >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Aug 14 09:10:11 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Ade Lovett >Release: All FreeBSD > 5.0 >Organization: Supernews >Environment: Any FreeBSD system (RELENG_5, RELENG_6, and HEAD) after revision 1.132 of sys/vm/vnode_pager.c (4 years, 1 month ago) >Description: Whilst attempting to nail down some serious performance issues (compared with 4.x) in preparation for a 6.x rollout here, we've come across something of a fundamental bug. In this particular environment (a Usenet transit server, so very high network and disk I/O) we observed that processes were spending a considerable amount of time in state 'wswbuf', traced back to getpbuf() in vm/vm_pager.c To cut a long story short, the order in which nswbuf is being initialized is completely, totally, and utterly wrong -- this was introduced by revision 1.132 of vm/vnode_pager.c just over 4 years ago. In vnode_pager.c we find: static void vnode_pager_init(void) { vnode_pbuf_freecnt = nswbuf / 2 + 1; } Unfortunately, nswbuf hasn't been assigned to yet, just happens to be zero (in all cases), and thus the kernel believes that there is only ever *one* swap buffer available. kern_vfs_bio_buffer_alloc() in kern/vfs_bio.c which actually does the calculation and assignment, is called rather further on in the process, by which time the damage has been done. The net result is that *any* calls involving getpbuf() will be unconditionally serialized, completely destroying any kind of concurrency (and performance). Given the memory footprint of our machines, we've hacked in a simple: nswbuf = 0x100; into vnode_pager_init(), since the calculation ends up giving us the maximum number anyway. There are a number of possible 'correct' fixes in terms of re-ordering the startup sequence. With the aforementioned hack, we're now seeing considerably better machine operation, certainly as good as similar 4.10-STABLE boxes. As per $SUBJECT, this affects all of RELENG_5, RELENG_6, and HEAD, and should, IMO, be considered an absolutely required fix for 6.0-RELEASE. >How-To-Repeat: N/A >Fix: We have implemented a local hack as above, given that the memory footprint of the machines would result in the maximal value of nswbuf being assigned in any case. This is not a real fix however. A solution has been offered by Alexander Kabaev <kabaev@gmail.com> as follows, which appears to do the right thing, at least on RELENG_6/i386, which is the only type of machine I have easy access to for testing purposes. In my opinion, it would be a fatal error to release 6.0 in any shape or form without addressing this issue. Index: vm_init.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_init.c,v retrieving revision 1.46 diff -u -r1.46 vm_init.c --- vm_init.c 25 Apr 2005 19:22:05 -0000 1.46 +++ vm_init.c 9 Aug 2005 01:59:12 -0000 @@ -124,7 +124,7 @@ vm_map_startup(); kmem_init(virtual_avail, virtual_end); pmap_init(); - vm_pager_init(); + /* vm_pager_init(); */ } void Index: vm_pager.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_pager.c,v retrieving revision 1.105 diff -u -r1.105 vm_pager.c --- vm_pager.c 18 May 2005 20:45:33 -0000 1.105 +++ vm_pager.c 9 Aug 2005 01:59:55 -0000 @@ -202,6 +202,8 @@ struct buf *bp; int i; + vm_pager_init(); + mtx_init(&pbuf_mtx, "pbuf mutex", NULL, MTX_DEF); bp = swbuf; /* >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200508140906.j7E96bPI018881>