From owner-freebsd-hackers Tue Dec 14 20:45:21 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id AD0271540F for ; Tue, 14 Dec 1999 20:45:18 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id UAA25923; Tue, 14 Dec 1999 20:45:13 -0800 (PST) (envelope-from dillon) Date: Tue, 14 Dec 1999 20:45:13 -0800 (PST) From: Matthew Dillon Message-Id: <199912150445.UAA25923@apollo.backplane.com> To: Ed Hall Cc: freebsd-hackers@FreeBSD.ORG, edhall@screech.weirdnoise.com Subject: Re: VM Scan Rate: Speed Kills on 3.3 References: <199912150159.RAA15697@screech.weirdnoise.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Under certain circumstances the VM scan rate can spike into the millions/sec :(as reported by vmstat) followed quickly by a system lockup (an endless :loop in vm_pageout()), suggesting that the page queue has been tied in a :loop. This effect was observed in a program ported from Solaris that :updated a large file by mmap()'ing small parts of it. Although using :read()/write() eliminates the problem (and with a sizable increase in :performance as well), there may be other triggers for this bug. : :(A side comment: although using mmap() for file updates in FreeBSD :applications seems to perform quite poorly when compared to read()/write(), :this is not the case on some other systems, such as Solaris. Also, :there may be cases where the shared memory semantics of mmap() are :important to an application such that conversion to read()/write() is :not possible.) : :I've attached a small test program that provokes the same behavior as :... This is a known problem which has not been fixed in 3.x. The problem has been mostly fixed in 4.x. The problem is that in a low-memory situation the page daemon winds up being the only system process left that is capable of cleaning (flushing) dirty pages to disk. However, the page daemon cannot flush pages associated with files whos vnodes are locked. So the lockup occurs when some other process (even another system process) holds the vnode locked and the system runs out of memory. The page daemon scans through tonnes of pages but can't flush any of them due to the locked vnode. There is no simple solution for 3.x. It may be possible to place a workaround in vm_fault to block early on a low memory condition before memory becomes critical but it would be a pretty nasty hack. The below hack (for 3.x) is not something I would commit to the tree because it is too drastic, but try it and see if it solves your problem. -Matt Index: vm_fault.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_fault.c,v retrieving revision 1.93.2.4 diff -u -r1.93.2.4 vm_fault.c --- vm_fault.c 1999/08/29 16:33:30 1.93.2.4 +++ vm_fault.c 1999/12/15 04:43:57 @@ -191,6 +191,10 @@ RetryFault:; fs.map = map; + while ((fault_type & VM_PROT_WRITE) && (cnt.v_free_count + cnt.v_cache_count) < cnt.v_free_min) { + VM_WAIT; + } + /* * Find the backing store object and offset into it to begin the * search. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message