From owner-freebsd-stable Tue Dec 12 18: 9: 5 2000 From owner-freebsd-stable@FreeBSD.ORG Tue Dec 12 18:09:01 2000 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id 1FED937B402 for ; Tue, 12 Dec 2000 18:09:01 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eBD290M79194; Tue, 12 Dec 2000 18:09:00 -0800 (PST) (envelope-from dillon) Date: Tue, 12 Dec 2000 18:09:00 -0800 (PST) From: Matt Dillon Message-Id: <200012130209.eBD290M79194@earth.backplane.com> To: freebsd-stable@FreeBSD.ORG Cc: Kachun Lee Subject: Re: Extreme high load with 12/7 4-releng References: <200012120230.SAA32402@pathlink.net> <200012121801.KAA42878@pathlink.net> <200012122138.NAA69074@pathlink.net> <200012122231.eBCMVE353411@earth.backplane.com> Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Well, I've found a bunch of problems. I'm not sure exactly which one Kachun is hitting, but I think it's likely he's hitting one of them. * I removed --page_shortage in the 'pageout daemon flushing dirty page' case. I shouldn't have. This can cause the pageout daemon to free up way too many clean pages. * The pageout daemon skips pages for vnodes it can't lock. BIG mistake. This results in completely non-optimal paging operation. It turns out that if the pageout daemon is woken up from a vm_fault, which is quite common, it is highly likely that the vm_fault will be holding a vnode lock and be in the middle of an I/O when the pageout daemon runs, causing the pageout daemon to ignore the vnode the vm_fault is sitting on. If you have a lot of processes doing I/O, a lot of vnodes get ignored. The lock-skipping code was originally in to prevent the pageout daemon from deadlocking in a low-memory situation, and to prevent it from locking up on dead NFS nodes. However, with the low-memory deadlock fixes I recently committed, I think we may be able to safely lock the vnode in the pageout daemon now. * The pageout daemon reorders pages it had to 'skip'. The main culprit is when it decides it can't lock the vnode. The reordering for this case only occurs for dirty pages which results in fragmentation of the queue ordering. Additionally it gives dirty pages 'triple priority'... they get moved to the end of the inactive queue, and they also get moved to the end of the inactive queue when they are successfully cleaned. This causes originally dirty pages to stick around much too long. I'm not certain why Kachun isn't having a problem with 4.1.1, because most of these problems are at least a year old. But I can see how recent low-memory handling changes might have exasperated the existing problems. The --page_shortage issue in particular really hoses the inactive scan when maxlaunder is not sufficient to clean the dirty pages. To give you an idea on the difference in performance, running a program on my test box to iterate through a huge (3xMain-memory) file via mmap, alternately touching 8K and accessing 8K, resulted in long system stalls and a pidly pageout rate of maybe 2MB/sec. To disk. When I made the pageout daemon block in the vnode lock rather then skip the vnode, fixed --page_shortage, got rid of the inactive queue reordering, and got rid of the artificial maxlaunder limitation, the paging rate went up to 24 MB/sec and the system no longer stalled. No stalling whatsoever. The system ran like clockwork despite having no free memory. I am going to make patch sets available for -current and -stable later tonight for testing. The changes are straightforward but serious so it could be up to two weeks before they get into -stable. I intend to commit them to -current later this week, maybe thursday. Some serious review is needed to ensure that the vnode locking change in the pagedaemon does not screw it up when you have things like dead NFS nodes floating around. The patches actually remove a whole lot of code.. the result is smaller then the original :-). that's always nice! I found another serious issue related to the update daemon's synchronization when combined with pages dirtied via mmap() (when mmap is used normally, without MAP_NOSYNC). We really need the incremental syncing feature but unless someone else wants to do it it may be a while before I can get to it. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message