From owner-freebsd-bugs Mon Jul 3 7:20: 9 2000 Delivered-To: freebsd-bugs@freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (Postfix) with ESMTP id A505D37B8FB for ; Mon, 3 Jul 2000 07:20:01 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.9.3/8.9.2) id HAA91741; Mon, 3 Jul 2000 07:20:01 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: from plab.ku.dk (plab.ku.dk [130.225.105.65]) by hub.freebsd.org (Postfix) with ESMTP id 30DB837B822 for ; Mon, 3 Jul 2000 07:16:33 -0700 (PDT) (envelope-from tobez@plab.ku.dk) Received: from lion.plab.ku.dk (lion.plab.ku.dk [130.225.105.49]) by plab.ku.dk (8.9.3/8.9.3) with ESMTP id QAA44028 for ; Mon, 3 Jul 2000 16:17:58 +0200 (CEST) (envelope-from tobez@plab.ku.dk) Received: (from tobez@localhost) by lion.plab.ku.dk (8.9.3/8.9.3) id QAA93991; Mon, 3 Jul 2000 16:16:34 +0200 (CEST) (envelope-from tobez) Message-Id: <200007031416.QAA93991@lion.plab.ku.dk> Date: Mon, 3 Jul 2000 16:16:34 +0200 (CEST) From: tobez@tobez.org Reply-To: tobez@tobez.org To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.2 Subject: kern/19672: contigmalloc1() oddity for large alignments (race condition) Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 19672 >Category: kern >Synopsis: contigmalloc1() oddity for large alignments (race condition) >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jul 03 07:20:01 PDT 2000 >Closed-Date: >Last-Modified: >Originator: Anton Berezin >Release: FreeBSD 5.0-CURRENT i386 >Organization: tobez.org >Environment: Most versions of FreeBSD, as far as I can tell. File: src/sys/vm/vm_page.c Function: contigmalloc1() >Description: If an object is requested with a large alignment, say, 1<<24, so that contigmalloc1() is not even able to find a single PQ_FREE or PQ_CACHE page with said alignment, it then proceeds freeing inactive pages, one by one, and then immediately active pages as well, also one by one. The problem is, that after freeing a page (in most cases the routine pages them out --- I inserted some sysctl counters to debug this), it starts again by rescanning the same queue (either PQ_INACTIVE or PQ_ACTIVE), from its head. To me, it looks bad enough even for inactive pages, but for an active queue it's a disaster, unless the box is idle. The point is that, in a nutshell, the following sequence gets executed when contigmalloc1() tries to free the page: vm_pageout_flush(page) which calls vm_pager_put_pages(page) which calls swap_pager_putpages(page) which sleeps (swwrt). When the box is not idle, while the process is blocked in swwrt state, some other process execution will lead to more inactive (some chances) or active (all the chances) pages added, and then contigmalloc1() starts scanning a queue again! >How-To-Repeat: A program that issues the METEORSETGEO ioctl to bktr driver, with relatively large number of frames (in my tests I used 14 frames == 14*768*576*4/4096 == 6049 pages). The bktr driver did not have sufficient space preallocated. For some reason, bktr driver in its get_bktr_mem() function (dev/bktr/bktr_os.c) first tries to do vm_page_alloc_contig() with the alignment of 1<<24, and then, if this fails, proceeds with PAGE_SIZE. [As a side note, I have no idea what is the reason for using such a large alignment in bktr driver. Apparently, this piece of code was copied as is from meteor driver.] On a practically idle box the allocation fails after 4 to 8 seconds. The number of jumps from vm_pageout_flush() callpoint in inactive scan code to PQ_INACTIVE rescan is about 110. The number of jumps from vm_pageout_flush() callpoint in active scan code to PQ_INACTIVE rescan is about 4400. On a busy box (nice -20 perl -e 'for(;;){}') this takes forever - or at least I was not patient enough to wait for completion. The number of jumps increases at a steady rate, most of them are from the `active' piece. In top(1), I observed things like this (please pay attention to Ks and Ms here): Mem: 348K Active, 180K Inact, 21M Wired, 38M Cache, 9899K Buf, 64M Free Swap: 525M Total, 21M Used, 504M Free, 3% Inuse, 1552K Out >Fix: A first obvious thing to do is to remove the 1<<24 alignment allocation from the bktr (and meteor) code. This helps in my particular case. However, I think that the internal workings of contigmalloc1() are seriously broken for large alignments. My understanding is that the page freeing code is somewhat of a last resort for the routine, and it probably should not do that in this case --- the assumption contigmalloc1() takes is that if the very first loop was not able to find even the starting page, then there is a severe memory shortage or something. Not necessarily so. To me, the code simply `does not look right'. And I have no idea what the proper fix might look like. Cheers, Anton. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message