From owner-freebsd-hackers Tue Dec 5 23:14:29 2000 From owner-freebsd-hackers@FreeBSD.ORG Tue Dec 5 23:14:25 2000 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id A237637B400 for ; Tue, 5 Dec 2000 23:14:25 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eB67D8I91529; Tue, 5 Dec 2000 23:13:08 -0800 (PST) (envelope-from dillon) Date: Tue, 5 Dec 2000 23:13:08 -0800 (PST) From: Matt Dillon Message-Id: <200012060713.eB67D8I91529@earth.backplane.com> To: News History File User Cc: hackers@freebsd.org, usenet@tdk.net Subject: Re: vm_pageout_scan badness References: <200012011918.eB1JIol53670@earth.backplane.com> <200012020525.eB25PPQ92768@newsmangler.inet.tele.dk> <200012021904.eB2J4An63970@earth.backplane.com> <200012030700.eB370XJ22476@newsmangler.inet.tele.dk> <200012040053.eB40rnm69425@earth.backplane.com> <200012050545.eB55jL453889@crotchety.newsbastards.org> <200012060519.eB65JS910042@crotchety.newsbastards.org> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :To recap, the difference here is that by cheating, I was able to mlock :one of the two files (the behaviour I was hoping to be able to achieve :through first MAP_NOSYNC alone, then in combination with MADV_WILLNEED :to keep all the pages in memory so much as possible) and achieve a much :improved level of performance -- I'm able to catch up on backlogs from :a full feed that had built up during the time I wasn't cheating -- by :using memory for the history database files rather than for general :filesystem caching. I even have spare capacity! Woo. : :The mlock man page refers to some system limit on wired pages; I get no :error when mlock()'ing the hash file, and I'm reasonably sure I tweaked :the INN source to treat both files identically (and on the other machines :I have running, the timestamps of both files remains pretty much unchanged). :I'm not sure why I'm not seeing the desired results here with both files :(maybe some call hidden somewhere I haven't located yet), but I hope you :can see the improvements so far. I even let abusive readers pound on :me. Well, for a while 'til I got tired of 'em. I think you are on to something here. It's got to be mlock(). Run 'limit' from csh/tcsh and you will see a 'memorylocked' resource. Whatever this resource is as of when innd is run -- presumably however it is initialized for the 'news' user (see /etc/login.conf) is going to effect mlock() operation. mlock() will wire pages. I think you can safely call it on your two smaller history files (history.hash, history.index). I can definitely see how this could result in better performance. :I still don't know for certain if the disk updates I am seeing are :slow because they aren't sorted well, or if they're random pages and :not a sequential set, given that I hope I've ruled out fragmentation :of the database files. I still maintain that in the case of a true :MADV_RANDOM madvise'd file, any attempts to clean out `unused' pages :are ill-advised, or if they're needed, anything other than freeing of :sequential pages results in excess disk activity that gains nothing, :if it's the case that this is not how it's done, due to the nature :of random access. History files are nortorious for random I/O... the problem is due to the hash table being, well, a hash table. The hash table lookups are bad enough but this will also result in random-like lookups on the main history file. You get a little better locality of reference on the main history file (meaning the system can do a better job caching it optimally), but the hash tables are a lost cause so mlock()ing them could be a very good thing. :Yeah, hacking the vm source to allow me to mlock() isn't kosher, but :I wanted to test a theory. Doing so probably requires a few more :tweaks in the INN source to handle expiry, so it seems, so I'd rather :the vm subsystem do this for me automagically with the right invocation :of the suitable mmap/madvise operations, if this is reasonable. At the moment madvise() MADV_WILLNEED does nothing more then activate the pages in question and force them into the process'es mmap. You have to call it every so often to keep the pages 'fresh'... calling it once isn't going to do anything. When you call madvise() MADV_WILLNEED the system has to go through a number of steps before the pages will be thrown away: - it has to remove them from the process pmap - it has to deactivate them - it has to cache them - then it can free them You may be able to achieve an effect very similar to mlock(), but runnable by the 'news' user without hacking the kernel, by writing a quick little C program to mmap() the two smaller history files and then madvise() the map using MADV_WILLNEED in a loop with a sleep(15). Keeping in mind that expire may recreate those files, the program should unmap, close(), and re-open()/mmap/madvise the descriptors every so often (like once a minute). You shouldn't have to access the underlying pages but that would also have a similar effect. If you do, use a volatile pointer so GCC doesn't optimize the access out of the loop. e.g. for (ptr = mapBase; ptr < mapEnd; ptr += pageSize) { volatile char c = *ptr; } or for (ptr = mapBase; ptr < mapEnd; ptr += pageSize) { dummyroutine(*ptr); } And my earlier suggestion above would look something like: for (;;) { open descriptor map for (i = 0; i < 15; ++i) { madvise(mapBase, mapSize, MADV_WILLNEED); sleep(15); } munmap close descriptor } -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message