From owner-freebsd-hackers Wed Dec 6 5:55:34 2000 From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 6 05:55:29 2000 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from crotchety.newsbastards.org (netcop.newsbastards.org [193.162.153.124]) by hub.freebsd.org (Postfix) with ESMTP id 14CEA37B400 for ; Wed, 6 Dec 2000 05:55:25 -0800 (PST) Received: (from news@localhost) by crotchety.newsbastards.org (8.11.1/8.11.1) id eB6DtGS34255; Wed, 6 Dec 2000 14:55:16 +0100 (CET) (envelope-from newslooser@free-pr0n.netscum.dk) Date: Wed, 6 Dec 2000 14:55:16 +0100 (CET) Message-Id: <200012061355.eB6DtGS34255@crotchety.newsbastards.org> X-Authentication-Warning: crotchety.newsbastards.org: news set sender to newslooser@free-pr0n.netscum.dk using -f Reply-To: freebsd-user@netscum.dk To: Matt Dillon In-Reply-To: <200012060713.eB67D8I91529@earth.backplane.com> From: News History File User Cc: hackers@freebsd.org, usenet@tdk.net Subject: Re: vm_pageout_scan badness References: <200012011918.eB1JIol53670@earth.backplane.com> <200012020525.eB25PPQ92768@newsmangler.inet.tele.dk> <200012021904.eB2J4An63970@earth.backplane.com> <200012030700.eB370XJ22476@newsmangler.inet.tele.dk> <200012040053.eB40rnm69425@earth.backplane.com> <200012050545.eB55jL453889@crotchety.newsbastards.org> <200012060519.eB65JS910042@crotchety.newsbastards.org> <200012060713.eB67D8I91529@earth.backplane.com> Sender: newslooser@free-pr0n.netscum.dk Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :The mlock man page refers to some system limit on wired pages; I get no > :error when mlock()'ing the hash file, and I'm reasonably sure I tweaked > :the INN source to treat both files identically (and on the other machines > :I have running, the timestamps of both files remains pretty much unchanged). > :I'm not sure why I'm not seeing the desired results here with both files > > I think you are on to something here. It's got to be mlock(). Run > 'limit' from csh/tcsh and you will see a 'memorylocked' resource. > Whatever this resource is as of when innd is run -- presumably however > it is initialized for the 'news' user (see /etc/login.conf) is going Yep, `unlimited'... same as the bash `ulimit -a'. OH NO. I HAVE IT SET TO `infinity' IN LOGIN DOT CONF, no wonder it is all b0rken-like. The weird thing is that mlock() does return success, the amount of wired memory matches the two files, and I've seen nothing obvious in the source code as to why it's different, but I'll keep plugging away at it. > History files are nortorious for random I/O... the problem is due > to the hash table being, well, a hash table. The hash table > lookups are bad enough but this will also result in random-like > lookups on the main history file. You get a little better > locality of reference on the main history file (meaning the system Ah, but ... This is how the recent history format (based on MD5 hashes) introduced as dbz v6 at the time you were busy with Diablo and your history mechanism there differs from that which you remember -- AIEEEE, speaking of your 64-bit CRC history mechanism, whatever happened to the links that would get you there from the backplane homepage... -- in this case, you don't do the random-like lookups to verify message ID presence in the text file at all. Everything you do is in the data in the two hash tables. At least for transit. I'm not sure if the reader requests do require a hit on the main file -- it'd be worth it to point a Diablo frontend at such a box to see how it does there even when the overview performance for traditional readership is, uh, suboptimal. I think it does but that's a trivial seek to one specific known offset. I'm sure this is applicable to other databases somehow, for those who aren't doing news and are bored stiff by this. > At the moment madvise() MADV_WILLNEED does nothing more then activate > the pages in question and force them into the process'es mmap. > You have to call it every so often to keep the pages 'fresh'... calling > it once isn't going to do anything. Well, it definitely does do a Good Thing when I call it once, as you can see from the initial timer numbers that approach the long-running values I'm used to (that I tried to simulate by doing lookups on a small fraction of history entries, in hope of activating a majority of the needed pages, that wasn't perfect but was a decent hack). You can see from the timestamps of the debugging here that while it slows down the startup somewhat, the work of reading in the data happens quickly and is a definite positive tradeoff: Dec 6 07:32:14 crotchety innd: dbz openhashtable /news/db/history.index Dec 6 07:32:14 crotchety innd: dbz madvise WILLNEED ok Dec 6 07:32:14 crotchety innd: dbz madvise RANDOM ok Dec 6 07:32:14 crotchety innd: dbz madvise NOSYNC ok Dec 6 07:32:27 crotchety innd: dbz mlock ok Dec 6 07:32:27 crotchety innd: dbz openhashtable /news/db/history.hash Dec 6 07:32:27 crotchety innd: dbz madvise WILLNEED ok Dec 6 07:32:27 crotchety innd: dbz madvise RANDOM ok Dec 6 07:32:27 crotchety innd: dbz madvise NOSYNC ok Dec 6 07:32:38 crotchety innd: dbz mlock ok This happens quickly when the data is still in cache, leading me to believe it's something else affecting the .hash file (I added the madvise() MADV_NOSYNC call just in case somehow it wasn't happening in the mmap() for some reason): Dec 6 09:29:34 crotchety innd: dbz openhashtable /news/db/history.index Dec 6 09:29:34 crotchety innd: dbz madvise WILLNEED ok Dec 6 09:29:34 crotchety innd: dbz madvise RANDOM ok Dec 6 09:29:34 crotchety innd: dbz madvise NOSYNC ok Dec 6 09:29:34 crotchety innd: dbz mlock ok Dec 6 09:29:34 crotchety innd: dbz openhashtable /news/db/history.hash Dec 6 09:29:34 crotchety innd: dbz madvise WILLNEED ok Dec 6 09:29:34 crotchety innd: dbz madvise RANDOM ok Dec 6 09:29:34 crotchety innd: dbz madvise NOSYNC ok Dec 6 09:29:34 crotchety innd: dbz mlock ok > You may be able to achieve an effect very similar to mlock(), but > runnable by the 'news' user without hacking the kernel, by Yeah, sounds like a hack, but I figured out what was going on earlier with my mlock() hack -- INN and the reader daemon now use a dynamically linked library so the nnrpd processes also were trying to mlock() the files too. Hmmm. Either I can statically compile INN (which I chose to do) or I can further butcher the source by attempting to prevent nnrpd from making the mlock() call -- it makes the same mmap() and madvise() calls in the lib/dbz.c routines. Or I can lobby for the basic functionality of mlock() in userland, or try such a hack as you outline: > writing a quick little C program to mmap() the two smaller history > files and then madvise() the map using MADV_WILLNEED in a loop > with a sleep(15). Keeping in mind that expire may recreate those > files, the program should unmap, close(), and re-open()/mmap/madvise the > descriptors every so often (like once a minute). You shouldn't have Yeah, and because the reader processes do hold open the inodes for a good long time as long as the reader hangs around, I can see having to be a bit careful about planning reserve space. Thanks for the input. I can see that such a machine would be busy trying to manage the data for * more than 10GB per hour of incoming articles * some amount of newly-created or updated overview data * readers concentrating on pulling down the entire overview data, repeatedly, for a few groups, due to poorly-designed reader clients * readers pulling down several gigs per hour, concentrated on pr0n and similar newsgroups, in article data. And above all that, I want to keep a couple hundred megs of history hash table data in memory for quick access. No wonder the transit machines had no difficulty doing this most of the time, while I had to cheat to get this to happen when doing the additional filesystem work to support readers... Er, quick correction -- make that more than 11GB per hour of incoming articles. I just looked at today's stats. Urk. Be glad you got out of news when you did... thanks for all the input! barry bouwsma, bandwidth hog To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message