Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Dec 2000 14:55:16 +0100 (CET)
From:      News History File User <newsuser@free-pr0n.netscum.dk>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        hackers@freebsd.org, usenet@tdk.net
Subject:   Re: vm_pageout_scan badness
Message-ID:  <200012061355.eB6DtGS34255@crotchety.newsbastards.org>
In-Reply-To: <200012060713.eB67D8I91529@earth.backplane.com>
References:  <200012011918.eB1JIol53670@earth.backplane.com> <200012020525.eB25PPQ92768@newsmangler.inet.tele.dk> <200012021904.eB2J4An63970@earth.backplane.com> <200012030700.eB370XJ22476@newsmangler.inet.tele.dk> <200012040053.eB40rnm69425@earth.backplane.com> <200012050545.eB55jL453889@crotchety.newsbastards.org> <200012060519.eB65JS910042@crotchety.newsbastards.org> <200012060713.eB67D8I91529@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> :The mlock man page refers to some system limit on wired pages; I get no
> :error when mlock()'ing the hash file, and I'm reasonably sure I tweaked
> :the INN source to treat both files identically (and on the other machines
> :I have running, the timestamps of both files remains pretty much unchanged).
> :I'm not sure why I'm not seeing the desired results here with both files
> 
>     I think you are on to something here.  It's got to be mlock().  Run
>     'limit' from csh/tcsh and you will see a 'memorylocked' resource.
>     Whatever this resource is as of when innd is run -- presumably however
>     it is initialized for the 'news' user (see /etc/login.conf) is going

Yep, `unlimited'...  same as the bash `ulimit -a'.  OH NO.  I HAVE IT
SET TO `infinity' IN LOGIN DOT CONF, no wonder it is all b0rken-like.

The weird thing is that mlock() does return success, the amount of
wired memory matches the two files, and I've seen nothing obvious in
the source code as to why it's different, but I'll keep plugging away
at it.


>     History files are nortorious for random I/O... the problem is due
>     to the hash table being, well, a hash table.  The hash table 
>     lookups are bad enough but this will also result in random-like
>     lookups on the main history file.  You get a little better
>     locality of reference on the main history file (meaning the system

Ah, but ...  This is how the recent history format (based on MD5 hashes)
introduced as dbz v6 at the time you were busy with Diablo and your
history mechanism there differs from that which you remember -- AIEEEE,
speaking of your 64-bit CRC history mechanism, whatever happened to the
links that would get you there from the backplane homepage... -- in this
case, you don't do the random-like lookups to verify message ID presence
in the text file at all.  Everything you do is in the data in the two hash
tables.  At least for transit.  I'm not sure if the reader requests do
require a hit on the main file -- it'd be worth it to point a Diablo
frontend at such a box to see how it does there even when the overview
performance for traditional readership is, uh, suboptimal.  I think it
does but that's a trivial seek to one specific known offset.

I'm sure this is applicable to other databases somehow, for those who
aren't doing news and are bored stiff by this.


>     At the moment madvise() MADV_WILLNEED does nothing more then activate
>     the pages in question and force them into the process'es mmap.
>     You have to call it every so often to keep the pages 'fresh'... calling
>     it once isn't going to do anything.  

Well, it definitely does do a Good Thing when I call it once, as you
can see from the initial timer numbers that approach the long-running
values I'm used to (that I tried to simulate by doing lookups on a small
fraction of history entries, in hope of activating a majority of the
needed pages, that wasn't perfect but was a decent hack).  You can see
from the timestamps of the debugging here that while it slows down the
startup somewhat, the work of reading in the data happens quickly and
is a definite positive tradeoff:

Dec  6 07:32:14 crotchety innd: dbz openhashtable /news/db/history.index
Dec  6 07:32:14 crotchety innd: dbz madvise WILLNEED ok
Dec  6 07:32:14 crotchety innd: dbz madvise RANDOM ok
Dec  6 07:32:14 crotchety innd: dbz madvise NOSYNC ok
Dec  6 07:32:27 crotchety innd: dbz mlock ok
Dec  6 07:32:27 crotchety innd: dbz openhashtable /news/db/history.hash
Dec  6 07:32:27 crotchety innd: dbz madvise WILLNEED ok
Dec  6 07:32:27 crotchety innd: dbz madvise RANDOM ok
Dec  6 07:32:27 crotchety innd: dbz madvise NOSYNC ok
Dec  6 07:32:38 crotchety innd: dbz mlock ok

This happens quickly when the data is still in cache, leading me to
believe it's something else affecting the .hash file (I added the
madvise() MADV_NOSYNC call just in case somehow it wasn't happening
in the mmap() for some reason):

Dec  6 09:29:34 crotchety innd: dbz openhashtable /news/db/history.index
Dec  6 09:29:34 crotchety innd: dbz madvise WILLNEED ok
Dec  6 09:29:34 crotchety innd: dbz madvise RANDOM ok
Dec  6 09:29:34 crotchety innd: dbz madvise NOSYNC ok
Dec  6 09:29:34 crotchety innd: dbz mlock ok
Dec  6 09:29:34 crotchety innd: dbz openhashtable /news/db/history.hash
Dec  6 09:29:34 crotchety innd: dbz madvise WILLNEED ok
Dec  6 09:29:34 crotchety innd: dbz madvise RANDOM ok
Dec  6 09:29:34 crotchety innd: dbz madvise NOSYNC ok
Dec  6 09:29:34 crotchety innd: dbz mlock ok


>     You may be able to achieve an effect very similar to mlock(), but
>     runnable by the 'news' user without hacking the kernel, by 

Yeah, sounds like a hack, but I figured out what was going on earlier
with my mlock() hack -- INN and the reader daemon now use a dynamically
linked library so the nnrpd processes also were trying to mlock() the
files too.  Hmmm.  Either I can statically compile INN (which I chose
to do) or I can further butcher the source by attempting to prevent
nnrpd from making the mlock() call -- it makes the same mmap() and
madvise() calls in the lib/dbz.c routines.  Or I can lobby for the
basic functionality of mlock() in userland, or try such a hack as you
outline:

>     writing a quick little C program to mmap() the two smaller history
>     files and then madvise() the map using MADV_WILLNEED in a loop
>     with a sleep(15).  Keeping in mind that expire may recreate those
>     files, the program should unmap, close(), and re-open()/mmap/madvise the 
>     descriptors every so often (like once a minute).  You shouldn't have

Yeah, and because the reader processes do hold open the inodes for a
good long time as long as the reader hangs around, I can see having
to be a bit careful about planning reserve space.

Thanks for the input.  I can see that such a machine would be busy
trying to manage the data for
* more than 10GB per hour of incoming articles
* some amount of newly-created or updated overview data
* readers concentrating on pulling down the entire overview data,
  repeatedly, for a few groups, due to poorly-designed reader clients
* readers pulling down several gigs per hour, concentrated on pr0n
  and similar newsgroups, in article data.
And above all that, I want to keep a couple hundred megs of history
hash table data in memory for quick access.  No wonder the transit
machines had no difficulty doing this most of the time, while I had
to cheat to get this to happen when doing the additional filesystem
work to support readers...

Er, quick correction -- make that more than 11GB per hour of incoming
articles.  I just looked at today's stats.  Urk.  Be glad you got out
of news when you did...

thanks for all the input!
barry bouwsma, bandwidth hog



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012061355.eB6DtGS34255>