Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Nov 1995 12:32:42 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        dyson@freefall.freebsd.org (John Dyson)
Cc:        rminnich@Sarnoff.COM, lm@slovax.engr.sgi.com, hackers@FreeBSD.ORG, waa@aurora.cis.upenn.edu, deraadt@theos.com, chuck@maria.wustl.edu
Subject:   Re: larry: you might want to add this to lmbench (but i'm not sure)
Message-ID:  <199511101932.MAA04151@phaeton.artisoft.com>
In-Reply-To: <199511101848.KAA06850@freefall.freebsd.org> from "John Dyson" at Nov 10, 95 10:48:38 am

next in thread | previous in thread | raw e-mail | index | archive | help
Some issues brought to light by the otherwise inane failure case
optimization requirements of this benchmark...

> You have found one of FreeBSD's VM's dark secrets!!!  Not only does
> FreeBSD wait for the invalid page fault to occur -- it also creates and
> destroys the page table page that whould have covered that address!!!
> That was a design decision to get rid of useless page table pages as
> quickly as possible.  Tell me, what is the best thing to do in this
> case?
> 
> My opinion is to make the common case quick -- depend on page faults
> to handle the exceptional condition.  The reason that page-table-pages
> are freed quickly is that it makes more memory available (and I have
> some benchmarks that do evil things on the original Mach VM system when
> you don't free the page table pages :-)).  A bit of restucturing could
> elimination the creation/deletion of the page table page though. (note that
> page table pages are demand-zeroed -- lots of bzero time!!!)
> 
> Does this slow things down running real applications?  If it does, I'll
> fix it.  This DOES NOT reflect the actual pagefault time however (about
> 60-90usecs on a 486/66), because of the continual bezeroing the
> page-table-page.  Let me know (anyone) what you think !!!

The issue of the extraneous create/delete is an interesting one; it
should probably be recoded as you suggested on general principles,
NOT in response to this "benchmark".

The issue of bzero has to do with table usage.  It may in fact be a
general hit.  That it would show up as a component of the time in
this benchmark is not sufficient to be a saving grace: the benchmark
is *still* bogus.

I'd like to have a more formal documentation of the the VM system
before launching into a full blown discussion of the issues involved
(such a document would help immensely in porting efforts as well).  But
lacking that, I will say that a bitmap of the initialized page table
entries might be sufficient to allow you to demand-zero them on a per
entry basis instead of bzeroing the whole thing at once.

The question to be answered here is when is the hit taken, and is it
in a critical path, and is the table zero assumption made implictly
by the access mechanism (hash, whatever).  In other words, does
anyone other than the intended user assume that the thing is initially
zeroed, and if so, what is the cost of hitting the bitmap first.

If the per entry zero is critical path as well, then unless you can
preallocate, then it'd probably a worse hit to set up multiple
zeroing's of small ranges than zeroing all of it at once at the
time the thing is allocated.  I guess this would depend on how
full your average page table ends up being relative to the setup
costs times the number of entries in the table.

It may be better to consider the tabled in terms of zones, then
pre-reserve entries outside the critical path.  Or even agregate
them so that the process can be mapped into the kernel space when
active (this presents its own problems) and do *all* of the "checking"
by way of faults (would require a 486 or better in all cases, so you'd
have to retrofit the 386 anyway).

I think this is probably overboard, since Linux uses non-agregated
mapping of the process into the kernel address space.  If that's the
target to beat, agregation will probably be more expensive than it's
worth.  I would strongly advise *against* adopting the Linux process
and kernel mapping model just to score big on what is after all a
bogus measurement of system capability.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199511101932.MAA04151>