Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Mar 2003 09:25:53 +0100
From:      "Poul-Henning Kamp" <phk@phk.freebsd.dk>
To:        David Schultz <das@FreeBSD.ORG>
Cc:        Garance A Drosihn <drosih@rpi.edu>, Dan Nelson <dnelson@allantgroup.com>, Wes Peters <wes@softweyr.com>, freebsd-arch@FreeBSD.ORG
Subject:   Re: Patch to protect process from pageout killing 
Message-ID:  <14382.1048580753@critter.freebsd.dk>
In-Reply-To: Your message of "Mon, 24 Mar 2003 23:53:42 PST." <20030325075342.GA5450@HAL9000.homeunix.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

>But I'm trying to impress on people that SIGDANGER is
>orthogonal to what Wes is trying to do, before the whole thing
>gets bogged down in discussions again and nothing ever happens.
>Here's an example of what I mean in verbose pseudocode with
>fudged constants:

If we are going to do this, we should do it right.

Doing it right means that we should also be sharing enough information
with userland, so that userland can adapt.

Take a simple example:  It makes sense for a program like fsck to
use all the RAM it can get hold off as cache, but it does not make
sense for the cache to be paged out.

As I see it, there is a need for several mechanisms:

1. A mechanism to export to userland enough information about the
   current RAM availability, so that phkmalloc and application
   specific code can make intelligent choices before things go bad.

2. A mechanism to alert userland to the fact that things _have_ gone
   bad.

3. A mechanism to influence the "Who do we kill ?" decision once
   things have gone from bad to worse.

To tackle them from behind:

Wes has a proposal for #3 which is a per-process flag which says
"I'm sacred".  I think that is a sound principle since that is
usually exactly what people want:  Do Not Kill This Process.

Certain processes already enjoy special protection, pid==1 most
notably, this would just be a way to make the same protection
available to other processes.  I'm not happy about using the
resourcelimit code for booleans, and I don't think the flag
should be inherited, but otherwise I'm for the idea.

We have the SIGDANGER proposal for #2, but I think we need to have
two severities:  "Out of RAM" and "Out of VM".  A program like
fsck would start to recycle cached sectors once we're out of RAM.

But I have not seen anybody come up with a good proposal for
#1, and that is where the main benefit would be derived:  It would
allow processes to be good citizens and adjust to the present
situation.

Traditionally userland code is totally oblivious to the overall
system circumstances, the most notable exception is sendmail which
for ages have monitored the loadavg and backe off accordingly.

I think all daemons, and even some non-daemon programs, can benefit
from being aware of more of the systems situation:

	phkmalloc would automatically shed the cache and go into
	"hinting" mode if there were any pageing activity.

	Daemons like named can shed caches.

	Long running daemons could even go through a garbage collect
	to reduce their memory footprint (using realloc() to reduce
	fragmentation).

	Bgfsck can shed all cache and take a nap.

	Sort can use smaller buckets.

The signals in #2 could be used as a cheap substitute for this, but
we would need to add complementary "All Clear" signals to get
processes out of "contingency mode".

I have often wondered about making a single page of "kernel info"
which would be read-only mapped into all processes, (my main agenda
is really evil timekeeping), but it would also be the perfect place
for information like:

	"N free pages in system"
	"N pages of swap used"
	"N pages paged out during the last 1/15/60 seconds"
	"N pages paged in during the last 1/15/60 seconds"
	...
And with cheap access to that information, processes could much
easier taylor their behaviour.


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14382.1048580753>