From owner-freebsd-arch  Tue Nov  7 17: 8:37 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id 4DEBB37B479
	for <arch@FreeBSD.ORG>; Tue,  7 Nov 2000 17:08:33 -0800 (PST)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id SAA17607;
	Tue, 7 Nov 2000 18:06:35 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp03.primenet.com, id smtpdAAAm_aySD; Tue Nov  7 18:01:23 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id SAA01102;
	Tue, 7 Nov 2000 18:02:56 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011080102.SAA01102@usr08.primenet.com>
Subject: Re: softdep panic due to blocked malloc (with traceback)
To: julian@elischer.org (Julian Elischer)
Date: Wed, 8 Nov 2000 01:02:56 +0000 (GMT)
Cc: rjesup@wgate.com (Randell Jesup),
	gibbs@scsiguy.com (Justin T. Gibbs),
	dillon@earth.backplane.com (Matt Dillon),
	phk@critter.freebsd.dk (Poul-Henning Kamp),
	bde@zeta.org.au (Bruce Evans), mckusick@mckusick.com (Kirk McKusick),
	arch@FreeBSD.ORG
In-Reply-To: <3A087257.DBA40791@elischer.org> from "Julian Elischer" at Nov 07, 2000 01:21:27 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >         I think both Matt's changes and what Poul-Henning can be useful.
> > (Actually, it sounds like Matt's are required, and Poul-Henning's might be
> > nice if and when someone does them).
> 
> I think that they are talking with cross purposes..
> 
> Matt is right that nothing that magically comes up with a few
> hundred KB of ram can be guaranteed to stop a deadlock, because
> after the few hundred KB have been used up, If the big memory
> hog keeps eating memory, you are right back where you started,
> and you no longer have a few hundred KB up your sleeve. 

Djikstra's Bankers Algorithm solves this by stalling the big
memory hog, if it can't have pages stolen from it (i.e. have
its working set size reduced).  No CPU cycles granted to the
hog equals no new allocations to the hog.

I think that the problem being examined is the one that Poul
and Matt and Alfred discussed a little while ago, where you
have a hog, and its pages are marked as anonymous, yet are
dirty.  This is really an evil thing to do, since the pages
are not permitted to be swapped, nor are they permitted to
be cleaned.

I think the right thing to do would be to treat the flag as
a hint, not a granted right, and start swapping the pages
anyway.

A more ideal soloution would not let you get into trouble in
the first place, by refusing to dirty buffers faster than they
can be written -- in other words, if the system is overloaded,
it slows down: degrading gracefully.


> On the other hand, PHK is correct in that it would be a useful
> facility to have and that it might buy some breathing space.
> To be ueful however I think it would need to be combined with
> some other measures to ensure that we don't get straight back
> into debt. For example triggering that queue might change the
> strategies in the kernel so that the biggest memory users are
> forced to start losing pages. (e.g. it's swapped out) or some
> similar work.. 

This will be, without a doubt, required, in order to ensure an
artificial scarcity isn't created.  You could consider this the
case where you have per CPU resource pools, or you have memory
tied up in other places, which could reasonably be recovered.
NT and Windows 95/98/2000 both have the capability for the VM
system to demand that resources be returned to it, under low
memory conditions.  This may not help, unless there is a reserve
which is kept seperately for I/O buffers vs. other buffers, so
the unified VM and buffer cache work against easily resolving
this (this may be why Sun punted in Solaris 2.8, and deunified
their cache again).

As to another comment in this thread, SIGDANGER was intended to
make processes free up resources, not the system, so I don't
think the AIX approach will work here.

Arguably, if one is going to do what Yahoo has been doing in
order to workaround other problems, the memory that gets
allocated to this purpose should probably be (1) wired down,
and (2) size restricted to say 75% of available memory, or
some other hard limit.  Effectively, memory used in such a way
as to cause this problem should probably not be overcommitted.

There's also the historical problem with page reclaimation,
after the disassociation of an in core inode with the vnode
off which clean buffers are hung.  This is, inded, wasted
memory, and those buffers should probably be released sooner,
when the inode/vnode relationship is severed, rather than
later, when the reclaimer gets around to it.

I'd actually be really interested in the statistics with regard
to "reclaimable clean vs. dirty memory" for a system once it
has hit "extreme low memory conditions" leading to deadlock.  I
think that you couldn't say that "low memory conditions always
result from X, and never from Y", and that different usage
patterns would mean different root causes -- and thus different
"optimal recovery strategies".

Anyway, real statistics gathered at or near the failure point
for these systems would be truly useful; if all the memory is
fragmented, then kicking subsystems to make them release what
they can won't help (I personally doubt this will be the case).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message