From owner-freebsd-hackers  Fri Jan 22 16:15:53 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id QAA08914
          for freebsd-hackers-outgoing; Fri, 22 Jan 1999 16:15:53 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA08908
          for <hackers@FreeBSD.ORG>; Fri, 22 Jan 1999 16:15:51 -0800 (PST)
          (envelope-from tlambert@usr09.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.8.8/8.8.8) id RAA24815;
	Fri, 22 Jan 1999 17:15:31 -0700 (MST)
Received: from usr09.primenet.com(206.165.6.209)
 via SMTP by smtp03.primenet.com, id smtpd024700; Fri Jan 22 17:15:24 1999
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id RAA13495;
	Fri, 22 Jan 1999 17:15:13 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199901230015.RAA13495@usr09.primenet.com>
Subject: Re: Error in vm_fault change
To: dyson@iquest.net
Date: Sat, 23 Jan 1999 00:15:13 +0000 (GMT)
Cc: tlambert@primenet.com, dillon@apollo.backplane.com, hackers@FreeBSD.ORG
In-Reply-To: <199901222353.SAA36870@y.dyson.net> from "John S. Dyson" at Jan 22, 99 06:53:18 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Actually, the RSS code has been in the kernel for about 3yrs now, and
> is well understood.  If it was being written from scratch, I would be
> more likely to agree with you.  The kernel RSS limiting code works mostly
> for private data in the process.

Nevertheless, an RSS based implementation is still going to fail to
address the major issue for a public server platform.  Matt's
objects are grounded in the idea that one or two users can stage
a DOS attack against other users of the system, and, in general, if
you allow semi-literate people to run code from the net, this *will*
happen, even without evil intent.


> > What I suggest is that vnodes with more than a certain number of
> > pages associated with them be forced to steal pages from their
> > own usage, instead of obtaining them from the system page pool.
>
> Vnodes aren't the only structure that contains data -- maybe you
> mean vm objects also.  In fact, vnode or "shared" data isn't usually
> the problem with memory usage.  However a vnode quota is probably
> a good idea also.

I think that the general case of the quota is on the object_t, but the
enforcement of the quota is in the page case.

One of the obvious reasons for wanting a VM object alias is to allow
direct read-only or COW mapping of of pages already in core as part
of an MFS object backed by core pages and/or swap.

I don't argue the (de)merits (IMO) of trying to solve what I think is
the wrong problem.  But I will say that the enforcement of the quota
on all objects is highly problematic, and that there is already a very
nice chokepoint presenting itself for our (ab)use, and that is the vnode
pager.

I think it would be very hard to implement general VM object quota
enforcement, and I think that it's the wrong thing to do in any case
(except maybe as a means of global policy enforcement).

I also think that it moves away from the idea of implementing some
form of per process working set quota, and it disallows special cases
for "well behaved code that nevertheless needs a lot of pages, just
as if it were really badly behaved code".


> > In general, when we talk about badly behaved processes, we are
> > talking about processes with large working sets that are directly
> > mapped to vnode backing objects.
>
> Not necessarily, think the new versions of GNU C++ :-).

I'd argue that creating a lot of dirty data pages that are being used
is not a bad behaviour; maybe it's bad compiler architecture, but
that's another issue.

The specific bad behaviour that I'd like to enforce is file I/O based,
and has to do with intentional thrashing by a process, either because
it's out of control (maybe the idiot OS doesn't propagate SIGHUP to
groups, like all other OS's or something), or because it's badly
designed in such a way that it thrashes a file.

Basically, this would mean that if the page at the end of the LRU
that's going to be forced out is dirty, so be it.  It will increase
swapping for that process, but reduce swapping overally by 2 times
as much, in that it won't force pages that another process is about
to use out of core.

Think of paging out as positive caching and avoiding it as negative
caching.  Negative caching is 2 times better than positive caching,
for most applications where the hash space doesn't bloat out of
control.

In the vm_object_t case, the hash space is (relatively) fixed, so it's
a major win.


> > This soloution was tried, and worked very well, in a UnixWare 2.0
> > kernel
>
> No UnixWare kernel VM ever worked very well, did it?

Actually, Steve Baumel, the architect of the SVR4.2 (UnixWare 2.x)
SMP aware VM system did one hell of a job.  He addressed most of the
real issues, including the ability to autogrow thread stacks, very
early on in the game.

It's very much a shame that other groups within USL failed to utilize
his code.  And that their participation on standards committies were
such that we ended up with standards where the stack is passed to
the thread at creation time, instead of the creation interface being
responsible for the stack creation.

Basically, don't blame Steve's design for the bad implementation of
various parts of SVR4.2.

It's a matter of source tree organization that prevented the working
set quota going into the SVR4.2 source tree.  Just as the FreeBSD source
tree is rather badly organized for seperating architectural pieces, the
USL source tree was nearly impossible to get cross-subsystem changes
integrated into.  Each compilable developement source tree was actually
a combination of three seperate source repositories (one for the kernel
peieces from third parties, one for the system independent code, and one
for the architecture specific code), and control of the repositories was
decentralized, as was control of the interfaces.

Organizationally, I'd say SVR4.2 was damn lucky to even have a
mechanism like user defined scheduling classes that could be abused to
address the problem at all.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message