From owner-freebsd-hackers  Fri Jan 22 17:13:52 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id RAA15547
          for freebsd-hackers-outgoing; Fri, 22 Jan 1999 17:13:52 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from iquest3.iquest.net (iquest3.iquest.net [209.43.20.203])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id RAA15537
          for <hackers@FreeBSD.ORG>; Fri, 22 Jan 1999 17:13:49 -0800 (PST)
          (envelope-from toor@y.dyson.net)
Received: (qmail 13648 invoked from network); 23 Jan 1999 01:10:46 -0000
Received: from dyson.iquest.net (HELO y.dyson.net) (198.70.144.127)
  by iquest3.iquest.net with SMTP; 23 Jan 1999 01:10:46 -0000
Received: (from root@localhost)
	by y.dyson.net (8.9.1/8.9.1) id UAA37690;
	Fri, 22 Jan 1999 20:10:45 -0500 (EST)
Message-Id: <199901230110.UAA37690@y.dyson.net>
Subject: Re: Error in vm_fault change
In-Reply-To: <199901230015.RAA13495@usr09.primenet.com> from Terry Lambert at "Jan 23, 99 00:15:13 am"
To: tlambert@primenet.com (Terry Lambert)
Date: Fri, 22 Jan 1999 20:10:45 -0500 (EST)
Cc: dyson@iquest.net, tlambert@primenet.com, dillon@apollo.backplane.com,
        hackers@FreeBSD.ORG
From: "John S. Dyson" <dyson@iquest.net>
Reply-To: dyson@iquest.net
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Terry Lambert said:
> > Actually, the RSS code has been in the kernel for about 3yrs now, and
> > is well understood.  If it was being written from scratch, I would be
> > more likely to agree with you.  The kernel RSS limiting code works mostly
> > for private data in the process.
> 
> Nevertheless, an RSS based implementation is still going to fail to
> address the major issue for a public server platform.  Matt's
> objects are grounded in the idea that one or two users can stage
> a DOS attack against other users of the system, and, in general, if
> you allow semi-literate people to run code from the net, this *will*
> happen, even without evil intent.
>
The RSS limiting and mgmt as I have proposed is as effective as possible (in
fact, could be extended to support shared objects.)

> 
> I also think that it moves away from the idea of implementing some
> form of per process working set quota, and it disallows special cases
> for "well behaved code that nevertheless needs a lot of pages, just
> as if it were really badly behaved code".
>
The RSS limiting does (fully) support that, if slightly extended.  (note that
the FreeBSD RSS limiting also moves pages to lower priority queues as needed.)

> >
> > Not necessarily, think the new versions of GNU C++ :-).
> 
> I'd argue that creating a lot of dirty data pages that are being used
> is not a bad behaviour; maybe it's bad compiler architecture, but
> that's another issue.
>
And that has to be dealt with.  Just because certain tools are hogs, doesn't
mean that FreeBSD should just explain the problem away as "the other program
is the bad guy."

> 
> The specific bad behaviour that I'd like to enforce is file I/O based,
> and has to do with intentional thrashing by a process, either because
> it's out of control (maybe the idiot OS doesn't propagate SIGHUP to
> groups, like all other OS's or something), or because it's badly
> designed in such a way that it thrashes a file.
>
Normal file I/O already works well.  However, the per vnode limiting (for mmap)
is indeed very easy (almost trivial) to implement.  This is of course,
true if the limit is greater than the buffer cache size, and only becomes
slightly less trivial if less than the buffer cache size (due to the required
buffer cache mgmt.)  If you want, you can almost freely yank pages from a
process (like the pageout daemon does), and the system will properly handle
that case.  The FreeBSD VM code is very robust when abused, and is one reason
it is critical to understand that the FreeBSD VM code can fix mistakes for you.
Making changes to the code (without complete understanding) might partially provide
some of the desired result, but might also eventually show a problem.  (The
FreeBSD code does try to close loops, and compensate for unforseen conditions.)

IMO, the code should be relentless in the persuit of handling difficult loads.

> 
> > > This soloution was tried, and worked very well, in a UnixWare 2.0
> > > kernel
> >
> > No UnixWare kernel VM ever worked very well, did it?
> 
> Actually, Steve Baumel, the architect of the SVR4.2 (UnixWare 2.x)
> SMP aware VM system did one hell of a job.  He addressed most of the
> real issues, including the ability to autogrow thread stacks, very
> early on in the game.
>
Autogrowing thread stacks is very easy in FreeBSD, but it is an issue
of specification and requirement.  Moving thread stacks would be more
problematical.

> 
> It's a matter of source tree organization that prevented the working
> set quota going into the SVR4.2 source tree.  Just as the FreeBSD source
> tree is rather badly organized for seperating architectural pieces, the
> USL source tree was nearly impossible to get cross-subsystem changes
> integrated into.  Each compilable developement source tree was actually
> a combination of three seperate source repositories (one for the kernel
> peieces from third parties, one for the system independent code, and one
> for the architecture specific code), and control of the repositories was
> decentralized, as was control of the interfaces.
> 
Well, RSS limiting on FreeBSD 1st pass will take 200 lines of code (max),
and a 2nd pass optimized might be another 100-200 lines (it is likely that
the code will actually be less than 1/2 that size.)  To me, it is more
an issue of not understanding what is there (due to a lack of patience and
not asking the authors for specific answers.)  The FreeBSD code is extremely
flexible, but is mostly broken in the sense of SMP (IMO) or non standard UNIX
requirements (like true realtime goals.) 

>From what I have seen, single solutions aren't often the best answer.  Fully
handling the issue of memory contention will definitely include per process
RSS limiting, but other forms of limiting might be easier and make more sense
in addition to the per-process limiting.  The per-vnode limiting will definitely
help the per process limiting.  In fact, rather than per-vnode limiting, perhaps
shared object limiting in general should be implemented, with proper consideration
for fairness.  (Fairness needs to be carefully defined, since it starts implying
"policy.")  Policy decisions often trade off performance in situations that aren't
considered.

So much for all of this rambling -- IMO for normal U**X timesharing purposes,
the most important issue is to limit the effect of an ill behaved process
on the rest of the system.  In this case, then the supplied RSS limiting is
sufficient.  The hard part is to decide what to do with the pages that are
"limited."   Not only does my supplied RSS limiting limit "RSS", but also
moves pages to lower priority queues.

-- 
John                  | Never try to teach a pig to sing,
dyson@iquest.net      | it makes one look stupid
jdyson@nc.com         | and it irritates the pig.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message