From owner-freebsd-arch  Sun Dec  3 11:55:53 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 11:55:51 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP id 7B9D037B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 11:55:51 -0800 (PST)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id MAA26115;
	Sun, 3 Dec 2000 12:43:44 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp02.primenet.com, id smtpdAAA7NaG.Y; Sun Dec  3 12:43:37 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id MAA29745;
	Sun, 3 Dec 2000 12:48:05 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012031948.MAA29745@usr05.primenet.com>
Subject: Re: Modifying FILE to add lock
To: arch@FreeBSD.ORG
Date: Sun, 3 Dec 2000 19:48:05 +0000 (GMT)
Cc: marcel@cup.hp.com
In-Reply-To: <200012011811.eB1IBqY01763@vashon.polstra.com> from "John Polstra" at Dec 01, 2000 10:11:52 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr05.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> The #1 biggest hassle with the Modula-3 stuff is that it has
> Modula-3 versions of all of the system structures, and they have to
> match exactly for things to work.  Some day I swear I'm going to
> work out a way to generate the M3 versions automatically from the
> header files in /usr/include ...

It's reasonable to think about a description language from
which C/C++, Modula, Ada, Perl, and other header file types
could be post-processed from.

Perl already has a kludge for generating Perl constructs from
C/C++ constructs, so if you wanted to kludge it instead, that
would be a reasonable starting point...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 12: 6:25 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 12:06:23 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP
	id C2BAA37B400; Sun,  3 Dec 2000 12:06:22 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id NAA07008;
	Sun, 3 Dec 2000 13:02:59 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp05.primenet.com, id smtpdAAA5oaWQn; Sun Dec  3 13:02:58 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id NAA00585;
	Sun, 3 Dec 2000 13:06:02 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012032006.NAA00585@usr05.primenet.com>
Subject: Re: zero copy code review
To: dg@root.com
Date: Sun, 3 Dec 2000 20:06:01 +0000 (GMT)
Cc: gallatin@cs.duke.edu (Andrew Gallatin),
	bmilekic@technokratis.com (Bosko Milekic),
	ken@kdm.org (Kenneth D. Merry), arch@FreeBSD.ORG, alfred@FreeBSD.ORG
In-Reply-To: <200012012326.PAA14154@implode.root.com> from "David Greenman" at Dec 01, 2000 03:26:19 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr05.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > > 	In your code, you do deal with the possibility of the MGETHDR
> > >   returning NULL (you check for it) and you set ENOBUFS in that case and
> > >   jump to the "errorpath" label. But, before using MGETHDR, you allocate an
> > >   sf_buf (in sf) and it just so happens that the code beyond "errorpath"
> > >   does not take care of freeing the sf_buf you allocated before even
> > >   trying to allocate the mbuf.
> >
> >I see your point.  This was copied, (bug for bug ;-), from sendfile itself.
> >Look at line 1700 or so of kern/uipc_syscalls.c..  This bug should
> >probaby be fixed there too..
> 
>    Oops. The original assumption (and code that I wrote) was that M_WAIT
> _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and
> as you mentioned, the code added in rev 1.65 that now checks for it in
> sendfile doesn't do complete cleanup in this case. It definately should
> be fixed so that the sf_buf is freed as well.


There's a real easy fix for this:


	m_get_not_broken( flag, type)
		int	flag, type;
	{
		struct mbuf *m;

		do {
			m = m_get( flag, type);
		} while( flag == M_WAIT && m == NULL);

		return( m);
	}

I think the idea that the M_WAIT flag should be broken so that
it can be safely used in interrupt mode is dumb.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 12:12:42 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 12:12:41 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 039FB37B400; Sun,  3 Dec 2000 12:12:41 -0800 (PST)
Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71])
	by feral.com (8.9.3/8.9.3) with ESMTP id MAA13403;
	Sun, 3 Dec 2000 12:12:35 -0800
Date: Sun, 3 Dec 2000 12:12:30 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: Terry Lambert <tlambert@primenet.com>
Cc: dg@root.com, Andrew Gallatin <gallatin@cs.duke.edu>,
	Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review
In-Reply-To: <200012032006.NAA00585@usr05.primenet.com>
Message-ID: <Pine.LNX.4.21.0012031212210.12502-100000@zeppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


> I think the idea that the M_WAIT flag should be broken so that
> it can be safely used in interrupt mode is dumb.

d'accord.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 13: 2:26 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 13:02:24 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id D527537B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 13:02:23 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id PAA28978;
	Sun, 3 Dec 2000 15:52:02 -0500 (EST)
Date: Sun, 3 Dec 2000 15:52:02 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: arch@FreeBSD.ORG, marcel@cup.hp.com
Subject: Re: Modifying FILE to add lock
In-Reply-To: <200012031948.MAA29745@usr05.primenet.com>
Message-ID: <Pine.SUN.3.91.1001203153153.26800A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, 3 Dec 2000, Terry Lambert wrote:
> > The #1 biggest hassle with the Modula-3 stuff is that it has
> > Modula-3 versions of all of the system structures, and they have to
> > match exactly for things to work.  Some day I swear I'm going to
> > work out a way to generate the M3 versions automatically from the
> > header files in /usr/include ...
> 
> It's reasonable to think about a description language from
> which C/C++, Modula, Ada, Perl, and other header file types
> could be post-processed from.
> 
> Perl already has a kludge for generating Perl constructs from
> C/C++ constructs, so if you wanted to kludge it instead, that
> would be a reasonable starting point...

Having done the Ada port, I can say that the only system structures
that cause problems are those that can't be/aren't created by
system calls/library routines.  Those are the _only_ things that
_should_ cause problems; if there are others, then the implementation
(of the affected language/application) is flawed.

The signal set changes caused a big impact because they (signal sets)
aren't created by library routines, and they are parameters in some
very common routines/syscalls as well as being part of struct sigaction,
jmp_buf, and ucontext_t (which are also interfaced to by multi-threaded 
languages).  I'd also imagine that struct timezone or timeval changes
to have similar impact.

But back to FILE and DIR changes, I seriously doubt that any of our
language ports would be affected by these being changed.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 13:19:52 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 13:19:50 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226])
	by hub.freebsd.org (Postfix) with ESMTP id 4983837B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 13:19:50 -0800 (PST)
Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30])
	by palrel3.hp.com (Postfix) with ESMTP
	id 91CDE44F; Sun,  3 Dec 2000 13:19:49 -0800 (PST)
Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180])
	by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id NAA29898;
	Sun, 3 Dec 2000 13:19:49 -0800 (PST)
Sender: marcel@cup.hp.com
Message-ID: <3A2AB8F4.DE04AD9D@cup.hp.com>
Date: Sun, 03 Dec 2000 13:19:48 -0800
From: Marcel Moolenaar <marcel@cup.hp.com>
Organization: Hewlett-Packard
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Daniel Eischen <eischen@vigrid.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
References: <Pine.SUN.3.91.1001203153153.26800A-100000@pcnet1.pcnet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Daniel Eischen wrote:
> 
> But back to FILE and DIR changes, I seriously doubt that any of our
> language ports would be affected by these being changed.

To conclude:

o  Appending the new field has the least impact,
o  Any impact is expected to be marginal or trivially
   fixed.

Go for it!

-- 
Marcel Moolenaar
  mail: marcel@cup.hp.com / marcel@FreeBSD.org
  tel:  (408) 447-4222


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 13:25:46 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 13:25:44 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from field.videotron.net (field.videotron.net [205.151.222.108])
	by hub.freebsd.org (Postfix) with ESMTP id B1CF437B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 13:25:43 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G5000NB5GUP0H@field.videotron.net> for arch@FreeBSD.ORG; Sun,  3 Dec 2000 16:25:38 -0500 (EST)
Date: Sun, 03 Dec 2000 16:26:26 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <200012032006.NAA00585@usr05.primenet.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: dg@root.com, Andrew Gallatin <gallatin@cs.duke.edu>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012031610070.98081-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Sun, 3 Dec 2000, Terry Lambert wrote:

[...]
> There's a real easy fix for this:
> 
> 
> 	m_get_not_broken( flag, type)
> 		int	flag, type;
> 	{
> 		struct mbuf *m;
> 
> 		do {
> 			m = m_get( flag, type);
> 		} while( flag == M_WAIT && m == NULL);
> 
> 		return( m);
> 	}
> 
> I think the idea that the M_WAIT flag should be broken so that
> it can be safely used in interrupt mode is dumb.

	I'm not sure I understand what you're putting forward with the above
  comment, specifically what you're referring to when you say "broken."
  	Are you trying to say that "M_WAIT is broken because it doesn't wait
  forever?" If that's what you're trying to say, the explanation is simple.
  If you "wait indefinetely," or spin as you're doing above, then what
  you're doing is pretty much useless. Let me explain why and how you can
  test my hypothesis without changing a single line of code.

  	First of all, the amount of time spent waiting with M_WAIT is
  completely tunable with the kern.ipc.mbuf_wait sysctl. If you want to
  wait indefinetely, just set it to 0.
  	Second of all, the default value is 32. The reason for that is that
  it is typically sufficient if you're going to get anything in the first
  place. Basically, if you're short on mbufs and you're hoping one will be
  freed then, in the general case (I've established this through various
  testing), on a relatively generic machine, with moderately heavy network
  load, you're going to get one back within the 32 ticks. If it isn't
  sufficient, you can tune from 32 to 64 to whatever it is you feel is
  appropriate. The only case where you won't be getting back what you need
  in the default time, usually, is when the main mbuf consumer is a process
  which is, in effect, sucking up all resources (allocating them for
  itself) -- think local DoS. In that case, even after you wait 32 ticks,
  64 ticks, or infinity ticks, you're likely to not get anything and even
  if you happen to get ONE mbuf, then it's even worse 'cause all that's
  happened is that the offending process has swallowed yet another mbuf and
  prevented the other (essential) system components to allocate.
  	So, in the latter case, if you have a non-offending process calling
  sendfile(2) and trying to allocate an mbuf, it can wait all day if you
  want it to, and it will never get anything until the offending process is
  killed. So, better to have the process return from the kernel and deal
  with the temporary failure. The same goes for the offending process that
  will keep exhausting mbufs in a tight loop; think of what would happen
  once the offending process hits the hard limit and exhausts mbufs. It
  will just be stuck waiting/looping indefinetely in the kernel and will
  not be killable because it will not be able to receive any signals posted
  to it until it returns from the kernel.

	Basically, what I'm telling you is: M_WAIT behavior is not broken in
  FreeBSD, it is entirely tunable and it is better, in the general case, to
  NOT have M_WAIT mean 'wait indefinetely.'

> 
> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 14:18:39 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 14:18:37 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id E3CCA37B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 14:18:36 -0800 (PST)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id PAA27026;
	Sun, 3 Dec 2000 15:16:23 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp03.primenet.com, id smtpdAAAx6ayW0; Sun Dec  3 15:16:16 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id PAA03289;
	Sun, 3 Dec 2000 15:18:26 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012032218.PAA03289@usr05.primenet.com>
Subject: Re: zero copy code review
To: bmilekic@technokratis.com (Bosko Milekic)
Date: Sun, 3 Dec 2000 22:18:26 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert), dg@root.com,
	gallatin@cs.duke.edu (Andrew Gallatin),
	ken@kdm.org (Kenneth D. Merry), arch@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.21.0012031610070.98081-100000@jehovah.technokratis.com> from "Bosko Milekic" at Dec 03, 2000 04:26:26 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr05.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > I think the idea that the M_WAIT flag should be broken so that
> > it can be safely used in interrupt mode is dumb.
> 
> 	I'm not sure I understand what you're putting forward with the above
>   comment, specifically what you're referring to when you say "broken."
>   	Are you trying to say that "M_WAIT is broken because it doesn't wait
>   forever?" If that's what you're trying to say, the explanation is simple.
>   If you "wait indefinetely," or spin as you're doing above, then what
>   you're doing is pretty much useless. Let me explain why and how you can
>   test my hypothesis without changing a single line of code.

[ ... local DOS ... ]

I really don't buy a probability defense.  If a probability defense
were acceptable, then not checking for a NULL return, and eating
the panic that results is also acceptable.


The problem with this theory is that "have the the [non-offending]
process return from the kernel and deal with the temporary failure"
presumes that there is a correct way to work around the failure in
user space.

I would maintain that the failure would be persistant, since this
does nothing to silence the DOS attack, and there is nothing that
a user space program can do, except to retry, and get all the way
down the code path to the same place that it was before.

It seems to me that this is just a case of how big you want to
make your retry loop, not one of whether or not there will be a
retry loop.


As an example of a user space DOS that can result in this, if you
have a FreeBSD machine which has an interface that is the default
route to the network, and a second interface that is the local
network, and the interface which is the default route is "down"
(as in a PPP interface with the modem turned off), you can start
a "ping" of an external machine (e.g. 16.1.0.2) which will
eventually consume all of the mbufs with ICMP echo datagrams which
can't be transmitted.  At this point, machines on the local network
cannot log into the gateway machine over the network to correct the
problem.

I would argue that this level of congestion should be proactively
prohibited from occurring in the first place; the most likely way
to do this correctly is to start "dropping" the oldest datagrams,
NOT returning "NULL" to allocations made on behalf od telnetd or
sshd from the local interface.

In other words, if this is a fear-response for a local DOS,
then there are better ways of achieving the same result,
without still locking up networking.

---

> 	Basically, what I'm telling you is: M_WAIT behavior is not broken in
>   FreeBSD, it is entirely tunable and it is better, in the general case, to
>   NOT have M_WAIT mean 'wait indefinetely.'

As a general bone of contention, if the thing _doesn't_ wait, it
shouldn't be called M_WAIT, it should be called M_TRY_HARDER or
something that indicates that the default behaviour has been
altered, but in fact the routine will not be waiting around until
it is successful, like all of the other _WAIT flags imply.

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 15:24:53 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 15:24:51 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id 9ECD937B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 15:24:50 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G5000D5SMDBVT@falla.videotron.net> for arch@FreeBSD.ORG; Sun,  3 Dec 2000 18:24:48 -0500 (EST)
Date: Sun, 03 Dec 2000 18:25:36 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <200012032218.PAA03289@usr05.primenet.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012031808540.98531-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Sun, 3 Dec 2000, Terry Lambert wrote:

> > > I think the idea that the M_WAIT flag should be broken so that
> > > it can be safely used in interrupt mode is dumb.
> > 
> > 	I'm not sure I understand what you're putting forward with the above
> >   comment, specifically what you're referring to when you say "broken."
> >   	Are you trying to say that "M_WAIT is broken because it doesn't wait
> >   forever?" If that's what you're trying to say, the explanation is simple.
> >   If you "wait indefinetely," or spin as you're doing above, then what
> >   you're doing is pretty much useless. Let me explain why and how you can
> >   test my hypothesis without changing a single line of code.
> 
> [ ... local DOS ... ]
> 
> I really don't buy a probability defense.  If a probability defense
> were acceptable, then not checking for a NULL return, and eating
> the panic that results is also acceptable.

	It's not a "probability defense." It's not a "defense." It's just a
  "don't act the worst way possible when we have an attack." And you
  haven't said at all why waiting indefinetely is better than not,
  especially in the problematic situation I brought up.

> The problem with this theory is that "have the the [non-offending]
> process return from the kernel and deal with the temporary failure"
> presumes that there is a correct way to work around the failure in
> user space.

	No, it doesn't. But it's better for the process to sleep in user
  space than to be INDEFINETELY stuck in the kernel. And, in the case of an
  attack, it _will_ be indefinetely stuck.

> I would maintain that the failure would be persistant, since this
> does nothing to silence the DOS attack, and there is nothing that
> a user space program can do, except to retry, and get all the way
> down the code path to the same place that it was before.

	Right. It's not a preventive measure. But, it's much better to have
  it act in this manner than wait indefinetely "in the case of."

> It seems to me that this is just a case of how big you want to
> make your retry loop, not one of whether or not there will be a
> retry loop.

	The retry loop is _useless_. You drop the mutex and lose priority in
  the wait queue when you return from m_get(). Calling again makes your
  chances of getting an mbuf in a shortage even less probable. If you want
  that behavior, just tweak your kern.ipc.mbuf_wait.

[...]
> I would argue that this level of congestion should be proactively
> prohibited from occurring in the first place; the most likely way
> to do this correctly is to start "dropping" the oldest datagrams,
> NOT returning "NULL" to allocations made on behalf od telnetd or
> sshd from the local interface.

	This is really a great block of theory. I only wish that people with
  such a passion to argue the methods would work in actually implementing
  them.

> In other words, if this is a fear-response for a local DOS,
> then there are better ways of achieving the same result,
> without still locking up networking.

	It's not. It never was. It never will be. It's just better than
  waiting indefinetely. It still provides you with the ability to wait
  indefinetely, though, if you are incapable of understanding why it's
  better not to.

> > 	Basically, what I'm telling you is: M_WAIT behavior is not broken in
> >   FreeBSD, it is entirely tunable and it is better, in the general case, to
> >   NOT have M_WAIT mean 'wait indefinetely.'
> 
> As a general bone of contention, if the thing _doesn't_ wait, it
> shouldn't be called M_WAIT, it should be called M_TRY_HARDER or
> something that indicates that the default behaviour has been
> altered, but in fact the routine will not be waiting around until
> it is successful, like all of the other _WAIT flags imply.

	It _does_ wait, and I disagree. By that logic, why not rename all the
  _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you
  can either read the code (hey, it is free!) or read the mbuf(9) man page
  (now available in -CURRENT).

> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 19: 9: 3 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 19:09:01 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mobile.wemm.org (adsl-64-163-195-99.dsl.snfc21.pacbell.net [64.163.195.99])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9A06A37B400; Sun,  3 Dec 2000 19:09:00 -0800 (PST)
Received: from netplex.com.au (localhost [127.0.0.1])
	by mobile.wemm.org (8.11.1/8.11.1) with ESMTP id eB438tD52326;
	Sun, 3 Dec 2000 19:08:55 -0800 (PST)
	(envelope-from peter@netplex.com.au)
Message-Id: <200012040308.eB438tD52326@mobile.wemm.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG, dillon@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <20001129231653.A1503@panzer.kdm.org> 
Date: Sun, 03 Dec 2000 19:08:55 -0800
From: Peter Wemm <peter@netplex.com.au>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

"Kenneth D. Merry" wrote:
> [ -net and -current BCCed for wider coverage, this is probably best
> handled on -arch ]
> 
> I would like to request reviews of the zero copy sockets and NFS code I've
> been posting about for months:
> 
> http://people.FreeBSD.org/~ken/zero_copy

Hmm.. I see one danger item:

"
5.Configuration and performance tuning. 

       There are a number of options that need to be turned on for various things to work: 

       options         ZERO_COPY_SOCKETS        # Turn on zero copy send code
       options         ENABLE_VFS_IOOPT         # Turn on zero copy receive
       options         NMBCLUSTERS=(512+512*32) # lots of mbuf clusters
       options         TI_JUMBO_HDRSPLIT        # Turn on Tigon header splitting

[..]
              Turn on vfs.ioopt to enable zero copy receive: 
               sysctl -w vfs.ioopt=1
"

I know Matt Dillon was intending to remove the ENABLE_VFS_IOOPT code
and vfs.ioopt because it is presently fundamentally broken and causes
devastating userland semantics impact.

For example, at it exists in the tree *right now*, if one does this:
  buf = malloc(PAGE_SIZE);	/* malloc does page alignment here */
  read(fd, buf, PAGE_SIZE);
.. it would be eligible for ioopt treatment (page lending).

Normally, you would have a *private* copy of the page of data.  If somebody
modifies the backing file, your private copy does not change.

However, turning on ioopt causes it to be mmapped in with MAP_PRIVATE.. But
this does **NOT** give the same semantics.  Sure, if you modify the buffer
yourself, you get a Copy-on-write fault and your own private page to mess with.

But if somebody else modifies the file before you dirty the page then
your supposedly static private copy silently changes out from underneath you
because you have been loaned a mapping from the vm/buffer cache.  The
infrastructure to track "loaned out" pages in the vm page cache isn't present.
The pages must be read-only to the kernel and DMA engines and a fault must be
taken giving the kernel a chance to fully donate the orignal page to the
mapping processes and generate it's own writable version.

I have not read the patch extensively, but I am not sure that it is handled
completely.  There are a few patches to vm_fault(), but I am not sure if
these are to handle the problem I described above or something else.  In
particular, if it is intended to handle the problem, then it seems to depend
on being able to make pages unwritable by the kernel.  This isn't possible
on i386 cpus (only 486 and later).  I did not see any busmaster DMA checking
either, but I could have missed it..  What about drivers that DMA to pages
mapped into KVM without checking writability (and hence COW)?  

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 19:49:51 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 19:49:48 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 188BC37B400; Sun,  3 Dec 2000 19:49:47 -0800 (PST)
Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id WAA22467;
	Sun, 3 Dec 2000 22:49:45 -0500 (EST)
Received: (from gallatin@localhost)
	by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB43njb13857;
	Sun, 3 Dec 2000 22:49:45 -0500 (EST)
	(envelope-from gallatin@cs.duke.edu)
From: Andrew Gallatin <gallatin@cs.duke.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Sun,  3 Dec 2000 22:49:44 -0500 (EST)
To: Peter Wemm <peter@netplex.com.au>
Cc: "Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	dillon@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <200012040308.eB438tD52326@mobile.wemm.org>
References: <20001129231653.A1503@panzer.kdm.org>
	<200012040308.eB438tD52326@mobile.wemm.org>
X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs  Lucid
Message-ID: <14891.4047.626648.658103@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Peter Wemm writes:
<...>
 > Hmm.. I see one danger item:

<..>

 > [..]
 >               Turn on vfs.ioopt to enable zero copy receive: 
 >                sysctl -w vfs.ioopt=1
 > "

This was a convenient sysctl to tie the zero-copy receive code to in
early prototyping.  It has nothing to do with the filesystem aspects
of vfs_ioopt, which makes it confusing to the reader.  This should
probably be ripped out or changed to depend only on the zero-copy
sockets sysctl.

Rather than loaning out pages, the zero-copy receive code does full
page-flipping, mapping the kernel's page into the receiving process
and freeing the page the user process was receiving into.  This is
possible because, unlike pages in the buffer cache, there is no need
to keep around data received on a socket.

<... objections to vfs_ioopt deleted...>

 > I have not read the patch extensively, but I am not sure that it is handled
 > completely.  There are a few patches to vm_fault(), but I am not sure if
 > these are to handle the problem I described above or something else.  In
 > particular, if it is intended to handle the problem, then it seems to depend
 > on being able to make pages unwritable by the kernel.  This isn't possible
 > on i386 cpus (only 486 and later).  I did not see any busmaster DMA checking

The patches are to support making pages sent via zero-copy sockets COW
for the user process which sent them (until they are acknowledged and
freed).  We do not make anything COW for the kernel.

 > either, but I could have missed it..  What about drivers that DMA to pages
 > mapped into KVM without checking writability (and hence COW)?  

This is a good point.  But I cannot think of any circumstance where a
driver would be DMA'ing directly to a user owned page (with the
exception of a vm fault, but this is impossible because the pages are
resident prior to setting up the send and are wired for the duration
of the send).

Thanks for the input.  I'm glad to see you and Matt looking at this!

Drew


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 20: 5:26 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 20:05:24 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP id B513637B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 20:05:24 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id eB444rd69989;
	Sun, 3 Dec 2000 20:04:53 -0800 (PST)
	(envelope-from dillon)
Date: Sun, 3 Dec 2000 20:04:53 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200012040404.eB444rd69989@earth.backplane.com>
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: Peter Wemm <peter@netplex.com.au>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Subject: Re: zero copy code review 
References: <20001129231653.A1503@panzer.kdm.org>
	<200012040308.eB438tD52326@mobile.wemm.org> <14891.4047.626648.658103@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:Peter Wemm writes:
:<...>
: > Hmm.. I see one danger item:
:
:<..>
:
: > [..]
: >               Turn on vfs.ioopt to enable zero copy receive: 
: >                sysctl -w vfs.ioopt=1
: > "
:
:This was a convenient sysctl to tie the zero-copy receive code to in
:early prototyping.  It has nothing to do with the filesystem aspects
:of vfs_ioopt, which makes it confusing to the reader.  This should
:probably be ripped out or changed to depend only on the zero-copy
:sockets sysctl.
:
:Rather than loaning out pages, the zero-copy receive code does full

    Oh my.  Could you please change your use of the sysctl to one of your own?
    I did in fact mean to remove vfs.ioopt because the FS code is fundamentally
    broken... and quite likely to cause a system crash if used heavily.
    The vfs.ioopt code is still using 3.x semantics (maybe even 2.x!).

    I haven't been following the zero-copy work so if you could give me
    a head's up when you moved your own zero copy stuff to your own sysctl,
    I will then go ahead and remove the original broken vfs.ioopt and its
    associated code.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Dec  3 20: 7:18 2000
From owner-freebsd-arch@FreeBSD.ORG  Sun Dec  3 20:07:16 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP id 6F5A537B400
	for <arch@FreeBSD.ORG>; Sun,  3 Dec 2000 20:07:16 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id eB446jk70007;
	Sun, 3 Dec 2000 20:06:45 -0800 (PST)
	(envelope-from dillon)
Date: Sun, 3 Dec 2000 20:06:45 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200012040406.eB446jk70007@earth.backplane.com>
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: Peter Wemm <peter@netplex.com.au>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Subject: Re: zero copy code review 
References: <20001129231653.A1503@panzer.kdm.org>
	<200012040308.eB438tD52326@mobile.wemm.org> <14891.4047.626648.658103@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:freed).  We do not make anything COW for the kernel.
:
: > either, but I could have missed it..  What about drivers that DMA to pages
: > mapped into KVM without checking writability (and hence COW)?  
:
:This is a good point.  But I cannot think of any circumstance where a
:driver would be DMA'ing directly to a user owned page (with the
:exception of a vm fault, but this is impossible because the pages are
:resident prior to setting up the send and are wired for the duration
:of the send).
:
:Thanks for the input.  I'm glad to see you and Matt looking at this!
:
:Drew

    Careful.  If you read() from a raw device most disk drivers WILL dma
    directly to a user-owned page.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4  1:11:22 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 01:11:21 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from cs.utep.edu (mail.cs.utep.edu [129.108.5.3])
	by hub.freebsd.org (Postfix) with ESMTP id CDCF837B400
	for <freebsd-arch@freebsd.org>; Mon,  4 Dec 2000 01:11:20 -0800 (PST)
Received: from gecko (gecko [129.108.5.51])
	by cs.utep.edu (8.10.1/8.10.1) with ESMTP id eB49BHG28583
	for <freebsd-arch@freebsd.org>; Mon, 4 Dec 2000 02:11:17 -0700 (MST)
Date: Mon, 4 Dec 2000 02:11:17 -0700 (MST)
From: <janb@cs.utep.edu>
X-Sender:  <janb@gecko>
To: <freebsd-arch@freebsd.org>
Message-ID: <Pine.GSO.4.30.0012040210560.15949-100000@gecko>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

subscribe


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4  9:43:24 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 09:43:22 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by hub.freebsd.org (Postfix) with ESMTP id 2C8B637B400
	for <arch@FreeBSD.ORG>; Mon,  4 Dec 2000 09:43:22 -0800 (PST)
Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id MAA02705;
	Mon, 4 Dec 2000 12:43:12 -0500 (EST)
Received: (from gallatin@localhost)
	by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB4HhCS15410;
	Mon, 4 Dec 2000 12:43:12 -0500 (EST)
	(envelope-from gallatin@cs.duke.edu)
From: Andrew Gallatin <gallatin@cs.duke.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Mon,  4 Dec 2000 12:43:12 -0500 (EST)
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Peter Wemm <peter@netplex.com.au>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <200012040406.eB446jk70007@earth.backplane.com>
References: <20001129231653.A1503@panzer.kdm.org>
	<200012040308.eB438tD52326@mobile.wemm.org>
	<14891.4047.626648.658103@grasshopper.cs.duke.edu>
	<200012040406.eB446jk70007@earth.backplane.com>
X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs  Lucid
Message-ID: <14891.55103.81970.494533@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Matt Dillon writes:
 > 
 > :freed).  We do not make anything COW for the kernel.
 > :
 > : > either, but I could have missed it..  What about drivers that DMA to pages
 > : > mapped into KVM without checking writability (and hence COW)?  
 > :
 > :This is a good point.  But I cannot think of any circumstance where a
 > :driver would be DMA'ing directly to a user owned page (with the
 > :exception of a vm fault, but this is impossible because the pages are
 > :resident prior to setting up the send and are wired for the duration
 > :of the send).
 > :
 > :Thanks for the input.  I'm glad to see you and Matt looking at this!
 > :
 > :Drew
 > 
 >     Careful.  If you read() from a raw device most disk drivers WILL dma
 >     directly to a user-owned page.
 > 
 > 						-Matt
 > 

That's a good point that I hadn't thought about.  All the more reason
to make the send-side code a socket option so the process has to take
careful aim before blowing off its foot.

Drew


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4 15:54:14 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 15:54:09 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id 82AEF37B400
	for <arch@FreeBSD.ORG>; Mon,  4 Dec 2000 15:54:09 -0800 (PST)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id QAA02581;
	Mon, 4 Dec 2000 16:51:52 -0700 (MST)
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp03.primenet.com, id smtpdAAAy5aale; Mon Dec  4 16:50:50 2000
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id QAA12392;
	Mon, 4 Dec 2000 16:52:50 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012042352.QAA12392@usr02.primenet.com>
Subject: Re: zero copy code review
To: bmilekic@technokratis.com (Bosko Milekic)
Date: Mon, 4 Dec 2000 23:52:50 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.21.0012031808540.98531-100000@jehovah.technokratis.com> from "Bosko Milekic" at Dec 03, 2000 06:25:36 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr02.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > [ ... local DOS ... ]
> > 
> > I really don't buy a probability defense.  If a probability defense
> > were acceptable, then not checking for a NULL return, and eating
> > the panic that results is also acceptable.
> 
> 	It's not a "probability defense." It's not a "defense." It's just a
>   "don't act the worst way possible when we have an attack." And you
>   haven't said at all why waiting indefinetely is better than not,
>   especially in the problematic situation I brought up.

The situation you quote is one where the allocation fails,
instead of WAITing until it can complete successfully, and
this results in the kernel function failing and state being
undone back to the point where the user space call that was
the originator of the request fails back to user space.  Then
the user space code has to handle the failure.

I maintain that the most reasonable and logical thing for the
user space program to do, on seeing this failure (ENOBUF?),
is to retry the operation.

So it calls down again, and fails again, and you have
substituted a busy loop which crosses protection domains
twice, for a kernel sleep.  This is the best case.

The worst case is that the local DOS obtains yet more
resources when the state is backed out,  and the busy
loop path in the kernel becomes shorter, due to an
earlier failure for lack of resources.

In neither case does failing the allocation instead of
sleeping do _anything at all_ to address the root cause
of the problem, nor does the failure result in the problem
going away or being lessened.

So I really don't see what is being accomplished by failing
the allocation, rather than sleeping, except to use up
_extra_ resources, during a time of resource starvation, to
enforce the mbuf_wait interval.


> > The problem with this theory is that "have the the [non-offending]
> > process return from the kernel and deal with the temporary failure"
> > presumes that there is a correct way to work around the failure in
> > user space.
> 
> 	No, it doesn't. But it's better for the process to sleep in user
>   space than to be INDEFINETELY stuck in the kernel. And, in the case of an
>   attack, it _will_ be indefinetely stuck.

Why the heck would the process sleep in user space?!?  It has
work to do, it knows the call to make to do the work, and it
will make the call repeatedly, untile it's context switched,
or until the call succeeds.  This is just like a write loop
on a large buffer, subtracting out the write() return value
and advancing the buffer pointer, until everything has been
written.  You might argue that a "correctly" written user
space program would use a select loop, but I'm betting that
the descriptor will show as writeable, even if thee aren't
any mbufs available to accept the write; there's no way to
make the write select accurate, without pre-reserving memory
to accept the write.

Personally, I would prefer, under DOS conditions, that my
program be stuck in kernel space, so that it at least has a
small chance of getting work done slowly during a DOS, than
stuck in user space.  You can be sure that the DOS process
is not going to be nearly as polite in hanging around in user
space until kernel resources are freed up.


> > I would maintain that the failure would be persistant, since this
> > does nothing to silence the DOS attack, and there is nothing that
> > a user space program can do, except to retry, and get all the way
> > down the code path to the same place that it was before.
> 
> 	Right. It's not a preventive measure. But, it's much better to have
>   it act in this manner than wait indefinetely "in the case of."

I strongly disagree.  That's "``in the case of'' being able to
get work done, despite the DOS".  Hung in user space is the
same as hung in the kernel: your process is not doing useful
work.

Making it easier for the DOS to get yet more resources during
a period of resource starvation, and preventing other programs
from competing ewith the DOS for resources freed by timeout or
other mechanism, which takes them back from the DOS, seems like
a big mistake to me.  I would much rather have a system that I
can normally talk to in a few seconds be capable of being talked
to over a period of 10 minutes, than one I can't talk to at all;
wouldn't you?


> > It seems to me that this is just a case of how big you want to
> > make your retry loop, not one of whether or not there will be a
> > retry loop.
> 
> 	The retry loop is _useless_. You drop the mutex and lose priority in
>   the wait queue when you return from m_get(). Calling again makes your
>   chances of getting an mbuf in a shortage even less probable. If you want
>   that behavior, just tweak your kern.ipc.mbuf_wait.

This is actually the opposite of the effect you would want.  A
well behaved process denied a scarce resource should be first in
line for that resource.  Saying "I can't give you one because
there's this pig of a process, but I'll tell you what I'll do:
why don't you just piss off until the next millenium?" is no way
to encourage well behaved processes... 8-).


> [...]
> > I would argue that this level of congestion should be proactively
> > prohibited from occurring in the first place; the most likely way
> > to do this correctly is to start "dropping" the oldest datagrams,
> > NOT returning "NULL" to allocations made on behalf od telnetd or
> > sshd from the local interface.
> 
> 	This is really a great block of theory. I only wish that people with
>   such a passion to argue the methods would work in actually implementing
>   them.

The code which implements "source quench" could be abused to
provide this functionality at the queue bottom, where things
are packing up in the ICMP echo datagram case (as one example).


> 	It's not. It never was. It never will be. It's just better than
>   waiting indefinetely. It still provides you with the ability to wait
>   indefinetely, though, if you are incapable of understanding why it's
>   better not to.

Explain it to me: why is it better to not wait?  When I see the
error return from the low memory condition, am I supposed to shut
myself down, disabling apache, for example?  Is _everyone_
supposed to do the same thing, until there is nothing but the DOS
process running on the system?

What does me failing buy _me_?

How is this different than me waiting on _any_ contended resource,
instead of timing out, like an advisory lock on a file?


> > As a general bone of contention, if the thing _doesn't_ wait, it
> > shouldn't be called M_WAIT, it should be called M_TRY_HARDER or
> > something that indicates that the default behaviour has been
> > altered, but in fact the routine will not be waiting around until
> > it is successful, like all of the other _WAIT flags imply.
> 
> 	It _does_ wait, and I disagree. By that logic, why not rename all the
>   _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you
>   can either read the code (hey, it is free!) or read the mbuf(9) man page
>   (now available in -CURRENT).

It waits until it doesn't, you mean.  8-p.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4 16:20:21 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 16:20:18 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 8DBB837B400
	for <arch@FreeBSD.ORG>; Mon,  4 Dec 2000 16:20:18 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB50KFR23246;
	Mon, 4 Dec 2000 16:20:15 -0800 (PST)
Date: Mon, 4 Dec 2000 16:20:15 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: Bosko Milekic <bmilekic@technokratis.com>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001204162015.A8051@fw.wintelcom.net>
References: <Pine.BSF.4.21.0012031808540.98531-100000@jehovah.technokratis.com> <200012042352.QAA12392@usr02.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200012042352.QAA12392@usr02.primenet.com>; from tlambert@primenet.com on Mon, Dec 04, 2000 at 11:52:50PM +0000
Sender: bright@fw.wintelcom.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Terry Lambert <tlambert@primenet.com> [001204 15:54] wrote:
> > > [ ... local DOS ... ]
> > > 
> > > I really don't buy a probability defense.  If a probability defense
> > > were acceptable, then not checking for a NULL return, and eating
> > > the panic that results is also acceptable.
> > 
> > 	It's not a "probability defense." It's not a "defense." It's just a
> >   "don't act the worst way possible when we have an attack." And you
> >   haven't said at all why waiting indefinetely is better than not,
> >   especially in the problematic situation I brought up.
> 
> The situation you quote is one where the allocation fails,
> instead of WAITing until it can complete successfully, and
> this results in the kernel function failing and state being
> undone back to the point where the user space call that was
> the originator of the request fails back to user space.  Then
> the user space code has to handle the failure.
> 
> I maintain that the most reasonable and logical thing for the
> user space program to do, on seeing this failure (ENOBUF?),
> is to retry the operation.
> 
> So it calls down again, and fails again, and you have
> substituted a busy loop which crosses protection domains
> twice, for a kernel sleep.  This is the best case.
> 
> The worst case is that the local DOS obtains yet more
> resources when the state is backed out,  and the busy
> loop path in the kernel becomes shorter, due to an
> earlier failure for lack of resources.
> 
> In neither case does failing the allocation instead of
> sleeping do _anything at all_ to address the root cause
> of the problem, nor does the failure result in the problem
> going away or being lessened.
> 
> So I really don't see what is being accomplished by failing
> the allocation, rather than sleeping, except to use up
> _extra_ resources, during a time of resource starvation, to
> enforce the mbuf_wait interval.

[snip]

Well behaved applications (read: written by me) deal with errors
like ENOBUFS properly, what they do is close the socket and
commence throttling connections.

I would not want my process to be stuck in the kernel waiting
for bufferspace that could take quite a long time get ahold of.

However I can understand someone wanting a niave process not
to get such errors because they may misbehave and do stupid
things like busy loop or just abort entirely.

Perhaps adding a per-process or per-socket or per-something flag
to ask for indefinite blocking (or turn it off) would be a good
idea, honestly having it one way or the other isn't very good
depending on your application.  I can live with the current
situation so I'll leave 'fixing' this to someone who wants
the indefinite blocking.

Oh, and don't forget, you can't block me indefinitely if I'm
writing to a non-blocking socket.  In fact if M_WAIT is set
I shouldn't be blocking at all on a non-blocking socket.

thanks,
-Alfred


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4 16:27:16 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 16:27:15 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id D18E137B400
	for <arch@FreeBSD.ORG>; Mon,  4 Dec 2000 16:27:14 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB50QxS23578;
	Mon, 4 Dec 2000 16:26:59 -0800 (PST)
Date: Mon, 4 Dec 2000 16:26:59 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: Bosko Milekic <bmilekic@technokratis.com>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001204162659.B8051@fw.wintelcom.net>
References: <Pine.BSF.4.21.0012031808540.98531-100000@jehovah.technokratis.com> <200012042352.QAA12392@usr02.primenet.com> <20001204162015.A8051@fw.wintelcom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20001204162015.A8051@fw.wintelcom.net>; from bright@wintelcom.net on Mon, Dec 04, 2000 at 04:20:15PM -0800
Sender: bright@fw.wintelcom.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Alfred Perlstein <bright@wintelcom.net> [001204 16:20] wrote:
> 
> Well behaved applications (read: written by me) deal with errors
> like ENOBUFS properly, what they do is close the socket and
> commence throttling connections.
> 
> I would not want my process to be stuck in the kernel waiting
> for bufferspace that could take quite a long time get ahold of.
> 
> However I can understand someone wanting a niave process not
> to get such errors because they may misbehave and do stupid
> things like busy loop or just abort entirely.
> 
> Perhaps adding a per-process or per-socket or per-something flag
> to ask for indefinite blocking (or turn it off) would be a good
> idea, honestly having it one way or the other isn't very good
> depending on your application.  I can live with the current
> situation so I'll leave 'fixing' this to someone who wants
> the indefinite blocking.
> 
> Oh, and don't forget, you can't block me indefinitely if I'm
> writing to a non-blocking socket.  In fact if M_WAIT is set
> I shouldn't be blocking at all on a non-blocking socket.

One more thing, ENOBUFS is indicative of a misconfiguration and
shouldn't happen in day to day operations, if it does happen then
the user needs to reconfigure for more buffer space.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4 19:21: 8 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 19:21:06 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from dt051n37.san.rr.com (dt051n37.san.rr.com [204.210.32.55])
	by hub.freebsd.org (Postfix) with ESMTP id E162637B400
	for <arch@FreeBSD.ORG>; Mon,  4 Dec 2000 19:21:05 -0800 (PST)
Received: from slave (Studded@slave [10.0.0.1])
	by dt051n37.san.rr.com (8.9.3/8.9.3) with ESMTP id TAA68548;
	Mon, 4 Dec 2000 19:20:50 -0800 (PST)
	(envelope-from DougB@gorean.org)
Date: Mon, 4 Dec 2000 19:20:49 -0800 (PST)
From: Doug Barton <DougB@gorean.org>
X-Sender: doug@dt051n37.san.rr.com
To: Peter Jeremy <peter.jeremy@alcatel.com.au>
Cc: "Michael C . Wu" <keichii@peorth.iteration.net>, arch@FreeBSD.ORG
Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486...
In-Reply-To: <20001201152137.K1474@gsmx07.alcatel.com.au>
Message-ID: <Pine.BSF.4.21.0012041919420.68514-100000@dt051n37.san.rr.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, 1 Dec 2000, Peter Jeremy wrote:

> On 2000-Nov-30 21:47:45 -0600, "Michael C . Wu" <keichii@iteration.net> wrote:
> >On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled:
> >| On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp <phk@FreeBSD.ORG> wrote:
> >| >Has anybody run a 486 or 386 under current recently ?
> >|
> >| X on a PRE_SMPNG 486 is painful - mouse movements no longer make
> >| the X pointer move in real time.  I haven't noticed the seeding
> >| issue (probably just luck).
> >
> >PRE_SMPNG does not have the /dev/random seeding issue.
> >
> >You actually expected X to run well on a 486? :-)
> 
> It used to run reasonably well (ignoring hogs like Netscape) before
> Yarrow was added. 

	Have you tried updating to the latest -Current? All aspects of the
entropy harvesting have changed significantly since PRE_SMPNG.

Doug
-- 
    So what I want to know is, where does the RED brick road go?

	Do YOU Yahoo!?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Dec  4 20:45:19 2000
From owner-freebsd-arch@FreeBSD.ORG  Mon Dec  4 20:45:17 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP id 69B3037B400
	for <arch@FreeBSD.ORG>; Mon,  4 Dec 2000 20:45:13 -0800 (PST)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id VAA42725;
	Mon, 4 Dec 2000 21:44:38 -0700 (MST)
	(envelope-from ken)
Date: Mon, 4 Dec 2000 21:44:38 -0700
From: "Kenneth D. Merry" <ken@kdm.org>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: arch@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001204214438.A42689@panzer.kdm.org>
References: <20001201002235.D10772@panzer.kdm.org> <Pine.BSF.4.21.0012021237540.91517-100000@jehovah.technokratis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <Pine.BSF.4.21.0012021237540.91517-100000@jehovah.technokratis.com>; from bmilekic@technokratis.com on Sat, Dec 02, 2000 at 01:00:22PM -0500
Sender: ken@panzer.kdm.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

[ catching up with mail from this weekend ]

On Sat, Dec 02, 2000 at 13:00:22 -0500, Bosko Milekic wrote:
> 
> On Fri, 1 Dec 2000, Kenneth D. Merry wrote:
> 
> > It does have spls in the right places, in this case splimp() and splvm().
> > Would you just convert those to the proper mutexes, or are we going to go
> > with per-data-structure mutexes (i.e. a little finer granularity), or...?
> > (I don't know much about the mutex strategy we're using...)
> 
> 	For now, you won't be able to do anything with the splvm() stuff, as
>   the VM code has not yet been ripped out from under Giant (and likely
>   won't be for a while).
>   	A few notes Re: spl()s and mutexes in uipc_jumbo.c, in particular
>   (since that's where I would begin putting in mutexes):
> 
>   - Your jumbo_kmap singly linked list should probably not be manipulated
>     under splvm() [in fact, I think it's wrong]. The list should be
>     protected by a lock.

Okay.

>   - jumbo_freem should just be called jumbo_free, if the naming convention
>     is being adopted from the mbuf system (which it looks like it is). The
>     reason is that for mbufs, m_free() frees a single mbuf while m_freem()
>     frees an entire chain of them.

Okay.

>   - jumbo_pg_free should be ripped out from under splimp(); leave the
>     explicit splvm() in there, but protect the list manipulations with the
>     lock.

Okay.

> 	If most of the things pointed out earlier are fixed, and as long as
>   the code is not flawed (which I really doubt it would be anyway), I have
>   no objections to it going in soon and then attacking the above issue a
>   little later (If nobody gets to it within the next two weeks, I'll be
>   glad to do it myself once those 2 weeks are past).

Sounds good.  There have been other problems pointed out that we'll need to
fix as well before the code can go in.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Dec  5  5:10:25 2000
From owner-freebsd-arch@FreeBSD.ORG  Tue Dec  5 05:10:24 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from bells.cs.ucl.ac.uk (bells.cs.ucl.ac.uk [128.16.5.31])
	by hub.freebsd.org (Postfix) with SMTP
	id 4236437B400; Tue,  5 Dec 2000 05:09:08 -0800 (PST)
Received: from sonic.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.22127-0@bells.cs.ucl.ac.uk>; Tue, 5 Dec 2000 13:08:51 +0000
From: Orion Hodson <O.Hodson@cs.ucl.ac.uk>
To: freebsd-arch@freebsd.org
Cc: cg@freebsd.org
Subject: soundcard.h
Date: Tue, 05 Dec 2000 13:08:50 +0000
Message-ID: <3737.976021730@cs.ucl.ac.uk>
Sender: O.Hodson@cs.ucl.ac.uk
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


As someone who works with sound quite a bit, its hard not to notice
that soundcard.h covers several different and sometimes overlapping
interfaces.  I had a go at re-arranging it with an aim to clarifying
what works, what's deprecated in newpcm, clarifying comments about
"what does this do", and putting related items together.  Cameron
suggested it would be a better idea to break out the functionalities
into separate include files, i.e. snd_oss.h, snd_pcm.h, snd_mixer.h,
snd_sequencer.h, etc and have these included from soundcard.h.

Is there any strength of feeling for or against doing this?  It's
completely aesthetic and very minor undertaking, but I don't mind
doing if people think it'd be reasonable.

- Orion


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Dec  5 13:23:34 2000
From owner-freebsd-arch@FreeBSD.ORG  Tue Dec  5 13:23:31 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.wgate.com (mail.wgate.com [38.219.83.4])
	by hub.freebsd.org (Postfix) with ESMTP id A518237B400
	for <arch@FreeBSD.ORG>; Tue,  5 Dec 2000 13:23:30 -0800 (PST)
Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id Y2Q95H02; Tue, 5 Dec 2000 16:23:34 -0500
Reply-To: Randell Jesup <rjesup@wgate.com>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: Terry Lambert <tlambert@primenet.com>,
	Bosko Milekic <bmilekic@technokratis.com>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
References: <Pine.BSF.4.21.0012031808540.98531-100000@jehovah.technokratis.com>
	<200012042352.QAA12392@usr02.primenet.com>
	<20001204162015.A8051@fw.wintelcom.net>
	<20001204162659.B8051@fw.wintelcom.net>
From: Randell Jesup <rjesup@wgate.com>
Date: 05 Dec 2000 16:30:26 -0500
In-Reply-To: Alfred Perlstein's message of "Mon, 4 Dec 2000 16:26:59 -0800"
Message-ID: <ybuelzmpnd9.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Alfred Perlstein <bright@wintelcom.net> writes:
>> Well behaved applications (read: written by me) deal with errors
>> like ENOBUFS properly, what they do is close the socket and
>> commence throttling connections.

        Most user-level applications do not.  Certainly most applications
that call write() don't.

        In fact, a grep of /usr/src shows that of things outside of sys,
only a handful test for ENOBUFS: telnet, sendmail, ipfilter, ntp, natd,
and ping seem to include a test (I didn't check what they do with it
though).

>> I would not want my process to be stuck in the kernel waiting
>> for bufferspace that could take quite a long time get ahold of.

        In many cases you do.  How different is that than waiting on some
other resource that may take a long time, or getting a response to a
write or read across a network?  In fact, I'd assert in most cases waiting
is the appropriate action unless a call is non-blocking.

        Given the very small number of programs that _do_ handle ENOBUFS,
I'd assert that the default action should be to wait, unless the
application has said it wants to hear about them.

>> However I can understand someone wanting a niave process not
>> to get such errors because they may misbehave and do stupid
>> things like busy loop or just abort entirely.

        or fail a complex transaction, etc.  Like 99% of user code out
there when faced with ENOBUFS.

>> Perhaps adding a per-process or per-socket or per-something flag
>> to ask for indefinite blocking (or turn it off) would be a good
>> idea, honestly having it one way or the other isn't very good
>> depending on your application.  I can live with the current
>> situation so I'll leave 'fixing' this to someone who wants
>> the indefinite blocking.

        per-socket makes sense; or keyed off non-blocking mode.  The
default should be wait.

>> Oh, and don't forget, you can't block me indefinitely if I'm
>> writing to a non-blocking socket.  In fact if M_WAIT is set
>> I shouldn't be blocking at all on a non-blocking socket.

        Agreed; even more reason to tie it to non-blocking mode.

>One more thing, ENOBUFS is indicative of a misconfiguration and
>shouldn't happen in day to day operations, if it does happen then
>the user needs to reconfigure for more buffer space.

        Or it's indicative of a DoS attack (possibly unintentional), or
a load problem, possibly temporary.  I dislike arbitrary tuning parameters.
Generally they're either ignored (mostly), or set wildly high in the hope
of the annoyance someone hit once going away.  Or just set randomly.  Most
of the people doing the setting don't have a good grasp of why it should be
set to a specific value.  Kind of like putting a spark-advance knob on
the steering wheel (which they once did, believe it or not).

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Dec  5 13:33:48 2000
From owner-freebsd-arch@FreeBSD.ORG  Tue Dec  5 13:33:41 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from field.videotron.net (field.videotron.net [205.151.222.108])
	by hub.freebsd.org (Postfix) with ESMTP id 38E5737B404
	for <arch@FreeBSD.ORG>; Tue,  5 Dec 2000 13:33:40 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G5400CFJ6K1ZP@field.videotron.net> for arch@FreeBSD.ORG; Tue,  5 Dec 2000 16:33:38 -0500 (EST)
Date: Tue, 05 Dec 2000 16:34:30 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <200012042352.QAA12392@usr02.primenet.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012051623180.9538-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


	I don't understand what you're complaining about already. Just set
  kern.ipc.mbuf_wait to 0 and you'll have the behavior you're looking for.
  As for the kernel, people must always keep checking whether their mbuf
  pointer is NULL following any type of allocation and deal with it
  appropriately (return ENOBUFS or drop the packet) until a real preventive
  global preventive measure is put into place (think vm_state a-la PHK, or
  something similar).
  	Changing the behavior of M_WAIT to not return NULL ever is out of the
  question. I don't need to explain myself again. If you want to test this
  theory, try lowering NMBCLUSTERS (so it's easy to exhaust mb_map),
  heavily load your network (from the outside) and tune your
  kern.ipc.mbuf_wait accordingly, so that `netstat -m' shows a "requests
  for memory delayed" > "requests for memory denied." This should give you
  an about optimal wait time for heavy network load. This is your "normal
  wait time." Now load your system with some local DoS (allocate very large
  socket buffers in a tight loop, for example) and watch your system
  effectively deadlock, and then see how much you're glad that your process
  isn't hanging in the kernel indefinetely and how ^C eventually does its
  job and kills the process. Then watch your system recover. Now set
  mbuf_wait to 0 and run the same test. Have fun running fsck after the
  cold boot.
	Just about the only thing that may be considered is changing the
  name of M_WAIT to something more appropriate, if it means so much to the
  majority of people (honestly, I would find even doing this a waste of
  time, but if lots of folks think it's worth educating kernel developers
  by changing the name of a flag, then we might as well). 

On Mon, 4 Dec 2000, Terry Lambert wrote:

> > > [ ... local DOS ... ]
> > > 
> > > I really don't buy a probability defense.  If a probability defense
> > > were acceptable, then not checking for a NULL return, and eating
> > > the panic that results is also acceptable.
> > 
> > 	It's not a "probability defense." It's not a "defense." It's just a
> >   "don't act the worst way possible when we have an attack." And you
> >   haven't said at all why waiting indefinetely is better than not,
> >   especially in the problematic situation I brought up.
> 
> The situation you quote is one where the allocation fails,
> instead of WAITing until it can complete successfully, and
> this results in the kernel function failing and state being
> undone back to the point where the user space call that was
> the originator of the request fails back to user space.  Then
> the user space code has to handle the failure.
> 
> I maintain that the most reasonable and logical thing for the
> user space program to do, on seeing this failure (ENOBUF?),
> is to retry the operation.
> 
> So it calls down again, and fails again, and you have
> substituted a busy loop which crosses protection domains
> twice, for a kernel sleep.  This is the best case.
> 
> The worst case is that the local DOS obtains yet more
> resources when the state is backed out,  and the busy
> loop path in the kernel becomes shorter, due to an
> earlier failure for lack of resources.
> 
> In neither case does failing the allocation instead of
> sleeping do _anything at all_ to address the root cause
> of the problem, nor does the failure result in the problem
> going away or being lessened.
> 
> So I really don't see what is being accomplished by failing
> the allocation, rather than sleeping, except to use up
> _extra_ resources, during a time of resource starvation, to
> enforce the mbuf_wait interval.
> 
> 
> > > The problem with this theory is that "have the the [non-offending]
> > > process return from the kernel and deal with the temporary failure"
> > > presumes that there is a correct way to work around the failure in
> > > user space.
> > 
> > 	No, it doesn't. But it's better for the process to sleep in user
> >   space than to be INDEFINETELY stuck in the kernel. And, in the case of an
> >   attack, it _will_ be indefinetely stuck.
> 
> Why the heck would the process sleep in user space?!?  It has
> work to do, it knows the call to make to do the work, and it
> will make the call repeatedly, untile it's context switched,
> or until the call succeeds.  This is just like a write loop
> on a large buffer, subtracting out the write() return value
> and advancing the buffer pointer, until everything has been
> written.  You might argue that a "correctly" written user
> space program would use a select loop, but I'm betting that
> the descriptor will show as writeable, even if thee aren't
> any mbufs available to accept the write; there's no way to
> make the write select accurate, without pre-reserving memory
> to accept the write.
> 
> Personally, I would prefer, under DOS conditions, that my
> program be stuck in kernel space, so that it at least has a
> small chance of getting work done slowly during a DOS, than
> stuck in user space.  You can be sure that the DOS process
> is not going to be nearly as polite in hanging around in user
> space until kernel resources are freed up.
> 
> 
> > > I would maintain that the failure would be persistant, since this
> > > does nothing to silence the DOS attack, and there is nothing that
> > > a user space program can do, except to retry, and get all the way
> > > down the code path to the same place that it was before.
> > 
> > 	Right. It's not a preventive measure. But, it's much better to have
> >   it act in this manner than wait indefinetely "in the case of."
> 
> I strongly disagree.  That's "``in the case of'' being able to
> get work done, despite the DOS".  Hung in user space is the
> same as hung in the kernel: your process is not doing useful
> work.
> 
> Making it easier for the DOS to get yet more resources during
> a period of resource starvation, and preventing other programs
> from competing ewith the DOS for resources freed by timeout or
> other mechanism, which takes them back from the DOS, seems like
> a big mistake to me.  I would much rather have a system that I
> can normally talk to in a few seconds be capable of being talked
> to over a period of 10 minutes, than one I can't talk to at all;
> wouldn't you?
> 
> 
> > > It seems to me that this is just a case of how big you want to
> > > make your retry loop, not one of whether or not there will be a
> > > retry loop.
> > 
> > 	The retry loop is _useless_. You drop the mutex and lose priority in
> >   the wait queue when you return from m_get(). Calling again makes your
> >   chances of getting an mbuf in a shortage even less probable. If you want
> >   that behavior, just tweak your kern.ipc.mbuf_wait.
> 
> This is actually the opposite of the effect you would want.  A
> well behaved process denied a scarce resource should be first in
> line for that resource.  Saying "I can't give you one because
> there's this pig of a process, but I'll tell you what I'll do:
> why don't you just piss off until the next millenium?" is no way
> to encourage well behaved processes... 8-).
> 
> 
> > [...]
> > > I would argue that this level of congestion should be proactively
> > > prohibited from occurring in the first place; the most likely way
> > > to do this correctly is to start "dropping" the oldest datagrams,
> > > NOT returning "NULL" to allocations made on behalf od telnetd or
> > > sshd from the local interface.
> > 
> > 	This is really a great block of theory. I only wish that people with
> >   such a passion to argue the methods would work in actually implementing
> >   them.
> 
> The code which implements "source quench" could be abused to
> provide this functionality at the queue bottom, where things
> are packing up in the ICMP echo datagram case (as one example).
> 
> 
> > 	It's not. It never was. It never will be. It's just better than
> >   waiting indefinetely. It still provides you with the ability to wait
> >   indefinetely, though, if you are incapable of understanding why it's
> >   better not to.
> 
> Explain it to me: why is it better to not wait?  When I see the
> error return from the low memory condition, am I supposed to shut
> myself down, disabling apache, for example?  Is _everyone_
> supposed to do the same thing, until there is nothing but the DOS
> process running on the system?
> 
> What does me failing buy _me_?
> 
> How is this different than me waiting on _any_ contended resource,
> instead of timing out, like an advisory lock on a file?
> 
> 
> > > As a general bone of contention, if the thing _doesn't_ wait, it
> > > shouldn't be called M_WAIT, it should be called M_TRY_HARDER or
> > > something that indicates that the default behaviour has been
> > > altered, but in fact the routine will not be waiting around until
> > > it is successful, like all of the other _WAIT flags imply.
> > 
> > 	It _does_ wait, and I disagree. By that logic, why not rename all the
> >   _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you
> >   can either read the code (hey, it is free!) or read the mbuf(9) man page
> >   (now available in -CURRENT).
> 
> It waits until it doesn't, you mean.  8-p.
> 
> 
> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Dec  5 19:11:51 2000
From owner-freebsd-arch@FreeBSD.ORG  Tue Dec  5 19:11:50 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id 78A1E37B400
	for <freebsd-arch@FreeBSD.org>; Tue,  5 Dec 2000 19:11:49 -0800 (PST)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.1/8.11.1) with SMTP id eB63Bmf96881
	for <freebsd-arch@FreeBSD.org>; Tue, 5 Dec 2000 22:11:48 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Tue, 5 Dec 2000 22:11:48 -0500 (EST)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: freebsd-arch@FreeBSD.org
Subject: Threads in the base system
Message-ID: <Pine.NEB.3.96L.1001205220943.95872B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: robert@fledge.watson.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Recently, pppctl was made thread-enabled, meaning that it relies on
libc_r.  This makes the NOLIBC_R cannot be used with buildworld anymore.
Given that making pppctl depend on !NOLIBC_R may not be all that helpful,
it looks like we may need to lose NOLIBC_R.  Presumably over time, threads
in default system applications will only become more popular.  Any
thoughts (especially in light of upcoming KSE changes, which will make
threading integral to the system architecture)?

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Dec  5 19:14:43 2000
From owner-freebsd-arch@FreeBSD.ORG  Tue Dec  5 19:14:40 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id E981437B698
	for <arch@FreeBSD.ORG>; Tue,  5 Dec 2000 19:14:32 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id UAA07508;
	Tue, 5 Dec 2000 20:11:08 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp05.primenet.com, id smtpdAAArtaGNo; Tue Dec  5 20:11:02 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id UAA24462;
	Tue, 5 Dec 2000 20:14:18 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012060314.UAA24462@usr05.primenet.com>
Subject: Re: zero copy code review
To: bmilekic@technokratis.com (Bosko Milekic)
Date: Wed, 6 Dec 2000 03:14:18 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.21.0012051623180.9538-100000@jehovah.technokratis.com> from "Bosko Milekic" at Dec 05, 2000 04:34:30 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr05.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 	I don't understand what you're complaining about already. Just set
>   kern.ipc.mbuf_wait to 0 and you'll have the behavior you're looking for.

I am looking for semantics, not behaviour.  The difference is the
cost that I end up paying.

>   As for the kernel, people must always keep checking whether their mbuf
>   pointer is NULL following any type of allocation and deal with it
>   appropriately (return ENOBUFS or drop the packet) until a real preventive
>   global preventive measure is put into place (think vm_state a-la PHK, or
>   something similar).

This otherwise unnecessary checking is what I'm complaining
about.  I don't like it in my code path, it slows things down
unnecessarily.


>   	Changing the behavior of M_WAIT to not return NULL ever is out of the
>   question.

You mean "changing it back", of course...

>   I don't need to explain myself again. If you want to test this
>   theory, try lowering NMBCLUSTERS (so it's easy to exhaust mb_map),
>   heavily load your network (from the outside) and tune your
>   kern.ipc.mbuf_wait accordingly, so that `netstat -m' shows a "requests
>   for memory delayed" > "requests for memory denied." This should give you
>   an about optimal wait time for heavy network load. This is your "normal
>   wait time."

I see you attempting to tune a pool entry rate in order to deal
with a pool retention time for something that I don't think
should be in a hysteretical loop in the first place.

>   Now load your system with some local DoS (allocate very large
>   socket buffers in a tight loop, for example) and watch your system
>   effectively deadlock, and then see how much you're glad that your process
>   isn't hanging in the kernel indefinetely and how ^C eventually does its
>   job and kills the process. Then watch your system recover.

I guess you are talking about interupting the DOS program, and
not some victim program here, right? 

I think that this is a really artificial test case.  I think a
real test case would be to start the DOS program (nb: I don't
let shell users on my servers anyway), and then start a different
(victim) program, and watch what happens to the different program.

I can ^C the victim program, in your scenario, but my system
will fail to recover, and will remain unusable.  My system
really needs to set working set limitations on how many resources
a single process is permitted to monopolize under low resource
conditions.  This would let my victim program continue to run,
if sluggishly, and prevent a single DOS from doing more than
slowing down my system.

I think that you are maybe rendering the program interuptable
the wrong way, and using the failure path on the allocation to
back out the stack state leading up to the allocation attempt,
as a convenience?

> 	Just about the only thing that may be considered is changing the
>   name of M_WAIT to something more appropriate, if it means so much to the
>   majority of people (honestly, I would find even doing this a waste of
>   time, but if lots of folks think it's worth educating kernel developers
>   by changing the name of a flag, then we might as well). 

If the semantics don't revert back to their pre-timeout behaviour,
I think it really would be best to have a meaningful name for it.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Dec  5 19:20:15 2000
From owner-freebsd-arch@FreeBSD.ORG  Tue Dec  5 19:20:14 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id AC0D837B401; Tue,  5 Dec 2000 19:20:13 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id WAA00210;
	Tue, 5 Dec 2000 22:19:52 -0500 (EST)
Date: Tue, 5 Dec 2000 22:19:52 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Robert Watson <rwatson@FreeBSD.ORG>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: Threads in the base system
In-Reply-To: <Pine.NEB.3.96L.1001205220943.95872B-100000@fledge.watson.org>
Message-ID: <Pine.SUN.3.91.1001205221719.29725A@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Tue, 5 Dec 2000, Robert Watson wrote:
> Recently, pppctl was made thread-enabled, meaning that it relies on
> libc_r.  This makes the NOLIBC_R cannot be used with buildworld anymore.
> Given that making pppctl depend on !NOLIBC_R may not be all that helpful,
> it looks like we may need to lose NOLIBC_R.  Presumably over time, threads
> in default system applications will only become more popular.  Any
> thoughts (especially in light of upcoming KSE changes, which will make
> threading integral to the system architecture)?

OK, lose NOLIBC_R -- not that I'm biased or anything ;-)

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6  5:22:29 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 05:22:27 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from www.stocke.com (unknown [202.101.165.60])
	by hub.freebsd.org (Postfix) with ESMTP id 6110837B400
	for <freebsd-arch@FreeBSD.org>; Wed,  6 Dec 2000 05:22:19 -0800 (PST)
Received: from xyf ([61.164.185.75])
	by www.stocke.com (8.9.3/8.9.3) with SMTP id VAA07952
	for <freebsd-arch@FreeBSD.org>; Wed, 6 Dec 2000 21:25:09 +0800
Message-ID: <000f01c05f87$7406cbc0$5ac809c0@xyf>
From: "xuyifeng" <xyf@stocke.com>
To: <freebsd-arch@FreeBSD.org>
References: <Pine.NEB.3.96L.1001205220943.95872B-100000@fledge.watson.org>
Subject: Re: Threads in the base system
Date: Wed, 6 Dec 2000 21:21:24 +0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: base64
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

ZG9lcyB0aGlzIG1lYW4gdGhhdCB3ZSB3aWxsIGhhdmUgb25seSBsaWJjX3Iuc28gYW5kIGxpYmNf
ci5hIGluIEZyZWVCU0QgNS4wIHN5c3RlbT8NCmNhbiB3ZSByZW1vdmUgbGliYy5zbyBhbmQgbGli
Yy5hPyAgbGV0IHRoZSBzeXN0ZW0gdG8gZGVmYXVsdCBtdWxpdC10aHJlYWRlZCBlbmFibGU/DQpJ
IGtub3cgaWYgSSBtaXggbXN2Y3J0LmRsbChETEwgdmVyc2lvbikgYW5kIGxpYmNtdC5saWIgKHN0
YXRpYyBsaWJyYXJ5KSBpbiBNJCB2aXN1YWwgQysrDQpwcm9ncmFtLCAgbWVtb3J5IHdpbGwgYmUg
Y29ycnVwdGVkLCAgKHNvbWV0aW1lcyBJIGNhbiBub3QgYXZvaWQgdGhlIHByb2JsZW0gYmVjYXVz
ZSANCm9mIHVzaW5nIHRoaXJkIHBhcnR5IGxpYmJyYXJ5KSwgIGlzIGl0IHRydWUgb24gRnJlZUJT
RCBpZiBJIG1peCB1c2luZyBsaWJjIGFuZCBsaWJjX3IgDQppbiBzYW1lIHByb2dyYW0/DQoNClJl
Z2FyZHMsDQpYdVlpZmVuZw0KDQotLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tIA0KRnJvbTog
Um9iZXJ0IFdhdHNvbiA8cndhdHNvbkBGcmVlQlNELm9yZz4NClRvOiA8ZnJlZWJzZC1hcmNoQEZy
ZWVCU0Qub3JnPg0KU2VudDogV2VkbmVzZGF5LCBEZWNlbWJlciAwNiwgMjAwMCAxMToxMSBBTQ0K
U3ViamVjdDogVGhyZWFkcyBpbiB0aGUgYmFzZSBzeXN0ZW0NCg0KDQo+IA0KPiBSZWNlbnRseSwg
cHBwY3RsIHdhcyBtYWRlIHRocmVhZC1lbmFibGVkLCBtZWFuaW5nIHRoYXQgaXQgcmVsaWVzIG9u
DQo+IGxpYmNfci4gIFRoaXMgbWFrZXMgdGhlIE5PTElCQ19SIGNhbm5vdCBiZSB1c2VkIHdpdGgg
YnVpbGR3b3JsZCBhbnltb3JlLg0KPiBHaXZlbiB0aGF0IG1ha2luZyBwcHBjdGwgZGVwZW5kIG9u
ICFOT0xJQkNfUiBtYXkgbm90IGJlIGFsbCB0aGF0IGhlbHBmdWwsDQo+IGl0IGxvb2tzIGxpa2Ug
d2UgbWF5IG5lZWQgdG8gbG9zZSBOT0xJQkNfUi4gIFByZXN1bWFibHkgb3ZlciB0aW1lLCB0aHJl
YWRzDQo+IGluIGRlZmF1bHQgc3lzdGVtIGFwcGxpY2F0aW9ucyB3aWxsIG9ubHkgYmVjb21lIG1v
cmUgcG9wdWxhci4gIEFueQ0KPiB0aG91Z2h0cyAoZXNwZWNpYWxseSBpbiBsaWdodCBvZiB1cGNv
bWluZyBLU0UgY2hhbmdlcywgd2hpY2ggd2lsbCBtYWtlDQo+IHRocmVhZGluZyBpbnRlZ3JhbCB0
byB0aGUgc3lzdGVtIGFyY2hpdGVjdHVyZSk/DQo+IA0KPiBSb2JlcnQgTiBNIFdhdHNvbiAgICAg
ICAgICAgICBGcmVlQlNEIENvcmUgVGVhbSwgVHJ1c3RlZEJTRCBQcm9qZWN0DQo+IHJvYmVydEBm
bGVkZ2Uud2F0c29uLm9yZyAgICAgIE5BSSBMYWJzLCBTYWZlcG9ydCBOZXR3b3JrIFNlcnZpY2Vz
DQo+IA0KPiANCj4gDQo+IFRvIFVuc3Vic2NyaWJlOiBzZW5kIG1haWwgdG8gbWFqb3Jkb21vQEZy
ZWVCU0Qub3JnDQo+IHdpdGggInVuc3Vic2NyaWJlIGZyZWVic2QtYXJjaCIgaW4gdGhlIGJvZHkg
b2YgdGhlIG1lc3NhZ2UNCg0KDQo=


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 13:37:19 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 13:37:17 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4A1C737B401; Wed,  6 Dec 2000 13:37:15 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6LZLm16231;
	Wed, 6 Dec 2000 21:35:21 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6Lc6t07375;
	Wed, 6 Dec 2000 21:38:06 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200012062138.eB6Lc6t07375@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: Robert Watson <rwatson@FreeBSD.org>
Cc: freebsd-arch@FreeBSD.org, brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: Message from Robert Watson <rwatson@FreeBSD.org> 
   of "Tue, 05 Dec 2000 22:11:48 EST." <Pine.NEB.3.96L.1001205220943.95872B-100000@fledge.watson.org> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 06 Dec 2000 21:38:06 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Good spot.  I believe NOLIBC_R should/must go.

> Recently, pppctl was made thread-enabled, meaning that it relies on
> libc_r.  This makes the NOLIBC_R cannot be used with buildworld anymore.
> Given that making pppctl depend on !NOLIBC_R may not be all that helpful,
> it looks like we may need to lose NOLIBC_R.  Presumably over time, threads
> in default system applications will only become more popular.  Any
> thoughts (especially in light of upcoming KSE changes, which will make
> threading integral to the system architecture)?
> 
> Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
> robert@fledge.watson.org      NAI Labs, Safeport Network Services

-- 
Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 13:37:30 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 13:37:26 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173])
	by hub.freebsd.org (Postfix) with ESMTP id 8A20737B400
	for <freebsd-arch@FreeBSD.org>; Wed,  6 Dec 2000 13:37:23 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6LYFm16217;
	Wed, 6 Dec 2000 21:34:15 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6Lb0t07362;
	Wed, 6 Dec 2000 21:37:00 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200012062137.eB6Lb0t07362@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: "xuyifeng" <xyf@stocke.com>
Cc: freebsd-arch@FreeBSD.org, brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: Message from "xuyifeng" <xyf@stocke.com> 
   of "Wed, 06 Dec 2000 21:21:24 +0800." <000f01c05f87$7406cbc0$5ac809c0@xyf> 
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Date: Wed, 06 Dec 2000 21:37:00 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I don't think it's possible to mix libc and libc_r in the same =

program (except as intended - with the libc_r stubs superseding the =

libc ones).

There are no ``alternative'' header files with different defines that =

might be used in one object and not the other, and the program is =

either linked against libc_r, or isn't (and will fail if it's got =

thread references).

I won't comment on Microsoft's shared library implementation.

> does this mean that we will have only libc_r.so and libc_r.a in FreeBSD=
 5.0 system?
> can we remove libc.so and libc.a?  let the system to default mulit-thre=
aded enable?
> I know if I mix msvcrt.dll(DLL version) and libcmt.lib (static library)=
 in M$ visual C++
> program,  memory will be corrupted,  (sometimes I can not avoid the pro=
blem because =

> of using third party libbrary),  is it true on FreeBSD if I mix using l=
ibc and libc_r =

> in same program?
> =

> Regards,
> XuYifeng
> =

> ----- Original Message ----- =

> From: Robert Watson <rwatson@FreeBSD.org>
> To: <freebsd-arch@FreeBSD.org>
> Sent: Wednesday, December 06, 2000 11:11 AM
> Subject: Threads in the base system
> =

> =

> > =

> > Recently, pppctl was made thread-enabled, meaning that it relies on
> > libc_r.  This makes the NOLIBC_R cannot be used with buildworld anymo=
re.
> > Given that making pppctl depend on !NOLIBC_R may not be all that help=
ful,
> > it looks like we may need to lose NOLIBC_R.  Presumably over time, th=
reads
> > in default system applications will only become more popular.  Any
> > thoughts (especially in light of upcoming KSE changes, which will mak=
e
> > threading integral to the system architecture)?
> > =

> > Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
> > robert@fledge.watson.org      NAI Labs, Safeport Network Services

-- =

Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org=
>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org=
>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 13:51: 1 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 13:50:58 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id 78C1337B400; Wed,  6 Dec 2000 13:50:57 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id QAA14771;
	Wed, 6 Dec 2000 16:50:30 -0500 (EST)
Date: Wed, 6 Dec 2000 16:50:29 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Brian Somers <brian@Awfulhak.org>
Cc: Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG,
	brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: <200012062138.eB6Lc6t07375@hak.lan.Awfulhak.org>
Message-ID: <Pine.SUN.3.91.1001206164314.13505A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 6 Dec 2000, Brian Somers wrote:
> Good spot.  I believe NOLIBC_R should/must go.

I was just [re]thinking about this.  When we get libpthread (work
has just started on this), then libc_r will eventually go away.
It's not clear yet whether libpthread will exist as a separate
entity or whether it will evolve from libc_r.  It's possible
that NOLIBC_R might actually become the default.

> > Recently, pppctl was made thread-enabled, meaning that it relies on
> > libc_r.  This makes the NOLIBC_R cannot be used with buildworld anymore.
> > Given that making pppctl depend on !NOLIBC_R may not be all that helpful,
> > it looks like we may need to lose NOLIBC_R.  Presumably over time, threads
> > in default system applications will only become more popular.  Any
> > thoughts (especially in light of upcoming KSE changes, which will make
> > threading integral to the system architecture)?
> > 
> > Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
> > robert@fledge.watson.org      NAI Labs, Safeport Network Services

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 14: 3: 2 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 14:02:59 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4870037B400; Wed,  6 Dec 2000 14:02:52 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6LvHm16334;
	Wed, 6 Dec 2000 21:57:17 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6M01t07697;
	Wed, 6 Dec 2000 22:00:01 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200012062200.eB6M01t07697@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: Daniel Eischen <eischen@vigrid.com>
Cc: Brian Somers <brian@Awfulhak.org>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG,
	brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: Message from Daniel Eischen <eischen@vigrid.com> 
   of "Wed, 06 Dec 2000 16:50:29 EST." <Pine.SUN.3.91.1001206164314.13505A-100000@pcnet1.pcnet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 06 Dec 2000 22:00:01 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> On Wed, 6 Dec 2000, Brian Somers wrote:
> > Good spot.  I believe NOLIBC_R should/must go.
> 
> I was just [re]thinking about this.  When we get libpthread (work
> has just started on this), then libc_r will eventually go away.
> It's not clear yet whether libpthread will exist as a separate
> entity or whether it will evolve from libc_r.  It's possible
> that NOLIBC_R might actually become the default.

We should really be advocating using threads in the base system 
rather than discouraging it (well, of course that's my view :-).
I suspect however that to most people, libc_r is just some extra 
buildworld overhead....

I've already cast my vote, and can't see any strong argument not to 
remove NOLIBC_R (especially now that it breaks world :-)

> > > Recently, pppctl was made thread-enabled, meaning that it relies on
> > > libc_r.  This makes the NOLIBC_R cannot be used with buildworld anymore.
> > > Given that making pppctl depend on !NOLIBC_R may not be all that helpful,
> > > it looks like we may need to lose NOLIBC_R.  Presumably over time, threads
> > > in default system applications will only become more popular.  Any
> > > thoughts (especially in light of upcoming KSE changes, which will make
> > > threading integral to the system architecture)?
> > > 
> > > Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
> > > robert@fledge.watson.org      NAI Labs, Safeport Network Services
> 
> -- 
> Dan Eischen

-- 
Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 14:36:51 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 14:36:49 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242])
	by hub.freebsd.org (Postfix) with ESMTP
	id AE95737B401; Wed,  6 Dec 2000 14:36:49 -0800 (PST)
Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30])
	by palrel1.hp.com (Postfix) with ESMTP
	id 678A589D; Wed,  6 Dec 2000 14:36:28 -0800 (PST)
Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180])
	by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id OAA07831;
	Wed, 6 Dec 2000 14:36:28 -0800 (PST)
Sender: marcel@cup.hp.com
Message-ID: <3A2EBF6B.90BA100B@cup.hp.com>
Date: Wed, 06 Dec 2000 14:36:27 -0800
From: Marcel Moolenaar <marcel@cup.hp.com>
Organization: Hewlett-Packard
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Brian Somers <brian@Awfulhak.org>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: Threads in the base system
References: <200012062200.eB6M01t07697@hak.lan.Awfulhak.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Brian Somers wrote:
> 
> I've already cast my vote, and can't see any strong argument not to
> remove NOLIBC_R (especially now that it breaks world :-)

I think Daniel just gave a good reason: we may need it in the future.

Isn't it better at this time to keep the NOLIBC_R, but to promote it to
an internal tweak?

-- 
Marcel Moolenaar
  mail: marcel@cup.hp.com / marcel@FreeBSD.org
  tel:  (408) 447-4222


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 14:48:43 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 14:48:40 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id 130DE37B400
	for <freebsd-arch@FreeBSD.ORG>; Wed,  6 Dec 2000 14:48:40 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id PAA10597;
	Wed, 6 Dec 2000 15:45:15 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAAXDaqPu; Wed Dec  6 15:45:07 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id PAA25916;
	Wed, 6 Dec 2000 15:48:27 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012062248.PAA25916@usr08.primenet.com>
Subject: Re: Threads in the base system
To: brian@Awfulhak.org (Brian Somers)
Date: Wed, 6 Dec 2000 22:48:27 +0000 (GMT)
Cc: xyf@stocke.com (xuyifeng), freebsd-arch@FreeBSD.ORG,
	brian@Awfulhak.org
In-Reply-To: <200012062137.eB6Lb0t07362@hak.lan.Awfulhak.org> from "Brian Somers" at Dec 06, 2000 09:37:00 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr08.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> I don't think it's possible to mix libc and libc_r in the same 
> program (except as intended - with the libc_r stubs superseding the 
> libc ones).
> 
> There are no ``alternative'' header files with different defines that 
> might be used in one object and not the other, and the program is 
> either linked against libc_r, or isn't (and will fail if it's got 
> thread references).

From my reading, there is still code in header files that is
compiled variant based on _THREAD.  This means that calling
libraries compiled with threading disabled from code that
was compiled with threading enabled _may_ result in undefined
behaviour (I haven't tracked down every instance, and I think
an audit would be needed to know for sure).

In general, it's possible to set up an "apartment" or "rental"
model threading interface to wrap such libraries to make sure
things work.  Work has to be queued for a worker thread, and
the worker thread does the work and queues the response.  Only
the worker thread can be allowed into the library.  This is
basically how you have to use the thread-unsafe LDAP libraries
on Windows (or any system that has thread local storage that
is not mapped into the global process address space -- what a
design mistake).  This assumes that with or without _THREAD,
the code doesn't change, though...

I guess the real question is, if you were to rename libc, so
that things couldn't link against it, modify the libc_r to
include a linkage against the renamed library so it pulls in
things it doesn't define from libc instead, and then make
symlinks from libc to point to libc_r instead, would things
still work, or are there some things that would break?

As far as eating the threading overhead in unthreaded
programs, the decision to eat the overhead has already been
taken; it happened whenEGCS became the default compiler,
since EGCS doesn't support dynamic registration of threads
support code (e.g. per thread exception stacks in C++ via
libgcc).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 14:53:44 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 14:53:42 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from gw.nectar.com (gw.nectar.com [208.42.49.153])
	by hub.freebsd.org (Postfix) with ESMTP
	id D00BA37B698; Wed,  6 Dec 2000 14:53:35 -0800 (PST)
Received: by gw.nectar.com (Postfix, from userid 1001)
	id 9FF94193E1; Wed,  6 Dec 2000 16:53:34 -0600 (CST)
Date: Wed, 6 Dec 2000 16:53:34 -0600
From: "Jacques A. Vidrine" <n@nectar.com>
To: Daniel Eischen <eischen@vigrid.com>
Cc: Brian Somers <brian@Awfulhak.org>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: Threads in the base system
Message-ID: <20001206165334.D64011@spawn.nectar.com>
Mail-Followup-To: "Jacques A. Vidrine" <n@nectar.com>,
	Daniel Eischen <eischen@vigrid.com>,
	Brian Somers <brian@Awfulhak.org>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
References: <200012062138.eB6Lc6t07375@hak.lan.Awfulhak.org> <Pine.SUN.3.91.1001206164314.13505A-100000@pcnet1.pcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.SUN.3.91.1001206164314.13505A-100000@pcnet1.pcnet.com>; from eischen@vigrid.com on Wed, Dec 06, 2000 at 04:50:29PM -0500
X-Url: http://www.nectar.com/
Sender: nectar@nectar.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote:
> I was just [re]thinking about this.  When we get libpthread (work
> has just started on this), then libc_r will eventually go away.
> It's not clear yet whether libpthread will exist as a separate
> entity or whether it will evolve from libc_r.  

For the ignorant (me), what is/will be the difference between libc_r and
libpthread? 

Cheers,
-- 
Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 15:22:31 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 15:22:29 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173])
	by hub.freebsd.org (Postfix) with ESMTP
	id 26A9D37B400; Wed,  6 Dec 2000 15:22:27 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6NJ3m16686;
	Wed, 6 Dec 2000 23:19:03 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6NLlt08622;
	Wed, 6 Dec 2000 23:21:47 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200012062321.eB6NLlt08622@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: "Jacques A. Vidrine" <n@nectar.com>,
	Daniel Eischen <eischen@vigrid.com>,
	Brian Somers <brian@Awfulhak.org>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: Threads in the base system 
In-Reply-To: Message from "Jacques A. Vidrine" <n@nectar.com> 
   of "Wed, 06 Dec 2000 16:53:34 CST." <20001206165334.D64011@spawn.nectar.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 06 Dec 2000 23:21:47 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote:
> > I was just [re]thinking about this.  When we get libpthread (work
> > has just started on this), then libc_r will eventually go away.
> > It's not clear yet whether libpthread will exist as a separate
> > entity or whether it will evolve from libc_r.  
> 
> For the ignorant (me), what is/will be the difference between libc_r and
> libpthread? 

And me !

Besides, can't we put libpthread in libc_r's place when it goes away ?

> Cheers,
> -- 
> Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org

-- 
Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 16:44:49 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 16:44:47 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242])
	by hub.freebsd.org (Postfix) with ESMTP id 2507337B401
	for <arch@FreeBSD.ORG>; Wed,  6 Dec 2000 16:44:46 -0800 (PST)
Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2])
	by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id BAA04665;
	Thu, 7 Dec 2000 01:44:39 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
Received: from localhost (mbendiks@localhost)
	by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id BAA30753;
	Thu, 7 Dec 2000 01:44:39 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs
Date: Thu, 7 Dec 2000 01:44:39 +0100 (CET)
From: Marius Bendiksen <mbendiks@eunet.no>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: Terry Lambert <tlambert@primenet.com>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
In-Reply-To: <Pine.BSF.4.21.0012051623180.9538-100000@jehovah.technokratis.com>
Message-ID: <Pine.BSF.4.05.10012070143120.30687-100000@login-1.eunet.no>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 	Just about the only thing that may be considered is changing the
>   name of M_WAIT to something more appropriate, if it means so much to the
>   majority of people (honestly, I would find even doing this a waste of
>   time, but if lots of folks think it's worth educating kernel developers
>   by changing the name of a flag, then we might as well). 

This isn't much of an issue for me; however, I'd vote for changing the name.
Not so much for "educating kernel developers", but rather for the sake of us
being consistent and labelling things correctly.

Marius


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Dec  6 19: 3:55 2000
From owner-freebsd-arch@FreeBSD.ORG  Wed Dec  6 19:03:53 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5951337B400; Wed,  6 Dec 2000 19:03:49 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id WAA29975;
	Wed, 6 Dec 2000 22:03:25 -0500 (EST)
Date: Wed, 6 Dec 2000 22:03:25 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Brian Somers <brian@Awfulhak.org>
Cc: "Jacques A. Vidrine" <n@nectar.com>,
	Brian Somers <brian@Awfulhak.org>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: Threads in the base system 
In-Reply-To: <200012062321.eB6NLlt08622@hak.lan.Awfulhak.org>
Message-ID: <Pine.SUN.3.91.1001206213644.25911A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 6 Dec 2000, Brian Somers wrote:
> > On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote:
> > > I was just [re]thinking about this.  When we get libpthread (work
> > > has just started on this), then libc_r will eventually go away.
> > > It's not clear yet whether libpthread will exist as a separate
> > > entity or whether it will evolve from libc_r.  
> > 
> > For the ignorant (me), what is/will be the difference between libc_r and
> > libpthread? 
> 
> And me !

OK, libc_r is libc + threads; an application can't be linked to both
libc_r and libc.  libpthread is just the thread routines (at least
those that aren't included in libc) and _is_ linked with libc.  When
you have libpthread, the gcc option "-pthread" goes away (which we use
to link to libc_r and prevent linking to libc), and you link with 
"-lpthread".  In theory, libc_r could be an archive of libc and
libpthread.

We may want to keep libc_r around for a while for compatibility
reasons (without moving it to compat).  But at some point, libc_r
will cease to be built the way it is currently being built (to
include libc).  All the _THREAD_SAFE checks will be removed from
libc.  Instead, libc will contain stub routines for the needed
lock operations.  These will be weak symbols that will be overloaded
with (non-weak symbol) routines of the same name in libpthread.
When libpthread isn't linked in, then the null stub routines
will be invoked.  If libpthread is linked in, then the real lock
routines will be called.

> Besides, can't we put libpthread in libc_r's place when it goes away ?

Yes, but it can't be used (linked to) the same way nor named the
same.

I guess my point is that applications in our base system that
require threads will need !NOLIBC_R || !NOLIBPTHREAD.  And NOLIBC_R
will eventually become the default some time after libpthread gets
integrated.

It's a little confusing, but am I making sense?

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  1:21:13 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 01:21:10 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id 3337337B400
	for <arch@freebsd.org>; Thu,  7 Dec 2000 01:21:10 -0800 (PST)
Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB79L5C60826;
	Thu, 7 Dec 2000 18:21:06 +0900 (JST)
Date: Thu, 07 Dec 2000 18:21:04 +0900
Message-ID: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: arch@freebsd.org
Subject: Even 1GB KVA is not enough, but we have no more space
Cc: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd)
Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

As you may know, we now have a KVA space of 1GB. Some parts of our
kernel, however, believes that they can scale up the size of memory to
allocate in the KVA proportionally to the amount of physical
memory. The result is again shortage of KVA space, but we cannot
extend our KVA any further. (I understand that 1GB is the upper limit
of KVA on i386, am I right?)

The following is a mail I sent to Matt Dillon a few hours ago.

On Thu, 07 Dec 2000 14:56:32 +0900,
  Seigo Tanimura <tanimura> said:

Seigo> I recently bought a Dell PowerEdge 6400/700 with RAM of 3GB in my
Seigo> lab. The box runs -current quite well, except that it panics upon
Seigo> swapping out data pages.

Seigo> Here is how the PowerEdge dies. swap_zone in vm/swap_pager.c is not
Seigo> initialized because zinit() attempts to allocate for swblock entries
Seigo> an entry of about 250MB, which does not fit in any free entries in
Seigo> kernel_map. The pagedaemon eventally calls zalloc(swap_pages) in
Seigo> swp_pager_meta_build() to build swap metadata, leading to dereference
Seigo> of a NULL pointer. Another box of mine at home with 256MB RAM also
Seigo> runs -current, but the swap pager works fine.

Seigo> Attached is a patch to adjust the number of swap metadata entries so
Seigo> that the metadata fits in the KVA. The number of the entries are
Seigo> divided by 2 until zinit() succeeds. If the initial value of n in
Seigo> swap_pager_swap_init() (which is cnt.v_page_count * 2) is too big or
Seigo> zinit() does not succeed at all (hopefully not likely), you will see a
Seigo> note or warning. zlist is cleaned up if zinitna() fails to avoid
Seigo> vmstat -z messing up.
(patch moved to the bottom of this mail)

First my eye was only on the size of swap metadata, but that was
shortsighted. After fixing allocation of swap metadata, my kernel died
in ffs_vget(), when kernel_map held only one free page. I then
estimated how big swap metadata grows up with respect to the amount of
physical memory. We assume that the amount of swap metadata is
proportional to the amount of physical memory, and that swap metadata
takes 8% of physical memory (according to my measurement). The results
are shown below.


Physical Memory		swap metadata
256M			20.5M
512M			41.0M
1G			81.9M
2G			163.8M
3G			245.8M
4G			327.7M


So, on my PowerEdge, the kernel first attempts to allocate about 1/4
of the KVA for swap metadata. Although the size of swap metadata
reduces to around 64MB with my patch, the size of the remaining free
entry in kernel_map is only about 120MB.

The solution I have is that we do not count the size of physical
memory larger than the size of our KVA, or 1GB, upon estimating the
size of KVA space to allocate in kernel_map. Hence the kernel
allocates the same amount of memory for swap metadata or whatever, on
a machine with 1GB, 2GB, 3GB and 4GB RAM. This solution might degrage
the performance of our kernel, but you would have no other options
than to switch to alpha or ia64 in order to expand the size of KVA.

Thanks, and any comments, flames or whatever are welcome.

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  1:29:38 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 01:29:34 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id A659937B400
	for <arch@freebsd.org>; Thu,  7 Dec 2000 01:29:33 -0800 (PST)
Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB79TTC61332;
	Thu, 7 Dec 2000 18:29:30 +0900 (JST)
Date: Thu, 07 Dec 2000 18:29:29 +0900
Message-ID: <vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: tanimura@r.dl.itc.u-tokyo.ac.jp
Cc: arch@freebsd.org
Subject: Re: Even 1GB KVA is not enough, but we have no more space
In-Reply-To: In your message of "Thu, 07 Dec 2000 18:21:04 +0900"
	<vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
Cc: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd)
Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Aaugh, I forgot to attach my patch...

As the previous mail of mine is somewhat long, I placed the following
patches on the web, and added another one:


URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff

This is the one I sent to Matt.


URI: http://people.FreeBSD.org/~tanimura/patches/vmstat.diff

This allows vmstat(8) to show the amount of pages each zone holds. The
result looks like this:

tanimura@stella% vmstat -z

ZONE            used    total   pages   mem-use 
PIPE            4       102     -1         0/15K
SWAPMETA        0       0       15078      0/0K
tcpcb           24      35      4624      12/18K
unpcb           12      128     -1         0/8K
ripcb           1       42      1632       0/7K
tcpcb           0       0       4624       0/0K
udpcb           36      84      1632       6/15K
socket          74      126     1632      13/23K
KNOTE           0       128     -1         0/8K
NFSNODE         137     192     -1        42/60K
NFSMOUNT        5       14      -1         2/7K
VNODE           14310   14400   -1      3577/3600K
NAMEI           0       16      -1         0/16K
VMSPACE         78      162     -1        17/35K
PROC            104     148     -1        55/78K
DP fakepg       0       0       -1         0/0K
PV ENTRY        151501  786421  16604   4142/21503K
MAP ENTRY       1166    1658    -1        54/77K
KMAP ENTRY      267     383     2262      12/17K
MAP             7       47      -1         0/4K
VM OBJECT       1259    1432    -1       118/134K
--------------------------------------------------
TOTAL                                   8058/25633K

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  1:36:18 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 01:36:17 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 3371337B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 01:36:17 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB79aBw29191;
	Thu, 7 Dec 2000 01:36:11 -0800 (PST)
Date: Thu, 7 Dec 2000 01:36:11 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
Cc: arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space
Message-ID: <20001207013611.O16205@fw.wintelcom.net>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp> <vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>; from tanimura@r.dl.itc.u-tokyo.ac.jp on Thu, Dec 07, 2000 at 06:29:29PM +0900
Sender: bright@fw.wintelcom.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> [001207 01:29] wrote:
> Aaugh, I forgot to attach my patch...
> 
> As the previous mail of mine is somewhat long, I placed the following
> patches on the web, and added another one:
> 
> 
> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff
> 
> This is the one I sent to Matt.
> 

possible problem:

in the loop you use to allocate, you never test if 'n' hits zero,
now if there's a swap problem you won't print anything, just wedge
hard.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  1:48: 3 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 01:48:01 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id 47B4037B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 01:48:00 -0800 (PST)
Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB79lvC62921;
	Thu, 7 Dec 2000 18:47:57 +0900 (JST)
Date: Thu, 07 Dec 2000 18:47:57 +0900
Message-ID: <vmpuj48svm.wl@rina.r.dl.itc.u-tokyo.ac.jp>
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: bright@wintelcom.net
Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space
In-Reply-To: In your message of "Thu, 7 Dec 2000 01:36:11 -0800"
	<20001207013611.O16205@fw.wintelcom.net>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207013611.O16205@fw.wintelcom.net>
User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd)
Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 7 Dec 2000 01:36:11 -0800,
  Alfred Perlstein <bright@wintelcom.net> said:

>> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff

Alfred> in the loop you use to allocate, you never test if 'n' hits zero,
Alfred> now if there's a swap problem you won't print anything, just wedge
Alfred> hard.

It should also be good to reject swapon(2) if swap_zone is NULL.

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  1:57: 0 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 01:56:58 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 9F94337B402
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 01:56:58 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB79upg29796;
	Thu, 7 Dec 2000 01:56:51 -0800 (PST)
Date: Thu, 7 Dec 2000 01:56:51 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
Cc: arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space
Message-ID: <20001207015651.P16205@fw.wintelcom.net>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp> <vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp> <20001207013611.O16205@fw.wintelcom.net> <vmpuj48svm.wl@rina.r.dl.itc.u-tokyo.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <vmpuj48svm.wl@rina.r.dl.itc.u-tokyo.ac.jp>; from tanimura@r.dl.itc.u-tokyo.ac.jp on Thu, Dec 07, 2000 at 06:47:57PM +0900
Sender: bright@fw.wintelcom.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> [001207 01:48] wrote:
> On Thu, 7 Dec 2000 01:36:11 -0800,
>   Alfred Perlstein <bright@wintelcom.net> said:
> 
> >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff
> 
> Alfred> in the loop you use to allocate, you never test if 'n' hits zero,
> Alfred> now if there's a swap problem you won't print anything, just wedge
> Alfred> hard.
> 
> It should also be good to reject swapon(2) if swap_zone is NULL.

Agreed.  Since you've been pouring through this code, I'm wondering
what happens when the swapper can't allocate as much as it wants?

Does it just reduce the amount of swaping the machine can do? or
is there a performance hit? or both?

> 
> -- 
> Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  2:22:14 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 02:22:13 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id AF10137B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 02:22:12 -0800 (PST)
Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB7AM8C66385;
	Thu, 7 Dec 2000 19:22:09 +0900 (JST)
Date: Thu, 07 Dec 2000 19:22:08 +0900
Message-ID: <vmofyo8ran.wl@rina.r.dl.itc.u-tokyo.ac.jp>
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: bright@wintelcom.net
Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space
In-Reply-To: In your message of "Thu, 7 Dec 2000 01:56:51 -0800"
	<20001207015651.P16205@fw.wintelcom.net>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207013611.O16205@fw.wintelcom.net>
	<vmpuj48svm.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207015651.P16205@fw.wintelcom.net>
User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd)
Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 7 Dec 2000 01:56:51 -0800,
  Alfred Perlstein <bright@wintelcom.net> said:

>> >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff
>> 
Alfred> in the loop you use to allocate, you never test if 'n' hits zero,
Alfred> now if there's a swap problem you won't print anything, just wedge
Alfred> hard.
>> 
>> It should also be good to reject swapon(2) if swap_zone is NULL.

Alfred> Agreed.  Since you've been pouring through this code, I'm wondering
Alfred> what happens when the swapper can't allocate as much as it wants?

Alfred> Does it just reduce the amount of swaping the machine can do? or
Alfred> is there a performance hit? or both?

Reduction of swap metadata entries primarily results in failure to
allocate a metadata entry, limiting the maximum size of vm objects
that can be used at a time. Another effect is for the pagedaemon to
wait for a free matadata entry, slowing down the speed of swap out.

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7  9:18:42 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 09:18:40 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70])
	by hub.freebsd.org (Postfix) with SMTP id 4A10737B400
	for <arch@freebsd.org>; Thu,  7 Dec 2000 09:18:38 -0800 (PST)
Received: (qmail 70321 invoked from network); 7 Dec 2000 17:16:28 -0000
Received: from unknown (HELO telehouse.ch) ([195.134.128.53]) (envelope-sender <oppermann@telehouse.ch>)
          by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP
          for <tanimura@r.dl.itc.u-tokyo.ac.jp>; 7 Dec 2000 17:16:28 -0000
Message-ID: <3A2FC647.6EC4FFA7@telehouse.ch>
Date: Thu, 07 Dec 2000 18:17:59 +0100
From: Andre Oppermann <oppermann@telehouse.ch>
X-Mailer: Mozilla 4.74 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
Cc: arch@freebsd.org
Subject: Re: Even 1GB KVA is not enough, but we have no more space
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp> <vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Seigo Tanimura wrote:
> 
> Aaugh, I forgot to attach my patch...
> 
> As the previous mail of mine is somewhat long, I placed the following
> patches on the web, and added another one:
> 
> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff
> 
> This is the one I sent to Matt.
> 
> URI: http://people.FreeBSD.org/~tanimura/patches/vmstat.diff
> 
> This allows vmstat(8) to show the amount of pages each zone holds. The
> result looks like this:
> 
> tanimura@stella% vmstat -z
> 
> ZONE            used    total   pages   mem-use
> PIPE            4       102     -1         0/15K
> SWAPMETA        0       0       15078      0/0K
> tcpcb           24      35      4624      12/18K
> unpcb           12      128     -1         0/8K
> ripcb           1       42      1632       0/7K
> tcpcb           0       0       4624       0/0K
> udpcb           36      84      1632       6/15K
> socket          74      126     1632      13/23K
> KNOTE           0       128     -1         0/8K
> NFSNODE         137     192     -1        42/60K
> NFSMOUNT        5       14      -1         2/7K
> VNODE           14310   14400   -1      3577/3600K
> NAMEI           0       16      -1         0/16K
> VMSPACE         78      162     -1        17/35K
> PROC            104     148     -1        55/78K
> DP fakepg       0       0       -1         0/0K
> PV ENTRY        151501  786421  16604   4142/21503K
> MAP ENTRY       1166    1658    -1        54/77K
> KMAP ENTRY      267     383     2262      12/17K
> MAP             7       47      -1         0/4K
> VM OBJECT       1259    1432    -1       118/134K
> --------------------------------------------------
> TOTAL                                   8058/25633K

Wow, that looks good! For easier than the other stuff.

-- 
Andre


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 10:52: 1 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 10:51:58 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP id ADED637B400
	for <arch@freebsd.org>; Thu,  7 Dec 2000 10:51:57 -0800 (PST)
Received: from luanda-33.budapest.interware.hu ([195.70.51.33] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 144694-0004Iy-00; Thu, 07 Dec 2000 19:51:55 +0100
Sender: julian@FreeBSD.ORG
Message-ID: <3A2F93C6.7967D1DA@elischer.org>
Date: Thu, 07 Dec 2000 05:42:30 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
Cc: arch@freebsd.org
Subject: Re: Even 1GB KVA is not enough, but we have no more space
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Seigo Tanimura wrote:

[interesting stuff deleted]

> The solution I have is that we do not count the size of physical
> memory larger than the size of our KVA, or 1GB, upon estimating the
> size of KVA space to allocate in kernel_map. Hence the kernel
> allocates the same amount of memory for swap metadata or whatever, on
> a machine with 1GB, 2GB, 3GB and 4GB RAM. This solution might degrage
> the performance of our kernel, but you would have no other options
> than to switch to alpha or ia64 in order to expand the size of KVA.
> 
> Thanks, and any comments, flames or whatever are welcome.
> 


THEORETICALLY it should be possible to put the kernel into a differnt 
KV space from the processes and give it 4GB.
Practically, we'd have to do a lot to do this, and it may effect 
throughout (page tables loading in and out).

It may however be worth looking at.
Especially with the possibility of altering the system to allow 
the > 4GB physical ram that P6 and higher have.  

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 11:38:18 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 11:38:16 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from beastie.mckusick.com (tserver.conference.usenix.org [209.179.127.3])
	by hub.freebsd.org (Postfix) with ESMTP id 7CC4037B400
	for <arch@freebsd.org>; Thu,  7 Dec 2000 11:38:15 -0800 (PST)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id LAA03622
	for <arch@freebsd.org>; Thu, 7 Dec 2000 11:38:11 -0800 (PST)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200012071938.LAA03622@beastie.mckusick.com>
To: arch@freebsd.org
Subject: Getting Kernel Process Information
Date: Thu, 07 Dec 2000 11:38:11 -0800
From: Kirk McKusick <mckusick@mckusick.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

For the third time in a week, I got the following message when I
tried to run ps on my 5.X system:

	proc size mismatch (39776 total, 1136 chunks)

This message arises when the size of the proc structure changes.
With the current SMP development, the proc structure changes at
a very high rate of speed. The current kinfo_proc interface used
between the kernel and user processes is built from two pieces:

	struct kinfo_proc {
		struct proc kp_proc;
		struct eproc kp_eproc;
	}

Kinfo_proc contains a copy of the kernel's proc structure
followed by an `extended' proc structure which has lots
of bits and pieces that have moved out of the proc structure
or are otherwise needed. Any change to the kernel's version
of the proc structure changes the size of the kinfo_proc
structure and hence causes a mismatch when attempts are made
to copy it out.

I propose to change the kinfo_proc structure. The new
kinfo_proc structure will contain only the stylized `extended'
proc structure which will be augmented with the twenty
fields that are actually referenced from the proc structure
by user processes. By taking this approach, changes to the
proc structure will not affect the format or size of the
kinfo_proc structure returned to user processes. The new
`extended' proc structure will have plenty of spare fields
added to its end so that when new fields are added to the
proc structure that user-level processes need/want to know
about, they can be added without changing the size of the
exported kinfo_proc structure and thus will not require
recompilation of the dozen or so programs that use the
exported interface. Note that even if 200 spare bytes are
added to the kinfo_proc structure, it will still be smaller
than the current one.

Note that I am proposing to make this change only in the
5.X tree. I am not proposing that it be back ported to the
4.X tree.

I am not interested in starting a long discussion on all
the possible alternatives for exporting kernel information
to user processes. I recognize that there are better ways
to handle these issues. I am just trying to make an
incremental change that is small in scope and hopefully
will make an annoying problem significantly less common.
With this caveat, comments are solicited.

	Kirk McKusick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 11:42:40 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 11:42:38 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mailout04.sul.t-online.com (mailout04.sul.t-online.com [194.25.134.18])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3B62937B401; Thu,  7 Dec 2000 11:42:37 -0800 (PST)
Received: from fwd00.sul.t-online.com 
	by mailout04.sul.t-online.com with smtp 
	id 1446w3-00059o-04; Thu, 07 Dec 2000 20:42:31 +0100
Received: from neutron.cichlids.com (520050424122-0001@[62.225.193.245]) by fmrl00.sul.t-online.com
	with esmtp id 1446vq-2FIe5AC; Thu, 7 Dec 2000 20:42:18 +0100
Received: from cichlids.cichlids.com (cichlids.cichlids.com [192.168.0.10])
	by neutron.cichlids.com (Postfix) with ESMTP
	id 336E9AB0C; Thu,  7 Dec 2000 20:42:18 +0100 (CET)
Received: by cichlids.cichlids.com (Postfix, from userid 1001)
	id 1382314A86; Thu,  7 Dec 2000 20:42:16 +0100 (CET)
Date: Thu, 7 Dec 2000 20:42:15 +0100
To: Orion Hodson <O.Hodson@cs.ucl.ac.uk>
Cc: freebsd-arch@FreeBSD.ORG, cg@FreeBSD.ORG
Subject: Re: soundcard.h
Message-ID: <20001207204215.A5787@cichlids.cichlids.com>
References: <3737.976021730@cs.ucl.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3737.976021730@cs.ucl.ac.uk>; from O.Hodson@cs.ucl.ac.uk on Tue, Dec 05, 2000 at 01:08:50PM +0000
X-PGP-Fingerprint: 44 28 CA 4C 46 5B D3 A8  A8 E3 BA F3 4E 60 7D 7F
X-PGP-at: finger alex@big.endian.de
X-Verwirrung: Dieser Header dient der allgemeinen Verwirrung.
From: alex@big.endian.de (Alexander Langer)
X-Sender: 520050424122-0001@t-dialin.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Thus spake Orion Hodson (O.Hodson@cs.ucl.ac.uk):

> into separate include files, i.e. snd_oss.h, snd_pcm.h, snd_mixer.h,
> snd_sequencer.h, etc and have these included from soundcard.h.
> Is there any strength of feeling for or against doing this?  It's
> completely aesthetic and very minor undertaking, but I don't mind
> doing if people think it'd be reasonable.

If this really helps someone (i.e. you or Cameron), I don't know why
it shouldn't be done.

Alex
-- 
cat: /home/alex/.sig: No such file or directory


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 11:52:49 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 11:52:46 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP id 0625137B401
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 11:52:46 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id eB7JqXm11711;
	Thu, 7 Dec 2000 11:52:33 -0800 (PST)
	(envelope-from dillon)
Date: Thu, 7 Dec 2000 11:52:33 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200012071952.eB7JqXm11711@earth.backplane.com>
To: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
Cc: bright@wintelcom.net, tanimura@r.dl.itc.u-tokyo.ac.jp,
	arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207013611.O16205@fw.wintelcom.net>
	<vmpuj48svm.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207015651.P16205@fw.wintelcom.net> <vmofyo8ran.wl@rina.r.dl.itc.u-tokyo.ac.jp>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:On Thu, 7 Dec 2000 01:56:51 -0800,
:  Alfred Perlstein <bright@wintelcom.net> said:
:
:>> >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff
:>> 
:Alfred> in the loop you use to allocate, you never test if 'n' hits zero,
:Alfred> now if there's a swap problem you won't print anything, just wedge
:Alfred> hard.
:>> 
:>> It should also be good to reject swapon(2) if swap_zone is NULL.
:
:Alfred> Agreed.  Since you've been pouring through this code, I'm wondering
:Alfred> what happens when the swapper can't allocate as much as it wants?
:
:Alfred> Does it just reduce the amount of swaping the machine can do? or
:Alfred> is there a performance hit? or both?
:
:Reduction of swap metadata entries primarily results in failure to
:allocate a metadata entry, limiting the maximum size of vm objects
:that can be used at a time. Another effect is for the pagedaemon to
:wait for a free matadata entry, slowing down the speed of swap out.
:
:-- 
:Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>

    Running out of swapmeta may not be an option.  A system deadlock could
    result.  The real problem here is that swapmeta is being reserved
    based on some multiple of main memory rather then based on the actual
    amount of swap allocated. 

    Another possibility would be to reserve swap in larger chunks... that
    is, in SWAP_META_PAGES (16-page) chunks rather then page-sized chunks.  
    The struct swblock structure would then turn into a single daddr_t (base
    swap address) and a bitmap (one int), reducing its size from 80 bytes
    to 24 bytes.  The only problem with this is that the VM object collapse
    code needs to merge swap areas on a page-by-page basis, so it isn't
    entirely trivial.

    Another possibility would be to have some way to swap the swblock
    structures themselves, relegating the SWAPMETA zone to a cache.
    Also not trivial.

    In anycase, your stopgap patch seems reasonable in concept until we
    can come up with a better solution.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 11:56:19 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 11:56:17 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 7CF1E37B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 11:56:17 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB7JuGg15731;
	Thu, 7 Dec 2000 11:56:16 -0800 (PST)
Date: Thu, 7 Dec 2000 11:56:16 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Kirk McKusick <mckusick@mckusick.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Getting Kernel Process Information
Message-ID: <20001207115616.V16205@fw.wintelcom.net>
References: <200012071938.LAA03622@beastie.mckusick.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200012071938.LAA03622@beastie.mckusick.com>; from mckusick@mckusick.com on Thu, Dec 07, 2000 at 11:38:11AM -0800
Sender: bright@fw.wintelcom.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Kirk McKusick <mckusick@mckusick.com> [001207 11:38] wrote:
> For the third time in a week, I got the following message when I
> tried to run ps on my 5.X system:
> 
> 	proc size mismatch (39776 total, 1136 chunks)
> 
> This message arises when the size of the proc structure changes.
> With the current SMP development, the proc structure changes at
> a very high rate of speed. The current kinfo_proc interface used
> between the kernel and user processes is built from two pieces:
> 
> 	struct kinfo_proc {
> 		struct proc kp_proc;
> 		struct eproc kp_eproc;
> 	}
> 
> Kinfo_proc contains a copy of the kernel's proc structure
> followed by an `extended' proc structure which has lots
> of bits and pieces that have moved out of the proc structure
> or are otherwise needed. Any change to the kernel's version
> of the proc structure changes the size of the kinfo_proc
> structure and hence causes a mismatch when attempts are made
> to copy it out.
> 
> I propose to change the kinfo_proc structure. The new
> kinfo_proc structure will contain only the stylized `extended'
> proc structure which will be augmented with the twenty
> fields that are actually referenced from the proc structure
> by user processes. By taking this approach, changes to the
> proc structure will not affect the format or size of the
> kinfo_proc structure returned to user processes. The new
> `extended' proc structure will have plenty of spare fields
> added to its end so that when new fields are added to the
> proc structure that user-level processes need/want to know
> about, they can be added without changing the size of the
> exported kinfo_proc structure and thus will not require
> recompilation of the dozen or so programs that use the
> exported interface. Note that even if 200 spare bytes are
> added to the kinfo_proc structure, it will still be smaller
> than the current one.

I completely agree that should be done.  My suggestion is to
completely rip out and kernel structs being passed through
this interface, the reason is that we will need mutexes in
a lot of them and we don't want to export that to userland.

I was looking at this the other week when trying to clean up the
struct ucred issues and thought it was a good idea, but a bit more
work than I had in mind at the time.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 12:26:13 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 12:26:11 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP id 339C237B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 12:26:11 -0800 (PST)
Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eB7KPC726457;
	Thu, 7 Dec 2000 12:25:12 -0800 (PST)
	(envelope-from jhb@foo.osd.bsdi.com)
Received: (from jhb@localhost)
	by foo.osd.bsdi.com (8.11.1/8.11.0) id eB7KPFn65390;
	Thu, 7 Dec 2000 12:25:15 -0800 (PST)
	(envelope-from jhb)
Message-ID: <XFMail.001207122515.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <20001207115616.V16205@fw.wintelcom.net>
Date: Thu, 07 Dec 2000 12:25:15 -0800 (PST)
Organization: BSD, Inc.
From: John Baldwin <jhb@FreeBSD.ORG>
To: Alfred Perlstein <bright@wintelcom.net>
Subject: Re: Getting Kernel Process Information
Cc: arch@FreeBSD.ORG, Kirk McKusick <mckusick@mckusick.com>
Sender: jhb@foo.osd.bsdi.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 07-Dec-00 Alfred Perlstein wrote:
> * Kirk McKusick <mckusick@mckusick.com> [001207 11:38] wrote:
>> For the third time in a week, I got the following message when I
>> tried to run ps on my 5.X system:
>> 
>>      proc size mismatch (39776 total, 1136 chunks)
>> 
>> This message arises when the size of the proc structure changes.
>> With the current SMP development, the proc structure changes at
>> a very high rate of speed. The current kinfo_proc interface used
>> between the kernel and user processes is built from two pieces:
>> 
>>      struct kinfo_proc {
>>              struct proc kp_proc;
>>              struct eproc kp_eproc;
>>      }

[ snip ]

> I completely agree that should be done.  My suggestion is to
> completely rip out and kernel structs being passed through
> this interface, the reason is that we will need mutexes in
> a lot of them and we don't want to export that to userland.

He is, he's just bulking up the eproc that gets created in fill_eproc()
so that proc doesn't need to be exported at all.  It sounds like an
excellent and noteworth goal, esp. since the KSE work is going to
make this even more bizarre and confusing. :)

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 13:37:25 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 13:37:22 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (fw2.aub.dk [195.24.1.195])
	by hub.freebsd.org (Postfix) with ESMTP id EB1D537B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 13:37:21 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id eB7LbBL92763;
	Thu, 7 Dec 2000 22:37:12 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Kirk McKusick <mckusick@mckusick.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Getting Kernel Process Information 
In-Reply-To: Your message of "Thu, 07 Dec 2000 11:38:11 PST."
             <200012071938.LAA03622@beastie.mckusick.com> 
Date: Thu, 07 Dec 2000 22:37:11 +0100
Message-ID: <92761.976225031@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <200012071938.LAA03622@beastie.mckusick.com>, Kirk McKusick writes:

>I propose to change the kinfo_proc structure. The new
>kinfo_proc structure will contain only the stylized `extended'
>proc structure which will be augmented with the twenty
>fields that are actually referenced from the proc structure
>by user processes.

Yes!

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 13:47: 8 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 13:47:06 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from cs.utep.edu (mail.cs.utep.edu [129.108.5.3])
	by hub.freebsd.org (Postfix) with ESMTP id 1401337B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 13:47:04 -0800 (PST)
Received: from gecko (gecko [129.108.5.51])
	by cs.utep.edu (8.10.1/8.10.1) with ESMTP id eB7LkWn25281;
	Thu, 7 Dec 2000 14:46:32 -0700 (MST)
Date: Thu, 7 Dec 2000 14:46:32 -0700 (MST)
From: <janb@cs.utep.edu>
X-Sender:  <janb@gecko>
To: Kirk McKusick <mckusick@mckusick.com>
Cc: <arch@FreeBSD.ORG>
Subject: Re: Getting Kernel Process Information
In-Reply-To: <200012071938.LAA03622@beastie.mckusick.com>
Message-ID: <Pine.GSO.4.30.0012071445580.12857-100000@gecko>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> to user processes. I recognize that there are better ways
> to handle these issues. I am just trying to make an

What are some of the better ways of handling the issue?


JAn


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 13:54:29 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 13:54:26 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9339A37B401; Thu,  7 Dec 2000 13:54:24 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7LqZx24603;
	Thu, 7 Dec 2000 21:52:35 GMT
	(envelope-from brian@lan.awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7Lt7G51311;
	Thu, 7 Dec 2000 21:55:07 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: Daniel Eischen <eischen@vigrid.com>
Cc: Brian Somers <brian@Awfulhak.org>,
	"Jacques A. Vidrine" <n@nectar.com>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG,
	brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: Message from Daniel Eischen <eischen@vigrid.com> 
   of "Wed, 06 Dec 2000 22:03:25 EST." <Pine.SUN.3.91.1001206213644.25911A-100000@pcnet1.pcnet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 07 Dec 2000 21:55:07 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> On Wed, 6 Dec 2000, Brian Somers wrote:
> > > On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote:
> > > > I was just [re]thinking about this.  When we get libpthread (work
> > > > has just started on this), then libc_r will eventually go away.
> > > > It's not clear yet whether libpthread will exist as a separate
> > > > entity or whether it will evolve from libc_r.  
> > > 
> > > For the ignorant (me), what is/will be the difference between libc_r and
> > > libpthread? 
> > 
> > And me !
> 
> OK, libc_r is libc + threads; an application can't be linked to both
> libc_r and libc.  libpthread is just the thread routines (at least
> those that aren't included in libc) and _is_ linked with libc.  When
> you have libpthread, the gcc option "-pthread" goes away (which we use
> to link to libc_r and prevent linking to libc), and you link with 
> "-lpthread".  In theory, libc_r could be an archive of libc and
> libpthread.
> 
> We may want to keep libc_r around for a while for compatibility
> reasons (without moving it to compat).  But at some point, libc_r
> will cease to be built the way it is currently being built (to
> include libc).  All the _THREAD_SAFE checks will be removed from
> libc.  Instead, libc will contain stub routines for the needed
> lock operations.  These will be weak symbols that will be overloaded
> with (non-weak symbol) routines of the same name in libpthread.
> When libpthread isn't linked in, then the null stub routines
> will be invoked.  If libpthread is linked in, then the real lock
> routines will be called.
> 
> > Besides, can't we put libpthread in libc_r's place when it goes away ?
> 
> Yes, but it can't be used (linked to) the same way nor named the
> same.
> 
> I guess my point is that applications in our base system that
> require threads will need !NOLIBC_R || !NOLIBPTHREAD.  And NOLIBC_R
> will eventually become the default some time after libpthread gets
> integrated.
> 
> It's a little confusing, but am I making sense?

Yes.

I'd tend to just say that we do it in these stages:

1.  Remove NOLIBC_R
2.  Eventually introduce libpthread
3.  Change all Makefiles that say -pthread to say -lpthread
4.  Blow away libc_r

With whatever gap is required between each step.  We *could* replace 
item 1 with ``don't build -pthread programs if NOLIBC_R'' and change 
that to NOLIBPTHREAD when item 3 is done, but I'd say it's better to 
encourage threads and not give these options.

> -- 
> Dan Eischen

-- 
Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 13:55: 5 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 13:55:03 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from beastie.mckusick.com (tserver.conference.usenix.org [209.179.127.3])
	by hub.freebsd.org (Postfix) with ESMTP id 9A75F37B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 13:55:02 -0800 (PST)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id NAA04017;
	Thu, 7 Dec 2000 13:54:43 -0800 (PST)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200012072154.NAA04017@beastie.mckusick.com>
To: janb@cs.utep.edu
Subject: Re: Getting Kernel Process Information 
Cc: arch@FreeBSD.ORG
In-Reply-To: Your message of "Thu, 07 Dec 2000 14:46:32 MST."
             <Pine.GSO.4.30.0012071445580.12857-100000@gecko> 
Date: Thu, 07 Dec 2000 13:54:43 -0800
From: Kirk McKusick <mckusick@mckusick.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

	From: janb@cs.utep.edu
	Date: Thu, 7 Dec 2000 14:46:32 -0700 (MST)
	To: Kirk McKusick <mckusick@mckusick.com>
	cc: <arch@FreeBSD.ORG>
	Subject: Re: Getting Kernel Process Information
	In-Reply-To: <200012071938.LAA03622@beastie.mckusick.com>

	> to user processes. I recognize that there are better ways
	> to handle these issues. I am just trying to make an

	What are some of the better ways of handling the issue?

	JAn

See Terry Lambert's commentary on the exporting of the ucred
structure that appeared about a week ago on this list for a
good overview of the alternatives.

	Kirk McKusick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 13:59:42 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 13:59:41 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from hub.lovett.com (hub.lovett.com [216.60.121.161])
	by hub.freebsd.org (Postfix) with ESMTP id 7D49F37B400
	for <freebsd-arch@freebsd.org>; Thu,  7 Dec 2000 13:59:40 -0800 (PST)
Received: from ade by hub.lovett.com with local (Exim 3.16 #1)
	id 14494U-000EIF-00; Thu, 07 Dec 2000 15:59:22 -0600
Date: Thu, 7 Dec 2000 15:59:22 -0600
From: Ade Lovett <ade@FreeBSD.org>
To: Brian Somers <brian@Awfulhak.org>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: Threads in the base system
Message-ID: <20001207155922.I46011@FreeBSD.org>
References: <eischen@vigrid.com> <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org>; from brian@Awfulhak.org on Thu, Dec 07, 2000 at 09:55:07PM +0000
Sender: ade@lovett.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, Dec 07, 2000 at 09:55:07PM +0000, Brian Somers wrote:
> 1.  Remove NOLIBC_R
> 2.  Eventually introduce libpthread
> 3.  Change all Makefiles that say -pthread to say -lpthread
> 4.  Blow away libc_r

With an OSVERSION bump at stage 3, so that the whole slew of
ports that currently mangle -lpthread to -pthread can DTRT
between 4.x and 5.x

Please? :)

-aDe

-- 
Ade Lovett, Austin, TX.			ade@FreeBSD.org
FreeBSD: The Power to Serve		http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 14: 1:45 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 14:01:43 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id 41F8E37B400; Thu,  7 Dec 2000 14:01:43 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id RAA08109;
	Thu, 7 Dec 2000 17:01:19 -0500 (EST)
Date: Thu, 7 Dec 2000 17:01:18 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Brian Somers <brian@Awfulhak.org>
Cc: Brian Somers <brian@Awfulhak.org>,
	"Jacques A. Vidrine" <n@nectar.com>,
	Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG,
	brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org>
Message-ID: <Pine.SUN.3.91.1001207165909.7490A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 7 Dec 2000, Brian Somers wrote:
> I'd tend to just say that we do it in these stages:
> 
> 1.  Remove NOLIBC_R
> 2.  Eventually introduce libpthread
> 3.  Change all Makefiles that say -pthread to say -lpthread
> 4.  Blow away libc_r
> 
> With whatever gap is required between each step.  We *could* replace 
> item 1 with ``don't build -pthread programs if NOLIBC_R'' and change 
> that to NOLIBPTHREAD when item 3 is done, but I'd say it's better to 
> encourage threads and not give these options.

OK, that's fine with me.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 15:23:46 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 15:23:44 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252])
	by hub.freebsd.org (Postfix) with ESMTP
	id A42A937B400; Thu,  7 Dec 2000 15:23:43 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7NM0x25292;
	Thu, 7 Dec 2000 23:22:00 GMT
	(envelope-from brian@lan.awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7NOVG52190;
	Thu, 7 Dec 2000 23:24:31 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200012072324.eB7NOVG52190@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: Ade Lovett <ade@FreeBSD.org>
Cc: Brian Somers <brian@Awfulhak.org>, freebsd-arch@FreeBSD.org,
	brian@Awfulhak.org
Subject: Re: Threads in the base system 
In-Reply-To: Message from Ade Lovett <ade@FreeBSD.org> 
   of "Thu, 07 Dec 2000 15:59:22 CST." <20001207155922.I46011@FreeBSD.org> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 07 Dec 2000 23:24:31 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> On Thu, Dec 07, 2000 at 09:55:07PM +0000, Brian Somers wrote:
> > 1.  Remove NOLIBC_R
> > 2.  Eventually introduce libpthread
> > 3.  Change all Makefiles that say -pthread to say -lpthread
> > 4.  Blow away libc_r
> 
> With an OSVERSION bump at stage 3, so that the whole slew of
> ports that currently mangle -lpthread to -pthread can DTRT
> between 4.x and 5.x
> 
> Please? :)

I agree, maybe with another version bump at stage 2 and 4 too 
(version bumps are cheap).  It would also be nice if the ports 
were smart enough to probe for -lpthread's existence and DTRT on that 
basis.

> -aDe
> 
> -- 
> Ade Lovett, Austin, TX.			ade@FreeBSD.org
> FreeBSD: The Power to Serve		http://www.FreeBSD.org/

-- 
Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 19:35:52 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 19:35:50 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
	by hub.freebsd.org (Postfix) with ESMTP id 9E8BD37B400
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 19:35:42 -0800 (PST)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.9.3/8.9.3) id UAA15393;
	Thu, 7 Dec 2000 20:31:27 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp04.primenet.com, id smtpdAAAB0aW.D; Thu Dec  7 20:31:23 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id UAA02960;
	Thu, 7 Dec 2000 20:35:26 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012080335.UAA02960@usr08.primenet.com>
Subject: Re: Getting Kernel Process Information
To: bright@wintelcom.net (Alfred Perlstein)
Date: Fri, 8 Dec 2000 03:35:26 +0000 (GMT)
Cc: mckusick@mckusick.com (Kirk McKusick), arch@FreeBSD.ORG
In-Reply-To: <20001207115616.V16205@fw.wintelcom.net> from "Alfred Perlstein" at Dec 07, 2000 11:56:16 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr08.primenet.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

[ ... reorg of proc structure to keep it from being a PITA ... ]

> I completely agree that should be done.  My suggestion is to
> completely rip out and kernel structs being passed through
> this interface, the reason is that we will need mutexes in
> a lot of them and we don't want to export that to userland.

Let me remind you that copying data out of /dev/kmem into user
space from structures like this is inherenetly MP-unsafe.

Without holding the mutex, you can not guarantee that the
structure contents will not change out from under the user
space process while it is in the middle of copying them out.

Ignoring the obvious things, like divide-by-zero errors, this
is mostly a problem for programs trying to do list traversal,
as opposed to particular data objects (unless they contain
pointers themselves).

Right now, the BGL protects us from this.

Please do not build a soloution which will not work on MP
systems, once the BGL is removed.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 19:39:39 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 19:39:36 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 9D07637B402
	for <arch@FreeBSD.ORG>; Thu,  7 Dec 2000 19:39:34 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB83dWV29351;
	Thu, 7 Dec 2000 19:39:32 -0800 (PST)
Date: Thu, 7 Dec 2000 19:39:32 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG
Subject: Re: Getting Kernel Process Information
Message-ID: <20001207193932.F16205@fw.wintelcom.net>
References: <20001207115616.V16205@fw.wintelcom.net> <200012080335.UAA02960@usr08.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200012080335.UAA02960@usr08.primenet.com>; from tlambert@primenet.com on Fri, Dec 08, 2000 at 03:35:26AM +0000
Sender: bright@fw.wintelcom.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Terry Lambert <tlambert@primenet.com> [001207 19:35] wrote:
> [ ... reorg of proc structure to keep it from being a PITA ... ]
> 
> > I completely agree that should be done.  My suggestion is to
> > completely rip out and kernel structs being passed through
> > this interface, the reason is that we will need mutexes in
> > a lot of them and we don't want to export that to userland.
> 
> Let me remind you that copying data out of /dev/kmem into user
> space from structures like this is inherenetly MP-unsafe.
> 
> Without holding the mutex, you can not guarantee that the
> structure contents will not change out from under the user
> space process while it is in the middle of copying them out.
> 
> Ignoring the obvious things, like divide-by-zero errors, this
> is mostly a problem for programs trying to do list traversal,
> as opposed to particular data objects (unless they contain
> pointers themselves).
> 
> Right now, the BGL protects us from this.
> 
> Please do not build a soloution which will not work on MP
> systems, once the BGL is removed.

I agree with you, however Kirk's idea doesn't make this impossible,
we can later have a sysctl that (for this case) looks up and locks
the proc then copies it out in eproc (or whatever it's called)
format with proper locking.

One step at a time. :)

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Dec  7 20:57:20 2000
From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  7 20:57:19 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id CCDFD37B401
	for <arch@freebsd.org>; Thu,  7 Dec 2000 20:57:18 -0800 (PST)
Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB84utC84096;
	Fri, 8 Dec 2000 13:56:57 +0900 (JST)
Date: Fri, 08 Dec 2000 13:56:55 +0900
Message-ID: <vmlmtr8q94.wl@rina.r.dl.itc.u-tokyo.ac.jp>
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: oppermann@telehouse.ch
Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, arch@freebsd.org
Subject: Re: Even 1GB KVA is not enough, but we have no more space
In-Reply-To: In your message of "Thu, 07 Dec 2000 18:17:59 +0100"
	<3A2FC647.6EC4FFA7@telehouse.ch>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<3A2FC647.6EC4FFA7@telehouse.ch>
User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd)
Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 07 Dec 2000 18:17:59 +0100,
  Andre Oppermann <oppermann@telehouse.ch> said:

>> tanimura@stella% vmstat -z
>> 
>> ZONE            used    total   pages   mem-use
>> PIPE            4       102     -1         0/15K
>> SWAPMETA        0       0       15078      0/0K
>> tcpcb           24      35      4624      12/18K
(snip)
>> --------------------------------------------------
>> TOTAL                                   8058/25633K

Andre> Wow, that looks good! For easier than the other stuff.

Remark: -1 in pages means that you cannot allocate items from this
zone.

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8  4:13:40 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 04:13:37 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP id 20F5537B400
	for <arch@freebsd.org>; Fri,  8 Dec 2000 04:13:37 -0800 (PST)
Received: from kairo-51.budapest.interware.hu ([195.70.50.115] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 144MP7-0006vQ-00; Fri, 08 Dec 2000 13:13:33 +0100
Sender: julian@FreeBSD.ORG
Message-ID: <3A3000F6.52E9B1D9@elischer.org>
Date: Thu, 07 Dec 2000 13:28:22 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Kirk McKusick <mckusick@mckusick.com>
Cc: arch@freebsd.org
Subject: Re: Getting Kernel Process Information
References: <200012071938.LAA03622@beastie.mckusick.com>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Kirk McKusick wrote:
> 
> For the third time in a week, I got the following message when I
> tried to run ps on my 5.X system:
> 
>         proc size mismatch (39776 total, 1136 chunks)
> 
> This message arises when the size of the proc structure changes.
> With the current SMP development, the proc structure changes at
> a very high rate of speed. The current kinfo_proc interface used
> between the kernel and user processes is built from two pieces:
> 
>         struct kinfo_proc {
>                 struct proc kp_proc;
>                 struct eproc kp_eproc;
>         }
> 
> Kinfo_proc contains a copy of the kernel's proc structure
> followed by an `extended' proc structure which has lots
> of bits and pieces that have moved out of the proc structure
> or are otherwise needed. Any change to the kernel's version
> of the proc structure changes the size of the kinfo_proc
> structure and hence causes a mismatch when attempts are made
> to copy it out.
> 
> I propose to change the kinfo_proc structure. The new
> kinfo_proc structure will contain only the stylized `extended'
> proc structure which will be augmented with the twenty
> fields that are actually referenced from the proc structure
> by user processes. By taking this approach, changes to the
> proc structure will not affect the format or size of the
> kinfo_proc structure returned to user processes. The new
> `extended' proc structure will have plenty of spare fields
> added to its end so that when new fields are added to the
> proc structure that user-level processes need/want to know
> about, they can be added without changing the size of the
> exported kinfo_proc structure and thus will not require
> recompilation of the dozen or so programs that use the
> exported interface. Note that even if 200 spare bytes are
> added to the kinfo_proc structure, it will still be smaller
> than the current one.

A good idea.
I would like to add that if we get our war and split 
struct proc
into :
1/ struct proc
2/ schedulabel entity
3/ Sleepable entity
4/ (possibly a linking structure for the above)
then all this would have to change anyhow. It seems possible 
that your change might insulate us from the pain of that 
happenning.

When is the information copied from the proc structure into the
kinfo_proc stucture? 

In the case of the threaded split world we are considering
some of the numbers would be totals from the subprocesses 
schedulable entities.


> 
> Note that I am proposing to make this change only in the
> 5.X tree. I am not proposing that it be back ported to the
> 4.X tree.
> 
> I am not interested in starting a long discussion on all
> the possible alternatives for exporting kernel information
> to user processes. I recognize that there are better ways
> to handle these issues. I am just trying to make an
> incremental change that is small in scope and hopefully
> will make an annoying problem significantly less common.
> With this caveat, comments are solicited.

go for it


> 
>         Kirk McKusick
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8  6:22: 7 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 06:22:04 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from point.osg.gov.bc.ca (point.osg.gov.bc.ca [142.32.102.44])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0D90837B401; Fri,  8 Dec 2000 06:22:04 -0800 (PST)
Received: (from daemon@localhost)
	by point.osg.gov.bc.ca (8.8.7/8.8.8) id GAA24826;
	Fri, 8 Dec 2000 06:21:46 -0800
Received: from passer.osg.gov.bc.ca(142.32.110.29)
 via SMTP by point.osg.gov.bc.ca, id smtpda24823; Fri Dec  8 06:21:39 2000
Received: (from uucp@localhost)
	by passer.osg.gov.bc.ca (8.11.1/8.9.1) id eB8ELN666899;
	Fri, 8 Dec 2000 06:21:23 -0800 (PST)
Received: from cwsys9.cwsent.com(10.2.2.1), claiming to be "cwsys.cwsent.com"
 via SMTP by passer9.cwsent.com, id smtpdI66893; Fri Dec  8 06:20:46 2000
Received: (from uucp@localhost)
	by cwsys.cwsent.com (8.11.1/8.9.1) id eB8EKfN82161;
	Fri, 8 Dec 2000 06:20:41 -0800 (PST)
Message-Id: <200012081420.eB8EKfN82161@cwsys.cwsent.com>
Received: from localhost.cwsent.com(127.0.0.1), claiming to be "cwsys"
 via SMTP by localhost.cwsent.com, id smtpdq82155; Fri Dec  8 06:20:20 2000
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
Reply-To: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
From: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
X-OS: FreeBSD 4.2-RELEASE
X-Sender: cy
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: Peter Jeremy <peter.jeremy@alcatel.com.au>,
	Poul-Henning Kamp <phk@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... 
In-reply-to: Your message of "Thu, 30 Nov 2000 21:47:45 CST."
             <20001130214745.E28757@peorth.iteration.net> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 08 Dec 2000 06:20:20 -0800
Sender: cy@uumail.gov.bc.ca
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <20001130214745.E28757@peorth.iteration.net>, "Michael C . 
Wu" write
s:
> On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled:
> | On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp <phk@FreeBSD.ORG> wrote:
> | >Has anybody run a 486 or 386 under current recently ?
> |
> | X on a PRE_SMPNG 486 is painful - mouse movements no longer make
> | the X pointer move in real time.  I haven't noticed the seeding
> | issue (probably just luck).
> 
> PRE_SMPNG does not have the /dev/random seeding issue.
> 
> You actually expected X to run well on a 486? :-)
> 
> | >What is the consensus ?
> |
> | I think 386/486 remains a significant market and would not like to
> | see support dropped.  I'd go so far as to suggest that if -current
> | does drop support for the 386/486, the then-stable version will need
> | to be actively maintained indefinitely to provide continued support.
> 
> I do not really think the latest XFree86 versions were designed
> with running 386/486 in mind. 386/486 is still a market, but
> not many people try to build an embedded system with a full X
> and tools.

Interesting.  At home I use a 486DX33 as an X terminal.  As long as I 
run all of my X clients, including the window manager on my server, a 
P120, performance is quite accetable.


Regards,                       Phone:  (250)387-8437
Cy Schubert                      Fax:  (250)387-5766
Team Leader, Sun/DEC Team   Internet:  Cy.Schubert@osg.gov.bc.ca
Open Systems Group, ITSD, ISTA
Province of BC


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8  7: 9:36 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 07:09:34 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87])
	by hub.freebsd.org (Postfix) with ESMTP id B4A3B37B402
	for <arch@FreeBSD.ORG>; Fri,  8 Dec 2000 07:09:33 -0800 (PST)
Received: from newsguy.com (p60-dn01kiryunisiki.gunma.ocn.ne.jp [211.0.245.61])
	by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id AAA26083;
	Sat, 9 Dec 2000 00:09:26 +0900 (JST)
Message-ID: <3A30E21F.846E3863@newsguy.com>
Date: Fri, 08 Dec 2000 22:29:03 +0900
From: "Daniel C. Sobral" <dcs@newsguy.com>
X-Mailer: Mozilla 4.7 [en] (Win98; I)
X-Accept-Language: en,pt-BR
MIME-Version: 1.0
To: janb@cs.utep.edu
Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG
Subject: Re: Getting Kernel Process Information
References: <Pine.GSO.4.30.0012071445580.12857-100000@gecko>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

janb@cs.utep.edu wrote:
> 
> > to user processes. I recognize that there are better ways
> > to handle these issues. I am just trying to make an
> 
> What are some of the better ways of handling the issue?

Userland kobj.

-- 
Daniel C. Sobral			(8-DCS)
dcs@newsguy.com
dcs@freebsd.org
capo@the.great.underground.bsdconpiracy.org

		"The bronze landed last, which canceled that method of impartial
choice."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8  9:38:52 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 09:38:50 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from beastie.mckusick.com (tserver.conference.usenix.org [209.179.127.3])
	by hub.freebsd.org (Postfix) with ESMTP id E98BA37B400
	for <arch@FreeBSD.ORG>; Fri,  8 Dec 2000 09:38:49 -0800 (PST)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id JAA04933;
	Fri, 8 Dec 2000 09:38:47 -0800 (PST)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200012081738.JAA04933@beastie.mckusick.com>
To: Alfred Perlstein <bright@wintelcom.net>
Subject: Re: Getting Kernel Process Information 
Cc: Terry Lambert <tlambert@primenet.com>, arch@FreeBSD.ORG
In-Reply-To: Your message of "Thu, 07 Dec 2000 19:39:32 PST."
             <20001207193932.F16205@fw.wintelcom.net> 
Date: Fri, 08 Dec 2000 09:38:47 -0800
From: Kirk McKusick <mckusick@mckusick.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

	Date: Thu, 7 Dec 2000 19:39:32 -0800
	From: Alfred Perlstein <bright@wintelcom.net>
	To: Terry Lambert <tlambert@primenet.com>
	Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG
	Subject: Re: Getting Kernel Process Information

	* Terry Lambert <tlambert@primenet.com> [001207 19:35] wrote:
	> [ ... reorg of proc structure to keep it from being a PITA ... ]
	> 
	> > I completely agree that should be done.  My suggestion is to
	> > completely rip out and kernel structs being passed through
	> > this interface, the reason is that we will need mutexes in
	> > a lot of them and we don't want to export that to userland.
	> 
	> Let me remind you that copying data out of /dev/kmem into user
	> space from structures like this is inherenetly MP-unsafe.
	> 
	> Without holding the mutex, you can not guarantee that the
	> structure contents will not change out from under the user
	> space process while it is in the middle of copying them out.
	> 
	> Ignoring the obvious things, like divide-by-zero errors, this
	> is mostly a problem for programs trying to do list traversal,
	> as opposed to particular data objects (unless they contain
	> pointers themselves).
	> 
	> Right now, the BGL protects us from this.
	> 
	> Please do not build a soloution which will not work on MP
	> systems, once the BGL is removed.

	I agree with you, however Kirk's idea doesn't make this impossible,
	we can later have a sysctl that (for this case) looks up and locks
	the proc then copies it out in eproc (or whatever it's called)
	format with proper locking.

	One step at a time. :)

	-- 
	-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
	"I have the heart of a child; I keep it in a jar on my desk."

We already use sysctl to get the proc information out of the kernel.
The traversal of the proc entry to gather up the information is done
in kern/kern_proc.c function fill_kinfo_proc. So, any and all locking
that needs to be done can be done there. The libkvm code uses sysctl
to get the desired proc entries when running on a live system. It also
knows how to grub through a crash dump to essentially duplicate the
fill_kinfo_proc, but that is not intended to be used on live kernels.

	Kirk


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8  9:53:31 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 09:53:30 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rover.village.org (rover.village.org [204.144.255.66])
	by hub.freebsd.org (Postfix) with ESMTP
	id E0A7537B400; Fri,  8 Dec 2000 09:53:25 -0800 (PST)
Received: from harmony.village.org (harmony.village.org [10.0.0.6])
	by rover.village.org (8.11.0/8.11.0) with ESMTP id eB8HrKs51907;
	Fri, 8 Dec 2000 10:53:20 -0700 (MST)
	(envelope-from imp@harmony.village.org)
Received: from harmony.village.org (localhost.village.org [127.0.0.1]) by harmony.village.org (8.9.3/8.8.3) with ESMTP id KAA14232; Fri, 8 Dec 2000 10:53:20 -0700 (MST)
Message-Id: <200012081753.KAA14232@harmony.village.org>
To: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... 
Cc: "Michael C . Wu" <keichii@peorth.iteration.net>,
	Peter Jeremy <peter.jeremy@alcatel.com.au>,
	Poul-Henning Kamp <phk@FreeBSD.ORG>, arch@FreeBSD.ORG
In-reply-to: Your message of "Fri, 08 Dec 2000 06:20:20 PST."
		<200012081420.eB8EKfN82161@cwsys.cwsent.com> 
References: <200012081420.eB8EKfN82161@cwsys.cwsent.com>  
Date: Fri, 08 Dec 2000 10:53:20 -0700
From: Warner Losh <imp@village.org>
Sender: imp@harmony.village.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <200012081420.eB8EKfN82161@cwsys.cwsent.com> Cy Schubert - ITSD Open Systems Group writes:
: > I do not really think the latest XFree86 versions were designed
: > with running 386/486 in mind. 386/486 is still a market, but
: > not many people try to build an embedded system with a full X
: > and tools.
: 
: Interesting.  At home I use a 486DX33 as an X terminal.  As long as I 
: run all of my X clients, including the window manager on my server, a 
: P120, performance is quite accetable.

We run X on our embedded product on 486 class machines.  It still
works fairly well.

Warner


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8 11: 3:22 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 11:03:21 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP id 18CDC37B401
	for <arch@FreeBSD.ORG>; Fri,  8 Dec 2000 11:03:21 -0800 (PST)
Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eB8J2B760003;
	Fri, 8 Dec 2000 11:02:11 -0800 (PST)
	(envelope-from jhb@foo.osd.bsdi.com)
Received: (from jhb@localhost)
	by foo.osd.bsdi.com (8.11.1/8.11.0) id eB8J2DU75205;
	Fri, 8 Dec 2000 11:02:13 -0800 (PST)
	(envelope-from jhb)
Message-ID: <XFMail.001208110213.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <3A3000F6.52E9B1D9@elischer.org>
Date: Fri, 08 Dec 2000 11:02:13 -0800 (PST)
Organization: BSD, Inc.
From: John Baldwin <jhb@FreeBSD.ORG>
To: Julian Elischer <julian@elischer.org>
Subject: Re: Getting Kernel Process Information
Cc: arch@FreeBSD.ORG, Kirk McKusick <mckusick@mckusick.com>
Sender: jhb@foo.osd.bsdi.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> A good idea.
> I would like to add that if we get our war and split 
> struct proc
> into :
> 1/ struct proc
> 2/ schedulabel entity
> 3/ Sleepable entity
> 4/ (possibly a linking structure for the above)
> then all this would have to change anyhow. It seems possible 
> that your change might insulate us from the pain of that 
> happenning.

It will do this (insulation).

> When is the information copied from the proc structure into the
> kinfo_proc stucture?

fill_eproc().  You can at that time decide what you need to stuff in
each eproc structure.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  8 21:25:19 2000
From owner-freebsd-arch@FreeBSD.ORG  Fri Dec  8 21:25:17 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id 07CAA37B400
	for <arch@FreeBSD.ORG>; Fri,  8 Dec 2000 21:25:17 -0800 (PST)
Received: (from uucp@localhost)
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with UUCP id eB95PES21947;
	Sat, 9 Dec 2000 14:25:14 +0900 (JST)
Received: from silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W) with ESMTP id eB95O6t42800;
	Sat, 9 Dec 2000 14:24:07 +0900 (JST)
Date: Sat, 09 Dec 2000 14:24:06 +0900
Message-ID: <86elzi88w9.wl@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp>
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: dillon@earth.backplane.com
Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, bright@wintelcom.net,
	arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space
In-Reply-To: In your message of "Thu, 7 Dec 2000 11:52:33 -0800 (PST)"
	<200012071952.eB7JqXm11711@earth.backplane.com>
References: <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<vmr93k8tqe.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207013611.O16205@fw.wintelcom.net>
	<vmpuj48svm.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<20001207015651.P16205@fw.wintelcom.net>
	<vmofyo8ran.wl@rina.r.dl.itc.u-tokyo.ac.jp>
	<200012071952.eB7JqXm11711@earth.backplane.com>
User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd)
Organization: Carrots
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 7 Dec 2000 11:52:33 -0800 (PST),
  Matt Dillon <dillon@earth.backplane.com> said:

Matt>     In anycase, your stopgap patch seems reasonable in concept until we
Matt>     can come up with a better solution.

And the saga continues. After regulating the size of struct swblock,
ffs_vget() failed to allocalte a new vnode. At the time the PowerEdge
failed, the kernel held around 197K vnodes, which is as large as
46MB. This time I reduced the size of kmem_map, the pool of malloc(9).

Although we reserve at most 200MB + mbufs + mbuf clusters for
kmem_map, most of the space is not likely to be in use. For example,
the bottom line of vmstat -m on the PowerEdge said that only 32MB out
of 200MB was used by malloc(9).

Counting the actual usage, 100MB should be enough for the malloc(9)
pool. Since malloc(9) always wire down allocated pages, you should
allocate memory by malloc(9) only if the size of memory to allocate is
constant; otherwise you would always have to consider how much wirable
pages a user has. Hence it makes no sense to simply scale up the
malloc(9) pool size only to waste free entries in kmem_map
unreusable. Memory for the device driver framework is a good example
of malloc(9) usage because we are not likely to scale up the number of
cards on a motherboard in 5, 10 or 20 years.

Scaling up is something more than just scaling up parameters. If you
see a ceiling, you have to watch out your head.

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Dec  9 12: 3: 2 2000
From owner-freebsd-arch@FreeBSD.ORG  Sat Dec  9 12:03:00 2000
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP id 7295337B400
	for <arch@FreeBSD.ORG>; Sat,  9 Dec 2000 12:02:59 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id LAA10626;
	Sat, 9 Dec 2000 11:54:39 -0800 (PST)
Message-Id: <200012091954.LAA10626@implode.root.com>
To: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
Cc: arch@FreeBSD.ORG
Subject: Re: Even 1GB KVA is not enough, but we have no more space 
In-reply-to: Your message of "Thu, 07 Dec 2000 18:21:04 +0900."
             <vmsno08u4f.wl@rina.r.dl.itc.u-tokyo.ac.jp> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Sat, 09 Dec 2000 11:54:39 -0800
Sender: dg@implode.root.com
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>extend our KVA any further. (I understand that 1GB is the upper limit
>of KVA on i386, am I right?)

   No, there isn't any limit, except for nearly all 4GB of the virtual memory
for the kernel. freesoftware.com and cdrom.com both run with 2GB of KVA space.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message