From owner-freebsd-arch Sun Dec 3 11:55:53 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 11:55:51 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 7B9D037B400 for ; Sun, 3 Dec 2000 11:55:51 -0800 (PST) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id MAA26115; Sun, 3 Dec 2000 12:43:44 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp02.primenet.com, id smtpdAAA7NaG.Y; Sun Dec 3 12:43:37 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id MAA29745; Sun, 3 Dec 2000 12:48:05 -0700 (MST) From: Terry Lambert Message-Id: <200012031948.MAA29745@usr05.primenet.com> Subject: Re: Modifying FILE to add lock To: arch@FreeBSD.ORG Date: Sun, 3 Dec 2000 19:48:05 +0000 (GMT) Cc: marcel@cup.hp.com In-Reply-To: <200012011811.eB1IBqY01763@vashon.polstra.com> from "John Polstra" at Dec 01, 2000 10:11:52 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr05.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The #1 biggest hassle with the Modula-3 stuff is that it has > Modula-3 versions of all of the system structures, and they have to > match exactly for things to work. Some day I swear I'm going to > work out a way to generate the M3 versions automatically from the > header files in /usr/include ... It's reasonable to think about a description language from which C/C++, Modula, Ada, Perl, and other header file types could be post-processed from. Perl already has a kludge for generating Perl constructs from C/C++ constructs, so if you wanted to kludge it instead, that would be a reasonable starting point... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 12: 6:25 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 12:06:23 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id C2BAA37B400; Sun, 3 Dec 2000 12:06:22 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id NAA07008; Sun, 3 Dec 2000 13:02:59 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAA5oaWQn; Sun Dec 3 13:02:58 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id NAA00585; Sun, 3 Dec 2000 13:06:02 -0700 (MST) From: Terry Lambert Message-Id: <200012032006.NAA00585@usr05.primenet.com> Subject: Re: zero copy code review To: dg@root.com Date: Sun, 3 Dec 2000 20:06:01 +0000 (GMT) Cc: gallatin@cs.duke.edu (Andrew Gallatin), bmilekic@technokratis.com (Bosko Milekic), ken@kdm.org (Kenneth D. Merry), arch@FreeBSD.ORG, alfred@FreeBSD.ORG In-Reply-To: <200012012326.PAA14154@implode.root.com> from "David Greenman" at Dec 01, 2000 03:26:19 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr05.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > In your code, you do deal with the possibility of the MGETHDR > > > returning NULL (you check for it) and you set ENOBUFS in that case and > > > jump to the "errorpath" label. But, before using MGETHDR, you allocate an > > > sf_buf (in sf) and it just so happens that the code beyond "errorpath" > > > does not take care of freeing the sf_buf you allocated before even > > > trying to allocate the mbuf. > > > >I see your point. This was copied, (bug for bug ;-), from sendfile itself. > >Look at line 1700 or so of kern/uipc_syscalls.c.. This bug should > >probaby be fixed there too.. > > Oops. The original assumption (and code that I wrote) was that M_WAIT > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and > as you mentioned, the code added in rev 1.65 that now checks for it in > sendfile doesn't do complete cleanup in this case. It definately should > be fixed so that the sf_buf is freed as well. There's a real easy fix for this: m_get_not_broken( flag, type) int flag, type; { struct mbuf *m; do { m = m_get( flag, type); } while( flag == M_WAIT && m == NULL); return( m); } I think the idea that the M_WAIT flag should be broken so that it can be safely used in interrupt mode is dumb. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 12:12:42 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 12:12:41 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 039FB37B400; Sun, 3 Dec 2000 12:12:41 -0800 (PST) Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71]) by feral.com (8.9.3/8.9.3) with ESMTP id MAA13403; Sun, 3 Dec 2000 12:12:35 -0800 Date: Sun, 3 Dec 2000 12:12:30 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: Terry Lambert Cc: dg@root.com, Andrew Gallatin , Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <200012032006.NAA00585@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I think the idea that the M_WAIT flag should be broken so that > it can be safely used in interrupt mode is dumb. d'accord. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 13: 2:26 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 13:02:24 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id D527537B400 for ; Sun, 3 Dec 2000 13:02:23 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id PAA28978; Sun, 3 Dec 2000 15:52:02 -0500 (EST) Date: Sun, 3 Dec 2000 15:52:02 -0500 (EST) From: Daniel Eischen To: Terry Lambert Cc: arch@FreeBSD.ORG, marcel@cup.hp.com Subject: Re: Modifying FILE to add lock In-Reply-To: <200012031948.MAA29745@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 3 Dec 2000, Terry Lambert wrote: > > The #1 biggest hassle with the Modula-3 stuff is that it has > > Modula-3 versions of all of the system structures, and they have to > > match exactly for things to work. Some day I swear I'm going to > > work out a way to generate the M3 versions automatically from the > > header files in /usr/include ... > > It's reasonable to think about a description language from > which C/C++, Modula, Ada, Perl, and other header file types > could be post-processed from. > > Perl already has a kludge for generating Perl constructs from > C/C++ constructs, so if you wanted to kludge it instead, that > would be a reasonable starting point... Having done the Ada port, I can say that the only system structures that cause problems are those that can't be/aren't created by system calls/library routines. Those are the _only_ things that _should_ cause problems; if there are others, then the implementation (of the affected language/application) is flawed. The signal set changes caused a big impact because they (signal sets) aren't created by library routines, and they are parameters in some very common routines/syscalls as well as being part of struct sigaction, jmp_buf, and ucontext_t (which are also interfaced to by multi-threaded languages). I'd also imagine that struct timezone or timeval changes to have similar impact. But back to FILE and DIR changes, I seriously doubt that any of our language ports would be affected by these being changed. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 13:19:52 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 13:19:50 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226]) by hub.freebsd.org (Postfix) with ESMTP id 4983837B400 for ; Sun, 3 Dec 2000 13:19:50 -0800 (PST) Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30]) by palrel3.hp.com (Postfix) with ESMTP id 91CDE44F; Sun, 3 Dec 2000 13:19:49 -0800 (PST) Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180]) by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id NAA29898; Sun, 3 Dec 2000 13:19:49 -0800 (PST) Sender: marcel@cup.hp.com Message-ID: <3A2AB8F4.DE04AD9D@cup.hp.com> Date: Sun, 03 Dec 2000 13:19:48 -0800 From: Marcel Moolenaar Organization: Hewlett-Packard X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Daniel Eischen Cc: arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Daniel Eischen wrote: > > But back to FILE and DIR changes, I seriously doubt that any of our > language ports would be affected by these being changed. To conclude: o Appending the new field has the least impact, o Any impact is expected to be marginal or trivially fixed. Go for it! -- Marcel Moolenaar mail: marcel@cup.hp.com / marcel@FreeBSD.org tel: (408) 447-4222 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 13:25:46 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 13:25:44 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from field.videotron.net (field.videotron.net [205.151.222.108]) by hub.freebsd.org (Postfix) with ESMTP id B1CF437B400 for ; Sun, 3 Dec 2000 13:25:43 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G5000NB5GUP0H@field.videotron.net> for arch@FreeBSD.ORG; Sun, 3 Dec 2000 16:25:38 -0500 (EST) Date: Sun, 03 Dec 2000 16:26:26 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <200012032006.NAA00585@usr05.primenet.com> To: Terry Lambert Cc: dg@root.com, Andrew Gallatin , "Kenneth D. Merry" , arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 3 Dec 2000, Terry Lambert wrote: [...] > There's a real easy fix for this: > > > m_get_not_broken( flag, type) > int flag, type; > { > struct mbuf *m; > > do { > m = m_get( flag, type); > } while( flag == M_WAIT && m == NULL); > > return( m); > } > > I think the idea that the M_WAIT flag should be broken so that > it can be safely used in interrupt mode is dumb. I'm not sure I understand what you're putting forward with the above comment, specifically what you're referring to when you say "broken." Are you trying to say that "M_WAIT is broken because it doesn't wait forever?" If that's what you're trying to say, the explanation is simple. If you "wait indefinetely," or spin as you're doing above, then what you're doing is pretty much useless. Let me explain why and how you can test my hypothesis without changing a single line of code. First of all, the amount of time spent waiting with M_WAIT is completely tunable with the kern.ipc.mbuf_wait sysctl. If you want to wait indefinetely, just set it to 0. Second of all, the default value is 32. The reason for that is that it is typically sufficient if you're going to get anything in the first place. Basically, if you're short on mbufs and you're hoping one will be freed then, in the general case (I've established this through various testing), on a relatively generic machine, with moderately heavy network load, you're going to get one back within the 32 ticks. If it isn't sufficient, you can tune from 32 to 64 to whatever it is you feel is appropriate. The only case where you won't be getting back what you need in the default time, usually, is when the main mbuf consumer is a process which is, in effect, sucking up all resources (allocating them for itself) -- think local DoS. In that case, even after you wait 32 ticks, 64 ticks, or infinity ticks, you're likely to not get anything and even if you happen to get ONE mbuf, then it's even worse 'cause all that's happened is that the offending process has swallowed yet another mbuf and prevented the other (essential) system components to allocate. So, in the latter case, if you have a non-offending process calling sendfile(2) and trying to allocate an mbuf, it can wait all day if you want it to, and it will never get anything until the offending process is killed. So, better to have the process return from the kernel and deal with the temporary failure. The same goes for the offending process that will keep exhausting mbufs in a tight loop; think of what would happen once the offending process hits the hard limit and exhausts mbufs. It will just be stuck waiting/looping indefinetely in the kernel and will not be killable because it will not be able to receive any signals posted to it until it returns from the kernel. Basically, what I'm telling you is: M_WAIT behavior is not broken in FreeBSD, it is entirely tunable and it is better, in the general case, to NOT have M_WAIT mean 'wait indefinetely.' > > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 14:18:39 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 14:18:37 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id E3CCA37B400 for ; Sun, 3 Dec 2000 14:18:36 -0800 (PST) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id PAA27026; Sun, 3 Dec 2000 15:16:23 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpdAAAx6ayW0; Sun Dec 3 15:16:16 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id PAA03289; Sun, 3 Dec 2000 15:18:26 -0700 (MST) From: Terry Lambert Message-Id: <200012032218.PAA03289@usr05.primenet.com> Subject: Re: zero copy code review To: bmilekic@technokratis.com (Bosko Milekic) Date: Sun, 3 Dec 2000 22:18:26 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), dg@root.com, gallatin@cs.duke.edu (Andrew Gallatin), ken@kdm.org (Kenneth D. Merry), arch@FreeBSD.ORG In-Reply-To: from "Bosko Milekic" at Dec 03, 2000 04:26:26 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr05.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I think the idea that the M_WAIT flag should be broken so that > > it can be safely used in interrupt mode is dumb. > > I'm not sure I understand what you're putting forward with the above > comment, specifically what you're referring to when you say "broken." > Are you trying to say that "M_WAIT is broken because it doesn't wait > forever?" If that's what you're trying to say, the explanation is simple. > If you "wait indefinetely," or spin as you're doing above, then what > you're doing is pretty much useless. Let me explain why and how you can > test my hypothesis without changing a single line of code. [ ... local DOS ... ] I really don't buy a probability defense. If a probability defense were acceptable, then not checking for a NULL return, and eating the panic that results is also acceptable. The problem with this theory is that "have the the [non-offending] process return from the kernel and deal with the temporary failure" presumes that there is a correct way to work around the failure in user space. I would maintain that the failure would be persistant, since this does nothing to silence the DOS attack, and there is nothing that a user space program can do, except to retry, and get all the way down the code path to the same place that it was before. It seems to me that this is just a case of how big you want to make your retry loop, not one of whether or not there will be a retry loop. As an example of a user space DOS that can result in this, if you have a FreeBSD machine which has an interface that is the default route to the network, and a second interface that is the local network, and the interface which is the default route is "down" (as in a PPP interface with the modem turned off), you can start a "ping" of an external machine (e.g. 16.1.0.2) which will eventually consume all of the mbufs with ICMP echo datagrams which can't be transmitted. At this point, machines on the local network cannot log into the gateway machine over the network to correct the problem. I would argue that this level of congestion should be proactively prohibited from occurring in the first place; the most likely way to do this correctly is to start "dropping" the oldest datagrams, NOT returning "NULL" to allocations made on behalf od telnetd or sshd from the local interface. In other words, if this is a fear-response for a local DOS, then there are better ways of achieving the same result, without still locking up networking. --- > Basically, what I'm telling you is: M_WAIT behavior is not broken in > FreeBSD, it is entirely tunable and it is better, in the general case, to > NOT have M_WAIT mean 'wait indefinetely.' As a general bone of contention, if the thing _doesn't_ wait, it shouldn't be called M_WAIT, it should be called M_TRY_HARDER or something that indicates that the default behaviour has been altered, but in fact the routine will not be waiting around until it is successful, like all of the other _WAIT flags imply. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 15:24:53 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 15:24:51 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 9ECD937B400 for ; Sun, 3 Dec 2000 15:24:50 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G5000D5SMDBVT@falla.videotron.net> for arch@FreeBSD.ORG; Sun, 3 Dec 2000 18:24:48 -0500 (EST) Date: Sun, 03 Dec 2000 18:25:36 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <200012032218.PAA03289@usr05.primenet.com> To: Terry Lambert Cc: arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 3 Dec 2000, Terry Lambert wrote: > > > I think the idea that the M_WAIT flag should be broken so that > > > it can be safely used in interrupt mode is dumb. > > > > I'm not sure I understand what you're putting forward with the above > > comment, specifically what you're referring to when you say "broken." > > Are you trying to say that "M_WAIT is broken because it doesn't wait > > forever?" If that's what you're trying to say, the explanation is simple. > > If you "wait indefinetely," or spin as you're doing above, then what > > you're doing is pretty much useless. Let me explain why and how you can > > test my hypothesis without changing a single line of code. > > [ ... local DOS ... ] > > I really don't buy a probability defense. If a probability defense > were acceptable, then not checking for a NULL return, and eating > the panic that results is also acceptable. It's not a "probability defense." It's not a "defense." It's just a "don't act the worst way possible when we have an attack." And you haven't said at all why waiting indefinetely is better than not, especially in the problematic situation I brought up. > The problem with this theory is that "have the the [non-offending] > process return from the kernel and deal with the temporary failure" > presumes that there is a correct way to work around the failure in > user space. No, it doesn't. But it's better for the process to sleep in user space than to be INDEFINETELY stuck in the kernel. And, in the case of an attack, it _will_ be indefinetely stuck. > I would maintain that the failure would be persistant, since this > does nothing to silence the DOS attack, and there is nothing that > a user space program can do, except to retry, and get all the way > down the code path to the same place that it was before. Right. It's not a preventive measure. But, it's much better to have it act in this manner than wait indefinetely "in the case of." > It seems to me that this is just a case of how big you want to > make your retry loop, not one of whether or not there will be a > retry loop. The retry loop is _useless_. You drop the mutex and lose priority in the wait queue when you return from m_get(). Calling again makes your chances of getting an mbuf in a shortage even less probable. If you want that behavior, just tweak your kern.ipc.mbuf_wait. [...] > I would argue that this level of congestion should be proactively > prohibited from occurring in the first place; the most likely way > to do this correctly is to start "dropping" the oldest datagrams, > NOT returning "NULL" to allocations made on behalf od telnetd or > sshd from the local interface. This is really a great block of theory. I only wish that people with such a passion to argue the methods would work in actually implementing them. > In other words, if this is a fear-response for a local DOS, > then there are better ways of achieving the same result, > without still locking up networking. It's not. It never was. It never will be. It's just better than waiting indefinetely. It still provides you with the ability to wait indefinetely, though, if you are incapable of understanding why it's better not to. > > Basically, what I'm telling you is: M_WAIT behavior is not broken in > > FreeBSD, it is entirely tunable and it is better, in the general case, to > > NOT have M_WAIT mean 'wait indefinetely.' > > As a general bone of contention, if the thing _doesn't_ wait, it > shouldn't be called M_WAIT, it should be called M_TRY_HARDER or > something that indicates that the default behaviour has been > altered, but in fact the routine will not be waiting around until > it is successful, like all of the other _WAIT flags imply. It _does_ wait, and I disagree. By that logic, why not rename all the _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you can either read the code (hey, it is free!) or read the mbuf(9) man page (now available in -CURRENT). > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 19: 9: 3 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 19:09:01 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mobile.wemm.org (adsl-64-163-195-99.dsl.snfc21.pacbell.net [64.163.195.99]) by hub.freebsd.org (Postfix) with ESMTP id 9A06A37B400; Sun, 3 Dec 2000 19:09:00 -0800 (PST) Received: from netplex.com.au (localhost [127.0.0.1]) by mobile.wemm.org (8.11.1/8.11.1) with ESMTP id eB438tD52326; Sun, 3 Dec 2000 19:08:55 -0800 (PST) (envelope-from peter@netplex.com.au) Message-Id: <200012040308.eB438tD52326@mobile.wemm.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG, dillon@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <20001129231653.A1503@panzer.kdm.org> Date: Sun, 03 Dec 2000 19:08:55 -0800 From: Peter Wemm Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG "Kenneth D. Merry" wrote: > [ -net and -current BCCed for wider coverage, this is probably best > handled on -arch ] > > I would like to request reviews of the zero copy sockets and NFS code I've > been posting about for months: > > http://people.FreeBSD.org/~ken/zero_copy Hmm.. I see one danger item: " 5.Configuration and performance tuning. There are a number of options that need to be turned on for various things to work: options ZERO_COPY_SOCKETS # Turn on zero copy send code options ENABLE_VFS_IOOPT # Turn on zero copy receive options NMBCLUSTERS=(512+512*32) # lots of mbuf clusters options TI_JUMBO_HDRSPLIT # Turn on Tigon header splitting [..] Turn on vfs.ioopt to enable zero copy receive: sysctl -w vfs.ioopt=1 " I know Matt Dillon was intending to remove the ENABLE_VFS_IOOPT code and vfs.ioopt because it is presently fundamentally broken and causes devastating userland semantics impact. For example, at it exists in the tree *right now*, if one does this: buf = malloc(PAGE_SIZE); /* malloc does page alignment here */ read(fd, buf, PAGE_SIZE); .. it would be eligible for ioopt treatment (page lending). Normally, you would have a *private* copy of the page of data. If somebody modifies the backing file, your private copy does not change. However, turning on ioopt causes it to be mmapped in with MAP_PRIVATE.. But this does **NOT** give the same semantics. Sure, if you modify the buffer yourself, you get a Copy-on-write fault and your own private page to mess with. But if somebody else modifies the file before you dirty the page then your supposedly static private copy silently changes out from underneath you because you have been loaned a mapping from the vm/buffer cache. The infrastructure to track "loaned out" pages in the vm page cache isn't present. The pages must be read-only to the kernel and DMA engines and a fault must be taken giving the kernel a chance to fully donate the orignal page to the mapping processes and generate it's own writable version. I have not read the patch extensively, but I am not sure that it is handled completely. There are a few patches to vm_fault(), but I am not sure if these are to handle the problem I described above or something else. In particular, if it is intended to handle the problem, then it seems to depend on being able to make pages unwritable by the kernel. This isn't possible on i386 cpus (only 486 and later). I did not see any busmaster DMA checking either, but I could have missed it.. What about drivers that DMA to pages mapped into KVM without checking writability (and hence COW)? Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 19:49:51 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 19:49:48 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 188BC37B400; Sun, 3 Dec 2000 19:49:47 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id WAA22467; Sun, 3 Dec 2000 22:49:45 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB43njb13857; Sun, 3 Dec 2000 22:49:45 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Sun, 3 Dec 2000 22:49:44 -0500 (EST) To: Peter Wemm Cc: "Kenneth D. Merry" , arch@FreeBSD.ORG, dillon@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <200012040308.eB438tD52326@mobile.wemm.org> References: <20001129231653.A1503@panzer.kdm.org> <200012040308.eB438tD52326@mobile.wemm.org> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14891.4047.626648.658103@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Peter Wemm writes: <...> > Hmm.. I see one danger item: <..> > [..] > Turn on vfs.ioopt to enable zero copy receive: > sysctl -w vfs.ioopt=1 > " This was a convenient sysctl to tie the zero-copy receive code to in early prototyping. It has nothing to do with the filesystem aspects of vfs_ioopt, which makes it confusing to the reader. This should probably be ripped out or changed to depend only on the zero-copy sockets sysctl. Rather than loaning out pages, the zero-copy receive code does full page-flipping, mapping the kernel's page into the receiving process and freeing the page the user process was receiving into. This is possible because, unlike pages in the buffer cache, there is no need to keep around data received on a socket. <... objections to vfs_ioopt deleted...> > I have not read the patch extensively, but I am not sure that it is handled > completely. There are a few patches to vm_fault(), but I am not sure if > these are to handle the problem I described above or something else. In > particular, if it is intended to handle the problem, then it seems to depend > on being able to make pages unwritable by the kernel. This isn't possible > on i386 cpus (only 486 and later). I did not see any busmaster DMA checking The patches are to support making pages sent via zero-copy sockets COW for the user process which sent them (until they are acknowledged and freed). We do not make anything COW for the kernel. > either, but I could have missed it.. What about drivers that DMA to pages > mapped into KVM without checking writability (and hence COW)? This is a good point. But I cannot think of any circumstance where a driver would be DMA'ing directly to a user owned page (with the exception of a vm fault, but this is impossible because the pages are resident prior to setting up the send and are wired for the duration of the send). Thanks for the input. I'm glad to see you and Matt looking at this! Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 20: 5:26 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 20:05:24 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id B513637B400 for ; Sun, 3 Dec 2000 20:05:24 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eB444rd69989; Sun, 3 Dec 2000 20:04:53 -0800 (PST) (envelope-from dillon) Date: Sun, 3 Dec 2000 20:04:53 -0800 (PST) From: Matt Dillon Message-Id: <200012040404.eB444rd69989@earth.backplane.com> To: Andrew Gallatin Cc: Peter Wemm , "Kenneth D. Merry" , arch@FreeBSD.ORG Subject: Re: zero copy code review References: <20001129231653.A1503@panzer.kdm.org> <200012040308.eB438tD52326@mobile.wemm.org> <14891.4047.626648.658103@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Peter Wemm writes: :<...> : > Hmm.. I see one danger item: : :<..> : : > [..] : > Turn on vfs.ioopt to enable zero copy receive: : > sysctl -w vfs.ioopt=1 : > " : :This was a convenient sysctl to tie the zero-copy receive code to in :early prototyping. It has nothing to do with the filesystem aspects :of vfs_ioopt, which makes it confusing to the reader. This should :probably be ripped out or changed to depend only on the zero-copy :sockets sysctl. : :Rather than loaning out pages, the zero-copy receive code does full Oh my. Could you please change your use of the sysctl to one of your own? I did in fact mean to remove vfs.ioopt because the FS code is fundamentally broken... and quite likely to cause a system crash if used heavily. The vfs.ioopt code is still using 3.x semantics (maybe even 2.x!). I haven't been following the zero-copy work so if you could give me a head's up when you moved your own zero copy stuff to your own sysctl, I will then go ahead and remove the original broken vfs.ioopt and its associated code. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Dec 3 20: 7:18 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 20:07:16 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id 6F5A537B400 for ; Sun, 3 Dec 2000 20:07:16 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eB446jk70007; Sun, 3 Dec 2000 20:06:45 -0800 (PST) (envelope-from dillon) Date: Sun, 3 Dec 2000 20:06:45 -0800 (PST) From: Matt Dillon Message-Id: <200012040406.eB446jk70007@earth.backplane.com> To: Andrew Gallatin Cc: Peter Wemm , "Kenneth D. Merry" , arch@FreeBSD.ORG Subject: Re: zero copy code review References: <20001129231653.A1503@panzer.kdm.org> <200012040308.eB438tD52326@mobile.wemm.org> <14891.4047.626648.658103@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :freed). We do not make anything COW for the kernel. : : > either, but I could have missed it.. What about drivers that DMA to pages : > mapped into KVM without checking writability (and hence COW)? : :This is a good point. But I cannot think of any circumstance where a :driver would be DMA'ing directly to a user owned page (with the :exception of a vm fault, but this is impossible because the pages are :resident prior to setting up the send and are wired for the duration :of the send). : :Thanks for the input. I'm glad to see you and Matt looking at this! : :Drew Careful. If you read() from a raw device most disk drivers WILL dma directly to a user-owned page. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 1:11:22 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 01:11:21 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from cs.utep.edu (mail.cs.utep.edu [129.108.5.3]) by hub.freebsd.org (Postfix) with ESMTP id CDCF837B400 for ; Mon, 4 Dec 2000 01:11:20 -0800 (PST) Received: from gecko (gecko [129.108.5.51]) by cs.utep.edu (8.10.1/8.10.1) with ESMTP id eB49BHG28583 for ; Mon, 4 Dec 2000 02:11:17 -0700 (MST) Date: Mon, 4 Dec 2000 02:11:17 -0700 (MST) From: X-Sender: To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG subscribe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 9:43:24 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 09:43:22 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 2C8B637B400 for ; Mon, 4 Dec 2000 09:43:22 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id MAA02705; Mon, 4 Dec 2000 12:43:12 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB4HhCS15410; Mon, 4 Dec 2000 12:43:12 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Mon, 4 Dec 2000 12:43:12 -0500 (EST) To: Matt Dillon Cc: Peter Wemm , "Kenneth D. Merry" , arch@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <200012040406.eB446jk70007@earth.backplane.com> References: <20001129231653.A1503@panzer.kdm.org> <200012040308.eB438tD52326@mobile.wemm.org> <14891.4047.626648.658103@grasshopper.cs.duke.edu> <200012040406.eB446jk70007@earth.backplane.com> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14891.55103.81970.494533@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon writes: > > :freed). We do not make anything COW for the kernel. > : > : > either, but I could have missed it.. What about drivers that DMA to pages > : > mapped into KVM without checking writability (and hence COW)? > : > :This is a good point. But I cannot think of any circumstance where a > :driver would be DMA'ing directly to a user owned page (with the > :exception of a vm fault, but this is impossible because the pages are > :resident prior to setting up the send and are wired for the duration > :of the send). > : > :Thanks for the input. I'm glad to see you and Matt looking at this! > : > :Drew > > Careful. If you read() from a raw device most disk drivers WILL dma > directly to a user-owned page. > > -Matt > That's a good point that I hadn't thought about. All the more reason to make the send-side code a socket option so the process has to take careful aim before blowing off its foot. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 15:54:14 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 15:54:09 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 82AEF37B400 for ; Mon, 4 Dec 2000 15:54:09 -0800 (PST) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id QAA02581; Mon, 4 Dec 2000 16:51:52 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp03.primenet.com, id smtpdAAAy5aale; Mon Dec 4 16:50:50 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id QAA12392; Mon, 4 Dec 2000 16:52:50 -0700 (MST) From: Terry Lambert Message-Id: <200012042352.QAA12392@usr02.primenet.com> Subject: Re: zero copy code review To: bmilekic@technokratis.com (Bosko Milekic) Date: Mon, 4 Dec 2000 23:52:50 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG In-Reply-To: from "Bosko Milekic" at Dec 03, 2000 06:25:36 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr02.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > [ ... local DOS ... ] > > > > I really don't buy a probability defense. If a probability defense > > were acceptable, then not checking for a NULL return, and eating > > the panic that results is also acceptable. > > It's not a "probability defense." It's not a "defense." It's just a > "don't act the worst way possible when we have an attack." And you > haven't said at all why waiting indefinetely is better than not, > especially in the problematic situation I brought up. The situation you quote is one where the allocation fails, instead of WAITing until it can complete successfully, and this results in the kernel function failing and state being undone back to the point where the user space call that was the originator of the request fails back to user space. Then the user space code has to handle the failure. I maintain that the most reasonable and logical thing for the user space program to do, on seeing this failure (ENOBUF?), is to retry the operation. So it calls down again, and fails again, and you have substituted a busy loop which crosses protection domains twice, for a kernel sleep. This is the best case. The worst case is that the local DOS obtains yet more resources when the state is backed out, and the busy loop path in the kernel becomes shorter, due to an earlier failure for lack of resources. In neither case does failing the allocation instead of sleeping do _anything at all_ to address the root cause of the problem, nor does the failure result in the problem going away or being lessened. So I really don't see what is being accomplished by failing the allocation, rather than sleeping, except to use up _extra_ resources, during a time of resource starvation, to enforce the mbuf_wait interval. > > The problem with this theory is that "have the the [non-offending] > > process return from the kernel and deal with the temporary failure" > > presumes that there is a correct way to work around the failure in > > user space. > > No, it doesn't. But it's better for the process to sleep in user > space than to be INDEFINETELY stuck in the kernel. And, in the case of an > attack, it _will_ be indefinetely stuck. Why the heck would the process sleep in user space?!? It has work to do, it knows the call to make to do the work, and it will make the call repeatedly, untile it's context switched, or until the call succeeds. This is just like a write loop on a large buffer, subtracting out the write() return value and advancing the buffer pointer, until everything has been written. You might argue that a "correctly" written user space program would use a select loop, but I'm betting that the descriptor will show as writeable, even if thee aren't any mbufs available to accept the write; there's no way to make the write select accurate, without pre-reserving memory to accept the write. Personally, I would prefer, under DOS conditions, that my program be stuck in kernel space, so that it at least has a small chance of getting work done slowly during a DOS, than stuck in user space. You can be sure that the DOS process is not going to be nearly as polite in hanging around in user space until kernel resources are freed up. > > I would maintain that the failure would be persistant, since this > > does nothing to silence the DOS attack, and there is nothing that > > a user space program can do, except to retry, and get all the way > > down the code path to the same place that it was before. > > Right. It's not a preventive measure. But, it's much better to have > it act in this manner than wait indefinetely "in the case of." I strongly disagree. That's "``in the case of'' being able to get work done, despite the DOS". Hung in user space is the same as hung in the kernel: your process is not doing useful work. Making it easier for the DOS to get yet more resources during a period of resource starvation, and preventing other programs from competing ewith the DOS for resources freed by timeout or other mechanism, which takes them back from the DOS, seems like a big mistake to me. I would much rather have a system that I can normally talk to in a few seconds be capable of being talked to over a period of 10 minutes, than one I can't talk to at all; wouldn't you? > > It seems to me that this is just a case of how big you want to > > make your retry loop, not one of whether or not there will be a > > retry loop. > > The retry loop is _useless_. You drop the mutex and lose priority in > the wait queue when you return from m_get(). Calling again makes your > chances of getting an mbuf in a shortage even less probable. If you want > that behavior, just tweak your kern.ipc.mbuf_wait. This is actually the opposite of the effect you would want. A well behaved process denied a scarce resource should be first in line for that resource. Saying "I can't give you one because there's this pig of a process, but I'll tell you what I'll do: why don't you just piss off until the next millenium?" is no way to encourage well behaved processes... 8-). > [...] > > I would argue that this level of congestion should be proactively > > prohibited from occurring in the first place; the most likely way > > to do this correctly is to start "dropping" the oldest datagrams, > > NOT returning "NULL" to allocations made on behalf od telnetd or > > sshd from the local interface. > > This is really a great block of theory. I only wish that people with > such a passion to argue the methods would work in actually implementing > them. The code which implements "source quench" could be abused to provide this functionality at the queue bottom, where things are packing up in the ICMP echo datagram case (as one example). > It's not. It never was. It never will be. It's just better than > waiting indefinetely. It still provides you with the ability to wait > indefinetely, though, if you are incapable of understanding why it's > better not to. Explain it to me: why is it better to not wait? When I see the error return from the low memory condition, am I supposed to shut myself down, disabling apache, for example? Is _everyone_ supposed to do the same thing, until there is nothing but the DOS process running on the system? What does me failing buy _me_? How is this different than me waiting on _any_ contended resource, instead of timing out, like an advisory lock on a file? > > As a general bone of contention, if the thing _doesn't_ wait, it > > shouldn't be called M_WAIT, it should be called M_TRY_HARDER or > > something that indicates that the default behaviour has been > > altered, but in fact the routine will not be waiting around until > > it is successful, like all of the other _WAIT flags imply. > > It _does_ wait, and I disagree. By that logic, why not rename all the > _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you > can either read the code (hey, it is free!) or read the mbuf(9) man page > (now available in -CURRENT). It waits until it doesn't, you mean. 8-p. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 16:20:21 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 16:20:18 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8DBB837B400 for ; Mon, 4 Dec 2000 16:20:18 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB50KFR23246; Mon, 4 Dec 2000 16:20:15 -0800 (PST) Date: Mon, 4 Dec 2000 16:20:15 -0800 From: Alfred Perlstein To: Terry Lambert Cc: Bosko Milekic , arch@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001204162015.A8051@fw.wintelcom.net> References: <200012042352.QAA12392@usr02.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200012042352.QAA12392@usr02.primenet.com>; from tlambert@primenet.com on Mon, Dec 04, 2000 at 11:52:50PM +0000 Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Terry Lambert [001204 15:54] wrote: > > > [ ... local DOS ... ] > > > > > > I really don't buy a probability defense. If a probability defense > > > were acceptable, then not checking for a NULL return, and eating > > > the panic that results is also acceptable. > > > > It's not a "probability defense." It's not a "defense." It's just a > > "don't act the worst way possible when we have an attack." And you > > haven't said at all why waiting indefinetely is better than not, > > especially in the problematic situation I brought up. > > The situation you quote is one where the allocation fails, > instead of WAITing until it can complete successfully, and > this results in the kernel function failing and state being > undone back to the point where the user space call that was > the originator of the request fails back to user space. Then > the user space code has to handle the failure. > > I maintain that the most reasonable and logical thing for the > user space program to do, on seeing this failure (ENOBUF?), > is to retry the operation. > > So it calls down again, and fails again, and you have > substituted a busy loop which crosses protection domains > twice, for a kernel sleep. This is the best case. > > The worst case is that the local DOS obtains yet more > resources when the state is backed out, and the busy > loop path in the kernel becomes shorter, due to an > earlier failure for lack of resources. > > In neither case does failing the allocation instead of > sleeping do _anything at all_ to address the root cause > of the problem, nor does the failure result in the problem > going away or being lessened. > > So I really don't see what is being accomplished by failing > the allocation, rather than sleeping, except to use up > _extra_ resources, during a time of resource starvation, to > enforce the mbuf_wait interval. [snip] Well behaved applications (read: written by me) deal with errors like ENOBUFS properly, what they do is close the socket and commence throttling connections. I would not want my process to be stuck in the kernel waiting for bufferspace that could take quite a long time get ahold of. However I can understand someone wanting a niave process not to get such errors because they may misbehave and do stupid things like busy loop or just abort entirely. Perhaps adding a per-process or per-socket or per-something flag to ask for indefinite blocking (or turn it off) would be a good idea, honestly having it one way or the other isn't very good depending on your application. I can live with the current situation so I'll leave 'fixing' this to someone who wants the indefinite blocking. Oh, and don't forget, you can't block me indefinitely if I'm writing to a non-blocking socket. In fact if M_WAIT is set I shouldn't be blocking at all on a non-blocking socket. thanks, -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 16:27:16 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 16:27:15 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id D18E137B400 for ; Mon, 4 Dec 2000 16:27:14 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB50QxS23578; Mon, 4 Dec 2000 16:26:59 -0800 (PST) Date: Mon, 4 Dec 2000 16:26:59 -0800 From: Alfred Perlstein To: Terry Lambert Cc: Bosko Milekic , arch@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001204162659.B8051@fw.wintelcom.net> References: <200012042352.QAA12392@usr02.primenet.com> <20001204162015.A8051@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20001204162015.A8051@fw.wintelcom.net>; from bright@wintelcom.net on Mon, Dec 04, 2000 at 04:20:15PM -0800 Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Alfred Perlstein [001204 16:20] wrote: > > Well behaved applications (read: written by me) deal with errors > like ENOBUFS properly, what they do is close the socket and > commence throttling connections. > > I would not want my process to be stuck in the kernel waiting > for bufferspace that could take quite a long time get ahold of. > > However I can understand someone wanting a niave process not > to get such errors because they may misbehave and do stupid > things like busy loop or just abort entirely. > > Perhaps adding a per-process or per-socket or per-something flag > to ask for indefinite blocking (or turn it off) would be a good > idea, honestly having it one way or the other isn't very good > depending on your application. I can live with the current > situation so I'll leave 'fixing' this to someone who wants > the indefinite blocking. > > Oh, and don't forget, you can't block me indefinitely if I'm > writing to a non-blocking socket. In fact if M_WAIT is set > I shouldn't be blocking at all on a non-blocking socket. One more thing, ENOBUFS is indicative of a misconfiguration and shouldn't happen in day to day operations, if it does happen then the user needs to reconfigure for more buffer space. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 19:21: 8 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 19:21:06 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from dt051n37.san.rr.com (dt051n37.san.rr.com [204.210.32.55]) by hub.freebsd.org (Postfix) with ESMTP id E162637B400 for ; Mon, 4 Dec 2000 19:21:05 -0800 (PST) Received: from slave (Studded@slave [10.0.0.1]) by dt051n37.san.rr.com (8.9.3/8.9.3) with ESMTP id TAA68548; Mon, 4 Dec 2000 19:20:50 -0800 (PST) (envelope-from DougB@gorean.org) Date: Mon, 4 Dec 2000 19:20:49 -0800 (PST) From: Doug Barton X-Sender: doug@dt051n37.san.rr.com To: Peter Jeremy Cc: "Michael C . Wu" , arch@FreeBSD.ORG Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... In-Reply-To: <20001201152137.K1474@gsmx07.alcatel.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 1 Dec 2000, Peter Jeremy wrote: > On 2000-Nov-30 21:47:45 -0600, "Michael C . Wu" wrote: > >On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled: > >| On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp wrote: > >| >Has anybody run a 486 or 386 under current recently ? > >| > >| X on a PRE_SMPNG 486 is painful - mouse movements no longer make > >| the X pointer move in real time. I haven't noticed the seeding > >| issue (probably just luck). > > > >PRE_SMPNG does not have the /dev/random seeding issue. > > > >You actually expected X to run well on a 486? :-) > > It used to run reasonably well (ignoring hogs like Netscape) before > Yarrow was added. Have you tried updating to the latest -Current? All aspects of the entropy harvesting have changed significantly since PRE_SMPNG. Doug -- So what I want to know is, where does the RED brick road go? Do YOU Yahoo!? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Dec 4 20:45:19 2000 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 4 20:45:17 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id 69B3037B400 for ; Mon, 4 Dec 2000 20:45:13 -0800 (PST) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id VAA42725; Mon, 4 Dec 2000 21:44:38 -0700 (MST) (envelope-from ken) Date: Mon, 4 Dec 2000 21:44:38 -0700 From: "Kenneth D. Merry" To: Bosko Milekic Cc: arch@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001204214438.A42689@panzer.kdm.org> References: <20001201002235.D10772@panzer.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: ; from bmilekic@technokratis.com on Sat, Dec 02, 2000 at 01:00:22PM -0500 Sender: ken@panzer.kdm.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG [ catching up with mail from this weekend ] On Sat, Dec 02, 2000 at 13:00:22 -0500, Bosko Milekic wrote: > > On Fri, 1 Dec 2000, Kenneth D. Merry wrote: > > > It does have spls in the right places, in this case splimp() and splvm(). > > Would you just convert those to the proper mutexes, or are we going to go > > with per-data-structure mutexes (i.e. a little finer granularity), or...? > > (I don't know much about the mutex strategy we're using...) > > For now, you won't be able to do anything with the splvm() stuff, as > the VM code has not yet been ripped out from under Giant (and likely > won't be for a while). > A few notes Re: spl()s and mutexes in uipc_jumbo.c, in particular > (since that's where I would begin putting in mutexes): > > - Your jumbo_kmap singly linked list should probably not be manipulated > under splvm() [in fact, I think it's wrong]. The list should be > protected by a lock. Okay. > - jumbo_freem should just be called jumbo_free, if the naming convention > is being adopted from the mbuf system (which it looks like it is). The > reason is that for mbufs, m_free() frees a single mbuf while m_freem() > frees an entire chain of them. Okay. > - jumbo_pg_free should be ripped out from under splimp(); leave the > explicit splvm() in there, but protect the list manipulations with the > lock. Okay. > If most of the things pointed out earlier are fixed, and as long as > the code is not flawed (which I really doubt it would be anyway), I have > no objections to it going in soon and then attacking the above issue a > little later (If nobody gets to it within the next two weeks, I'll be > glad to do it myself once those 2 weeks are past). Sounds good. There have been other problems pointed out that we'll need to fix as well before the code can go in. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Dec 5 5:10:25 2000 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 5 05:10:24 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from bells.cs.ucl.ac.uk (bells.cs.ucl.ac.uk [128.16.5.31]) by hub.freebsd.org (Postfix) with SMTP id 4236437B400; Tue, 5 Dec 2000 05:09:08 -0800 (PST) Received: from sonic.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP id ; Tue, 5 Dec 2000 13:08:51 +0000 From: Orion Hodson To: freebsd-arch@freebsd.org Cc: cg@freebsd.org Subject: soundcard.h Date: Tue, 05 Dec 2000 13:08:50 +0000 Message-ID: <3737.976021730@cs.ucl.ac.uk> Sender: O.Hodson@cs.ucl.ac.uk Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG As someone who works with sound quite a bit, its hard not to notice that soundcard.h covers several different and sometimes overlapping interfaces. I had a go at re-arranging it with an aim to clarifying what works, what's deprecated in newpcm, clarifying comments about "what does this do", and putting related items together. Cameron suggested it would be a better idea to break out the functionalities into separate include files, i.e. snd_oss.h, snd_pcm.h, snd_mixer.h, snd_sequencer.h, etc and have these included from soundcard.h. Is there any strength of feeling for or against doing this? It's completely aesthetic and very minor undertaking, but I don't mind doing if people think it'd be reasonable. - Orion To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Dec 5 13:23:34 2000 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 5 13:23:31 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mail.wgate.com (mail.wgate.com [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id A518237B400 for ; Tue, 5 Dec 2000 13:23:30 -0800 (PST) Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id Y2Q95H02; Tue, 5 Dec 2000 16:23:34 -0500 Reply-To: Randell Jesup To: Alfred Perlstein Cc: Terry Lambert , Bosko Milekic , arch@FreeBSD.ORG Subject: Re: zero copy code review References: <200012042352.QAA12392@usr02.primenet.com> <20001204162015.A8051@fw.wintelcom.net> <20001204162659.B8051@fw.wintelcom.net> From: Randell Jesup Date: 05 Dec 2000 16:30:26 -0500 In-Reply-To: Alfred Perlstein's message of "Mon, 4 Dec 2000 16:26:59 -0800" Message-ID: User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein writes: >> Well behaved applications (read: written by me) deal with errors >> like ENOBUFS properly, what they do is close the socket and >> commence throttling connections. Most user-level applications do not. Certainly most applications that call write() don't. In fact, a grep of /usr/src shows that of things outside of sys, only a handful test for ENOBUFS: telnet, sendmail, ipfilter, ntp, natd, and ping seem to include a test (I didn't check what they do with it though). >> I would not want my process to be stuck in the kernel waiting >> for bufferspace that could take quite a long time get ahold of. In many cases you do. How different is that than waiting on some other resource that may take a long time, or getting a response to a write or read across a network? In fact, I'd assert in most cases waiting is the appropriate action unless a call is non-blocking. Given the very small number of programs that _do_ handle ENOBUFS, I'd assert that the default action should be to wait, unless the application has said it wants to hear about them. >> However I can understand someone wanting a niave process not >> to get such errors because they may misbehave and do stupid >> things like busy loop or just abort entirely. or fail a complex transaction, etc. Like 99% of user code out there when faced with ENOBUFS. >> Perhaps adding a per-process or per-socket or per-something flag >> to ask for indefinite blocking (or turn it off) would be a good >> idea, honestly having it one way or the other isn't very good >> depending on your application. I can live with the current >> situation so I'll leave 'fixing' this to someone who wants >> the indefinite blocking. per-socket makes sense; or keyed off non-blocking mode. The default should be wait. >> Oh, and don't forget, you can't block me indefinitely if I'm >> writing to a non-blocking socket. In fact if M_WAIT is set >> I shouldn't be blocking at all on a non-blocking socket. Agreed; even more reason to tie it to non-blocking mode. >One more thing, ENOBUFS is indicative of a misconfiguration and >shouldn't happen in day to day operations, if it does happen then >the user needs to reconfigure for more buffer space. Or it's indicative of a DoS attack (possibly unintentional), or a load problem, possibly temporary. I dislike arbitrary tuning parameters. Generally they're either ignored (mostly), or set wildly high in the hope of the annoyance someone hit once going away. Or just set randomly. Most of the people doing the setting don't have a good grasp of why it should be set to a specific value. Kind of like putting a spark-advance knob on the steering wheel (which they once did, believe it or not). -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Dec 5 13:33:48 2000 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 5 13:33:41 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from field.videotron.net (field.videotron.net [205.151.222.108]) by hub.freebsd.org (Postfix) with ESMTP id 38E5737B404 for ; Tue, 5 Dec 2000 13:33:40 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G5400CFJ6K1ZP@field.videotron.net> for arch@FreeBSD.ORG; Tue, 5 Dec 2000 16:33:38 -0500 (EST) Date: Tue, 05 Dec 2000 16:34:30 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <200012042352.QAA12392@usr02.primenet.com> To: Terry Lambert Cc: arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I don't understand what you're complaining about already. Just set kern.ipc.mbuf_wait to 0 and you'll have the behavior you're looking for. As for the kernel, people must always keep checking whether their mbuf pointer is NULL following any type of allocation and deal with it appropriately (return ENOBUFS or drop the packet) until a real preventive global preventive measure is put into place (think vm_state a-la PHK, or something similar). Changing the behavior of M_WAIT to not return NULL ever is out of the question. I don't need to explain myself again. If you want to test this theory, try lowering NMBCLUSTERS (so it's easy to exhaust mb_map), heavily load your network (from the outside) and tune your kern.ipc.mbuf_wait accordingly, so that `netstat -m' shows a "requests for memory delayed" > "requests for memory denied." This should give you an about optimal wait time for heavy network load. This is your "normal wait time." Now load your system with some local DoS (allocate very large socket buffers in a tight loop, for example) and watch your system effectively deadlock, and then see how much you're glad that your process isn't hanging in the kernel indefinetely and how ^C eventually does its job and kills the process. Then watch your system recover. Now set mbuf_wait to 0 and run the same test. Have fun running fsck after the cold boot. Just about the only thing that may be considered is changing the name of M_WAIT to something more appropriate, if it means so much to the majority of people (honestly, I would find even doing this a waste of time, but if lots of folks think it's worth educating kernel developers by changing the name of a flag, then we might as well). On Mon, 4 Dec 2000, Terry Lambert wrote: > > > [ ... local DOS ... ] > > > > > > I really don't buy a probability defense. If a probability defense > > > were acceptable, then not checking for a NULL return, and eating > > > the panic that results is also acceptable. > > > > It's not a "probability defense." It's not a "defense." It's just a > > "don't act the worst way possible when we have an attack." And you > > haven't said at all why waiting indefinetely is better than not, > > especially in the problematic situation I brought up. > > The situation you quote is one where the allocation fails, > instead of WAITing until it can complete successfully, and > this results in the kernel function failing and state being > undone back to the point where the user space call that was > the originator of the request fails back to user space. Then > the user space code has to handle the failure. > > I maintain that the most reasonable and logical thing for the > user space program to do, on seeing this failure (ENOBUF?), > is to retry the operation. > > So it calls down again, and fails again, and you have > substituted a busy loop which crosses protection domains > twice, for a kernel sleep. This is the best case. > > The worst case is that the local DOS obtains yet more > resources when the state is backed out, and the busy > loop path in the kernel becomes shorter, due to an > earlier failure for lack of resources. > > In neither case does failing the allocation instead of > sleeping do _anything at all_ to address the root cause > of the problem, nor does the failure result in the problem > going away or being lessened. > > So I really don't see what is being accomplished by failing > the allocation, rather than sleeping, except to use up > _extra_ resources, during a time of resource starvation, to > enforce the mbuf_wait interval. > > > > > The problem with this theory is that "have the the [non-offending] > > > process return from the kernel and deal with the temporary failure" > > > presumes that there is a correct way to work around the failure in > > > user space. > > > > No, it doesn't. But it's better for the process to sleep in user > > space than to be INDEFINETELY stuck in the kernel. And, in the case of an > > attack, it _will_ be indefinetely stuck. > > Why the heck would the process sleep in user space?!? It has > work to do, it knows the call to make to do the work, and it > will make the call repeatedly, untile it's context switched, > or until the call succeeds. This is just like a write loop > on a large buffer, subtracting out the write() return value > and advancing the buffer pointer, until everything has been > written. You might argue that a "correctly" written user > space program would use a select loop, but I'm betting that > the descriptor will show as writeable, even if thee aren't > any mbufs available to accept the write; there's no way to > make the write select accurate, without pre-reserving memory > to accept the write. > > Personally, I would prefer, under DOS conditions, that my > program be stuck in kernel space, so that it at least has a > small chance of getting work done slowly during a DOS, than > stuck in user space. You can be sure that the DOS process > is not going to be nearly as polite in hanging around in user > space until kernel resources are freed up. > > > > > I would maintain that the failure would be persistant, since this > > > does nothing to silence the DOS attack, and there is nothing that > > > a user space program can do, except to retry, and get all the way > > > down the code path to the same place that it was before. > > > > Right. It's not a preventive measure. But, it's much better to have > > it act in this manner than wait indefinetely "in the case of." > > I strongly disagree. That's "``in the case of'' being able to > get work done, despite the DOS". Hung in user space is the > same as hung in the kernel: your process is not doing useful > work. > > Making it easier for the DOS to get yet more resources during > a period of resource starvation, and preventing other programs > from competing ewith the DOS for resources freed by timeout or > other mechanism, which takes them back from the DOS, seems like > a big mistake to me. I would much rather have a system that I > can normally talk to in a few seconds be capable of being talked > to over a period of 10 minutes, than one I can't talk to at all; > wouldn't you? > > > > > It seems to me that this is just a case of how big you want to > > > make your retry loop, not one of whether or not there will be a > > > retry loop. > > > > The retry loop is _useless_. You drop the mutex and lose priority in > > the wait queue when you return from m_get(). Calling again makes your > > chances of getting an mbuf in a shortage even less probable. If you want > > that behavior, just tweak your kern.ipc.mbuf_wait. > > This is actually the opposite of the effect you would want. A > well behaved process denied a scarce resource should be first in > line for that resource. Saying "I can't give you one because > there's this pig of a process, but I'll tell you what I'll do: > why don't you just piss off until the next millenium?" is no way > to encourage well behaved processes... 8-). > > > > [...] > > > I would argue that this level of congestion should be proactively > > > prohibited from occurring in the first place; the most likely way > > > to do this correctly is to start "dropping" the oldest datagrams, > > > NOT returning "NULL" to allocations made on behalf od telnetd or > > > sshd from the local interface. > > > > This is really a great block of theory. I only wish that people with > > such a passion to argue the methods would work in actually implementing > > them. > > The code which implements "source quench" could be abused to > provide this functionality at the queue bottom, where things > are packing up in the ICMP echo datagram case (as one example). > > > > It's not. It never was. It never will be. It's just better than > > waiting indefinetely. It still provides you with the ability to wait > > indefinetely, though, if you are incapable of understanding why it's > > better not to. > > Explain it to me: why is it better to not wait? When I see the > error return from the low memory condition, am I supposed to shut > myself down, disabling apache, for example? Is _everyone_ > supposed to do the same thing, until there is nothing but the DOS > process running on the system? > > What does me failing buy _me_? > > How is this different than me waiting on _any_ contended resource, > instead of timing out, like an advisory lock on a file? > > > > > As a general bone of contention, if the thing _doesn't_ wait, it > > > shouldn't be called M_WAIT, it should be called M_TRY_HARDER or > > > something that indicates that the default behaviour has been > > > altered, but in fact the routine will not be waiting around until > > > it is successful, like all of the other _WAIT flags imply. > > > > It _does_ wait, and I disagree. By that logic, why not rename all the > > _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you > > can either read the code (hey, it is free!) or read the mbuf(9) man page > > (now available in -CURRENT). > > It waits until it doesn't, you mean. 8-p. > > > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Dec 5 19:11:51 2000 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 5 19:11:50 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 78A1E37B400 for ; Tue, 5 Dec 2000 19:11:49 -0800 (PST) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.1/8.11.1) with SMTP id eB63Bmf96881 for ; Tue, 5 Dec 2000 22:11:48 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Tue, 5 Dec 2000 22:11:48 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: freebsd-arch@FreeBSD.org Subject: Threads in the base system Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: robert@fledge.watson.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Recently, pppctl was made thread-enabled, meaning that it relies on libc_r. This makes the NOLIBC_R cannot be used with buildworld anymore. Given that making pppctl depend on !NOLIBC_R may not be all that helpful, it looks like we may need to lose NOLIBC_R. Presumably over time, threads in default system applications will only become more popular. Any thoughts (especially in light of upcoming KSE changes, which will make threading integral to the system architecture)? Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Dec 5 19:14:43 2000 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 5 19:14:40 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id E981437B698 for ; Tue, 5 Dec 2000 19:14:32 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id UAA07508; Tue, 5 Dec 2000 20:11:08 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAArtaGNo; Tue Dec 5 20:11:02 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id UAA24462; Tue, 5 Dec 2000 20:14:18 -0700 (MST) From: Terry Lambert Message-Id: <200012060314.UAA24462@usr05.primenet.com> Subject: Re: zero copy code review To: bmilekic@technokratis.com (Bosko Milekic) Date: Wed, 6 Dec 2000 03:14:18 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG In-Reply-To: from "Bosko Milekic" at Dec 05, 2000 04:34:30 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr05.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I don't understand what you're complaining about already. Just set > kern.ipc.mbuf_wait to 0 and you'll have the behavior you're looking for. I am looking for semantics, not behaviour. The difference is the cost that I end up paying. > As for the kernel, people must always keep checking whether their mbuf > pointer is NULL following any type of allocation and deal with it > appropriately (return ENOBUFS or drop the packet) until a real preventive > global preventive measure is put into place (think vm_state a-la PHK, or > something similar). This otherwise unnecessary checking is what I'm complaining about. I don't like it in my code path, it slows things down unnecessarily. > Changing the behavior of M_WAIT to not return NULL ever is out of the > question. You mean "changing it back", of course... > I don't need to explain myself again. If you want to test this > theory, try lowering NMBCLUSTERS (so it's easy to exhaust mb_map), > heavily load your network (from the outside) and tune your > kern.ipc.mbuf_wait accordingly, so that `netstat -m' shows a "requests > for memory delayed" > "requests for memory denied." This should give you > an about optimal wait time for heavy network load. This is your "normal > wait time." I see you attempting to tune a pool entry rate in order to deal with a pool retention time for something that I don't think should be in a hysteretical loop in the first place. > Now load your system with some local DoS (allocate very large > socket buffers in a tight loop, for example) and watch your system > effectively deadlock, and then see how much you're glad that your process > isn't hanging in the kernel indefinetely and how ^C eventually does its > job and kills the process. Then watch your system recover. I guess you are talking about interupting the DOS program, and not some victim program here, right? I think that this is a really artificial test case. I think a real test case would be to start the DOS program (nb: I don't let shell users on my servers anyway), and then start a different (victim) program, and watch what happens to the different program. I can ^C the victim program, in your scenario, but my system will fail to recover, and will remain unusable. My system really needs to set working set limitations on how many resources a single process is permitted to monopolize under low resource conditions. This would let my victim program continue to run, if sluggishly, and prevent a single DOS from doing more than slowing down my system. I think that you are maybe rendering the program interuptable the wrong way, and using the failure path on the allocation to back out the stack state leading up to the allocation attempt, as a convenience? > Just about the only thing that may be considered is changing the > name of M_WAIT to something more appropriate, if it means so much to the > majority of people (honestly, I would find even doing this a waste of > time, but if lots of folks think it's worth educating kernel developers > by changing the name of a flag, then we might as well). If the semantics don't revert back to their pre-timeout behaviour, I think it really would be best to have a meaningful name for it. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Dec 5 19:20:15 2000 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 5 19:20:14 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id AC0D837B401; Tue, 5 Dec 2000 19:20:13 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id WAA00210; Tue, 5 Dec 2000 22:19:52 -0500 (EST) Date: Tue, 5 Dec 2000 22:19:52 -0500 (EST) From: Daniel Eischen To: Robert Watson Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Threads in the base system In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 5 Dec 2000, Robert Watson wrote: > Recently, pppctl was made thread-enabled, meaning that it relies on > libc_r. This makes the NOLIBC_R cannot be used with buildworld anymore. > Given that making pppctl depend on !NOLIBC_R may not be all that helpful, > it looks like we may need to lose NOLIBC_R. Presumably over time, threads > in default system applications will only become more popular. Any > thoughts (especially in light of upcoming KSE changes, which will make > threading integral to the system architecture)? OK, lose NOLIBC_R -- not that I'm biased or anything ;-) -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 5:22:29 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 05:22:27 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from www.stocke.com (unknown [202.101.165.60]) by hub.freebsd.org (Postfix) with ESMTP id 6110837B400 for ; Wed, 6 Dec 2000 05:22:19 -0800 (PST) Received: from xyf ([61.164.185.75]) by www.stocke.com (8.9.3/8.9.3) with SMTP id VAA07952 for ; Wed, 6 Dec 2000 21:25:09 +0800 Message-ID: <000f01c05f87$7406cbc0$5ac809c0@xyf> From: "xuyifeng" To: References: Subject: Re: Threads in the base system Date: Wed, 6 Dec 2000 21:21:24 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: base64 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ZG9lcyB0aGlzIG1lYW4gdGhhdCB3ZSB3aWxsIGhhdmUgb25seSBsaWJjX3Iuc28gYW5kIGxpYmNf ci5hIGluIEZyZWVCU0QgNS4wIHN5c3RlbT8NCmNhbiB3ZSByZW1vdmUgbGliYy5zbyBhbmQgbGli Yy5hPyAgbGV0IHRoZSBzeXN0ZW0gdG8gZGVmYXVsdCBtdWxpdC10aHJlYWRlZCBlbmFibGU/DQpJ IGtub3cgaWYgSSBtaXggbXN2Y3J0LmRsbChETEwgdmVyc2lvbikgYW5kIGxpYmNtdC5saWIgKHN0 YXRpYyBsaWJyYXJ5KSBpbiBNJCB2aXN1YWwgQysrDQpwcm9ncmFtLCAgbWVtb3J5IHdpbGwgYmUg Y29ycnVwdGVkLCAgKHNvbWV0aW1lcyBJIGNhbiBub3QgYXZvaWQgdGhlIHByb2JsZW0gYmVjYXVz ZSANCm9mIHVzaW5nIHRoaXJkIHBhcnR5IGxpYmJyYXJ5KSwgIGlzIGl0IHRydWUgb24gRnJlZUJT RCBpZiBJIG1peCB1c2luZyBsaWJjIGFuZCBsaWJjX3IgDQppbiBzYW1lIHByb2dyYW0/DQoNClJl Z2FyZHMsDQpYdVlpZmVuZw0KDQotLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tIA0KRnJvbTog Um9iZXJ0IFdhdHNvbiA8cndhdHNvbkBGcmVlQlNELm9yZz4NClRvOiA8ZnJlZWJzZC1hcmNoQEZy ZWVCU0Qub3JnPg0KU2VudDogV2VkbmVzZGF5LCBEZWNlbWJlciAwNiwgMjAwMCAxMToxMSBBTQ0K U3ViamVjdDogVGhyZWFkcyBpbiB0aGUgYmFzZSBzeXN0ZW0NCg0KDQo+IA0KPiBSZWNlbnRseSwg cHBwY3RsIHdhcyBtYWRlIHRocmVhZC1lbmFibGVkLCBtZWFuaW5nIHRoYXQgaXQgcmVsaWVzIG9u DQo+IGxpYmNfci4gIFRoaXMgbWFrZXMgdGhlIE5PTElCQ19SIGNhbm5vdCBiZSB1c2VkIHdpdGgg YnVpbGR3b3JsZCBhbnltb3JlLg0KPiBHaXZlbiB0aGF0IG1ha2luZyBwcHBjdGwgZGVwZW5kIG9u ICFOT0xJQkNfUiBtYXkgbm90IGJlIGFsbCB0aGF0IGhlbHBmdWwsDQo+IGl0IGxvb2tzIGxpa2Ug d2UgbWF5IG5lZWQgdG8gbG9zZSBOT0xJQkNfUi4gIFByZXN1bWFibHkgb3ZlciB0aW1lLCB0aHJl YWRzDQo+IGluIGRlZmF1bHQgc3lzdGVtIGFwcGxpY2F0aW9ucyB3aWxsIG9ubHkgYmVjb21lIG1v cmUgcG9wdWxhci4gIEFueQ0KPiB0aG91Z2h0cyAoZXNwZWNpYWxseSBpbiBsaWdodCBvZiB1cGNv bWluZyBLU0UgY2hhbmdlcywgd2hpY2ggd2lsbCBtYWtlDQo+IHRocmVhZGluZyBpbnRlZ3JhbCB0 byB0aGUgc3lzdGVtIGFyY2hpdGVjdHVyZSk/DQo+IA0KPiBSb2JlcnQgTiBNIFdhdHNvbiAgICAg ICAgICAgICBGcmVlQlNEIENvcmUgVGVhbSwgVHJ1c3RlZEJTRCBQcm9qZWN0DQo+IHJvYmVydEBm bGVkZ2Uud2F0c29uLm9yZyAgICAgIE5BSSBMYWJzLCBTYWZlcG9ydCBOZXR3b3JrIFNlcnZpY2Vz DQo+IA0KPiANCj4gDQo+IFRvIFVuc3Vic2NyaWJlOiBzZW5kIG1haWwgdG8gbWFqb3Jkb21vQEZy ZWVCU0Qub3JnDQo+IHdpdGggInVuc3Vic2NyaWJlIGZyZWVic2QtYXJjaCIgaW4gdGhlIGJvZHkg b2YgdGhlIG1lc3NhZ2UNCg0KDQo= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 13:37:19 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 13:37:17 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173]) by hub.freebsd.org (Postfix) with ESMTP id 4A1C737B401; Wed, 6 Dec 2000 13:37:15 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6LZLm16231; Wed, 6 Dec 2000 21:35:21 GMT (envelope-from brian@hak.lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6Lc6t07375; Wed, 6 Dec 2000 21:38:06 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200012062138.eB6Lc6t07375@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Robert Watson Cc: freebsd-arch@FreeBSD.org, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: Message from Robert Watson of "Tue, 05 Dec 2000 22:11:48 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 06 Dec 2000 21:38:06 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Good spot. I believe NOLIBC_R should/must go. > Recently, pppctl was made thread-enabled, meaning that it relies on > libc_r. This makes the NOLIBC_R cannot be used with buildworld anymore. > Given that making pppctl depend on !NOLIBC_R may not be all that helpful, > it looks like we may need to lose NOLIBC_R. Presumably over time, threads > in default system applications will only become more popular. Any > thoughts (especially in light of upcoming KSE changes, which will make > threading integral to the system architecture)? > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > robert@fledge.watson.org NAI Labs, Safeport Network Services -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 13:37:30 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 13:37:26 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173]) by hub.freebsd.org (Postfix) with ESMTP id 8A20737B400 for ; Wed, 6 Dec 2000 13:37:23 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6LYFm16217; Wed, 6 Dec 2000 21:34:15 GMT (envelope-from brian@hak.lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6Lb0t07362; Wed, 6 Dec 2000 21:37:00 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200012062137.eB6Lb0t07362@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: "xuyifeng" Cc: freebsd-arch@FreeBSD.org, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: Message from "xuyifeng" of "Wed, 06 Dec 2000 21:21:24 +0800." <000f01c05f87$7406cbc0$5ac809c0@xyf> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Date: Wed, 06 Dec 2000 21:37:00 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I don't think it's possible to mix libc and libc_r in the same = program (except as intended - with the libc_r stubs superseding the = libc ones). There are no ``alternative'' header files with different defines that = might be used in one object and not the other, and the program is = either linked against libc_r, or isn't (and will fail if it's got = thread references). I won't comment on Microsoft's shared library implementation. > does this mean that we will have only libc_r.so and libc_r.a in FreeBSD= 5.0 system? > can we remove libc.so and libc.a? let the system to default mulit-thre= aded enable? > I know if I mix msvcrt.dll(DLL version) and libcmt.lib (static library)= in M$ visual C++ > program, memory will be corrupted, (sometimes I can not avoid the pro= blem because = > of using third party libbrary), is it true on FreeBSD if I mix using l= ibc and libc_r = > in same program? > = > Regards, > XuYifeng > = > ----- Original Message ----- = > From: Robert Watson > To: > Sent: Wednesday, December 06, 2000 11:11 AM > Subject: Threads in the base system > = > = > > = > > Recently, pppctl was made thread-enabled, meaning that it relies on > > libc_r. This makes the NOLIBC_R cannot be used with buildworld anymo= re. > > Given that making pppctl depend on !NOLIBC_R may not be all that help= ful, > > it looks like we may need to lose NOLIBC_R. Presumably over time, th= reads > > in default system applications will only become more popular. Any > > thoughts (especially in light of upcoming KSE changes, which will mak= e > > threading integral to the system architecture)? > > = > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > > robert@fledge.watson.org NAI Labs, Safeport Network Services -- = Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 13:51: 1 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 13:50:58 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 78C1337B400; Wed, 6 Dec 2000 13:50:57 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id QAA14771; Wed, 6 Dec 2000 16:50:30 -0500 (EST) Date: Wed, 6 Dec 2000 16:50:29 -0500 (EST) From: Daniel Eischen To: Brian Somers Cc: Robert Watson , freebsd-arch@FreeBSD.ORG, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: <200012062138.eB6Lc6t07375@hak.lan.Awfulhak.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 6 Dec 2000, Brian Somers wrote: > Good spot. I believe NOLIBC_R should/must go. I was just [re]thinking about this. When we get libpthread (work has just started on this), then libc_r will eventually go away. It's not clear yet whether libpthread will exist as a separate entity or whether it will evolve from libc_r. It's possible that NOLIBC_R might actually become the default. > > Recently, pppctl was made thread-enabled, meaning that it relies on > > libc_r. This makes the NOLIBC_R cannot be used with buildworld anymore. > > Given that making pppctl depend on !NOLIBC_R may not be all that helpful, > > it looks like we may need to lose NOLIBC_R. Presumably over time, threads > > in default system applications will only become more popular. Any > > thoughts (especially in light of upcoming KSE changes, which will make > > threading integral to the system architecture)? > > > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > > robert@fledge.watson.org NAI Labs, Safeport Network Services -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 14: 3: 2 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 14:02:59 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173]) by hub.freebsd.org (Postfix) with ESMTP id 4870037B400; Wed, 6 Dec 2000 14:02:52 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6LvHm16334; Wed, 6 Dec 2000 21:57:17 GMT (envelope-from brian@hak.lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6M01t07697; Wed, 6 Dec 2000 22:00:01 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200012062200.eB6M01t07697@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Daniel Eischen Cc: Brian Somers , Robert Watson , freebsd-arch@FreeBSD.ORG, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: Message from Daniel Eischen of "Wed, 06 Dec 2000 16:50:29 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 06 Dec 2000 22:00:01 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > On Wed, 6 Dec 2000, Brian Somers wrote: > > Good spot. I believe NOLIBC_R should/must go. > > I was just [re]thinking about this. When we get libpthread (work > has just started on this), then libc_r will eventually go away. > It's not clear yet whether libpthread will exist as a separate > entity or whether it will evolve from libc_r. It's possible > that NOLIBC_R might actually become the default. We should really be advocating using threads in the base system rather than discouraging it (well, of course that's my view :-). I suspect however that to most people, libc_r is just some extra buildworld overhead.... I've already cast my vote, and can't see any strong argument not to remove NOLIBC_R (especially now that it breaks world :-) > > > Recently, pppctl was made thread-enabled, meaning that it relies on > > > libc_r. This makes the NOLIBC_R cannot be used with buildworld anymore. > > > Given that making pppctl depend on !NOLIBC_R may not be all that helpful, > > > it looks like we may need to lose NOLIBC_R. Presumably over time, threads > > > in default system applications will only become more popular. Any > > > thoughts (especially in light of upcoming KSE changes, which will make > > > threading integral to the system architecture)? > > > > > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > > > robert@fledge.watson.org NAI Labs, Safeport Network Services > > -- > Dan Eischen -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 14:36:51 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 14:36:49 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242]) by hub.freebsd.org (Postfix) with ESMTP id AE95737B401; Wed, 6 Dec 2000 14:36:49 -0800 (PST) Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30]) by palrel1.hp.com (Postfix) with ESMTP id 678A589D; Wed, 6 Dec 2000 14:36:28 -0800 (PST) Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180]) by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id OAA07831; Wed, 6 Dec 2000 14:36:28 -0800 (PST) Sender: marcel@cup.hp.com Message-ID: <3A2EBF6B.90BA100B@cup.hp.com> Date: Wed, 06 Dec 2000 14:36:27 -0800 From: Marcel Moolenaar Organization: Hewlett-Packard X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Brian Somers Cc: Daniel Eischen , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: Threads in the base system References: <200012062200.eB6M01t07697@hak.lan.Awfulhak.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Brian Somers wrote: > > I've already cast my vote, and can't see any strong argument not to > remove NOLIBC_R (especially now that it breaks world :-) I think Daniel just gave a good reason: we may need it in the future. Isn't it better at this time to keep the NOLIBC_R, but to promote it to an internal tweak? -- Marcel Moolenaar mail: marcel@cup.hp.com / marcel@FreeBSD.org tel: (408) 447-4222 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 14:48:43 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 14:48:40 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 130DE37B400 for ; Wed, 6 Dec 2000 14:48:40 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id PAA10597; Wed, 6 Dec 2000 15:45:15 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAXDaqPu; Wed Dec 6 15:45:07 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id PAA25916; Wed, 6 Dec 2000 15:48:27 -0700 (MST) From: Terry Lambert Message-Id: <200012062248.PAA25916@usr08.primenet.com> Subject: Re: Threads in the base system To: brian@Awfulhak.org (Brian Somers) Date: Wed, 6 Dec 2000 22:48:27 +0000 (GMT) Cc: xyf@stocke.com (xuyifeng), freebsd-arch@FreeBSD.ORG, brian@Awfulhak.org In-Reply-To: <200012062137.eB6Lb0t07362@hak.lan.Awfulhak.org> from "Brian Somers" at Dec 06, 2000 09:37:00 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr08.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I don't think it's possible to mix libc and libc_r in the same > program (except as intended - with the libc_r stubs superseding the > libc ones). > > There are no ``alternative'' header files with different defines that > might be used in one object and not the other, and the program is > either linked against libc_r, or isn't (and will fail if it's got > thread references). From my reading, there is still code in header files that is compiled variant based on _THREAD. This means that calling libraries compiled with threading disabled from code that was compiled with threading enabled _may_ result in undefined behaviour (I haven't tracked down every instance, and I think an audit would be needed to know for sure). In general, it's possible to set up an "apartment" or "rental" model threading interface to wrap such libraries to make sure things work. Work has to be queued for a worker thread, and the worker thread does the work and queues the response. Only the worker thread can be allowed into the library. This is basically how you have to use the thread-unsafe LDAP libraries on Windows (or any system that has thread local storage that is not mapped into the global process address space -- what a design mistake). This assumes that with or without _THREAD, the code doesn't change, though... I guess the real question is, if you were to rename libc, so that things couldn't link against it, modify the libc_r to include a linkage against the renamed library so it pulls in things it doesn't define from libc instead, and then make symlinks from libc to point to libc_r instead, would things still work, or are there some things that would break? As far as eating the threading overhead in unthreaded programs, the decision to eat the overhead has already been taken; it happened whenEGCS became the default compiler, since EGCS doesn't support dynamic registration of threads support code (e.g. per thread exception stacks in C++ via libgcc). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 14:53:44 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 14:53:42 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from gw.nectar.com (gw.nectar.com [208.42.49.153]) by hub.freebsd.org (Postfix) with ESMTP id D00BA37B698; Wed, 6 Dec 2000 14:53:35 -0800 (PST) Received: by gw.nectar.com (Postfix, from userid 1001) id 9FF94193E1; Wed, 6 Dec 2000 16:53:34 -0600 (CST) Date: Wed, 6 Dec 2000 16:53:34 -0600 From: "Jacques A. Vidrine" To: Daniel Eischen Cc: Brian Somers , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: Threads in the base system Message-ID: <20001206165334.D64011@spawn.nectar.com> Mail-Followup-To: "Jacques A. Vidrine" , Daniel Eischen , Brian Somers , Robert Watson , freebsd-arch@FreeBSD.ORG References: <200012062138.eB6Lc6t07375@hak.lan.Awfulhak.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from eischen@vigrid.com on Wed, Dec 06, 2000 at 04:50:29PM -0500 X-Url: http://www.nectar.com/ Sender: nectar@nectar.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote: > I was just [re]thinking about this. When we get libpthread (work > has just started on this), then libc_r will eventually go away. > It's not clear yet whether libpthread will exist as a separate > entity or whether it will evolve from libc_r. For the ignorant (me), what is/will be the difference between libc_r and libpthread? Cheers, -- Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 15:22:31 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 15:22:29 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173]) by hub.freebsd.org (Postfix) with ESMTP id 26A9D37B400; Wed, 6 Dec 2000 15:22:27 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6NJ3m16686; Wed, 6 Dec 2000 23:19:03 GMT (envelope-from brian@hak.lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB6NLlt08622; Wed, 6 Dec 2000 23:21:47 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200012062321.eB6NLlt08622@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: "Jacques A. Vidrine" , Daniel Eischen , Brian Somers , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: Threads in the base system In-Reply-To: Message from "Jacques A. Vidrine" of "Wed, 06 Dec 2000 16:53:34 CST." <20001206165334.D64011@spawn.nectar.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 06 Dec 2000 23:21:47 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote: > > I was just [re]thinking about this. When we get libpthread (work > > has just started on this), then libc_r will eventually go away. > > It's not clear yet whether libpthread will exist as a separate > > entity or whether it will evolve from libc_r. > > For the ignorant (me), what is/will be the difference between libc_r and > libpthread? And me ! Besides, can't we put libpthread in libc_r's place when it goes away ? > Cheers, > -- > Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 16:44:49 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 16:44:47 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id 2507337B401 for ; Wed, 6 Dec 2000 16:44:46 -0800 (PST) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id BAA04665; Thu, 7 Dec 2000 01:44:39 +0100 (CET) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id BAA30753; Thu, 7 Dec 2000 01:44:39 +0100 (CET) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 7 Dec 2000 01:44:39 +0100 (CET) From: Marius Bendiksen To: Bosko Milekic Cc: Terry Lambert , arch@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Just about the only thing that may be considered is changing the > name of M_WAIT to something more appropriate, if it means so much to the > majority of people (honestly, I would find even doing this a waste of > time, but if lots of folks think it's worth educating kernel developers > by changing the name of a flag, then we might as well). This isn't much of an issue for me; however, I'd vote for changing the name. Not so much for "educating kernel developers", but rather for the sake of us being consistent and labelling things correctly. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Dec 6 19: 3:55 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 6 19:03:53 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 5951337B400; Wed, 6 Dec 2000 19:03:49 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id WAA29975; Wed, 6 Dec 2000 22:03:25 -0500 (EST) Date: Wed, 6 Dec 2000 22:03:25 -0500 (EST) From: Daniel Eischen To: Brian Somers Cc: "Jacques A. Vidrine" , Brian Somers , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: Threads in the base system In-Reply-To: <200012062321.eB6NLlt08622@hak.lan.Awfulhak.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 6 Dec 2000, Brian Somers wrote: > > On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote: > > > I was just [re]thinking about this. When we get libpthread (work > > > has just started on this), then libc_r will eventually go away. > > > It's not clear yet whether libpthread will exist as a separate > > > entity or whether it will evolve from libc_r. > > > > For the ignorant (me), what is/will be the difference between libc_r and > > libpthread? > > And me ! OK, libc_r is libc + threads; an application can't be linked to both libc_r and libc. libpthread is just the thread routines (at least those that aren't included in libc) and _is_ linked with libc. When you have libpthread, the gcc option "-pthread" goes away (which we use to link to libc_r and prevent linking to libc), and you link with "-lpthread". In theory, libc_r could be an archive of libc and libpthread. We may want to keep libc_r around for a while for compatibility reasons (without moving it to compat). But at some point, libc_r will cease to be built the way it is currently being built (to include libc). All the _THREAD_SAFE checks will be removed from libc. Instead, libc will contain stub routines for the needed lock operations. These will be weak symbols that will be overloaded with (non-weak symbol) routines of the same name in libpthread. When libpthread isn't linked in, then the null stub routines will be invoked. If libpthread is linked in, then the real lock routines will be called. > Besides, can't we put libpthread in libc_r's place when it goes away ? Yes, but it can't be used (linked to) the same way nor named the same. I guess my point is that applications in our base system that require threads will need !NOLIBC_R || !NOLIBPTHREAD. And NOLIBC_R will eventually become the default some time after libpthread gets integrated. It's a little confusing, but am I making sense? -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 1:21:13 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 01:21:10 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id 3337337B400 for ; Thu, 7 Dec 2000 01:21:10 -0800 (PST) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB79L5C60826; Thu, 7 Dec 2000 18:21:06 +0900 (JST) Date: Thu, 07 Dec 2000 18:21:04 +0900 Message-ID: From: Seigo Tanimura To: arch@freebsd.org Subject: Even 1GB KVA is not enough, but we have no more space Cc: Seigo Tanimura User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG As you may know, we now have a KVA space of 1GB. Some parts of our kernel, however, believes that they can scale up the size of memory to allocate in the KVA proportionally to the amount of physical memory. The result is again shortage of KVA space, but we cannot extend our KVA any further. (I understand that 1GB is the upper limit of KVA on i386, am I right?) The following is a mail I sent to Matt Dillon a few hours ago. On Thu, 07 Dec 2000 14:56:32 +0900, Seigo Tanimura said: Seigo> I recently bought a Dell PowerEdge 6400/700 with RAM of 3GB in my Seigo> lab. The box runs -current quite well, except that it panics upon Seigo> swapping out data pages. Seigo> Here is how the PowerEdge dies. swap_zone in vm/swap_pager.c is not Seigo> initialized because zinit() attempts to allocate for swblock entries Seigo> an entry of about 250MB, which does not fit in any free entries in Seigo> kernel_map. The pagedaemon eventally calls zalloc(swap_pages) in Seigo> swp_pager_meta_build() to build swap metadata, leading to dereference Seigo> of a NULL pointer. Another box of mine at home with 256MB RAM also Seigo> runs -current, but the swap pager works fine. Seigo> Attached is a patch to adjust the number of swap metadata entries so Seigo> that the metadata fits in the KVA. The number of the entries are Seigo> divided by 2 until zinit() succeeds. If the initial value of n in Seigo> swap_pager_swap_init() (which is cnt.v_page_count * 2) is too big or Seigo> zinit() does not succeed at all (hopefully not likely), you will see a Seigo> note or warning. zlist is cleaned up if zinitna() fails to avoid Seigo> vmstat -z messing up. (patch moved to the bottom of this mail) First my eye was only on the size of swap metadata, but that was shortsighted. After fixing allocation of swap metadata, my kernel died in ffs_vget(), when kernel_map held only one free page. I then estimated how big swap metadata grows up with respect to the amount of physical memory. We assume that the amount of swap metadata is proportional to the amount of physical memory, and that swap metadata takes 8% of physical memory (according to my measurement). The results are shown below. Physical Memory swap metadata 256M 20.5M 512M 41.0M 1G 81.9M 2G 163.8M 3G 245.8M 4G 327.7M So, on my PowerEdge, the kernel first attempts to allocate about 1/4 of the KVA for swap metadata. Although the size of swap metadata reduces to around 64MB with my patch, the size of the remaining free entry in kernel_map is only about 120MB. The solution I have is that we do not count the size of physical memory larger than the size of our KVA, or 1GB, upon estimating the size of KVA space to allocate in kernel_map. Hence the kernel allocates the same amount of memory for swap metadata or whatever, on a machine with 1GB, 2GB, 3GB and 4GB RAM. This solution might degrage the performance of our kernel, but you would have no other options than to switch to alpha or ia64 in order to expand the size of KVA. Thanks, and any comments, flames or whatever are welcome. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 1:29:38 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 01:29:34 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id A659937B400 for ; Thu, 7 Dec 2000 01:29:33 -0800 (PST) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB79TTC61332; Thu, 7 Dec 2000 18:29:30 +0900 (JST) Date: Thu, 07 Dec 2000 18:29:29 +0900 Message-ID: From: Seigo Tanimura To: tanimura@r.dl.itc.u-tokyo.ac.jp Cc: arch@freebsd.org Subject: Re: Even 1GB KVA is not enough, but we have no more space In-Reply-To: In your message of "Thu, 07 Dec 2000 18:21:04 +0900" References: Cc: Seigo Tanimura User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Aaugh, I forgot to attach my patch... As the previous mail of mine is somewhat long, I placed the following patches on the web, and added another one: URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff This is the one I sent to Matt. URI: http://people.FreeBSD.org/~tanimura/patches/vmstat.diff This allows vmstat(8) to show the amount of pages each zone holds. The result looks like this: tanimura@stella% vmstat -z ZONE used total pages mem-use PIPE 4 102 -1 0/15K SWAPMETA 0 0 15078 0/0K tcpcb 24 35 4624 12/18K unpcb 12 128 -1 0/8K ripcb 1 42 1632 0/7K tcpcb 0 0 4624 0/0K udpcb 36 84 1632 6/15K socket 74 126 1632 13/23K KNOTE 0 128 -1 0/8K NFSNODE 137 192 -1 42/60K NFSMOUNT 5 14 -1 2/7K VNODE 14310 14400 -1 3577/3600K NAMEI 0 16 -1 0/16K VMSPACE 78 162 -1 17/35K PROC 104 148 -1 55/78K DP fakepg 0 0 -1 0/0K PV ENTRY 151501 786421 16604 4142/21503K MAP ENTRY 1166 1658 -1 54/77K KMAP ENTRY 267 383 2262 12/17K MAP 7 47 -1 0/4K VM OBJECT 1259 1432 -1 118/134K -------------------------------------------------- TOTAL 8058/25633K -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 1:36:18 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 01:36:17 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 3371337B400 for ; Thu, 7 Dec 2000 01:36:17 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB79aBw29191; Thu, 7 Dec 2000 01:36:11 -0800 (PST) Date: Thu, 7 Dec 2000 01:36:11 -0800 From: Alfred Perlstein To: Seigo Tanimura Cc: arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space Message-ID: <20001207013611.O16205@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from tanimura@r.dl.itc.u-tokyo.ac.jp on Thu, Dec 07, 2000 at 06:29:29PM +0900 Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Seigo Tanimura [001207 01:29] wrote: > Aaugh, I forgot to attach my patch... > > As the previous mail of mine is somewhat long, I placed the following > patches on the web, and added another one: > > > URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff > > This is the one I sent to Matt. > possible problem: in the loop you use to allocate, you never test if 'n' hits zero, now if there's a swap problem you won't print anything, just wedge hard. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 1:48: 3 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 01:48:01 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id 47B4037B400 for ; Thu, 7 Dec 2000 01:48:00 -0800 (PST) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB79lvC62921; Thu, 7 Dec 2000 18:47:57 +0900 (JST) Date: Thu, 07 Dec 2000 18:47:57 +0900 Message-ID: From: Seigo Tanimura To: bright@wintelcom.net Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space In-Reply-To: In your message of "Thu, 7 Dec 2000 01:36:11 -0800" <20001207013611.O16205@fw.wintelcom.net> References: <20001207013611.O16205@fw.wintelcom.net> User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 7 Dec 2000 01:36:11 -0800, Alfred Perlstein said: >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff Alfred> in the loop you use to allocate, you never test if 'n' hits zero, Alfred> now if there's a swap problem you won't print anything, just wedge Alfred> hard. It should also be good to reject swapon(2) if swap_zone is NULL. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 1:57: 0 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 01:56:58 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 9F94337B402 for ; Thu, 7 Dec 2000 01:56:58 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB79upg29796; Thu, 7 Dec 2000 01:56:51 -0800 (PST) Date: Thu, 7 Dec 2000 01:56:51 -0800 From: Alfred Perlstein To: Seigo Tanimura Cc: arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space Message-ID: <20001207015651.P16205@fw.wintelcom.net> References: <20001207013611.O16205@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from tanimura@r.dl.itc.u-tokyo.ac.jp on Thu, Dec 07, 2000 at 06:47:57PM +0900 Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Seigo Tanimura [001207 01:48] wrote: > On Thu, 7 Dec 2000 01:36:11 -0800, > Alfred Perlstein said: > > >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff > > Alfred> in the loop you use to allocate, you never test if 'n' hits zero, > Alfred> now if there's a swap problem you won't print anything, just wedge > Alfred> hard. > > It should also be good to reject swapon(2) if swap_zone is NULL. Agreed. Since you've been pouring through this code, I'm wondering what happens when the swapper can't allocate as much as it wants? Does it just reduce the amount of swaping the machine can do? or is there a performance hit? or both? > > -- > Seigo Tanimura -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 2:22:14 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 02:22:13 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id AF10137B400 for ; Thu, 7 Dec 2000 02:22:12 -0800 (PST) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB7AM8C66385; Thu, 7 Dec 2000 19:22:09 +0900 (JST) Date: Thu, 07 Dec 2000 19:22:08 +0900 Message-ID: From: Seigo Tanimura To: bright@wintelcom.net Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space In-Reply-To: In your message of "Thu, 7 Dec 2000 01:56:51 -0800" <20001207015651.P16205@fw.wintelcom.net> References: <20001207013611.O16205@fw.wintelcom.net> <20001207015651.P16205@fw.wintelcom.net> User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 7 Dec 2000 01:56:51 -0800, Alfred Perlstein said: >> >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff >> Alfred> in the loop you use to allocate, you never test if 'n' hits zero, Alfred> now if there's a swap problem you won't print anything, just wedge Alfred> hard. >> >> It should also be good to reject swapon(2) if swap_zone is NULL. Alfred> Agreed. Since you've been pouring through this code, I'm wondering Alfred> what happens when the swapper can't allocate as much as it wants? Alfred> Does it just reduce the amount of swaping the machine can do? or Alfred> is there a performance hit? or both? Reduction of swap metadata entries primarily results in failure to allocate a metadata entry, limiting the maximum size of vm objects that can be used at a time. Another effect is for the pagedaemon to wait for a free matadata entry, slowing down the speed of swap out. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 9:18:42 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 09:18:40 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70]) by hub.freebsd.org (Postfix) with SMTP id 4A10737B400 for ; Thu, 7 Dec 2000 09:18:38 -0800 (PST) Received: (qmail 70321 invoked from network); 7 Dec 2000 17:16:28 -0000 Received: from unknown (HELO telehouse.ch) ([195.134.128.53]) (envelope-sender ) by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP for ; 7 Dec 2000 17:16:28 -0000 Message-ID: <3A2FC647.6EC4FFA7@telehouse.ch> Date: Thu, 07 Dec 2000 18:17:59 +0100 From: Andre Oppermann X-Mailer: Mozilla 4.74 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Seigo Tanimura Cc: arch@freebsd.org Subject: Re: Even 1GB KVA is not enough, but we have no more space References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Seigo Tanimura wrote: > > Aaugh, I forgot to attach my patch... > > As the previous mail of mine is somewhat long, I placed the following > patches on the web, and added another one: > > URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff > > This is the one I sent to Matt. > > URI: http://people.FreeBSD.org/~tanimura/patches/vmstat.diff > > This allows vmstat(8) to show the amount of pages each zone holds. The > result looks like this: > > tanimura@stella% vmstat -z > > ZONE used total pages mem-use > PIPE 4 102 -1 0/15K > SWAPMETA 0 0 15078 0/0K > tcpcb 24 35 4624 12/18K > unpcb 12 128 -1 0/8K > ripcb 1 42 1632 0/7K > tcpcb 0 0 4624 0/0K > udpcb 36 84 1632 6/15K > socket 74 126 1632 13/23K > KNOTE 0 128 -1 0/8K > NFSNODE 137 192 -1 42/60K > NFSMOUNT 5 14 -1 2/7K > VNODE 14310 14400 -1 3577/3600K > NAMEI 0 16 -1 0/16K > VMSPACE 78 162 -1 17/35K > PROC 104 148 -1 55/78K > DP fakepg 0 0 -1 0/0K > PV ENTRY 151501 786421 16604 4142/21503K > MAP ENTRY 1166 1658 -1 54/77K > KMAP ENTRY 267 383 2262 12/17K > MAP 7 47 -1 0/4K > VM OBJECT 1259 1432 -1 118/134K > -------------------------------------------------- > TOTAL 8058/25633K Wow, that looks good! For easier than the other stuff. -- Andre To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 10:52: 1 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 10:51:58 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id ADED637B400 for ; Thu, 7 Dec 2000 10:51:57 -0800 (PST) Received: from luanda-33.budapest.interware.hu ([195.70.51.33] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 144694-0004Iy-00; Thu, 07 Dec 2000 19:51:55 +0100 Sender: julian@FreeBSD.ORG Message-ID: <3A2F93C6.7967D1DA@elischer.org> Date: Thu, 07 Dec 2000 05:42:30 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Seigo Tanimura Cc: arch@freebsd.org Subject: Re: Even 1GB KVA is not enough, but we have no more space References: Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Seigo Tanimura wrote: [interesting stuff deleted] > The solution I have is that we do not count the size of physical > memory larger than the size of our KVA, or 1GB, upon estimating the > size of KVA space to allocate in kernel_map. Hence the kernel > allocates the same amount of memory for swap metadata or whatever, on > a machine with 1GB, 2GB, 3GB and 4GB RAM. This solution might degrage > the performance of our kernel, but you would have no other options > than to switch to alpha or ia64 in order to expand the size of KVA. > > Thanks, and any comments, flames or whatever are welcome. > THEORETICALLY it should be possible to put the kernel into a differnt KV space from the processes and give it 4GB. Practically, we'd have to do a lot to do this, and it may effect throughout (page tables loading in and out). It may however be worth looking at. Especially with the possibility of altering the system to allow the > 4GB physical ram that P6 and higher have. -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 11:38:18 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 11:38:16 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from beastie.mckusick.com (tserver.conference.usenix.org [209.179.127.3]) by hub.freebsd.org (Postfix) with ESMTP id 7CC4037B400 for ; Thu, 7 Dec 2000 11:38:15 -0800 (PST) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id LAA03622 for ; Thu, 7 Dec 2000 11:38:11 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200012071938.LAA03622@beastie.mckusick.com> To: arch@freebsd.org Subject: Getting Kernel Process Information Date: Thu, 07 Dec 2000 11:38:11 -0800 From: Kirk McKusick Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG For the third time in a week, I got the following message when I tried to run ps on my 5.X system: proc size mismatch (39776 total, 1136 chunks) This message arises when the size of the proc structure changes. With the current SMP development, the proc structure changes at a very high rate of speed. The current kinfo_proc interface used between the kernel and user processes is built from two pieces: struct kinfo_proc { struct proc kp_proc; struct eproc kp_eproc; } Kinfo_proc contains a copy of the kernel's proc structure followed by an `extended' proc structure which has lots of bits and pieces that have moved out of the proc structure or are otherwise needed. Any change to the kernel's version of the proc structure changes the size of the kinfo_proc structure and hence causes a mismatch when attempts are made to copy it out. I propose to change the kinfo_proc structure. The new kinfo_proc structure will contain only the stylized `extended' proc structure which will be augmented with the twenty fields that are actually referenced from the proc structure by user processes. By taking this approach, changes to the proc structure will not affect the format or size of the kinfo_proc structure returned to user processes. The new `extended' proc structure will have plenty of spare fields added to its end so that when new fields are added to the proc structure that user-level processes need/want to know about, they can be added without changing the size of the exported kinfo_proc structure and thus will not require recompilation of the dozen or so programs that use the exported interface. Note that even if 200 spare bytes are added to the kinfo_proc structure, it will still be smaller than the current one. Note that I am proposing to make this change only in the 5.X tree. I am not proposing that it be back ported to the 4.X tree. I am not interested in starting a long discussion on all the possible alternatives for exporting kernel information to user processes. I recognize that there are better ways to handle these issues. I am just trying to make an incremental change that is small in scope and hopefully will make an annoying problem significantly less common. With this caveat, comments are solicited. Kirk McKusick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 11:42:40 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 11:42:38 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mailout04.sul.t-online.com (mailout04.sul.t-online.com [194.25.134.18]) by hub.freebsd.org (Postfix) with ESMTP id 3B62937B401; Thu, 7 Dec 2000 11:42:37 -0800 (PST) Received: from fwd00.sul.t-online.com by mailout04.sul.t-online.com with smtp id 1446w3-00059o-04; Thu, 07 Dec 2000 20:42:31 +0100 Received: from neutron.cichlids.com (520050424122-0001@[62.225.193.245]) by fmrl00.sul.t-online.com with esmtp id 1446vq-2FIe5AC; Thu, 7 Dec 2000 20:42:18 +0100 Received: from cichlids.cichlids.com (cichlids.cichlids.com [192.168.0.10]) by neutron.cichlids.com (Postfix) with ESMTP id 336E9AB0C; Thu, 7 Dec 2000 20:42:18 +0100 (CET) Received: by cichlids.cichlids.com (Postfix, from userid 1001) id 1382314A86; Thu, 7 Dec 2000 20:42:16 +0100 (CET) Date: Thu, 7 Dec 2000 20:42:15 +0100 To: Orion Hodson Cc: freebsd-arch@FreeBSD.ORG, cg@FreeBSD.ORG Subject: Re: soundcard.h Message-ID: <20001207204215.A5787@cichlids.cichlids.com> References: <3737.976021730@cs.ucl.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3737.976021730@cs.ucl.ac.uk>; from O.Hodson@cs.ucl.ac.uk on Tue, Dec 05, 2000 at 01:08:50PM +0000 X-PGP-Fingerprint: 44 28 CA 4C 46 5B D3 A8 A8 E3 BA F3 4E 60 7D 7F X-PGP-at: finger alex@big.endian.de X-Verwirrung: Dieser Header dient der allgemeinen Verwirrung. From: alex@big.endian.de (Alexander Langer) X-Sender: 520050424122-0001@t-dialin.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Thus spake Orion Hodson (O.Hodson@cs.ucl.ac.uk): > into separate include files, i.e. snd_oss.h, snd_pcm.h, snd_mixer.h, > snd_sequencer.h, etc and have these included from soundcard.h. > Is there any strength of feeling for or against doing this? It's > completely aesthetic and very minor undertaking, but I don't mind > doing if people think it'd be reasonable. If this really helps someone (i.e. you or Cameron), I don't know why it shouldn't be done. Alex -- cat: /home/alex/.sig: No such file or directory To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 11:52:49 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 11:52:46 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id 0625137B401 for ; Thu, 7 Dec 2000 11:52:46 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eB7JqXm11711; Thu, 7 Dec 2000 11:52:33 -0800 (PST) (envelope-from dillon) Date: Thu, 7 Dec 2000 11:52:33 -0800 (PST) From: Matt Dillon Message-Id: <200012071952.eB7JqXm11711@earth.backplane.com> To: Seigo Tanimura Cc: bright@wintelcom.net, tanimura@r.dl.itc.u-tokyo.ac.jp, arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space References: <20001207013611.O16205@fw.wintelcom.net> <20001207015651.P16205@fw.wintelcom.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :On Thu, 7 Dec 2000 01:56:51 -0800, : Alfred Perlstein said: : :>> >> URI: http://people.FreeBSD.org/~tanimura/patches/vm.diff :>> :Alfred> in the loop you use to allocate, you never test if 'n' hits zero, :Alfred> now if there's a swap problem you won't print anything, just wedge :Alfred> hard. :>> :>> It should also be good to reject swapon(2) if swap_zone is NULL. : :Alfred> Agreed. Since you've been pouring through this code, I'm wondering :Alfred> what happens when the swapper can't allocate as much as it wants? : :Alfred> Does it just reduce the amount of swaping the machine can do? or :Alfred> is there a performance hit? or both? : :Reduction of swap metadata entries primarily results in failure to :allocate a metadata entry, limiting the maximum size of vm objects :that can be used at a time. Another effect is for the pagedaemon to :wait for a free matadata entry, slowing down the speed of swap out. : :-- :Seigo Tanimura Running out of swapmeta may not be an option. A system deadlock could result. The real problem here is that swapmeta is being reserved based on some multiple of main memory rather then based on the actual amount of swap allocated. Another possibility would be to reserve swap in larger chunks... that is, in SWAP_META_PAGES (16-page) chunks rather then page-sized chunks. The struct swblock structure would then turn into a single daddr_t (base swap address) and a bitmap (one int), reducing its size from 80 bytes to 24 bytes. The only problem with this is that the VM object collapse code needs to merge swap areas on a page-by-page basis, so it isn't entirely trivial. Another possibility would be to have some way to swap the swblock structures themselves, relegating the SWAPMETA zone to a cache. Also not trivial. In anycase, your stopgap patch seems reasonable in concept until we can come up with a better solution. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 11:56:19 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 11:56:17 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 7CF1E37B400 for ; Thu, 7 Dec 2000 11:56:17 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB7JuGg15731; Thu, 7 Dec 2000 11:56:16 -0800 (PST) Date: Thu, 7 Dec 2000 11:56:16 -0800 From: Alfred Perlstein To: Kirk McKusick Cc: arch@FreeBSD.ORG Subject: Re: Getting Kernel Process Information Message-ID: <20001207115616.V16205@fw.wintelcom.net> References: <200012071938.LAA03622@beastie.mckusick.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200012071938.LAA03622@beastie.mckusick.com>; from mckusick@mckusick.com on Thu, Dec 07, 2000 at 11:38:11AM -0800 Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Kirk McKusick [001207 11:38] wrote: > For the third time in a week, I got the following message when I > tried to run ps on my 5.X system: > > proc size mismatch (39776 total, 1136 chunks) > > This message arises when the size of the proc structure changes. > With the current SMP development, the proc structure changes at > a very high rate of speed. The current kinfo_proc interface used > between the kernel and user processes is built from two pieces: > > struct kinfo_proc { > struct proc kp_proc; > struct eproc kp_eproc; > } > > Kinfo_proc contains a copy of the kernel's proc structure > followed by an `extended' proc structure which has lots > of bits and pieces that have moved out of the proc structure > or are otherwise needed. Any change to the kernel's version > of the proc structure changes the size of the kinfo_proc > structure and hence causes a mismatch when attempts are made > to copy it out. > > I propose to change the kinfo_proc structure. The new > kinfo_proc structure will contain only the stylized `extended' > proc structure which will be augmented with the twenty > fields that are actually referenced from the proc structure > by user processes. By taking this approach, changes to the > proc structure will not affect the format or size of the > kinfo_proc structure returned to user processes. The new > `extended' proc structure will have plenty of spare fields > added to its end so that when new fields are added to the > proc structure that user-level processes need/want to know > about, they can be added without changing the size of the > exported kinfo_proc structure and thus will not require > recompilation of the dozen or so programs that use the > exported interface. Note that even if 200 spare bytes are > added to the kinfo_proc structure, it will still be smaller > than the current one. I completely agree that should be done. My suggestion is to completely rip out and kernel structs being passed through this interface, the reason is that we will need mutexes in a lot of them and we don't want to export that to userland. I was looking at this the other week when trying to clean up the struct ucred issues and thought it was a good idea, but a bit more work than I had in mind at the time. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 12:26:13 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 12:26:11 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 339C237B400 for ; Thu, 7 Dec 2000 12:26:11 -0800 (PST) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eB7KPC726457; Thu, 7 Dec 2000 12:25:12 -0800 (PST) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.1/8.11.0) id eB7KPFn65390; Thu, 7 Dec 2000 12:25:15 -0800 (PST) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20001207115616.V16205@fw.wintelcom.net> Date: Thu, 07 Dec 2000 12:25:15 -0800 (PST) Organization: BSD, Inc. From: John Baldwin To: Alfred Perlstein Subject: Re: Getting Kernel Process Information Cc: arch@FreeBSD.ORG, Kirk McKusick Sender: jhb@foo.osd.bsdi.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 07-Dec-00 Alfred Perlstein wrote: > * Kirk McKusick [001207 11:38] wrote: >> For the third time in a week, I got the following message when I >> tried to run ps on my 5.X system: >> >> proc size mismatch (39776 total, 1136 chunks) >> >> This message arises when the size of the proc structure changes. >> With the current SMP development, the proc structure changes at >> a very high rate of speed. The current kinfo_proc interface used >> between the kernel and user processes is built from two pieces: >> >> struct kinfo_proc { >> struct proc kp_proc; >> struct eproc kp_eproc; >> } [ snip ] > I completely agree that should be done. My suggestion is to > completely rip out and kernel structs being passed through > this interface, the reason is that we will need mutexes in > a lot of them and we don't want to export that to userland. He is, he's just bulking up the eproc that gets created in fill_eproc() so that proc doesn't need to be exported at all. It sounds like an excellent and noteworth goal, esp. since the KSE work is going to make this even more bizarre and confusing. :) -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 13:37:25 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 13:37:22 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (fw2.aub.dk [195.24.1.195]) by hub.freebsd.org (Postfix) with ESMTP id EB1D537B400 for ; Thu, 7 Dec 2000 13:37:21 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id eB7LbBL92763; Thu, 7 Dec 2000 22:37:12 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Kirk McKusick Cc: arch@FreeBSD.ORG Subject: Re: Getting Kernel Process Information In-Reply-To: Your message of "Thu, 07 Dec 2000 11:38:11 PST." <200012071938.LAA03622@beastie.mckusick.com> Date: Thu, 07 Dec 2000 22:37:11 +0100 Message-ID: <92761.976225031@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200012071938.LAA03622@beastie.mckusick.com>, Kirk McKusick writes: >I propose to change the kinfo_proc structure. The new >kinfo_proc structure will contain only the stylized `extended' >proc structure which will be augmented with the twenty >fields that are actually referenced from the proc structure >by user processes. Yes! -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 13:47: 8 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 13:47:06 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from cs.utep.edu (mail.cs.utep.edu [129.108.5.3]) by hub.freebsd.org (Postfix) with ESMTP id 1401337B400 for ; Thu, 7 Dec 2000 13:47:04 -0800 (PST) Received: from gecko (gecko [129.108.5.51]) by cs.utep.edu (8.10.1/8.10.1) with ESMTP id eB7LkWn25281; Thu, 7 Dec 2000 14:46:32 -0700 (MST) Date: Thu, 7 Dec 2000 14:46:32 -0700 (MST) From: X-Sender: To: Kirk McKusick Cc: Subject: Re: Getting Kernel Process Information In-Reply-To: <200012071938.LAA03622@beastie.mckusick.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > to user processes. I recognize that there are better ways > to handle these issues. I am just trying to make an What are some of the better ways of handling the issue? JAn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 13:54:29 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 13:54:26 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252]) by hub.freebsd.org (Postfix) with ESMTP id 9339A37B401; Thu, 7 Dec 2000 13:54:24 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7LqZx24603; Thu, 7 Dec 2000 21:52:35 GMT (envelope-from brian@lan.awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7Lt7G51311; Thu, 7 Dec 2000 21:55:07 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Daniel Eischen Cc: Brian Somers , "Jacques A. Vidrine" , Robert Watson , freebsd-arch@FreeBSD.ORG, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: Message from Daniel Eischen of "Wed, 06 Dec 2000 22:03:25 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 07 Dec 2000 21:55:07 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > On Wed, 6 Dec 2000, Brian Somers wrote: > > > On Wed, Dec 06, 2000 at 04:50:29PM -0500, Daniel Eischen wrote: > > > > I was just [re]thinking about this. When we get libpthread (work > > > > has just started on this), then libc_r will eventually go away. > > > > It's not clear yet whether libpthread will exist as a separate > > > > entity or whether it will evolve from libc_r. > > > > > > For the ignorant (me), what is/will be the difference between libc_r and > > > libpthread? > > > > And me ! > > OK, libc_r is libc + threads; an application can't be linked to both > libc_r and libc. libpthread is just the thread routines (at least > those that aren't included in libc) and _is_ linked with libc. When > you have libpthread, the gcc option "-pthread" goes away (which we use > to link to libc_r and prevent linking to libc), and you link with > "-lpthread". In theory, libc_r could be an archive of libc and > libpthread. > > We may want to keep libc_r around for a while for compatibility > reasons (without moving it to compat). But at some point, libc_r > will cease to be built the way it is currently being built (to > include libc). All the _THREAD_SAFE checks will be removed from > libc. Instead, libc will contain stub routines for the needed > lock operations. These will be weak symbols that will be overloaded > with (non-weak symbol) routines of the same name in libpthread. > When libpthread isn't linked in, then the null stub routines > will be invoked. If libpthread is linked in, then the real lock > routines will be called. > > > Besides, can't we put libpthread in libc_r's place when it goes away ? > > Yes, but it can't be used (linked to) the same way nor named the > same. > > I guess my point is that applications in our base system that > require threads will need !NOLIBC_R || !NOLIBPTHREAD. And NOLIBC_R > will eventually become the default some time after libpthread gets > integrated. > > It's a little confusing, but am I making sense? Yes. I'd tend to just say that we do it in these stages: 1. Remove NOLIBC_R 2. Eventually introduce libpthread 3. Change all Makefiles that say -pthread to say -lpthread 4. Blow away libc_r With whatever gap is required between each step. We *could* replace item 1 with ``don't build -pthread programs if NOLIBC_R'' and change that to NOLIBPTHREAD when item 3 is done, but I'd say it's better to encourage threads and not give these options. > -- > Dan Eischen -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 13:55: 5 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 13:55:03 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from beastie.mckusick.com (tserver.conference.usenix.org [209.179.127.3]) by hub.freebsd.org (Postfix) with ESMTP id 9A75F37B400 for ; Thu, 7 Dec 2000 13:55:02 -0800 (PST) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id NAA04017; Thu, 7 Dec 2000 13:54:43 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200012072154.NAA04017@beastie.mckusick.com> To: janb@cs.utep.edu Subject: Re: Getting Kernel Process Information Cc: arch@FreeBSD.ORG In-Reply-To: Your message of "Thu, 07 Dec 2000 14:46:32 MST." Date: Thu, 07 Dec 2000 13:54:43 -0800 From: Kirk McKusick Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG From: janb@cs.utep.edu Date: Thu, 7 Dec 2000 14:46:32 -0700 (MST) To: Kirk McKusick cc: Subject: Re: Getting Kernel Process Information In-Reply-To: <200012071938.LAA03622@beastie.mckusick.com> > to user processes. I recognize that there are better ways > to handle these issues. I am just trying to make an What are some of the better ways of handling the issue? JAn See Terry Lambert's commentary on the exporting of the ucred structure that appeared about a week ago on this list for a good overview of the alternatives. Kirk McKusick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 13:59:42 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 13:59:41 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from hub.lovett.com (hub.lovett.com [216.60.121.161]) by hub.freebsd.org (Postfix) with ESMTP id 7D49F37B400 for ; Thu, 7 Dec 2000 13:59:40 -0800 (PST) Received: from ade by hub.lovett.com with local (Exim 3.16 #1) id 14494U-000EIF-00; Thu, 07 Dec 2000 15:59:22 -0600 Date: Thu, 7 Dec 2000 15:59:22 -0600 From: Ade Lovett To: Brian Somers Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Threads in the base system Message-ID: <20001207155922.I46011@FreeBSD.org> References: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org>; from brian@Awfulhak.org on Thu, Dec 07, 2000 at 09:55:07PM +0000 Sender: ade@lovett.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, Dec 07, 2000 at 09:55:07PM +0000, Brian Somers wrote: > 1. Remove NOLIBC_R > 2. Eventually introduce libpthread > 3. Change all Makefiles that say -pthread to say -lpthread > 4. Blow away libc_r With an OSVERSION bump at stage 3, so that the whole slew of ports that currently mangle -lpthread to -pthread can DTRT between 4.x and 5.x Please? :) -aDe -- Ade Lovett, Austin, TX. ade@FreeBSD.org FreeBSD: The Power to Serve http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 14: 1:45 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 14:01:43 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 41F8E37B400; Thu, 7 Dec 2000 14:01:43 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id RAA08109; Thu, 7 Dec 2000 17:01:19 -0500 (EST) Date: Thu, 7 Dec 2000 17:01:18 -0500 (EST) From: Daniel Eischen To: Brian Somers Cc: Brian Somers , "Jacques A. Vidrine" , Robert Watson , freebsd-arch@FreeBSD.ORG, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: <200012072155.eB7Lt7G51311@hak.lan.Awfulhak.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 7 Dec 2000, Brian Somers wrote: > I'd tend to just say that we do it in these stages: > > 1. Remove NOLIBC_R > 2. Eventually introduce libpthread > 3. Change all Makefiles that say -pthread to say -lpthread > 4. Blow away libc_r > > With whatever gap is required between each step. We *could* replace > item 1 with ``don't build -pthread programs if NOLIBC_R'' and change > that to NOLIBPTHREAD when item 3 is done, but I'd say it's better to > encourage threads and not give these options. OK, that's fine with me. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 15:23:46 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 15:23:44 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252]) by hub.freebsd.org (Postfix) with ESMTP id A42A937B400; Thu, 7 Dec 2000 15:23:43 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7NM0x25292; Thu, 7 Dec 2000 23:22:00 GMT (envelope-from brian@lan.awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.1/8.11.1) with ESMTP id eB7NOVG52190; Thu, 7 Dec 2000 23:24:31 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200012072324.eB7NOVG52190@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Ade Lovett Cc: Brian Somers , freebsd-arch@FreeBSD.org, brian@Awfulhak.org Subject: Re: Threads in the base system In-Reply-To: Message from Ade Lovett of "Thu, 07 Dec 2000 15:59:22 CST." <20001207155922.I46011@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 07 Dec 2000 23:24:31 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > On Thu, Dec 07, 2000 at 09:55:07PM +0000, Brian Somers wrote: > > 1. Remove NOLIBC_R > > 2. Eventually introduce libpthread > > 3. Change all Makefiles that say -pthread to say -lpthread > > 4. Blow away libc_r > > With an OSVERSION bump at stage 3, so that the whole slew of > ports that currently mangle -lpthread to -pthread can DTRT > between 4.x and 5.x > > Please? :) I agree, maybe with another version bump at stage 2 and 4 too (version bumps are cheap). It would also be nice if the ports were smart enough to probe for -lpthread's existence and DTRT on that basis. > -aDe > > -- > Ade Lovett, Austin, TX. ade@FreeBSD.org > FreeBSD: The Power to Serve http://www.FreeBSD.org/ -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 19:35:52 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 19:35:50 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 9E8BD37B400 for ; Thu, 7 Dec 2000 19:35:42 -0800 (PST) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id UAA15393; Thu, 7 Dec 2000 20:31:27 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAAB0aW.D; Thu Dec 7 20:31:23 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id UAA02960; Thu, 7 Dec 2000 20:35:26 -0700 (MST) From: Terry Lambert Message-Id: <200012080335.UAA02960@usr08.primenet.com> Subject: Re: Getting Kernel Process Information To: bright@wintelcom.net (Alfred Perlstein) Date: Fri, 8 Dec 2000 03:35:26 +0000 (GMT) Cc: mckusick@mckusick.com (Kirk McKusick), arch@FreeBSD.ORG In-Reply-To: <20001207115616.V16205@fw.wintelcom.net> from "Alfred Perlstein" at Dec 07, 2000 11:56:16 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr08.primenet.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG [ ... reorg of proc structure to keep it from being a PITA ... ] > I completely agree that should be done. My suggestion is to > completely rip out and kernel structs being passed through > this interface, the reason is that we will need mutexes in > a lot of them and we don't want to export that to userland. Let me remind you that copying data out of /dev/kmem into user space from structures like this is inherenetly MP-unsafe. Without holding the mutex, you can not guarantee that the structure contents will not change out from under the user space process while it is in the middle of copying them out. Ignoring the obvious things, like divide-by-zero errors, this is mostly a problem for programs trying to do list traversal, as opposed to particular data objects (unless they contain pointers themselves). Right now, the BGL protects us from this. Please do not build a soloution which will not work on MP systems, once the BGL is removed. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 19:39:39 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 19:39:36 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 9D07637B402 for ; Thu, 7 Dec 2000 19:39:34 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB83dWV29351; Thu, 7 Dec 2000 19:39:32 -0800 (PST) Date: Thu, 7 Dec 2000 19:39:32 -0800 From: Alfred Perlstein To: Terry Lambert Cc: Kirk McKusick , arch@FreeBSD.ORG Subject: Re: Getting Kernel Process Information Message-ID: <20001207193932.F16205@fw.wintelcom.net> References: <20001207115616.V16205@fw.wintelcom.net> <200012080335.UAA02960@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200012080335.UAA02960@usr08.primenet.com>; from tlambert@primenet.com on Fri, Dec 08, 2000 at 03:35:26AM +0000 Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Terry Lambert [001207 19:35] wrote: > [ ... reorg of proc structure to keep it from being a PITA ... ] > > > I completely agree that should be done. My suggestion is to > > completely rip out and kernel structs being passed through > > this interface, the reason is that we will need mutexes in > > a lot of them and we don't want to export that to userland. > > Let me remind you that copying data out of /dev/kmem into user > space from structures like this is inherenetly MP-unsafe. > > Without holding the mutex, you can not guarantee that the > structure contents will not change out from under the user > space process while it is in the middle of copying them out. > > Ignoring the obvious things, like divide-by-zero errors, this > is mostly a problem for programs trying to do list traversal, > as opposed to particular data objects (unless they contain > pointers themselves). > > Right now, the BGL protects us from this. > > Please do not build a soloution which will not work on MP > systems, once the BGL is removed. I agree with you, however Kirk's idea doesn't make this impossible, we can later have a sysctl that (for this case) looks up and locks the proc then copies it out in eproc (or whatever it's called) format with proper locking. One step at a time. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Dec 7 20:57:20 2000 From owner-freebsd-arch@FreeBSD.ORG Thu Dec 7 20:57:19 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id CCDFD37B401 for ; Thu, 7 Dec 2000 20:57:18 -0800 (PST) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with ESMTP id eB84utC84096; Fri, 8 Dec 2000 13:56:57 +0900 (JST) Date: Fri, 08 Dec 2000 13:56:55 +0900 Message-ID: From: Seigo Tanimura To: oppermann@telehouse.ch Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, arch@freebsd.org Subject: Re: Even 1GB KVA is not enough, but we have no more space In-Reply-To: In your message of "Thu, 07 Dec 2000 18:17:59 +0100" <3A2FC647.6EC4FFA7@telehouse.ch> References: <3A2FC647.6EC4FFA7@telehouse.ch> User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 07 Dec 2000 18:17:59 +0100, Andre Oppermann said: >> tanimura@stella% vmstat -z >> >> ZONE used total pages mem-use >> PIPE 4 102 -1 0/15K >> SWAPMETA 0 0 15078 0/0K >> tcpcb 24 35 4624 12/18K (snip) >> -------------------------------------------------- >> TOTAL 8058/25633K Andre> Wow, that looks good! For easier than the other stuff. Remark: -1 in pages means that you cannot allocate items from this zone. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 4:13:40 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 04:13:37 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 20F5537B400 for ; Fri, 8 Dec 2000 04:13:37 -0800 (PST) Received: from kairo-51.budapest.interware.hu ([195.70.50.115] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 144MP7-0006vQ-00; Fri, 08 Dec 2000 13:13:33 +0100 Sender: julian@FreeBSD.ORG Message-ID: <3A3000F6.52E9B1D9@elischer.org> Date: Thu, 07 Dec 2000 13:28:22 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Kirk McKusick Cc: arch@freebsd.org Subject: Re: Getting Kernel Process Information References: <200012071938.LAA03622@beastie.mckusick.com> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Kirk McKusick wrote: > > For the third time in a week, I got the following message when I > tried to run ps on my 5.X system: > > proc size mismatch (39776 total, 1136 chunks) > > This message arises when the size of the proc structure changes. > With the current SMP development, the proc structure changes at > a very high rate of speed. The current kinfo_proc interface used > between the kernel and user processes is built from two pieces: > > struct kinfo_proc { > struct proc kp_proc; > struct eproc kp_eproc; > } > > Kinfo_proc contains a copy of the kernel's proc structure > followed by an `extended' proc structure which has lots > of bits and pieces that have moved out of the proc structure > or are otherwise needed. Any change to the kernel's version > of the proc structure changes the size of the kinfo_proc > structure and hence causes a mismatch when attempts are made > to copy it out. > > I propose to change the kinfo_proc structure. The new > kinfo_proc structure will contain only the stylized `extended' > proc structure which will be augmented with the twenty > fields that are actually referenced from the proc structure > by user processes. By taking this approach, changes to the > proc structure will not affect the format or size of the > kinfo_proc structure returned to user processes. The new > `extended' proc structure will have plenty of spare fields > added to its end so that when new fields are added to the > proc structure that user-level processes need/want to know > about, they can be added without changing the size of the > exported kinfo_proc structure and thus will not require > recompilation of the dozen or so programs that use the > exported interface. Note that even if 200 spare bytes are > added to the kinfo_proc structure, it will still be smaller > than the current one. A good idea. I would like to add that if we get our war and split struct proc into : 1/ struct proc 2/ schedulabel entity 3/ Sleepable entity 4/ (possibly a linking structure for the above) then all this would have to change anyhow. It seems possible that your change might insulate us from the pain of that happenning. When is the information copied from the proc structure into the kinfo_proc stucture? In the case of the threaded split world we are considering some of the numbers would be totals from the subprocesses schedulable entities. > > Note that I am proposing to make this change only in the > 5.X tree. I am not proposing that it be back ported to the > 4.X tree. > > I am not interested in starting a long discussion on all > the possible alternatives for exporting kernel information > to user processes. I recognize that there are better ways > to handle these issues. I am just trying to make an > incremental change that is small in scope and hopefully > will make an annoying problem significantly less common. > With this caveat, comments are solicited. go for it > > Kirk McKusick > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 6:22: 7 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 06:22:04 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from point.osg.gov.bc.ca (point.osg.gov.bc.ca [142.32.102.44]) by hub.freebsd.org (Postfix) with ESMTP id 0D90837B401; Fri, 8 Dec 2000 06:22:04 -0800 (PST) Received: (from daemon@localhost) by point.osg.gov.bc.ca (8.8.7/8.8.8) id GAA24826; Fri, 8 Dec 2000 06:21:46 -0800 Received: from passer.osg.gov.bc.ca(142.32.110.29) via SMTP by point.osg.gov.bc.ca, id smtpda24823; Fri Dec 8 06:21:39 2000 Received: (from uucp@localhost) by passer.osg.gov.bc.ca (8.11.1/8.9.1) id eB8ELN666899; Fri, 8 Dec 2000 06:21:23 -0800 (PST) Received: from cwsys9.cwsent.com(10.2.2.1), claiming to be "cwsys.cwsent.com" via SMTP by passer9.cwsent.com, id smtpdI66893; Fri Dec 8 06:20:46 2000 Received: (from uucp@localhost) by cwsys.cwsent.com (8.11.1/8.9.1) id eB8EKfN82161; Fri, 8 Dec 2000 06:20:41 -0800 (PST) Message-Id: <200012081420.eB8EKfN82161@cwsys.cwsent.com> Received: from localhost.cwsent.com(127.0.0.1), claiming to be "cwsys" via SMTP by localhost.cwsent.com, id smtpdq82155; Fri Dec 8 06:20:20 2000 X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 Reply-To: Cy Schubert - ITSD Open Systems Group From: Cy Schubert - ITSD Open Systems Group X-OS: FreeBSD 4.2-RELEASE X-Sender: cy To: "Michael C . Wu" Cc: Peter Jeremy , Poul-Henning Kamp , arch@FreeBSD.ORG Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... In-reply-to: Your message of "Thu, 30 Nov 2000 21:47:45 CST." <20001130214745.E28757@peorth.iteration.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 08 Dec 2000 06:20:20 -0800 Sender: cy@uumail.gov.bc.ca Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <20001130214745.E28757@peorth.iteration.net>, "Michael C . Wu" write s: > On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled: > | On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp wrote: > | >Has anybody run a 486 or 386 under current recently ? > | > | X on a PRE_SMPNG 486 is painful - mouse movements no longer make > | the X pointer move in real time. I haven't noticed the seeding > | issue (probably just luck). > > PRE_SMPNG does not have the /dev/random seeding issue. > > You actually expected X to run well on a 486? :-) > > | >What is the consensus ? > | > | I think 386/486 remains a significant market and would not like to > | see support dropped. I'd go so far as to suggest that if -current > | does drop support for the 386/486, the then-stable version will need > | to be actively maintained indefinitely to provide continued support. > > I do not really think the latest XFree86 versions were designed > with running 386/486 in mind. 386/486 is still a market, but > not many people try to build an embedded system with a full X > and tools. Interesting. At home I use a 486DX33 as an X terminal. As long as I run all of my X clients, including the window manager on my server, a P120, performance is quite accetable. Regards, Phone: (250)387-8437 Cy Schubert Fax: (250)387-5766 Team Leader, Sun/DEC Team Internet: Cy.Schubert@osg.gov.bc.ca Open Systems Group, ITSD, ISTA Province of BC To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 7: 9:36 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 07:09:34 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id B4A3B37B402 for ; Fri, 8 Dec 2000 07:09:33 -0800 (PST) Received: from newsguy.com (p60-dn01kiryunisiki.gunma.ocn.ne.jp [211.0.245.61]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id AAA26083; Sat, 9 Dec 2000 00:09:26 +0900 (JST) Message-ID: <3A30E21F.846E3863@newsguy.com> Date: Fri, 08 Dec 2000 22:29:03 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: janb@cs.utep.edu Cc: Kirk McKusick , arch@FreeBSD.ORG Subject: Re: Getting Kernel Process Information References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG janb@cs.utep.edu wrote: > > > to user processes. I recognize that there are better ways > > to handle these issues. I am just trying to make an > > What are some of the better ways of handling the issue? Userland kobj. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.great.underground.bsdconpiracy.org "The bronze landed last, which canceled that method of impartial choice." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 9:38:52 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 09:38:50 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from beastie.mckusick.com (tserver.conference.usenix.org [209.179.127.3]) by hub.freebsd.org (Postfix) with ESMTP id E98BA37B400 for ; Fri, 8 Dec 2000 09:38:49 -0800 (PST) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id JAA04933; Fri, 8 Dec 2000 09:38:47 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200012081738.JAA04933@beastie.mckusick.com> To: Alfred Perlstein Subject: Re: Getting Kernel Process Information Cc: Terry Lambert , arch@FreeBSD.ORG In-Reply-To: Your message of "Thu, 07 Dec 2000 19:39:32 PST." <20001207193932.F16205@fw.wintelcom.net> Date: Fri, 08 Dec 2000 09:38:47 -0800 From: Kirk McKusick Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Date: Thu, 7 Dec 2000 19:39:32 -0800 From: Alfred Perlstein To: Terry Lambert Cc: Kirk McKusick , arch@FreeBSD.ORG Subject: Re: Getting Kernel Process Information * Terry Lambert [001207 19:35] wrote: > [ ... reorg of proc structure to keep it from being a PITA ... ] > > > I completely agree that should be done. My suggestion is to > > completely rip out and kernel structs being passed through > > this interface, the reason is that we will need mutexes in > > a lot of them and we don't want to export that to userland. > > Let me remind you that copying data out of /dev/kmem into user > space from structures like this is inherenetly MP-unsafe. > > Without holding the mutex, you can not guarantee that the > structure contents will not change out from under the user > space process while it is in the middle of copying them out. > > Ignoring the obvious things, like divide-by-zero errors, this > is mostly a problem for programs trying to do list traversal, > as opposed to particular data objects (unless they contain > pointers themselves). > > Right now, the BGL protects us from this. > > Please do not build a soloution which will not work on MP > systems, once the BGL is removed. I agree with you, however Kirk's idea doesn't make this impossible, we can later have a sysctl that (for this case) looks up and locks the proc then copies it out in eproc (or whatever it's called) format with proper locking. One step at a time. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." We already use sysctl to get the proc information out of the kernel. The traversal of the proc entry to gather up the information is done in kern/kern_proc.c function fill_kinfo_proc. So, any and all locking that needs to be done can be done there. The libkvm code uses sysctl to get the desired proc entries when running on a live system. It also knows how to grub through a crash dump to essentially duplicate the fill_kinfo_proc, but that is not intended to be used on live kernels. Kirk To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 9:53:31 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 09:53:30 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rover.village.org (rover.village.org [204.144.255.66]) by hub.freebsd.org (Postfix) with ESMTP id E0A7537B400; Fri, 8 Dec 2000 09:53:25 -0800 (PST) Received: from harmony.village.org (harmony.village.org [10.0.0.6]) by rover.village.org (8.11.0/8.11.0) with ESMTP id eB8HrKs51907; Fri, 8 Dec 2000 10:53:20 -0700 (MST) (envelope-from imp@harmony.village.org) Received: from harmony.village.org (localhost.village.org [127.0.0.1]) by harmony.village.org (8.9.3/8.8.3) with ESMTP id KAA14232; Fri, 8 Dec 2000 10:53:20 -0700 (MST) Message-Id: <200012081753.KAA14232@harmony.village.org> To: Cy Schubert - ITSD Open Systems Group Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... Cc: "Michael C . Wu" , Peter Jeremy , Poul-Henning Kamp , arch@FreeBSD.ORG In-reply-to: Your message of "Fri, 08 Dec 2000 06:20:20 PST." <200012081420.eB8EKfN82161@cwsys.cwsent.com> References: <200012081420.eB8EKfN82161@cwsys.cwsent.com> Date: Fri, 08 Dec 2000 10:53:20 -0700 From: Warner Losh Sender: imp@harmony.village.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200012081420.eB8EKfN82161@cwsys.cwsent.com> Cy Schubert - ITSD Open Systems Group writes: : > I do not really think the latest XFree86 versions were designed : > with running 386/486 in mind. 386/486 is still a market, but : > not many people try to build an embedded system with a full X : > and tools. : : Interesting. At home I use a 486DX33 as an X terminal. As long as I : run all of my X clients, including the window manager on my server, a : P120, performance is quite accetable. We run X on our embedded product on 486 class machines. It still works fairly well. Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 11: 3:22 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 11:03:21 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 18CDC37B401 for ; Fri, 8 Dec 2000 11:03:21 -0800 (PST) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eB8J2B760003; Fri, 8 Dec 2000 11:02:11 -0800 (PST) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.1/8.11.0) id eB8J2DU75205; Fri, 8 Dec 2000 11:02:13 -0800 (PST) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <3A3000F6.52E9B1D9@elischer.org> Date: Fri, 08 Dec 2000 11:02:13 -0800 (PST) Organization: BSD, Inc. From: John Baldwin To: Julian Elischer Subject: Re: Getting Kernel Process Information Cc: arch@FreeBSD.ORG, Kirk McKusick Sender: jhb@foo.osd.bsdi.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > A good idea. > I would like to add that if we get our war and split > struct proc > into : > 1/ struct proc > 2/ schedulabel entity > 3/ Sleepable entity > 4/ (possibly a linking structure for the above) > then all this would have to change anyhow. It seems possible > that your change might insulate us from the pain of that > happenning. It will do this (insulation). > When is the information copied from the proc structure into the > kinfo_proc stucture? fill_eproc(). You can at that time decide what you need to stuff in each eproc structure. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 8 21:25:19 2000 From owner-freebsd-arch@FreeBSD.ORG Fri Dec 8 21:25:17 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id 07CAA37B400 for ; Fri, 8 Dec 2000 21:25:17 -0800 (PST) Received: (from uucp@localhost) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W-rina.r-0.1-11.01.2000) with UUCP id eB95PES21947; Sat, 9 Dec 2000 14:25:14 +0900 (JST) Received: from silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4Wpre/3.7W) with ESMTP id eB95O6t42800; Sat, 9 Dec 2000 14:24:07 +0900 (JST) Date: Sat, 09 Dec 2000 14:24:06 +0900 Message-ID: <86elzi88w9.wl@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> From: Seigo Tanimura To: dillon@earth.backplane.com Cc: tanimura@r.dl.itc.u-tokyo.ac.jp, bright@wintelcom.net, arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space In-Reply-To: In your message of "Thu, 7 Dec 2000 11:52:33 -0800 (PST)" <200012071952.eB7JqXm11711@earth.backplane.com> References: <20001207013611.O16205@fw.wintelcom.net> <20001207015651.P16205@fw.wintelcom.net> <200012071952.eB7JqXm11711@earth.backplane.com> User-Agent: Wanderlust/1.1.1 (Purple Rain) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) Organization: Carrots MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 7 Dec 2000 11:52:33 -0800 (PST), Matt Dillon said: Matt> In anycase, your stopgap patch seems reasonable in concept until we Matt> can come up with a better solution. And the saga continues. After regulating the size of struct swblock, ffs_vget() failed to allocalte a new vnode. At the time the PowerEdge failed, the kernel held around 197K vnodes, which is as large as 46MB. This time I reduced the size of kmem_map, the pool of malloc(9). Although we reserve at most 200MB + mbufs + mbuf clusters for kmem_map, most of the space is not likely to be in use. For example, the bottom line of vmstat -m on the PowerEdge said that only 32MB out of 200MB was used by malloc(9). Counting the actual usage, 100MB should be enough for the malloc(9) pool. Since malloc(9) always wire down allocated pages, you should allocate memory by malloc(9) only if the size of memory to allocate is constant; otherwise you would always have to consider how much wirable pages a user has. Hence it makes no sense to simply scale up the malloc(9) pool size only to waste free entries in kmem_map unreusable. Memory for the device driver framework is a good example of malloc(9) usage because we are not likely to scale up the number of cards on a motherboard in 5, 10 or 20 years. Scaling up is something more than just scaling up parameters. If you see a ceiling, you have to watch out your head. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Dec 9 12: 3: 2 2000 From owner-freebsd-arch@FreeBSD.ORG Sat Dec 9 12:03:00 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 7295337B400 for ; Sat, 9 Dec 2000 12:02:59 -0800 (PST) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id LAA10626; Sat, 9 Dec 2000 11:54:39 -0800 (PST) Message-Id: <200012091954.LAA10626@implode.root.com> To: Seigo Tanimura Cc: arch@FreeBSD.ORG Subject: Re: Even 1GB KVA is not enough, but we have no more space In-reply-to: Your message of "Thu, 07 Dec 2000 18:21:04 +0900." From: David Greenman Reply-To: dg@root.com Date: Sat, 09 Dec 2000 11:54:39 -0800 Sender: dg@implode.root.com Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >extend our KVA any further. (I understand that 1GB is the upper limit >of KVA on i386, am I right?) No, there isn't any limit, except for nearly all 4GB of the virtual memory for the kernel. freesoftware.com and cdrom.com both run with 2GB of KVA space. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message