From owner-freebsd-arch@FreeBSD.ORG Tue Apr 13 03:00:50 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5817416A4CE for ; Tue, 13 Apr 2004 03:00:50 -0700 (PDT) Received: from smtp.tele.fi (smtp.tele.fi [192.89.123.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id 77EB643D4C for ; Tue, 13 Apr 2004 03:00:49 -0700 (PDT) (envelope-from Teemu.Parkkinen@patria.fi) Received: from hubns01.patria.fi (unknown [193.209.168.1]) by smtp.tele.fi (Postfix) with ESMTP id 0B37C46557 for ; Tue, 13 Apr 2004 13:00:48 +0300 (EEST) To: freebsd-arch@freebsd.org X-Mailer: Lotus Notes Release 5.0.8 June 18, 2001 Message-ID: From: Teemu.Parkkinen@patria.fi Date: Tue, 13 Apr 2004 12:44:35 +0300 X-MIMETrack: Serialize by Router on HUBNS01/HUB/PATRIA(Release 5.0.11 |July 24, 2002) at 13.04.2004 13:01:40 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Subject: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2004 10:00:50 -0000 Hi all, I am about to write a digital tv-driver for my dvb-c -card. Because FreeBSD does not yet have any dvb-devices and I don't have any prior driver development experience, I have a couple of questions for you. 1) Should we use Linux-DVB API as a reference, or should we consider some changes to it? The API seems to be constantly changing and improving. Version 3 is available here: http://www.linuxtv.org/download/dvb/linux-dvb-api-1.0.0.pdf but they are currently working on version 4. In my opinion, the API should be minimal, but complete, so there is no need to constantly add new features to it. 2) As linux kernel is GPL-licensed, I cannot just port the linux driver to FreeBSD, right? In other words, we have to write the driver from scratch. In this case we don't have to stick with the Linux DVB-API and therefore I suggest that we give think the api through before deciding how we implement it (do we follow linux api or not). 3) Do you have any pointers to good books or other documentation on how to write device drivers for UNIX (BSD)? I already have read those from FreeBSD documentation, but a decent book would be handy. - Teemu Parkkinen From owner-freebsd-arch@FreeBSD.ORG Tue Apr 13 03:43:43 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9CC9516A4CE for ; Tue, 13 Apr 2004 03:43:43 -0700 (PDT) Received: from srv01.sparkit.no (srv01.sparkit.no [193.69.116.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id D2F2243D48 for ; Tue, 13 Apr 2004 03:43:42 -0700 (PDT) (envelope-from eivind@FreeBSD.org) Received: from ws ([193.69.114.88]) by srv01.sparkit.no (8.12.10/8.12.10) with ESMTP id i3DAhbcZ098916; Tue, 13 Apr 2004 12:43:37 +0200 (CEST) (envelope-from eivind@FreeBSD.org) Received: from ws (localhost [127.0.0.1]) by ws (8.12.9/8.12.10) with ESMTP id i3DAgU7K002995; Tue, 13 Apr 2004 10:42:30 GMT (envelope-from eivind@ws) Received: (from eivind@localhost) by ws (8.12.9/8.12.10/Submit) id i3DAgU2F002887; Tue, 13 Apr 2004 10:42:30 GMT (envelope-from eivind) Date: Tue, 13 Apr 2004 10:41:28 +0000 From: Eivind Eklund To: Teemu.Parkkinen@patria.fi Message-ID: <20040413104128.GA2625@FreeBSD.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i cc: freebsd-arch@FreeBSD.org Subject: Re: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2004 10:43:43 -0000 On Tue, Apr 13, 2004 at 12:44:35PM +0300, Teemu.Parkkinen@patria.fi wrote: > Hi all, > > I am about to write a digital tv-driver for my dvb-c -card. [...] > 1) Should we use Linux-DVB API as a reference, or should we consider some > changes to it? The API seems to be constantly changing and improving. > Version 3 is available here: > http://www.linuxtv.org/download/dvb/linux-dvb-api-1.0.0.pdf > but they are currently working on version 4. In my opinion, the API should > be minimal, but complete, so there is no need to constantly add new > features to it. If the API in your evaluation is reasonably OK, we should stick with it (or a subset of it). Being directly compatible gives a lot of benefits. You may also want to look at the Brooktree (bktr) and Meteor (meteor) APIs already in FreeBSD. Ideally, all of these similar would be available through a single unified API, at least for the parts that are common. > 2) As linux kernel is GPL-licensed, I cannot just port the linux driver to > FreeBSD, right? That's more or less correct. We CAN port over drivers, but we really really prefer native (non-GPLed) drivers, as that lets us ship them compiled in, and allows the kind of re-use of the code that we want to allow. > In other words, we have to write the driver from scratch. In this case > we don't have to stick with the Linux DVB-API and therefore I suggest > that we give think the api through before deciding how we implement > it (do we follow linux api or not). What issues do you see with the Linux API that would make us want to change it? > 3) Do you have any pointers to good books or other documentation on how to > write device drivers for UNIX (BSD)? I already have read those from > FreeBSD documentation, but a decent book would be handy. The Design and Implementation of 4.4BSD contains some information, but it is getting somewhat dated - you need to check all information in it. Eivind. From owner-freebsd-arch@FreeBSD.ORG Tue Apr 13 04:40:21 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 93A4616A4CE; Tue, 13 Apr 2004 04:40:21 -0700 (PDT) Received: from smtp.tele.fi (smtp.tele.fi [192.89.123.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id D641C43D45; Tue, 13 Apr 2004 04:40:20 -0700 (PDT) (envelope-from Teemu.Parkkinen@patria.fi) Received: from hubns01.patria.fi (unknown [193.209.168.1]) by smtp.tele.fi (Postfix) with ESMTP id 618721FDEB; Tue, 13 Apr 2004 14:40:19 +0300 (EEST) To: Eivind Eklund X-Mailer: Lotus Notes Release 5.0.8 June 18, 2001 Message-ID: From: Teemu.Parkkinen@patria.fi Date: Tue, 13 Apr 2004 14:40:21 +0300 X-MIMETrack: Serialize by Router on HUBNS01/HUB/PATRIA(Release 5.0.11 |July 24, 2002) at 13.04.2004 14:41:12 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii cc: freebsd-arch@FreeBSD.org Subject: Re: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2004 11:40:21 -0000 > > 1) Should we use Linux-DVB API as a reference, or should we consider some > > changes to it? The API seems to be constantly changing and improving. > > Version 3 is available here: > > http://www.linuxtv.org/download/dvb/linux-dvb-api-1.0.0.pdf > > but they are currently working on version 4. In my opinion, the API should > > be minimal, but complete, so there is no need to constantly add new > > features to it. > > > If the API in your evaluation is reasonably OK, we should stick with it > (or a subset of it). Being directly compatible gives a lot of benefits. I agree. > > In other words, we have to write the driver from scratch. In this case > > we don't have to stick with the Linux DVB-API and therefore I suggest > > that we give think the api through before deciding how we implement > > it (do we follow linux api or not). > > What issues do you see with the Linux API that would make us want to > change it? I would prefer not to change it. I think it's quite ok, yet it still seems to be evolving a bit. I don't think however, that this is a big issue because dvb has been included in 2.6 linux-ernels and therefore I doubt that they want to change it much. I will take a look on the most current linux api and implement FreeBSD driver based on that. If you have ideas on this that I should take into account, please let me know. -Teemu From owner-freebsd-arch@FreeBSD.ORG Mon Apr 12 16:56:17 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 721B016A4CF for ; Mon, 12 Apr 2004 16:56:17 -0700 (PDT) Received: from UABEXMB2.ad.uab.edu (uabex2n2.ad.uab.edu [138.26.5.245]) by mx1.FreeBSD.org (Postfix) with ESMTP id C20D243D54 for ; Mon, 12 Apr 2004 16:56:16 -0700 (PDT) (envelope-from nickchri@uab.edu) x-mimeole: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message MIME-Version: 1.0 Date: Mon, 12 Apr 2004 18:56:16 -0500 Message-ID: <7C93F21AD56849408985C3478EE83BA601650BCA@UABEXMB2.ad.uab.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Platform architecture Thread-Index: AcQg6b3pflJDU30rQBGnviSrEfWBlg== From: "Nicholas M Christian" To: X-Mailman-Approved-At: Tue, 13 Apr 2004 04:47:44 -0700 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.1 Subject: Platform architecture X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Apr 2004 23:56:17 -0000 What platform version do I need to download for a Pentium 3 and a MAC G3 systems! =20 Thank you! =20 From owner-freebsd-arch@FreeBSD.ORG Mon Apr 12 20:41:02 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 06B7B16A4CE; Mon, 12 Apr 2004 20:41:02 -0700 (PDT) Received: from smtp0.server.rpi.edu (smtp0.server.rpi.edu [128.113.53.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9F25443D45; Mon, 12 Apr 2004 20:41:01 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp0.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i3D3f0Ed023843; Mon, 12 Apr 2004 23:41:00 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: Date: Mon, 12 Apr 2004 23:40:59 -0400 To: freebsd-ports@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) X-Mailman-Approved-At: Tue, 13 Apr 2004 04:47:44 -0700 Subject: Second "RFC" on pkg-data idea for ports X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2004 03:41:02 -0000 [this is BCC'ed to -hackers and -arch just so everyone has a chance to see it, but I expect the bulk of the discussion should take place on the freebsd-ports mailing list] Back in January I send out a long-ish email asking for feedback on some ideas I had for the ports-collection. I received a fair number of comments, and have finally re-organized my ideas into a few web pages. Hopefully these will make more sense. Initially I had written the ideas up as a bunch of Wiki pages, but the machine holding that Wiki died, taking with it all the pages I had written... I have not proof-read the web pages, so there's probably some spelling mistakes and odd sentences on them. Apologies for that, but I wanted to get *something* sent out this week. This project has been blocked due to a lack of time on my part, and I want to get it moving again... The basic idea is to collapse many of the separate files for a port into a single pkg-data file. The web pages explain why I think this might be worth doing. Please check them out at: http://people.freebsd.org/~gad/PkgData/ Some of the work for this has been done, mainly just to see how well it might work out. The project is still probably more work than Darren and I can finish, so we might limit ourselves to a subset of the idea. For instance, we might start out by just collapsing the distinfo, pkg-plist, and "files/patches-*" files into a pkg-data file, and leave the other files for some later project. What I'd like is some idea of whether this project is worth pursuing. If not, then Darren and I will concentrate on some other, less disruptive project. If people like the general idea of this project, then we'll see how much of we can do. If we have some of the details wrong, then let us know what we need to change or where we need to look for more information. I know that I am not a full-fledged expert in every facet of the ports collection, and I am not looking to ram some ideas down everyone's throat. I just think that some change like this one could be useful for the ports collection, and I'm trying to come up with something that everyone sees as useful. If this project does not seem like it would be worth the effort, then that will be perfectly okay too. Please let me know what you think. Also, please read the web pages before responding. It took me a fair amount of time to write the web pages that are there (as lame as they are...), and I'd rather not have to retype all of that into a long series of disjointed messages... -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-arch@FreeBSD.ORG Tue Apr 13 08:22:47 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 833F116A4CE for ; Tue, 13 Apr 2004 08:22:47 -0700 (PDT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 67C2143D68 for ; Tue, 13 Apr 2004 08:22:47 -0700 (PDT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.10/8.12.3) with ESMTP id i3DFMXkS021565; Tue, 13 Apr 2004 08:22:34 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i3DFMXK3021564; Tue, 13 Apr 2004 08:22:33 -0700 Date: Tue, 13 Apr 2004 08:22:33 -0700 From: Brooks Davis To: Nicholas M Christian Message-ID: <20040413152233.GA20550@Odin.AC.HMC.Edu> References: <7C93F21AD56849408985C3478EE83BA601650BCA@UABEXMB2.ad.uab.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy" Content-Disposition: inline In-Reply-To: <7C93F21AD56849408985C3478EE83BA601650BCA@UABEXMB2.ad.uab.edu> User-Agent: Mutt/1.5.4i X-Virus-Scanned: by amavisd-milter (http://amavis.org/) on odin.ac.hmc.edu cc: freebsd-arch@freebsd.org Subject: Re: Platform architecture X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2004 15:22:47 -0000 --KsGdsel6WgEHnImy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 12, 2004 at 06:56:16PM -0500, Nicholas M Christian wrote: > What platform version do I need to download for a Pentium 3 and a MAC G3 > systems! For Pentium 3, you want i386. Some day PPC may support your Mac, but it's too early to run that port unless you plan to work on it. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --KsGdsel6WgEHnImy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFAfAW4XY6L6fI4GtQRAqH9AKCoYyb7RGZDnS8Czx5Yp7Jfp1j47wCfZk2u ZSpswWRXIx37lXt2NGB05VA= =4b/m -----END PGP SIGNATURE----- --KsGdsel6WgEHnImy-- From owner-freebsd-arch@FreeBSD.ORG Tue Apr 13 13:50:42 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E4DE16A4CE for ; Tue, 13 Apr 2004 13:50:42 -0700 (PDT) Received: from fidel.freesurf.fr (fidel.freesurf.fr [212.43.206.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 121E343D45 for ; Tue, 13 Apr 2004 13:50:42 -0700 (PDT) (envelope-from nsouch@smtp.freesurf.fr) Received: from smtp.freesurf.fr (62-240-249-21.adsl.freesurf.fr [62.240.249.21]) by fidel.freesurf.fr (Postfix) with SMTP id 02B052A5230 for ; Tue, 13 Apr 2004 22:50:40 +0200 (CEST) Received: (qmail 11381 invoked by uid 1001); 13 Apr 2004 23:03:45 -0000 Date: Tue, 13 Apr 2004 23:03:45 +0000 From: Nicolas Souchu To: Teemu.Parkkinen@patria.fi Message-ID: <20040413230345.A11245@armor.freesurf.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from Teemu.Parkkinen@patria.fi on Tue, Apr 13, 2004 at 12:44:35PM +0300 cc: freebsd-arch@freebsd.org Subject: Re: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2004 20:50:42 -0000 On Tue, Apr 13, 2004 at 12:44:35PM +0300, Teemu.Parkkinen@patria.fi wrote: > Hi all, > > I am about to write a digital tv-driver for my dvb-c -card. Because FreeBSD > does not yet have > any dvb-devices and I don't have any prior driver development experience, I > have a couple > of questions for you. Your project is interesting and I'd be happy to know some web page to read regulary, where you'd put your progress. Nicholas -- Nicholas Souchu - nsouch@free.fr - nsouch@FreeBSD.org http://www.freebsd.org/~nsouch/kgi4BSD From owner-freebsd-arch@FreeBSD.ORG Wed Apr 14 01:30:19 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F306716A4CE for ; Wed, 14 Apr 2004 01:30:18 -0700 (PDT) Received: from smtp0.euronet.nl (smtp0.euronet.nl [194.134.35.141]) by mx1.FreeBSD.org (Postfix) with ESMTP id 35BDA43D60 for ; Wed, 14 Apr 2004 01:30:16 -0700 (PDT) (envelope-from dodell@sitetronics.com) Received: from sitetronics.com (zp-c-13e65.mxs.adsl.euronet.nl [81.69.92.101]) by smtp0.euronet.nl (Postfix) with ESMTP id 36CD02470A for ; Wed, 14 Apr 2004 10:30:14 +0200 (MEST) Message-ID: <407CF5B8.2060909@sitetronics.com> Date: Wed, 14 Apr 2004 10:26:32 +0200 From: "Devon H. O'Dell" User-Agent: Mozilla Thunderbird 0.5 (Windows/20040207) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-arch@freebsd.org Content-Type: multipart/mixed; boundary="------------060800030601020705080005" Subject: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2004 08:30:19 -0000 This is a multi-part message in MIME format. --------------060800030601020705080005 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hello arch@, cperciva@ noted to me a few days ago that he'd received an email from a person who had found a user-exploitable panic in FreeBSD involving POSIX-type advisory-mode locks. The theory of the exploit is that one creates a significantly large file (>3 times the size of the amount of kernel memory) and lock non-contiguous chunks of the file. I've seen and tested a proof-of-concept for this exploit; it certainly panics the system with a non-privileged account. Per cperciva@'s suggestion, I'm asking if arch@ would like to give me some input / feedback on this patch. The enclosed patches (available on http://freebsd0.sitetronics.com/~dodell/patches/lockfix.tar.gz as well) fix this problem by creating a new rlimit that limits the maximum number of POSIX-type advisory-mode locks a user can hold at one time. There are a couple of immediate problems I see with the patch myself: 1) I have to pass a struct proc * to change_ruid. If a user changes his/her uid, the number of advisory-mode locks needs to be transferred to the new uid and the only way I figured to do that would to be to count the number of advisory-mode locks held by a process (I didn't need to track this across a fork() since POSIX locks are not inherited between processes). This means I have to move the definition out of /sys/sys/ucred.h and into /sys/kern/kern_prot.c. It also means that OSF/1 compatibility becomes broken on Alpha, since the setuid() in osf1_misc.c calls the change_[r|e|sv]uid functions that have been implemented in /sys/kern/kern_prot.c. Solutions to this include: a) creating a SuSv3-compatible setuid for use with the OSF/1, SVR4 and Linux compat ABIs (since using the BSD setuid for these also isn't totally correct). b) require sys/proc.h wherever sys/ucred.h is included (which is very ugly) c) move the check out of change_ruid() (I don't think this is correct since the chgproccnt() function is called there as well) d) re-write the OSF/1 compatibility code to use its own change_ruid_osf1() function (bloated) 2) Terms I've used for variables are a little bit ambiguous. struct flock (BSD-style) locks, which will lock a whole file are also ``advisory-mode'' locks. This being the first time I've worked in depth with the kernel and SuSv3 specifications, I didn't know that there was a difference. My sysctl variable is also rather ugly. What names would you guys suggest I use instead of ``maxadvlocks''? I was thinking something along the lines of running a s/advlock/posixlock/ on the code. This is a simple fix, but an important one nonetheless. 3) Does this work justify my going through the modified files and doing style(9) changes on them? I'm willing to do this; mux@ has encouraged it; style(9) suggests that I do it if my code comprises 50% or more of the new files (which it doesn't). Again, if this is useful, I'll certainly do this. 4) I had to change lf_split() to return a value, since a lock is potentially allocated inside it, and I couldn't previously return from that function if the split created a lock overflowing the user's limit. Is this a problem? 5) With regards to SMPng-fu, mine is probably sub-par compared to yours; are my locks / assertions correct? Am I missing a uidinfo lock in chgadvlockcnt()? 6) I'm not sure that 8192 locks is enough for normal user operations, but I haven't been able to test this on a desktop or server. Someone who has been able to reported that 8192 was not enough to run Enlightenment (or anything that talked to gconfd), but I think this was before I realized that F_POSIX locks were different. 7) As previously mentioned, a modification in /sys/sys/proc.h adds an int p_numadvlocks; to struct proc. Is this acceptable? I've not done anything special to lock it and I've marked it with the status (*). Is this something I need to fix? 8) Are any of the modifications I've made too intrusive to the [proc|advlock|rlimit|sysctl] subsystem(s)? 9) What (extra) suggestions would you have for my patches for relevant manpages? 10) Have I missed any userland utilities that don't use libutil to check/set classes/limits (perhaps there are some in ports that I can patch as well)? This patch is against April 13th -CURRENT but backporting it is very simple since the main affected subsystem doesn't change much architecturally / structurally. However, this also brings into light that this problem may also affect the other BSDs (Dragonfly, Net, Open, Ekko). I cannot verify this as I do not have much experience with these other BSDs and do not know if they impose any limits on the amount of kernel memory a user can have or any other limits which would disallow this to exploit to ``work''. Should they be affected, what do I need to do to alert them of this? Sorry for the somewhat ``needy'' email; again, this is the first time I've developed a kernel-level patch and I'm not extremely familiar with the architectural requirements for developing one. Thanks for your time! Kind regards, Devon H. O'Dell --------------060800030601020705080005 Content-Type: text/plain; name="lockfix-etc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="lockfix-etc.patch" diff -ur etc/login.conf etc_lockfix/login.conf --- etc/login.conf Tue Jun 25 21:04:37 2002 +++ etc_lockfix/login.conf Tue Apr 13 12:48:10 2004 @@ -33,6 +33,7 @@ :coredumpsize=unlimited:\ :openfiles=unlimited:\ :maxproc=unlimited:\ + :advlocks=unlimited:\ :sbsize=unlimited:\ :vmemoryuse=unlimited:\ :priority=0:\ --------------060800030601020705080005 Content-Type: text/plain; name="lockfix-bin.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="lockfix-bin.patch" diff -ur bin/sh/miscbltin.c bin_lockfix/sh/miscbltin.c --- bin/sh/miscbltin.c Wed Apr 14 00:13:05 2004 +++ bin_lockfix/sh/miscbltin.c Wed Apr 14 00:12:27 2004 @@ -342,6 +342,9 @@ #ifdef RLIMIT_SBSIZE { "sbsize", "bytes", RLIMIT_SBSIZE, 1, 'b' }, #endif +#ifdef RLIMIT_ADVLOCK + { "advlocks", (char *)0, RLIMIT_ADVLOCK, 1, 'k' }, +#endif { (char *) 0, (char *)0, 0, 0, '\0' } }; @@ -358,7 +361,7 @@ struct rlimit limit; what = 'f'; - while ((optc = nextopt("HSatfdsmcnuvlb")) != '\0') + while ((optc = nextopt("HSatfdsmcnuvlbk")) != '\0') switch (optc) { case 'H': how = HARD; --------------060800030601020705080005 Content-Type: text/plain; name="lockfix-lib.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="lockfix-lib.patch" diff -ur lib/libc/sys/getrlimit.2 lib_lockfix/libc/sys/getrlimit.2 --- lib/libc/sys/getrlimit.2 Tue Apr 13 23:53:52 2004 +++ lib_lockfix/libc/sys/getrlimit.2 Tue Apr 13 23:58:24 2004 @@ -98,6 +98,9 @@ The maximum size (in bytes) of socket buffer usage for this user. This limits the amount of network memory, and hence the amount of mbufs, that this user may hold at any time. +.It Li RLIMIT_ADVLOCK +The maximum number of POSIX-type (lockf(3) style) advisory-mode +locks avilable to this user. .El .Pp A resource limit is specified as a soft limit and a hard limit. When a diff -ur lib/libutil/login.conf.5 lib_lockfix/libutil/login.conf.5 --- lib/libutil/login.conf.5 Wed Apr 14 00:03:21 2004 +++ lib_lockfix/libutil/login.conf.5 Wed Apr 14 00:00:28 2004 @@ -167,6 +167,7 @@ .It "sbsize size Maximum permitted socketbuffer size. .It "vmemoryuse size Maximum permitted total VM usage per process. .It "stacksize size Maximum stack size limit. +.It "advlocks size Maximum number of POSIX-type advisory-mode locks. .El .Pp These resource limit entries actually specify both the maximum diff -ur lib/libutil/login_class.c lib_lockfix/libutil/login_class.c --- lib/libutil/login_class.c Tue Apr 13 12:49:17 2004 +++ lib_lockfix/libutil/login_class.c Tue Apr 13 12:43:38 2004 @@ -59,6 +59,7 @@ { "coredumpsize", login_getcapsize, RLIMIT_CORE }, { "sbsize", login_getcapsize, RLIMIT_SBSIZE }, { "vmemoryuse", login_getcapsize, RLIMIT_VMEM }, + { "advlocks", login_getcapnum, RLIMIT_ADVLOCKS }, { NULL, 0, 0 } }; --------------060800030601020705080005 Content-Type: text/plain; name="lockfix-sys.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="lockfix-sys.patch" diff -ur sys/kern/kern_lockf.c sys_lockfix/kern/kern_lockf.c --- sys/kern/kern_lockf.c Tue Apr 13 12:43:16 2004 +++ sys_lockfix/kern/kern_lockf.c Tue Apr 13 23:34:55 2004 @@ -50,6 +50,7 @@ #include #include #include +#include /* * This variable controls the maximum number of processes that will @@ -80,7 +81,7 @@ lf_getblock(struct lockf *); static int lf_getlock(struct lockf *, struct flock *); static int lf_setlock(struct lockf *); -static void lf_split(struct lockf *, struct lockf *); +static int lf_split(struct lockf *, struct lockf *); static void lf_wakelock(struct lockf *); /* @@ -100,6 +101,7 @@ { register struct flock *fl = ap->a_fl; register struct lockf *lock; + struct proc *pp = (struct proc *)0; off_t start, end, oadd; int error; @@ -156,6 +158,19 @@ /* * Create the lockf structure */ + if (ap->a_flags & F_POSIX) { + pp = (struct proc *)ap->a_id; + if (ap->a_op == F_SETLK) { + if (!chgadvlockcnt(pp, 1, lim_max(pp, RLIMIT_ADVLOCK))) + return (ENOLCK); + } else { + /* + * We are allowed this lock because we will free it + * no matter the outcome of the operation. + */ + chgadvlockcnt(pp, 1, 0); + } + } MALLOC(lock, struct lockf *, sizeof *lock, M_LOCKF, M_WAITOK); lock->lf_start = start; lock->lf_end = end; @@ -181,15 +196,21 @@ case F_UNLCK: error = lf_clearlock(lock); + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); FREE(lock, M_LOCKF); return (error); case F_GETLK: error = lf_getlock(lock, fl); + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); FREE(lock, M_LOCKF); return (error); default: + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); free(lock, M_LOCKF); return (EINVAL); } @@ -204,11 +225,14 @@ register struct lockf *lock; { register struct lockf *block; + struct proc *pp = (struct proc *)0; struct lockf **head = lock->lf_head; struct lockf **prev, *overlap, *ltmp; static char lockstr[] = "lockf"; int ovcase, priority, needtolink, error; - + + if (lock->lf_flags & F_POSIX) + pp = (struct proc *)lock->lf_id; #ifdef LOCKF_DEBUG if (lockf_debug & 1) lf_print("lf_setlock", lock); @@ -229,6 +253,8 @@ * Free the structure and return if nonblocking. */ if ((lock->lf_flags & F_WAIT) == 0) { + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); FREE(lock, M_LOCKF); return (EAGAIN); } @@ -265,6 +291,10 @@ wproc = (struct proc *)waitblock->lf_id; if (wproc == (struct proc *)lock->lf_id) { mtx_unlock_spin(&sched_lock); + if (lock->lf_flags & + F_POSIX) + chgadvlockcnt(pp, + -1, 0); free(lock, M_LOCKF); return (EDEADLK); } @@ -309,6 +339,8 @@ lock->lf_next = NOLOCKF; } if (error) { + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); free(lock, M_LOCKF); return (error); } @@ -324,6 +356,7 @@ prev = head; block = *head; needtolink = 1; + for (;;) { ovcase = lf_findoverlap(block, lock, SELF, &prev, &overlap); if (ovcase) @@ -350,10 +383,13 @@ * If downgrading lock, others may be * able to acquire it. */ + /* No new locks are created; no need to check rlim */ if (lock->lf_type == F_RDLCK && overlap->lf_type == F_WRLCK) lf_wakelock(overlap); overlap->lf_type = lock->lf_type; + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); FREE(lock, M_LOCKF); lock = overlap; /* for debug output below */ break; @@ -363,6 +399,8 @@ * Check for common starting point and different types. */ if (overlap->lf_type == lock->lf_type) { + if (lock->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); free(lock, M_LOCKF); lock = overlap; /* for debug output below */ break; @@ -371,8 +409,9 @@ *prev = lock; lock->lf_next = overlap; overlap->lf_start = lock->lf_end + 1; - } else - lf_split(overlap, lock); + } else + if (lf_split(overlap, lock) == ENOLCK) + return (ENOLCK); lf_wakelock(overlap); break; @@ -381,6 +420,7 @@ * If downgrading lock, others may be able to * acquire it, otherwise take the list. */ + /* No new locks */ if (lock->lf_type == F_RDLCK && overlap->lf_type == F_WRLCK) { lf_wakelock(overlap); @@ -404,6 +444,8 @@ needtolink = 0; } else *prev = overlap->lf_next; + if (overlap->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); free(overlap, M_LOCKF); continue; @@ -453,9 +495,10 @@ register struct lockf *unlock; { struct lockf **head = unlock->lf_head; + struct proc *pp = (struct proc *)0; register struct lockf *lf = *head; struct lockf *overlap, **prev; - int ovcase; + int ovcase; if (lf == NOLOCKF) return (0); @@ -466,6 +509,8 @@ lf_print("lf_clearlock", unlock); #endif /* LOCKF_DEBUG */ prev = head; + if (unlock->lf_flags & F_POSIX) + pp = (struct proc *)unlock->lf_id; while ((ovcase = lf_findoverlap(lf, unlock, SELF, &prev, &overlap))) { /* * Wakeup the list of locks to be retried. @@ -476,6 +521,8 @@ case 1: /* overlap == lock */ *prev = overlap->lf_next; + if (overlap->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); FREE(overlap, M_LOCKF); break; @@ -484,13 +531,16 @@ overlap->lf_start = unlock->lf_end + 1; break; } - lf_split(overlap, unlock); + if (lf_split(overlap, unlock) == ENOLCK) + return (ENOLCK); overlap->lf_next = unlock->lf_next; break; case 3: /* lock contains overlap */ *prev = overlap->lf_next; lf = overlap->lf_next; + if (overlap->lf_flags & F_POSIX) + chgadvlockcnt(pp, -1, 0); free(overlap, M_LOCKF); continue; @@ -691,13 +741,16 @@ * Split a lock and a contained region into * two or three locks as necessary. */ -static void +static int lf_split(lock1, lock2) register struct lockf *lock1; register struct lockf *lock2; { register struct lockf *splitlock; + struct proc *pp = (struct proc *)0; + if (lock1->lf_flags & F_POSIX) + pp = (struct proc *)lock1->lf_id; #ifdef LOCKF_DEBUG if (lockf_debug & 2) { lf_print("lf_split", lock1); @@ -710,14 +763,19 @@ if (lock1->lf_start == lock2->lf_start) { lock1->lf_start = lock2->lf_end + 1; lock2->lf_next = lock1; - return; + return (1); } if (lock1->lf_end == lock2->lf_end) { lock1->lf_end = lock2->lf_start - 1; lock2->lf_next = lock1->lf_next; lock1->lf_next = lock2; - return; + return (1); } + + if (lock1->lf_flags & F_POSIX) + if (!chgadvlockcnt(pp, 1, lim_max(pp, RLIMIT_ADVLOCK))) + return (ENOLCK); + /* * Make a new lock consisting of the last part of * the encompassing lock @@ -733,6 +791,7 @@ splitlock->lf_next = lock1->lf_next; lock2->lf_next = splitlock; lock1->lf_next = lock2; + return (1); } /* diff -ur sys/kern/kern_mib.c sys_lockfix/kern/kern_mib.c --- sys/kern/kern_mib.c Tue Apr 13 12:43:16 2004 +++ sys_lockfix/kern/kern_mib.c Tue Apr 13 12:54:43 2004 @@ -45,6 +45,7 @@ #include #include #include +#include #include #include #include @@ -111,6 +112,9 @@ SYSCTL_INT(_kern, KERN_MAXPROCPERUID, maxprocperuid, CTLFLAG_RW, &maxprocperuid, 0, "Maximum processes allowed per userid"); + +SYSCTL_INT(_kern, KERN_MAXADVLOCKSPERUID, maxadvlocksperuid, CTLFLAG_RW, + &maxadvlocksperuid, 0, "Maximum number of advisory-mode locks per userid"); SYSCTL_INT(_kern, OID_AUTO, maxusers, CTLFLAG_RDTUN, &maxusers, 0, "Hint for kernel tuning"); diff -ur sys/kern/kern_prot.c sys_lockfix/kern/kern_prot.c --- sys/kern/kern_prot.c Tue Apr 13 12:43:16 2004 +++ sys_lockfix/kern/kern_prot.c Tue Apr 13 23:50:43 2004 @@ -63,6 +63,9 @@ #include #include +void change_ruid(struct ucred *newcred, struct uidinfo *ruip, + struct proc *pp); + static MALLOC_DEFINE(M_CRED, "cred", "credentials"); SYSCTL_DECL(_security); @@ -550,7 +553,7 @@ * Set the real uid and transfer proc count to new user. */ if (uid != oldcred->cr_ruid) { - change_ruid(newcred, uip); + change_ruid(newcred, uip, p); setsugid(p); } /* @@ -865,7 +868,7 @@ setsugid(p); } if (ruid != (uid_t)-1 && oldcred->cr_ruid != ruid) { - change_ruid(newcred, ruip); + change_ruid(newcred, ruip, p); setsugid(p); } if ((ruid != (uid_t)-1 || newcred->cr_uid != newcred->cr_ruid) && @@ -990,7 +993,7 @@ setsugid(p); } if (ruid != (uid_t)-1 && oldcred->cr_ruid != ruid) { - change_ruid(newcred, ruip); + change_ruid(newcred, ruip, p); setsugid(p); } if (suid != (uid_t)-1 && oldcred->cr_svuid != suid) { @@ -1979,15 +1982,22 @@ * duration of the call. */ void -change_ruid(struct ucred *newcred, struct uidinfo *ruip) +change_ruid(struct ucred *newcred, struct uidinfo *ruip, struct proc *pp) { + /* + * We don't want the number of advisory-mode locks to change + * while we are performing this operation. + */ + PROC_LOCK_ASSERT(pp, MA_OWNED); (void)chgproccnt(newcred->cr_ruidinfo, -1, 0); + (void)chgadvlockcnt(pp, -(pp->p_numadvlocks), 0); newcred->cr_ruid = ruip->ui_uid; uihold(ruip); uifree(newcred->cr_ruidinfo); newcred->cr_ruidinfo = ruip; (void)chgproccnt(newcred->cr_ruidinfo, 1, 0); + (void)chgadvlockcnt(pp, pp->p_numadvlocks, 0); } /*- diff -ur sys/kern/kern_resource.c sys_lockfix/kern/kern_resource.c --- sys/kern/kern_resource.c Tue Apr 13 12:43:16 2004 +++ sys_lockfix/kern/kern_resource.c Tue Apr 13 12:54:43 2004 @@ -43,6 +43,7 @@ #include #include #include +#include #include #include #include @@ -605,6 +606,12 @@ if (limp->rlim_max < 1) limp->rlim_max = 1; break; + case RLIMIT_ADVLOCK: + if (limp->rlim_cur > maxadvlocksperuid) + limp->rlim_cur = maxadvlocksperuid; + if (limp->rlim_max > maxadvlocksperuid) + limp->rlim_max = maxadvlocksperuid; + break; } *alimp = *limp; p->p_limit = newlim; @@ -1080,6 +1087,70 @@ if (uip->ui_proccnt < 0) printf("negative proccnt for uid = %d\n", uip->ui_uid); UIDINFO_UNLOCK(uip); + return (1); +} + +/* + * Change the count of the number of advisory-mode locks in + * use by a user at any given time. + */ +int +chgadvlockcnt(pp, diff, max) + register struct proc *pp; + int diff; + int max; +{ + struct uidinfo *uip; + PROC_LOCK(pp); + uip = pp->p_ucred->cr_uidinfo; + + /* Root is not affected by the lock limit, however, + * it is entirely possible that root will setuid() + * to another user, for whom the locks will need to be + * transferred. This brings up an interesting situation: + * root may hold more locks than the user may have. In + * this situation, we simply fail to upgrade the lock + * count at the setuid() call. However, to be able to + * do this, we still need to keep track of the amount of + * locks a process holds, even if root is the owner of the + * process. + */ + if (pp->p_ucred->cr_uid == 0) { + uip->ui_advlocks += diff; + pp->p_numadvlocks += diff; + printf("Root has now acquired %d locks.\n", uip->ui_advlocks); + printf("Process %d has %d locks.\n", pp->p_pid, pp->p_numadvlocks); + PROC_UNLOCK(pp); + return (1); + } + + /* + * Zero represents no limit on the number of locks, + * as opposed to no locks. + */ + if (max == 0) { + pp->p_numadvlocks += diff; + uip->ui_advlocks += diff; + printf("User %d has now acquired %d locks but that doesn't matter because max is 0.\n", pp->p_ucred->cr_uid, uip->ui_advlocks); + printf("Process %d has %d locks.\n", pp->p_pid, pp->p_numadvlocks); + PROC_UNLOCK(pp); + return (1); + } + + /* Don't allow them to exceed max */ + if (diff > 0 && uip->ui_advlocks + diff > max) { + printf("User %d has now acquired %d locks but they have exceeded max: %d.\n", pp->p_ucred->cr_uid, uip->ui_advlocks, max); + PROC_UNLOCK(pp); + return (0); + } + + printf("Hell, let's give user %d and process %d %d locks!\n", pp->p_ucred->cr_uid, pp->p_pid, diff); + uip->ui_advlocks += diff; + pp->p_numadvlocks += diff; + KASSERT(uip->ui_advlocks < 0, ("negative number of advisory-mode locks, user-count")); + KASSERT(pp->p_numadvlocks < 0, ("negative number of advisory-mode locks, process-count")); + + PROC_UNLOCK(pp); return (1); } diff -ur sys/kern/subr_param.c sys_lockfix/kern/subr_param.c --- sys/kern/subr_param.c Tue Apr 13 12:43:16 2004 +++ sys_lockfix/kern/subr_param.c Tue Apr 13 12:54:43 2004 @@ -64,6 +64,9 @@ #ifndef MAXFILES #define MAXFILES (maxproc * 2) #endif +#ifndef MAXADVLOCKSPERUID +#define MAXADVLOCKSPERUID 8192 +#endif int hz; int tick; @@ -72,6 +75,7 @@ int maxprocperuid; /* max # of procs per user */ int maxfiles; /* sys. wide open files limit */ int maxfilesperproc; /* per-proc open files limit */ +int maxadvlocksperuid; /* max # of advisory-mode locks per uid */ int ncallout; /* maximum # of timer events */ int nbuf; int nswbuf; @@ -111,6 +115,9 @@ maxbcache = VM_BCACHE_SIZE_MAX; #endif TUNABLE_INT_FETCH("kern.maxbcache", &maxbcache); + + maxadvlocksperuid = MAXADVLOCKSPERUID; + TUNABLE_INT_FETCH("kern.maxadvlocksperuid", &maxadvlocksperuid); maxtsiz = MAXTSIZ; TUNABLE_QUAD_FETCH("kern.maxtsiz", &maxtsiz); diff -ur sys/sys/fcntl.h sys_lockfix/sys/fcntl.h --- sys/sys/fcntl.h Tue Apr 13 12:43:16 2004 +++ sys_lockfix/sys/fcntl.h Tue Apr 13 12:54:43 2004 @@ -226,4 +226,8 @@ __END_DECLS #endif +#ifdef _KERNEL +extern int maxadvlocksperuid; +#endif + #endif /* !_SYS_FCNTL_H_ */ diff -ur sys/sys/proc.h sys_lockfix/sys/proc.h --- sys/sys/proc.h Tue Apr 13 12:43:16 2004 +++ sys_lockfix/sys/proc.h Tue Apr 13 12:54:43 2004 @@ -608,6 +608,7 @@ void *p_emuldata; /* (c) Emulator state data. */ struct label *p_label; /* (*) Proc (not subject) MAC label. */ struct p_sched *p_sched; /* (*) Scheduler-specific data. */ + int p_numadvlocks; /* (*) Number of advisory-mode locks */ }; #define p_session p_pgrp->pg_session diff -ur sys/sys/resource.h sys_lockfix/sys/resource.h --- sys/sys/resource.h Tue Apr 13 12:43:16 2004 +++ sys_lockfix/sys/resource.h Tue Apr 13 12:54:43 2004 @@ -85,8 +85,9 @@ #define RLIMIT_NOFILE 8 /* number of open files */ #define RLIMIT_SBSIZE 9 /* maximum size of all socket buffers */ #define RLIMIT_VMEM 10 /* virtual process size (inclusive of mmap) */ +#define RLIMIT_ADVLOCK 11 /* maximum number of advisory-mode locks per user */ -#define RLIM_NLIMITS 11 /* number of resource limits */ +#define RLIM_NLIMITS 12 /* number of resource limits */ #define RLIM_INFINITY ((rlim_t)(((u_quad_t)1 << 63) - 1)) @@ -108,6 +109,7 @@ "nofile", "sbsize", "vmem", + "advlock", }; #endif diff -ur sys/sys/resourcevar.h sys_lockfix/sys/resourcevar.h --- sys/sys/resourcevar.h Tue Apr 13 12:43:16 2004 +++ sys_lockfix/sys/resourcevar.h Tue Apr 13 12:54:43 2004 @@ -90,6 +90,7 @@ long ui_proccnt; /* number of processes */ uid_t ui_uid; /* uid */ u_int ui_ref; /* reference count */ + int ui_advlocks; /* number of advisory-mode locks */ struct mtx *ui_mtxp; /* protect all counts/limits */ }; @@ -104,6 +105,7 @@ void calcru(struct proc *p, struct timeval *up, struct timeval *sp, struct timeval *ip); int chgproccnt(struct uidinfo *uip, int diff, int max); +int chgadvlockcnt(register struct proc *pp, int diff, int max); int chgsbsize(struct uidinfo *uip, u_int *hiwat, u_int to, rlim_t max); int fuswintr(void *base); diff -ur sys/sys/sysctl.h sys_lockfix/sys/sysctl.h --- sys/sys/sysctl.h Tue Apr 13 12:43:16 2004 +++ sys_lockfix/sys/sysctl.h Tue Apr 13 12:54:43 2004 @@ -359,6 +359,7 @@ #define KERN_LOGSIGEXIT 34 /* int: do we log sigexit procs? */ #define KERN_IOV_MAX 35 /* int: value of UIO_MAXIOV */ #define KERN_MAXID 36 /* number of valid kern ids */ +#define KERN_MAXADVLOCKSPERUID 37 /* int: number of max advisory-mode locks */ #define CTL_KERN_NAMES { \ { 0, 0 }, \ @@ -390,6 +391,7 @@ { "bootfile", CTLTYPE_STRING }, \ { "maxfilesperproc", CTLTYPE_INT }, \ { "maxprocperuid", CTLTYPE_INT }, \ + { "maxadvlocksperuid", CTLTYPE_INT }, \ { "ipc", CTLTYPE_NODE }, \ { "dummy", CTLTYPE_INT }, \ { "ps_strings", CTLTYPE_INT }, \ diff -ur sys/sys/ucred.h sys_lockfix/sys/ucred.h --- sys/sys/ucred.h Tue Apr 13 12:43:16 2004 +++ sys_lockfix/sys/ucred.h Tue Apr 13 12:54:43 2004 @@ -82,7 +82,10 @@ void change_egid(struct ucred *newcred, gid_t egid); void change_euid(struct ucred *newcred, struct uidinfo *euip); void change_rgid(struct ucred *newcred, gid_t rgid); -void change_ruid(struct ucred *newcred, struct uidinfo *ruip); +/* + * Removed change_ruid; placed definition in kern/kern_prot.c due to + * struct proc dependency. + */ void change_svgid(struct ucred *newcred, gid_t svgid); void change_svuid(struct ucred *newcred, uid_t svuid); void crcopy(struct ucred *dest, struct ucred *src); --------------060800030601020705080005 Content-Type: text/plain; name="lockfix-usr.bin.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="lockfix-usr.bin.patch" diff -ur usr.bin/limits/limits.1 usr.bin_lockfix/limits/limits.1 --- usr.bin/limits/limits.1 Thu Dec 12 09:26:00 2002 +++ usr.bin_lockfix/limits/limits.1 Wed Apr 14 00:06:10 2004 @@ -357,6 +357,7 @@ When run in command mode and execution of the command succeeds, the exit status will be whatever the executed program returns. .Sh SEE ALSO +.Xr builtin 1 , .Xr csh 1 , .Xr env 1 , .Xr limit 1 , --------------060800030601020705080005-- From owner-freebsd-arch@FreeBSD.ORG Wed Apr 14 04:28:02 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 82D5116A4D1 for ; Wed, 14 Apr 2004 04:28:02 -0700 (PDT) Received: from mailhost.stack.nl (vaak.stack.nl [131.155.140.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id A1D8943D5C for ; Wed, 14 Apr 2004 04:28:01 -0700 (PDT) (envelope-from jilles@stack.nl) Received: from turtle.stack.nl (turtle.stack.nl [2001:610:1108:5010::132]) by mailhost.stack.nl (Postfix) with ESMTP id 407D2040#A425D1F00E; Wed, 14 Apr 2004 13:28:00 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 8AD051CCC6; Wed, 14 Apr 2004 13:28:00 +0200 (CEST) Date: Wed, 14 Apr 2004 13:28:00 +0200 From: Jilles Tjoelker To: "Devon H. O'Dell" Message-ID: <20040414112800.GA69649@stack.nl> References: <407CF5B8.2060909@sitetronics.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <407CF5B8.2060909@sitetronics.com> X-Operating-System: FreeBSD 5.2.1-RELEASE-p4 i386 User-Agent: Mutt/1.5.6i cc: freebsd-arch@freebsd.org Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2004 11:28:02 -0000 On Wed, Apr 14, 2004 at 10:26:32AM +0200, Devon H. O'Dell wrote: > 1) I have to pass a struct proc * to change_ruid. If a user changes > his/her uid, the number of advisory-mode locks needs to be transferred > to the new uid and the only way I figured to do that would to be to > count the number of advisory-mode locks held by a process (I didn't need > to track this across a fork() since POSIX locks are not inherited > between processes). This means I have to move the definition out of > /sys/sys/ucred.h and into /sys/kern/kern_prot.c. It also means that > OSF/1 compatibility becomes broken on Alpha, since the setuid() in > osf1_misc.c calls the change_[r|e|sv]uid functions that have been > implemented in /sys/kern/kern_prot.c. Solutions to this include: > a) creating a SuSv3-compatible setuid for use with the OSF/1, SVR4 > and Linux compat ABIs (since using the BSD setuid for these also isn't > totally correct). > b) require sys/proc.h wherever sys/ucred.h is included (which is > very ugly) > c) move the check out of change_ruid() (I don't think this is > correct since the chgproccnt() function is called there as well) > d) re-write the OSF/1 compatibility code to use its own > change_ruid_osf1() function (bloated) e) add a line 'struct proc;' to sys/ucred.h > 3) Does this work justify my going through the modified files and doing > style(9) changes on them? I'm willing to do this; mux@ has encouraged > it; style(9) suggests that I do it if my code comprises 50% or more of > the new files (which it doesn't). Again, if this is useful, I'll > certainly do this. Some of the files have a mixture of K&R-style and ANSI function definitions. > 8) Are any of the modifications I've made too intrusive to the > [proc|advlock|rlimit|sysctl] subsystem(s)? Rather a lot of functions and programs (setusercontext(3) in libutil, limits(1), rlimit-related builtins in all shells) have knowledge of all the rlimits built into them. This is already a bit of a problem, for example bash doesn't support the socket buffer size rlimit. Also note that those programs often use single letters for the rlimits. > 9) What (extra) suggestions would you have for my patches for relevant > manpages? > 10) Have I missed any userland utilities that don't use libutil to > check/set classes/limits (perhaps there are some in ports that I can > patch as well)? limits(1), all shells. > This patch is against April 13th -CURRENT but backporting it is very > simple since the main affected subsystem doesn't change much > architecturally / structurally. However, this also brings into light > that this problem may also affect the other BSDs (Dragonfly, Net, Open, > Ekko). I cannot verify this as I do not have much experience with these > other BSDs and do not know if they impose any limits on the amount of > kernel memory a user can have or any other limits which would disallow > this to exploit to ``work''. Should they be affected, what do I need to > do to alert them of this? Limiting the number of locked regions is not uncommon, e.g. Solaris does it (the manpage seems to indicate a per-system limitation only, though). Interesting part from Linux getrlimit(2) manpage: RLIMIT_LOCKS A limit on the combined number of flock() locks and fcntl() leases that this process may establish (Linux 2.4 and later). Per-user instead of per-process limits are harder to implement but more effective. > diff -ur lib/libc/sys/getrlimit.2 lib_lockfix/libc/sys/getrlimit.2 > --- lib/libc/sys/getrlimit.2 Tue Apr 13 23:53:52 2004 > +++ lib_lockfix/libc/sys/getrlimit.2 Tue Apr 13 23:58:24 2004 > @@ -98,6 +98,9 @@ > The maximum size (in bytes) of socket buffer usage for this user. > This limits the amount of network memory, and hence the amount of > mbufs, that this user may hold at any time. > +.It Li RLIMIT_ADVLOCK > +The maximum number of POSIX-type (lockf(3) style) advisory-mode > +locks avilable to this user. > .El > .Pp > A resource limit is specified as a soft limit and a hard limit. When a Refer to fcntl(2) in preference to lockf(3). While lockf(3) locks typically are implemented using fcntl(2), SUSv3 doesn't say anything about interaction between the two. Also, lockf(3) is marked XSI, but fcntl(2) locking is not. The sysctl(3) and sysctl(8) manpages haven't been updated, but I'm not sure whether that's useful. -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Wed Apr 14 04:54:47 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3AC4B16A4CE for ; Wed, 14 Apr 2004 04:54:47 -0700 (PDT) Received: from smtp1.euronet.nl (smtp1.euronet.nl [194.134.35.133]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0673243D5D for ; Wed, 14 Apr 2004 04:54:47 -0700 (PDT) (envelope-from dodell@sitetronics.com) Received: from sitetronics.com (zp-c-13e65.mxs.adsl.euronet.nl [81.69.92.101]) by smtp1.euronet.nl (Postfix) with ESMTP id E1D8567199; Wed, 14 Apr 2004 13:54:45 +0200 (MEST) Message-ID: <407D25A7.8090502@sitetronics.com> Date: Wed, 14 Apr 2004 13:51:03 +0200 From: "Devon H. O'Dell" User-Agent: Mozilla Thunderbird 0.5 (Windows/20040207) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jilles Tjoelker References: <407CF5B8.2060909@sitetronics.com> <20040414112800.GA69649@stack.nl> In-Reply-To: <20040414112800.GA69649@stack.nl> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-arch@freebsd.org Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2004 11:54:47 -0000 Jilles Tjoelker wrote: > e) add a line 'struct proc;' to sys/ucred.h Thanks for this suggestion; I wasn't aware that this was reasonably possible from an architectural standpoint. >>3) Does this work justify my going through the modified files and doing >>style(9) changes on them? I'm willing to do this; mux@ has encouraged >>it; style(9) suggests that I do it if my code comprises 50% or more of >>the new files (which it doesn't). Again, if this is useful, I'll >>certainly do this. > > > Some of the files have a mixture of K&R-style and ANSI function > definitions. I'll look into implementing style(9) changes then. I know my patch fails a style(9) check in some contexts, so I'll go a general cleanup as well. >>8) Are any of the modifications I've made too intrusive to the >>[proc|advlock|rlimit|sysctl] subsystem(s)? > > > Rather a lot of functions and programs (setusercontext(3) in libutil, > limits(1), rlimit-related builtins in all shells) have knowledge of all > the rlimits built into them. This is already a bit of a problem, for > example bash doesn't support the socket buffer size rlimit. Also note > that those programs often use single letters for the rlimits. My fix for sh's builtin(1) ulimit feature does implement a single letter feature for limiting locks. In my reading of the sh ulimit code, I saw that these limits are really not dynamic at all... one is thus unable to create a generic interface for use in all shells, although it appears that libutil tries to provide this interface. It's a pity that it's not used in the contexts it should be. >>9) What (extra) suggestions would you have for my patches for relevant >>manpages? > > >>10) Have I missed any userland utilities that don't use libutil to >>check/set classes/limits (perhaps there are some in ports that I can >>patch as well)? > > > limits(1), all shells. sh has been fixed. I was under the impression that csh used libutil for this (libutil has been fixed). I'll take a deeper look into shells in base and in ports and figure out what changes I need to make there. While I'm at it, I don't think it'd be a bad idea to go ahead and build in the RLIMIT_SBSIZE to bash and bash2. I'm not entirely sure what information I need to list in all the manpages, but I'll get that sorted out. >>This patch is against April 13th -CURRENT but backporting it is very >>simple since the main affected subsystem doesn't change much >>architecturally / structurally. However, this also brings into light >>that this problem may also affect the other BSDs (Dragonfly, Net, Open, >>Ekko). I cannot verify this as I do not have much experience with these >>other BSDs and do not know if they impose any limits on the amount of >>kernel memory a user can have or any other limits which would disallow >>this to exploit to ``work''. Should they be affected, what do I need to >>do to alert them of this? > > > Limiting the number of locked regions is not uncommon, e.g. Solaris does > it (the manpage seems to indicate a per-system limitation only, though). > > Interesting part from Linux getrlimit(2) manpage: > RLIMIT_LOCKS > A limit on the combined number of flock() locks and fcntl() > leases that this process may establish (Linux 2.4 and later). > > Per-user instead of per-process limits are harder to implement but > more effective. Ok. I was not aware that Linux had this fix / feature already. I'll take a look into the CVS repos of the other BSDs and see if it's something I can suggest a patch for in those worlds. The reason I asked was because I don't have access to many boxes of different architectures or operating systems. Indeed, my patch implements per-user limits, but keeps track per-process for the purpose of removing locks on a setuid() call. In the beginning, I thought that it would be necessary for process termination, but since fdfree() is called in kern_exit.c and the locks are released sequentially across the whole file (the whole file is unlocked and the for (;;) code in kern_lockf.c will unlock this), I see that it is unnecessary for this purpose. Are there ideas on how I could implement the lock transfer between users without intruding on the process structure, or is this something that's reasonable? >>[snip] > Refer to fcntl(2) in preference to lockf(3). While lockf(3) locks > typically are implemented using fcntl(2), SUSv3 doesn't say anything > about interaction between the two. Also, lockf(3) is marked XSI, but > fcntl(2) locking is not. Point taken. > The sysctl(3) and sysctl(8) manpages haven't been updated, but I'm not > sure whether that's useful. Right. I'll need to list my new sysctl. Thanks for the reminder. Thanks for the feedback, Jilles. I really appreciate the architectural help and explinations you've given to me both here on-list and on Freenode. I'll let you guys know when I've an updated patch incorporating these changes. Kind regards, Devon H. O'Dell From owner-freebsd-arch@FreeBSD.ORG Wed Apr 14 10:25:38 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 88E2316A4CE for ; Wed, 14 Apr 2004 10:25:38 -0700 (PDT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B01743D55 for ; Wed, 14 Apr 2004 10:25:38 -0700 (PDT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 4A3AE72DF0; Wed, 14 Apr 2004 10:25:38 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 47DCD72DBF; Wed, 14 Apr 2004 10:25:38 -0700 (PDT) Date: Wed, 14 Apr 2004 10:25:38 -0700 (PDT) From: Doug White To: Teemu.Parkkinen@patria.fi In-Reply-To: Message-ID: <20040414102422.H84038@carver.gumbysoft.com> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-arch@freebsd.org Subject: Re: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2004 17:25:38 -0000 On Tue, 13 Apr 2004 Teemu.Parkkinen@patria.fi wrote: > Hi all, > > I am about to write a digital tv-driver for my dvb-c -card. Because > FreeBSD does not yet have any dvb-devices and I don't have any prior > driver development experience, I have a couple of questions for you. > > 1) Should we use Linux-DVB API as a reference, or should we consider > some changes to it? The API seems to be constantly changing and > improving. Version 3 is available here: > http://www.linuxtv.org/download/dvb/linux-dvb-api-1.0.0.pdf but they are > currently working on version 4. In my opinion, the API should be > minimal, but complete, so there is no need to constantly add new > features to it. Please check in with the freebsd-multimedia mailing list before you start. They were working on a new vidoe API and this sort of thing would slot into it nicely. The bktr developers hang out there and can assist you with issues specific to video hardware. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 00:12:00 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5391716A4CE for ; Thu, 15 Apr 2004 00:12:00 -0700 (PDT) Received: from hutcs.cs.hut.fi (hutcs.cs.hut.fi [130.233.192.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0502643D31 for ; Thu, 15 Apr 2004 00:12:00 -0700 (PDT) (envelope-from kirma@cs.hut.fi) Received: from kirma (helo=localhost) by hutcs.cs.hut.fi with local-esmtp (Exim 4.30) id 1BE12V-0000BX-2y for freebsd-arch@freebsd.org; Thu, 15 Apr 2004 10:11:59 +0300 Date: Thu, 15 Apr 2004 10:11:59 +0300 (EEST) From: Jari Kirma To: freebsd-arch@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 07:12:00 -0000 > I am about to write a digital tv-driver for my dvb-c -card. Because > FreeBSD does not yet have any dvb-devices and I don't have any prior > driver development experience, I have a couple of questions for you. > > 1) Should we use Linux-DVB API as a reference, or should we consider > some changes to it? The API seems to be constantly changing and > improving. Version 3 is available here: > http://www.linuxtv.org/download/dvb/linux-dvb-api-1.0.0.pdf > but they are currently working on version 4. In my opinion, the API > should be minimal, but complete, so there is no need to constantly add > new features to it. > > 2) As linux kernel is GPL-licensed, I cannot just port the linux driver > to FreeBSD, right? In other words, we have to write the driver from > scratch. In this case we don't have to stick with the Linux DVB-API and > therefore I suggest that we give think the api through before > deciding how we implement it (do we follow linux api or not). > > 3) Do you have any pointers to good books or other documentation on how > to write device drivers for UNIX (BSD)? I already have read those from > FreeBSD documentation, but a decent book would be handy. I received my TechnoTrend USB DVB-C box couple weeks ago and pondered the same issues. I got same suggestions in freebsd-multimedia (like, compability with Linux is nice). I was also suggested to take a look at early work at VideoBSD . I haven't made any decisions on the interface yet, but I have had some thoughts. The biggest problem with Linux DVB API is that its interface is at device/ioctl level. I consider that a poison for reasonable OS independence. There's also libdvb which supposedly abstracts away some OS-specific parts, but based on a very quick glance, it might not cover whole set of operations required to operate DVB sensibly. In my case, I've been able to prototype my "driver" completely in userspace as it is a USB device. I use ugen device bulk pipe for device control, ugen isochronous pipe provides stream transport. I have had some problems because ugen isn't really tried and tested on (relatively) high-bandwith isochronous transfers, but most of those have been solved. I'm able to watch DVB programs converted from MPEG TS substreams to MPEG PS stream and piped to mplayer, completely in userland. Of course, this isn't really doable with devices in PCI bus, but in USB or Firewire, it should be pretty easy. It can also nicely separate the parts that may be "poisoned" by GPL outside the kernel. Even with this design, it would be possible to design "fake" devices emulating Linux DVB interface and actually redirect the operations to userland daemon. Extra copying of the MPEG TS stream can be avoided (I'm planning to write a mmapped, properly synchronised, zero-copy usb isochronous device driver), and testing is reasonably easy. Biggest problem with this approach is obviously that you have to choose either long latency or extra context switches... Please note that I have not given thought on MPEG decoding, because my device exports only MPEG stream. MPEG decoders should be handled in some intelligent, modular way, because those two parts may well be operated separately although they may be bundled on the same card. -kirma From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 02:48:43 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BB71D16A4CE for ; Thu, 15 Apr 2004 02:48:43 -0700 (PDT) Received: from relay2.mail2web.com (relay2.mail2web.com [168.144.1.82]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4029F43D54 for ; Thu, 15 Apr 2004 02:48:43 -0700 (PDT) (envelope-from dodell@sitetronics.com) Received: from M2W064.mail2web.com ([168.144.251.173]) by relay2.mail2web.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 15 Apr 2004 05:48:42 -0400 Message-ID: <48270-22004441594828368@M2W064.mail2web.com> X-Priority: 3 X-Originating-IP: 81.69.92.101 X-URL: http://mail2web.com/ From: "dodell@sitetronics.com" To: freebsd-arch@freebsd.org, jilles@stack.nl Date: Thu, 15 Apr 2004 05:48:28 -0400 MIME-Version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-OriginalArrivalTime: 15 Apr 2004 09:48:42.0311 (UTC) FILETIME=[D60EAD70:01C422CE] Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: dodell@offmyserver.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 09:48:43 -0000 Again, thanks for the comments=2E I fixed tcsh to work with the new rlimit= , fixed limits(1) and have=20 attempted to fix as many manpages as possible=2E The patch is at the same place, http:// freebsd0=2Esitetronics=2Ecom/~dodell/patches/lockfix=2Etar=2Egz My apologies for the large original post to the list=2E Comments would be very appreciated! Kind regards, Devon H=2E O'Dell -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web=2Ecom/ =2E From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 03:02:06 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA2F716A4CE for ; Thu, 15 Apr 2004 03:02:06 -0700 (PDT) Received: from smtp.tele.fi (smtp.tele.fi [192.89.123.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id D533043D5E for ; Thu, 15 Apr 2004 03:02:05 -0700 (PDT) (envelope-from Teemu.Parkkinen@patria.fi) Received: from hubns01.patria.fi (unknown [193.209.168.1]) by smtp.tele.fi (Postfix) with ESMTP id F018816C79 for ; Thu, 15 Apr 2004 13:01:57 +0300 (EEST) To: freebsd-arch@freebsd.org X-Mailer: Lotus Notes Release 5.0.8 June 18, 2001 Message-ID: From: Teemu.Parkkinen@patria.fi Date: Thu, 15 Apr 2004 12:36:38 +0300 X-MIMETrack: Serialize by Router on HUBNS01/HUB/PATRIA(Release 5.0.11 |July 24, 2002) at 15.04.2004 13:02:51 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Subject: Re: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 10:02:07 -0000 Implementation in userspace would be quite interesting, as it does not need any modifications to the kernel. I have no idea if this is feasible with PCI-devices though. Linux-compatibility using libraries could probably be made, but needs some extra work when compared to direct support of linux-api. If you have ideas on this, I'm interested to hear them. As most digital-tv receivers are PCI-cards, I think it's important to have some kind of elegant mechanism to use them that is: a) sufficient for most (all) PCI, USB, Firewire etc digital tv-cards to use them for watching, recording and using other dvb-services. b) easy to add new drivers to it c) compatible to linux (as it is desired) The cards vary greatly with different options, some have decoders, some not, some use satellite antenna motors, ci-modules etc, etc. To be sufficient for all dvb-cards, the API is very important. Therefore I prefer linux-api as it seems to be rather complete. Designing completely new api takes some time and I don't know if it will be any better than linux-api, because they have been working on it for some time. I don't have enough experience to decide if it is ok for FreeBSD, as Jari said that it may not be OS-independent enough. I would appreciate some feedback on this matter, especially which is the best way to go when considering the long-term dvb-support in FreeBSD. -Teemu Jari Kirma To: freebsd-arch@freebsd.org Sent by: cc: owner-freebsd-arch@ Subject: Digital-tv card drivers and API discussion freebsd.org 04/15/04 10:11 AM > I am about to write a digital tv-driver for my dvb-c -card. Because > FreeBSD does not yet have any dvb-devices and I don't have any prior > driver development experience, I have a couple of questions for you. > > 1) Should we use Linux-DVB API as a reference, or should we consider > some changes to it? The API seems to be constantly changing and > improving. Version 3 is available here: > http://www.linuxtv.org/download/dvb/linux-dvb-api-1.0.0.pdf > but they are currently working on version 4. In my opinion, the API > should be minimal, but complete, so there is no need to constantly add > new features to it. > > 2) As linux kernel is GPL-licensed, I cannot just port the linux driver > to FreeBSD, right? In other words, we have to write the driver from > scratch. In this case we don't have to stick with the Linux DVB-API and > therefore I suggest that we give think the api through before > deciding how we implement it (do we follow linux api or not). > > 3) Do you have any pointers to good books or other documentation on how > to write device drivers for UNIX (BSD)? I already have read those from > FreeBSD documentation, but a decent book would be handy. I received my TechnoTrend USB DVB-C box couple weeks ago and pondered the same issues. I got same suggestions in freebsd-multimedia (like, compability with Linux is nice). I was also suggested to take a look at early work at VideoBSD . I haven't made any decisions on the interface yet, but I have had some thoughts. The biggest problem with Linux DVB API is that its interface is at device/ioctl level. I consider that a poison for reasonable OS independence. There's also libdvb which supposedly abstracts away some OS-specific parts, but based on a very quick glance, it might not cover whole set of operations required to operate DVB sensibly. In my case, I've been able to prototype my "driver" completely in userspace as it is a USB device. I use ugen device bulk pipe for device control, ugen isochronous pipe provides stream transport. I have had some problems because ugen isn't really tried and tested on (relatively) high-bandwith isochronous transfers, but most of those have been solved. I'm able to watch DVB programs converted from MPEG TS substreams to MPEG PS stream and piped to mplayer, completely in userland. Of course, this isn't really doable with devices in PCI bus, but in USB or Firewire, it should be pretty easy. It can also nicely separate the parts that may be "poisoned" by GPL outside the kernel. Even with this design, it would be possible to design "fake" devices emulating Linux DVB interface and actually redirect the operations to userland daemon. Extra copying of the MPEG TS stream can be avoided (I'm planning to write a mmapped, properly synchronised, zero-copy usb isochronous device driver), and testing is reasonably easy. Biggest problem with this approach is obviously that you have to choose either long latency or extra context switches... Please note that I have not given thought on MPEG decoding, because my device exports only MPEG stream. MPEG decoders should be handled in some intelligent, modular way, because those two parts may well be operated separately although they may be bundled on the same card. -kirma _______________________________________________ freebsd-arch@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 03:32:03 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BB03016A4CE for ; Thu, 15 Apr 2004 03:32:03 -0700 (PDT) Received: from hutcs.cs.hut.fi (hutcs.cs.hut.fi [130.233.192.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01F8943D67 for ; Thu, 15 Apr 2004 03:32:03 -0700 (PDT) (envelope-from kirma@cs.hut.fi) Received: from kirma (helo=localhost) by hutcs.cs.hut.fi with local-esmtp (Exim 4.30) id 1BE4A6-0001fG-2g for freebsd-arch@freebsd.org; Thu, 15 Apr 2004 13:32:02 +0300 Date: Thu, 15 Apr 2004 13:32:02 +0300 (EEST) From: Jari Kirma To: freebsd-arch@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Digital-tv card drivers and API discussion X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 10:32:04 -0000 > Implementation in userspace would be quite interesting, as it does not > need any modifications to the kernel. I have no idea if this is feasible > with PCI-devices though. I have very strong doubts if this can be done. It would be possible to implement things such as MPEG TS PID and section filtering in userland although low level parts of the driver reside in kernel, but it is questionable if it's worth it, especially if one doesn't want to do some more complex conversions, such as MPEG TS -> PS replexing inside the framework. > I don't have enough experience to decide if it is ok for FreeBSD, as > Jari said that it may not be OS-independent enough. I would appreciate > some feedback on this matter, especially which is the best way to go > when considering the long-term dvb-support in FreeBSD. I agree on that support for Linux API is essential to make ports of DVB-related software easy. Linux API, in many respects, is probably good enough in what it provides. The big thing that I don't like in it is the fact that it exposes raw system call interface which practically requires implementation of at least a dummy device driver in kernel. If they would have cared to even wrap most of those operations inside library stub functions and specify reasonable semantics for them in addition to avoiding references of specific device files in /dev, it would be so much nicer to write a compatible implementation any way one wants (entirely in kernel, partly in kernel, or entirely in userland). Unfortunately they already adopted it and have applications that support it. ;) -kirma From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 04:08:37 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8FB7116A4CE for ; Thu, 15 Apr 2004 04:08:37 -0700 (PDT) Received: from srv01.sparkit.no (srv01.sparkit.no [193.69.116.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id C632843D4C for ; Thu, 15 Apr 2004 04:08:36 -0700 (PDT) (envelope-from eivind@FreeBSD.org) Received: from ws ([193.69.114.88]) by srv01.sparkit.no (8.12.10/8.12.10) with ESMTP id i3FB8XcZ006299; Thu, 15 Apr 2004 13:08:33 +0200 (CEST) (envelope-from eivind@FreeBSD.org) Received: from ws (localhost [127.0.0.1]) by ws (8.12.9/8.12.10) with ESMTP id i3FB7OY8001682; Thu, 15 Apr 2004 11:07:24 GMT (envelope-from eivind@ws) Received: (from eivind@localhost) by ws (8.12.9/8.12.10/Submit) id i3FB7OeG001680; Thu, 15 Apr 2004 11:07:24 GMT (envelope-from eivind) Date: Thu, 15 Apr 2004 11:06:22 +0000 From: Eivind Eklund To: "Devon H. O'Dell" Message-ID: <20040415110622.GA1370@FreeBSD.org> References: <407CF5B8.2060909@sitetronics.com> <20040414112800.GA69649@stack.nl> <407D25A7.8090502@sitetronics.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <407D25A7.8090502@sitetronics.com> User-Agent: Mutt/1.5.4i cc: Jilles Tjoelker cc: freebsd-arch@FreeBSD.org Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 11:08:37 -0000 On Wed, Apr 14, 2004 at 01:51:03PM +0200, Devon H. O'Dell wrote: > Jilles Tjoelker wrote: > >e) add a line 'struct proc;' to sys/ucred.h > > Thanks for this suggestion; I wasn't aware that this was reasonably > possible from an architectural standpoint. Most of the sys/* files are really owned by the implementation, and it is usually OK to introduce forward declarations into them. We try to avoid namespace poisoning (introducing unknown variables) for the official files, but that also happens sometimes. Also, many of the files (including sys/ucred.h) has an #ifdef _KERNEL section. This section is totally owned by the implementation, and it is (almost) always OK to add forward declarations. The list of official sys/ includes can be fetched at http://www.opengroup.org/onlinepubs/007904975/basedefs/sys/ > >>3) Does this work justify my going through the modified files and doing > >>style(9) changes on them? I'm willing to do this; mux@ has encouraged > >>it; style(9) suggests that I do it if my code comprises 50% or more of > >>the new files (which it doesn't). Again, if this is useful, I'll > >>certainly do this. > > > >Some of the files have a mixture of K&R-style and ANSI function > >definitions. > > I'll look into implementing style(9) changes then. I know my patch fails > a style(9) check in some contexts, so I'll go a general cleanup as well. Please do that separately from the main patch. We try quite hard to not mix stylistic and functional changes in a single patch, to make it easy to use the version history (and easy for people to review the patches). > sh has been fixed. I was under the impression that csh used libutil for > this (libutil has been fixed). I'll take a deeper look into shells in > base and in ports and figure out what changes I need to make there. > While I'm at it, I don't think it'd be a bad idea to go ahead and build > in the RLIMIT_SBSIZE to bash and bash2. If it is easy, it might be worthwhile to patch the shells to use libutil and submit those patches back to the maintainers. > >Limiting the number of locked regions is not uncommon, e.g. Solaris does > >it (the manpage seems to indicate a per-system limitation only, though). > > > >Interesting part from Linux getrlimit(2) manpage: > > RLIMIT_LOCKS > > A limit on the combined number of flock() locks and > > fcntl() > > leases that this process may establish (Linux 2.4 and later). > > > >Per-user instead of per-process limits are harder to implement but > >more effective. > > Ok. I was not aware that Linux had this fix / feature already. I'll take > a look into the CVS repos of the other BSDs and see if it's something I > can suggest a patch for in those worlds. It'd be nice to be compatible with Linux here, as it means just a define is needed for making apps work with it on FreeBSD (it may even automatically happen due to configure scripts.) Eivind. From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 05:43:14 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FFAD16A4CE; Thu, 15 Apr 2004 05:43:14 -0700 (PDT) Received: from relay3.mail2web.com (relay3.mail2web.com [168.144.1.83]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC41A43D55; Thu, 15 Apr 2004 05:43:13 -0700 (PDT) (envelope-from dodell@sitetronics.com) Received: from M2W057.mail2web.com ([168.144.251.165]) by relay3.mail2web.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 15 Apr 2004 08:43:12 -0400 Message-ID: <99610-220044415124312827@M2W057.mail2web.com> X-Priority: 3 X-Originating-IP: 81.69.92.101 X-URL: http://mail2web.com/ From: "dodell@sitetronics.com" To: eivind@freebsd.org, dodell@sitetronics.com, jilles@stack.nl, freebsd-arch@freebsd.org Date: Thu, 15 Apr 2004 08:43:12 -0400 MIME-Version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-OriginalArrivalTime: 15 Apr 2004 12:43:12.0732 (UTC) FILETIME=[36EA21C0:01C422E7] Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: dodell@sitetronics.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 12:43:14 -0000 Original Message: ----------------- From: Eivind Eklund eivind@FreeBSD=2Eorg Date: Thu, 15 Apr 2004 11:06:22 +0000 To: dodell@sitetronics=2Ecom, jilles@stack=2Enl, freebsd-arch@FreeBSD=2Eor= g Subject: Re: [patch] lockf(3) user-exploitable kernel panic On Wed, Apr 14, 2004 at 01:51:03PM +0200, Devon H=2E O'Dell wrote: >> Jilles Tjoelker wrote: >>>e) add a line 'struct proc;' to sys/ucred=2Eh >>=20 >> Thanks for this suggestion; I wasn't aware that this was reasonably=20 >> possible from an architectural standpoint=2E > >Most of the sys/* files are really owned by the implementation, and it >is usually OK to introduce forward declarations into them=2E We try to >avoid namespace poisoning (introducing unknown variables) for the >official files, but that also happens sometimes=2E > >Also, many of the files (including sys/ucred=2Eh) has an #ifdef _KERNEL >section=2E This section is totally owned by the implementation, and it i= s >(almost) always OK to add forward declarations=2E > >The list of official sys/ includes can be fetched at=20 >http://www=2Eopengroup=2Eorg/onlinepubs/007904975/basedefs/sys/ Thanks for the clarification here, Eivind=2E When implementing the=20 declaration in my current patch, I noticed that sys/ucred=2Eh had done this as well with struct thread=2E >> [snip] >> I'll look into implementing style(9) changes then=2E I know my patch fa= ils=20 >> a style(9) check in some contexts, so I'll go a general cleanup as well= =2E > >Please do that separately from the main patch=2E We try quite hard to no= t >mix stylistic and functional changes in a single patch, to make it easy >to use the version history (and easy for people to review the patches)=2E= I was aware of this policy=2E I'll make aesthetic changes to my current patch code and styleize the other code in a separate patch=2E >> sh has been fixed=2E I was under the impression that csh used libutil f= or=20 >> this (libutil has been fixed)=2E I'll take a deeper look into shells in= =20 >> base and in ports and figure out what changes I need to make there=2E=20= >> While I'm at it, I don't think it'd be a bad idea to go ahead and build= =20 >> in the RLIMIT_SBSIZE to bash and bash2=2E > >If it is easy, it might be worthwhile to patch the shells to use >libutil and submit those patches back to the maintainers=2E There are a huge number of shells to do this with=2E This subsystem looks like somewhat of a kludge to me in this respect; the functionality is plainly provided in libutil, while every shell (sh and tcsh included) have their own implementations=2E limits(1) even has statically compiled information about the limits for every shell it is aware of (including sh, csh, tcsh, bash/bash2 and a good few others)=2E I'll take a look at these later=2E=20 >> Ok=2E I was not aware that Linux had this fix / feature already=2E I'll= take=20 >> a look into the CVS repos of the other BSDs and see if it's something I= =20 >> can suggest a patch for in those worlds=2E > >It'd be nice to be compatible with Linux here, as it means just a define >is needed for making apps work with it on FreeBSD (it may even >automatically happen due to configure scripts=2E) My patch implements a per-user setting for this, since I thought that a malicious user might be able to spawn a large number of=20 processes, each eating up n locks=2E However, considering that POSIX locks are not inherited between processes and the=20 kern=2Emaxprocperuid sysctl, if it's desirable to be compatible with Linux in this case, I can certainly back out the per-user check and make it per-process=2E This would get rid of a good bit of code, including the setuid()-necessary changes=2E What would be a sane value for a per-process setting? Should I=20 calculate this value based on available system resources (as=20 kern=2Emaxfiles is set)? >Eivind=2E Thanks for the input! Kind regards, Devon H=2E O'Dell -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web=2Ecom/ =2E From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 07:29:37 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ED9B016A4CE for ; Thu, 15 Apr 2004 07:29:37 -0700 (PDT) Received: from smtp.des.no (flood.des.no [217.116.83.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id B51E643D1F for ; Thu, 15 Apr 2004 07:29:36 -0700 (PDT) (envelope-from des@des.no) Received: by smtp.des.no (Pony Express, from userid 666) id 8711F530C; Thu, 15 Apr 2004 16:29:33 +0200 (CEST) Received: from dwp.des.no (des.no [80.203.228.37]) by smtp.des.no (Pony Express) with ESMTP id 816265309 for ; Thu, 15 Apr 2004 16:29:13 +0200 (CEST) Received: by dwp.des.no (Postfix, from userid 2602) id 6C84933C6C; Thu, 15 Apr 2004 16:29:13 +0200 (CEST) To: arch@freebsd.org From: des@des.no (=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=) Date: Thu, 15 Apr 2004 16:29:13 +0200 Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on flood.des.no X-Spam-Level: X-Spam-Status: No, hits=0.0 required=5.0 tests=AWL autolearn=no version=2.63 Subject: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 14:29:38 -0000 --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Currently, Makefile.inc1 will only install a single kernel. If KERNCONF specifies multiple kernel configs, they are all built, but only the first one is installed. This makes sense since otherwise the last one installed would simply clobber all the other ones. The attached patch changes that. It modifies kern.pre.mk to install each kernel in /boot/ instead of /boot/kernel. It also modifies Makefile.inc1 to build and install all kernel configs listed in KERNCONF. It also adds a script, sys/conf/regkernel.sh, which keeps a list of installed kernels in /boot/kernels, making sure that the last one installed is always listed last. The only missing element is to make the loader read /boot/kernels and have $kernel default to the last kernel listed there (i.e. the most recently installed) instead of "kernel". It would also be nice to offer a kernel selection menu for the CLI-impaired. Unfortunately, I'm afraid my forth skills aren't quite up to the task. Any takers? DES --=20 Dag-Erling Sm=F8rgrav - des@des.no --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=kernels.diff Index: Makefile.inc1 =================================================================== RCS file: /home/ncvs/src/Makefile.inc1,v retrieving revision 1.423 diff -u -r1.423 Makefile.inc1 --- Makefile.inc1 14 Apr 2004 16:06:17 -0000 1.423 +++ Makefile.inc1 15 Apr 2004 14:08:56 -0000 @@ -500,7 +500,6 @@ .else KERNCONF?= GENERIC .endif -INSTKERNNAME?= kernel KERNSRCDIR?= ${.CURDIR}/sys KRNLCONFDIR= ${KERNSRCDIR}/${TARGET}/conf @@ -508,13 +507,9 @@ KERNCONFDIR?= ${KRNLCONFDIR} BUILDKERNELS= -INSTALLKERNEL= .for _kernel in ${KERNCONF} .if exists(${KERNCONFDIR}/${_kernel}) BUILDKERNELS+= ${_kernel} -.if empty(INSTALLKERNEL) -INSTALLKERNEL= ${_kernel} -.endif .endif .endfor @@ -557,7 +552,7 @@ @echo ">>> stage 2.1: cleaning up the object tree" @echo "--------------------------------------------------------------" cd ${KRNLOBJDIR}/${_kernel}; \ - ${KMAKEENV} ${MAKE} KERNEL=${INSTKERNNAME} ${CLEANDIR} + ${KMAKEENV} ${MAKE} ${CLEANDIR} .endif @echo @echo "--------------------------------------------------------------" @@ -586,14 +581,14 @@ @echo ">>> stage 3.1: making dependencies" @echo "--------------------------------------------------------------" cd ${KRNLOBJDIR}/${_kernel}; \ - ${KMAKEENV} ${MAKE} KERNEL=${INSTKERNNAME} depend -DNO_MODULES_OBJ + ${KMAKEENV} ${MAKE} depend -DNO_MODULES_OBJ .endif @echo @echo "--------------------------------------------------------------" @echo ">>> stage 3.2: building everything" @echo "--------------------------------------------------------------" cd ${KRNLOBJDIR}/${_kernel}; \ - ${KMAKEENV} ${MAKE} KERNEL=${INSTKERNNAME} all -DNO_MODULES_OBJ + ${KMAKEENV} ${MAKE} all -DNO_MODULES_OBJ @echo "--------------------------------------------------------------" @echo ">>> Kernel build for ${_kernel} completed on `LC_ALL=C date`" @echo "--------------------------------------------------------------" @@ -602,13 +597,13 @@ # # installkernel, etc. # -# Install the kernel defined by INSTALLKERNEL +# Install the kernels # installkernel installkernel.debug \ reinstallkernel reinstallkernel.debug: ${SPECIAL_INSTALLCHECKS} -.if empty(INSTALLKERNEL) - @echo "ERROR: No kernel \"${KERNCONF}\" to install." - false +.if empty(BUILDKERNELS) + @echo "ERROR: Missing kernel configuration file(s) (${KERNCONF})."; + @false .endif @echo "--------------------------------------------------------------" @echo ">>> Making hierarchy" @@ -619,9 +614,11 @@ @echo "--------------------------------------------------------------" @echo ">>> Installing kernel" @echo "--------------------------------------------------------------" - cd ${KRNLOBJDIR}/${INSTALLKERNEL}; \ +.for _kernel in ${BUILDKERNELS} + cd ${KRNLOBJDIR}/${_kernel}; \ ${CROSSENV} PATH=${TMPPATH} \ - ${MAKE} KERNEL=${INSTKERNNAME} ${.TARGET:S/kernel//} + ${MAKE} ${.TARGET:S/kernel//} +.endfor # # update Index: sys/conf/kern.post.mk =================================================================== RCS file: /home/ncvs/src/sys/conf/kern.post.mk,v retrieving revision 1.65 diff -u -r1.65 kern.post.mk --- sys/conf/kern.post.mk 22 Mar 2004 15:45:17 -0000 1.65 +++ sys/conf/kern.post.mk 15 Apr 2004 14:09:40 -0000 @@ -206,6 +206,7 @@ .else ${INSTALL} -p -m 555 -o root -g wheel ${KERNEL_KO} ${DESTDIR}${KODIR} .endif + sh $S/conf/regkernel.sh ${DESTDIR}${KODIR} kernel-reinstall: @-chflags -R noschg ${DESTDIR}${KODIR} @@ -214,6 +215,7 @@ .else ${INSTALL} -p -m 555 -o root -g wheel ${KERNEL_KO} ${DESTDIR}${KODIR} .endif + sh $S/conf/regkernel.sh ${DESTDIR}${KODIR} config.o env.o hints.o majors.o vers.o vnode_if.o: ${NORMAL_C} Index: sys/conf/kern.pre.mk =================================================================== RCS file: /home/ncvs/src/sys/conf/kern.pre.mk,v retrieving revision 1.50 diff -u -r1.50 kern.pre.mk --- sys/conf/kern.pre.mk 29 Mar 2004 01:15:39 -0000 1.50 +++ sys/conf/kern.pre.mk 29 Mar 2004 14:24:18 -0000 @@ -6,7 +6,8 @@ # Can be overridden by makeoptions or /etc/make.conf KERNEL_KO?= kernel KERNEL?= kernel -KODIR?= /boot/${KERNEL} +KODIR?= /boot/${KERN_IDENT} +BOOTKODIR?= /boot/${KERNEL} M= ${MACHINE_ARCH} Index: sys/conf/regkernel.sh =================================================================== RCS file: sys/conf/regkernel.sh diff -N sys/conf/regkernel.sh --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ sys/conf/regkernel.sh 15 Apr 2004 14:26:52 -0000 @@ -0,0 +1,41 @@ +#!/bin/sh +# +# $FreeBSD$ +# + +set -e + +error() { + echo "$@" 1>&2 + exit 1 +} + +if [ $# -ne 1 ] ; then + error "usage: $(basename $0) kernel-directory" 1>&2 +fi + +kodir="$1" +kernel=$(basename "${kodir}") +bootdir=$(dirname "${kodir}") +kernlist="${bootdir}/kernels" + +if [ ! -d "${kodir}" ] ; then + error "${kodir} is not a directory" +fi + +if [ ! -f "${kodir}/kernel" ] ; then + error "${kodir} does not seem to contain a kernel" +fi + +if [ -f "${kernlist}" ] ; then + mv "${kernlist}" "${kernlist}.old" +else + echo '# These are your installed kernels' >"${kernlist}.old" + basename $(dirname $(sysctl -n kern.bootfile)) >>"${kernlist}.old" +fi + +fgrep -xv -e "${kernel}" -e "${kernel}.old" "${kernlist}.old" >"${kernlist}" +if [ -d "${kodir}.old" -a -f "${kodir}.old/kernel" ] ; then + echo "${kernel}.old" >>"${kernlist}" +fi +echo "${kernel}" >>"${kernlist}" --=-=-=-- From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 07:39:52 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E09A16A4CE for ; Thu, 15 Apr 2004 07:39:52 -0700 (PDT) Received: from arginine.spc.org (arginine.spc.org [195.206.69.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id E72F943D48 for ; Thu, 15 Apr 2004 07:39:51 -0700 (PDT) (envelope-from bms@spc.org) Received: from localhost (localhost [127.0.0.1]) by arginine.spc.org (Postfix) with ESMTP id 2365E652FE; Thu, 15 Apr 2004 15:39:51 +0100 (BST) Received: from arginine.spc.org ([127.0.0.1]) by localhost (arginine.spc.org [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 47205-02-4; Thu, 15 Apr 2004 15:39:50 +0100 (BST) Received: from empiric.dek.spc.org (82-147-17-88.dsl.uk.rapidplay.com [82.147.17.88]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by arginine.spc.org (Postfix) with ESMTP id 69405652EC; Thu, 15 Apr 2004 15:39:50 +0100 (BST) Received: by empiric.dek.spc.org (Postfix, from userid 1001) id 5BE9260EE; Thu, 15 Apr 2004 15:39:49 +0100 (BST) Date: Thu, 15 Apr 2004 15:39:49 +0100 From: Bruce M Simpson To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20040415143949.GD53839@empiric.dek.spc.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: cc: arch@freebsd.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 14:39:52 -0000 On Thu, Apr 15, 2004 at 04:29:13PM +0200, Dag-Erling Smørgrav wrote: > The only missing element is to make the loader read /boot/kernels and > have $kernel default to the last kernel listed there (i.e. the most > recently installed) instead of "kernel". It would also be nice to > offer a kernel selection menu for the CLI-impaired. Unfortunately, > I'm afraid my forth skills aren't quite up to the task. Any takers? You probably want to look at Gordon Letlow's nextboot(8) stuff. BMS From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 07:51:07 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D34016A4CE for ; Thu, 15 Apr 2004 07:51:07 -0700 (PDT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC6B043D4C for ; Thu, 15 Apr 2004 07:51:06 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id i3FEp1tf025380; Thu, 15 Apr 2004 10:51:01 -0400 (EDT) Date: Thu, 15 Apr 2004 10:51:01 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: =?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?= In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: QUOTED-PRINTABLE cc: arch@freebsd.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 14:51:07 -0000 On Thu, 15 Apr 2004, [iso-8859-1] Dag-Erling Sm=F8rgrav wrote: > Currently, Makefile.inc1 will only install a single kernel. If > KERNCONF specifies multiple kernel configs, they are all built, but > only the first one is installed. This makes sense since otherwise the > last one installed would simply clobber all the other ones. >=20 > The attached patch changes that. It modifies kern.pre.mk to install > each kernel in /boot/ instead of /boot/kernel. It also I think it would be neat to install multiple KERNCONF kernels to /boot/kernel/ and only install one set of modules. > modifies Makefile.inc1 to build and install all kernel configs listed > in KERNCONF. It also adds a script, sys/conf/regkernel.sh, which > keeps a list of installed kernels in /boot/kernels, making sure that > the last one installed is always listed last. >=20 > The only missing element is to make the loader read /boot/kernels and > have $kernel default to the last kernel listed there (i.e. the most > recently installed) instead of "kernel". It would also be nice to > offer a kernel selection menu for the CLI-impaired. Unfortunately, > I'm afraid my forth skills aren't quite up to the task. Any takers? >=20 > DES > --=20 > Dag-Erling Sm=F8rgrav - des@des.no --=20 Dan Eischen From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 07:53:31 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 763A916A4CE; Thu, 15 Apr 2004 07:53:31 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3FErUVY005892; Thu, 15 Apr 2004 10:53:30 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404151453.i3FErUVY005892@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: dodell@sitetronics.com In-Reply-To: Message from "dodell@sitetronics.com" <99610-220044415124312827@M2W057.mail2web.com> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 15 Apr 2004 10:53:30 -0400 Sender: green@green.homeunix.org cc: jilles@stack.nl cc: freebsd-arch@freebsd.org Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 14:53:31 -0000 "dodell@sitetronics.com" wrote: > >> sh has been fixed. I was under the impression that csh used libutil for > >> this (libutil has been fixed). I'll take a deeper look into shells in > >> base and in ports and figure out what changes I need to make there. > >> While I'm at it, I don't think it'd be a bad idea to go ahead and build > >> in the RLIMIT_SBSIZE to bash and bash2. > > > >If it is easy, it might be worthwhile to patch the shells to use > >libutil and submit those patches back to the maintainers. > > There are a huge number of shells to do this with. This subsystem > looks like somewhat of a kludge to me in this respect; the > functionality is plainly provided in libutil, while every shell (sh > and tcsh included) have their own implementations. limits(1) > even has statically compiled information about the limits for > every shell it is aware of (including sh, csh, tcsh, bash/bash2 > and a good few others). I'll take a look at these later. Thanks for doing this work, Devon! The most important part is for /etc/login.conf to allow you to configure the maximum limits -- all the shell stuff is really secondary. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 07:55:48 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A3E8516A4CE for ; Thu, 15 Apr 2004 07:55:48 -0700 (PDT) Received: from smtp.des.no (flood.des.no [217.116.83.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5A0B743D60 for ; Thu, 15 Apr 2004 07:55:48 -0700 (PDT) (envelope-from des@des.no) Received: by smtp.des.no (Pony Express, from userid 666) id 8A25D530D; Thu, 15 Apr 2004 16:55:47 +0200 (CEST) Received: from dwp.des.no (des.no [80.203.228.37]) by smtp.des.no (Pony Express) with ESMTP id B9A73530C; Thu, 15 Apr 2004 16:55:26 +0200 (CEST) Received: by dwp.des.no (Postfix, from userid 2602) id A340533C6C; Thu, 15 Apr 2004 16:55:26 +0200 (CEST) To: Daniel Eischen References: From: des@des.no (=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=) Date: Thu, 15 Apr 2004 16:55:26 +0200 In-Reply-To: (Daniel Eischen's message of "Thu, 15 Apr 2004 10:51:01 -0400 (EDT)") Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on flood.des.no X-Spam-Level: X-Spam-Status: No, hits=0.0 required=5.0 tests=AWL autolearn=no version=2.63 cc: arch@freebsd.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 14:55:48 -0000 Daniel Eischen writes: > I think it would be neat to install multiple KERNCONF kernels > to /boot/kernel/ and only install one set of > modules. that is a completely different issue - and it probably won't work since modules are now built (at least to some extent) using the "parent" kernel's option headers. DES --=20 Dag-Erling Sm=F8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 11:33:04 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 23DA716A4CE for ; Thu, 15 Apr 2004 11:33:04 -0700 (PDT) Received: from rwcrmhc11.comcast.net (rwcrmhc11.comcast.net [204.127.198.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC5F743D4C for ; Thu, 15 Apr 2004 11:33:03 -0700 (PDT) (envelope-from jeh@freebsd.org) Received: from thehousleys.net ([24.34.30.131]) by comcast.net (rwcrmhc11) with ESMTP id <2004041518325701300fhej2e>; Thu, 15 Apr 2004 18:33:03 +0000 Received: from localhost (localhost [127.0.0.1]) by thehousleys.net (8.12.9p2/8.12.9) with ESMTP id i3FIWpoN011954; Thu, 15 Apr 2004 14:32:51 -0400 (EDT) (envelope-from jeh@FreeBSD.org) Received: from thehousleys.net ([127.0.0.1]) by localhost (cat.int.thehousleys.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 11650-06; Thu, 15 Apr 2004 14:32:48 -0400 (EDT) Received: from FreeBSD.org (baby.int.thehousleys.net [192.168.0.100]) (authenticated bits=0) by thehousleys.net (8.12.9p2/8.12.9) with ESMTP id i3FIWK27011946; Thu, 15 Apr 2004 14:32:20 -0400 (EDT) (envelope-from jeh@FreeBSD.org) Message-ID: <407ED533.5020102@FreeBSD.org> Date: Thu, 15 Apr 2004 14:32:19 -0400 From: "James E. Housley" Organization: FreeBSD User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040208 X-Accept-Language: en-us, en MIME-Version: 1.0 To: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: by amavisd-new at thehousleys.net cc: arch@FreeBSD.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 18:33:04 -0000 Dag-Erling Smørgrav wrote: > Currently, Makefile.inc1 will only install a single kernel. If > KERNCONF specifies multiple kernel configs, they are all built, but > only the first one is installed. This makes sense since otherwise the > last one installed would simply clobber all the other ones. > > The attached patch changes that. It modifies kern.pre.mk to install > each kernel in /boot/ instead of /boot/kernel. It also > modifies Makefile.inc1 to build and install all kernel configs listed > in KERNCONF. It also adds a script, sys/conf/regkernel.sh, which > keeps a list of installed kernels in /boot/kernels, making sure that > the last one installed is always listed last. > > The only missing element is to make the loader read /boot/kernels and > have $kernel default to the last kernel listed there (i.e. the most > recently installed) instead of "kernel". It would also be nice to > offer a kernel selection menu for the CLI-impaired. Unfortunately, > I'm afraid my forth skills aren't quite up to the task. Any takers? > But isn't changing to use the last kernel instead of the first kernel a violation of POLA? Other then that this sounds great. Jim -- /"\ ASCII Ribbon Campaign . \ / - NO HTML/RTF in e-mail . X - NO Word docs in e-mail . / \ ----------------------------------------------------------------- jeh@FreeBSD.org http://www.FreeBSD.org The Power to Serve jim@TheHousleys.Net http://www.TheHousleys.net --------------------------------------------------------------------- Life begins at 4.0 From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 11:51:39 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91BA116A4CE for ; Thu, 15 Apr 2004 11:51:39 -0700 (PDT) Received: from rwcrmhc12.comcast.net (rwcrmhc12.comcast.net [216.148.227.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5E37743D49 for ; Thu, 15 Apr 2004 11:51:39 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (rwcrmhc12) with ESMTP id <20040415185138014001ejque>; Thu, 15 Apr 2004 18:51:38 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA50328; Thu, 15 Apr 2004 11:51:37 -0700 (PDT) Date: Thu, 15 Apr 2004 11:51:35 -0700 (PDT) From: Julian Elischer To: =?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?= In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: QUOTED-PRINTABLE cc: arch@freebsd.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 18:51:39 -0000 On Thu, 15 Apr 2004, [iso-8859-1] Dag-Erling Sm=F8rgrav wrote: > Daniel Eischen writes: > > I think it would be neat to install multiple KERNCONF kernels > > to /boot/kernel/ and only install one set of > > modules. >=20 > that is a completely different issue - and it probably won't work > since modules are now built (at least to some extent) using the > "parent" kernel's option headers. They are? when did that breakage occur? >=20 > DES > --=20 > Dag-Erling Sm=F8rgrav - des@des.no > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >=20 From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 12:08:00 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E66D16A4CE; Thu, 15 Apr 2004 12:08:00 -0700 (PDT) Received: from smtp.des.no (flood.des.no [217.116.83.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 802A943D2F; Thu, 15 Apr 2004 12:07:58 -0700 (PDT) (envelope-from des@des.no) Received: by smtp.des.no (Pony Express, from userid 666) id 42B91530C; Thu, 15 Apr 2004 21:07:57 +0200 (CEST) Received: from dwp.des.no (des.no [80.203.228.37]) by smtp.des.no (Pony Express) with ESMTP id 81CFF5309; Thu, 15 Apr 2004 21:07:41 +0200 (CEST) Received: by dwp.des.no (Postfix, from userid 2602) id 2A55633C6C; Thu, 15 Apr 2004 21:07:41 +0200 (CEST) To: "James E. Housley" References: <407ED533.5020102@FreeBSD.org> From: des@des.no (=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=) Date: Thu, 15 Apr 2004 21:07:41 +0200 In-Reply-To: <407ED533.5020102@FreeBSD.org> (James E. Housley's message of "Thu, 15 Apr 2004 14:32:19 -0400") Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on flood.des.no X-Spam-Level: X-Spam-Status: No, hits=0.0 required=5.0 tests=AWL autolearn=no version=2.63 cc: arch@FreeBSD.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 19:08:00 -0000 "James E. Housley" writes: > But isn't changing to use the last kernel instead of the first kernel > a violation of POLA? Other then that this sounds great. Yes, but it's hard to work around. We have to default to the most recent kernel, but Makefile.inc1 currently assumes that the first kernel listed is the one to use, so we somehow have to make sure it is installed last - or have Makefile.inc1 run regkernel.sh to force the issue. DES --=20 Dag-Erling Sm=F8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 12:11:44 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 69F3D16A4CF for ; Thu, 15 Apr 2004 12:11:44 -0700 (PDT) Received: from smtp.des.no (flood.des.no [217.116.83.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 14CFE43D48 for ; Thu, 15 Apr 2004 12:11:44 -0700 (PDT) (envelope-from des@des.no) Received: by smtp.des.no (Pony Express, from userid 666) id 02811530C; Thu, 15 Apr 2004 21:11:42 +0200 (CEST) Received: from dwp.des.no (des.no [80.203.228.37]) by smtp.des.no (Pony Express) with ESMTP id A63FE5309; Thu, 15 Apr 2004 21:11:28 +0200 (CEST) Received: by dwp.des.no (Postfix, from userid 2602) id 6030E33C6C; Thu, 15 Apr 2004 21:11:28 +0200 (CEST) To: Julian Elischer References: From: des@des.no (=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=) Date: Thu, 15 Apr 2004 21:11:28 +0200 In-Reply-To: (Julian Elischer's message of "Thu, 15 Apr 2004 11:51:35 -0700 (PDT)") Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on flood.des.no X-Spam-Level: X-Spam-Status: No, hits=0.0 required=5.0 tests=AWL autolearn=no version=2.63 cc: arch@freebsd.org Subject: Re: installing multiple kernels X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 19:11:44 -0000 Julian Elischer writes: > On Thu, 15 Apr 2004, [iso-8859-1] Dag-Erling Sm=F8rgrav wrote: > > that is a completely different issue - and it probably won't work > > since modules are now built (at least to some extent) using the > > "parent" kernel's option headers. > They are? > > when did that breakage occur? sys/conf/kmod.mk revision 1.145, and possibly others. More recently, this feature caused the twa module breakage - when built with a kernel, it picked up the kernel's opt_ipx.h, but when built standalone (or with world) it didn't have an opt_ipx.h to use. Vinod only tested the common case, so MODULES_WITH_WORLD broke. DES --=20 Dag-Erling Sm=F8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 13:36:32 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id A2C7F16A4CE; Thu, 15 Apr 2004 13:36:31 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3FKaUd5018165; Thu, 15 Apr 2004 16:36:31 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404152036.i3FKaUd5018165@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Robert Watson , Seigo Tanimura , Pawel Jakub Dawidek , freebsd-arch@freebsd.org In-Reply-To: Message from John-Mark Gurney of "Fri, 09 Apr 2004 11:01:06 PDT." <20040409180106.GM567@funkthat.com> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 15 Apr 2004 16:36:30 -0400 Sender: green@green.homeunix.org Subject: Re: mtx_lock_recurse/mtx_unlock_recurse functions (proof-of-concept). X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 20:36:32 -0000 John-Mark Gurney wrote: > How does this sound to people? I have some code starting to implement > this, but I haven't gotten very far with it yet... You know, now I think Seigo's right on the money that due to the nature of the recursion of kqueue's current implementation, it's impossible to get right with this train of thought. So, let's redesign: * The kqueue object the user controls needs a mutex. * The lists (selinfo, mostly) that knotes are on need locking. * The filterops that kqueue calls out MUST be called with some kind of locking on the lists that the kqueue is on, and the user MUST be able to grab any kind of lock from inside a filterop. * At some point the object being observed must call back into kqueue to add itself. We'll end up getting deadlocks if the locks kqueue holds are not the ones required to add the object to klists and held when we do the f_attach(). * Anything that calls KNOTE() or KNOTE_ACTIVATE() directly will end up recursing back on itself if we don't convert it to putting the new event on a work queue. How filt_procattach() calls KNOTE_ACTIVATE() or filt_proc() calls kqueue_register() is a very good example. * The functions that need to be exported are: * KNOTE()/KNOTE_ACTIVATE() <- put on a workqueue * kqueue_register() <- put on a workqueue * knote klist/(kn_selnext) linking and unlinking * knote klist/(kn_selnext) is disappearing The last two are the only ones that are not called recursively and should have easy locking semantics. Nothing is currently designed to work with anything even remotely not looking like spl(), so we have to either flatten it out (using workqueues) or change semantics so that when KNOTE() is called it acts like the closure that we pretend it is. Of course, the easy way to do this is with a worker queue/condvar/mutex/thread. What other ways do we have available to turn KNOTE() into a closure, bearing in mind that the entire point of the mechanism is that there is no memory allocation at the time of event generation -- only when events are defined (by the user or recursively by other events). -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 13:52:14 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 96DCC16A4CE; Thu, 15 Apr 2004 13:52:14 -0700 (PDT) Received: from huva.hittite.isp.9tel.net (huva.hittite.isp.9tel.net [62.62.156.28]) by mx1.FreeBSD.org (Postfix) with ESMTP id BB5C243D1F; Thu, 15 Apr 2004 13:52:13 -0700 (PDT) (envelope-from cyrille.lefevre@laposte.net) Received: from pc2k (233-60-118-80.kaptech.net [80.118.60.233]) by huva.hittite.isp.9tel.net (Postfix) with SMTP id A5D6A9BE37; Thu, 15 Apr 2004 22:53:13 +0200 (CEST) Message-ID: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> From: "Cyrille Lefevre" To: Date: Thu, 15 Apr 2004 22:52:12 +0200 Organization: ACME MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 cc: "current @FreeBSD.org" Subject: bin/41071: make NO to NO_ transition patch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 20:52:14 -0000 anyone interrested to validate and commit this PR ? it provide a patch set to change all NOFOO variables to NO_FOO. see thread "Standardized make options (or no doesn't always mean no)" restart on 24 July 2002 in -arch for details and comments. it depended on PR #41070 related to make enhancements and bug fixes which has been partly commited some days ago (.warning keyword). I'm using this patch for 2 years now w/o any problem except when upgrading perl from ports which should be fixed (no more NOPERL in /etc/make.conf). impacted files: around 285 Makefiles make.conf(5) share/examples/etc/make.conf share/mk/*.mk (well, only those w/ NOXXX vars :) new file: share/mk/bsd.var.mk sample outputs: $ make -v /usr/share/mk/bsd.var.mk NOPERL= "/usr/share/mk/bsd.var.mk", line 146: warning: NO_PERL should be defined in place of NOPERL -- using NO_PERL with the value of NOPERL and unsetting NOPERL. $ make -v /usr/share/mk/bsd.var.mk NOPERL= NO_PERL= "/usr/share/mk/bsd.var.mk", line 142: warning: both NO_PERL and NOPERL are defined with the same value -- using NO_PERL and unsetting NOPERL. $ make -v /usr/share/mk/bsd.var.mk NOPERL=yes NO_PERL=no "/usr/share/mk/bsd.var.mk", line 139: warning: both NO_PERL and NOPERL are defined with a different value -- using NO_PERL and unsetting NOPERL. $ make -v /usr/share/mk/bsd.var.mk NONO_PERL= "/usr/share/mk/bsd.var.mk", line 132: warning: NONO_PERL is defined -- unsetting NOPERL, NO_PERL and NONO_PERL. PS : for instance, NONOPERL isn't handled but could be easily added. Cyrille Lefevre. -- home: mailto:cyrille.lefevre@laposte.net From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 15:39:23 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0481916A4D0 for ; Thu, 15 Apr 2004 15:39:23 -0700 (PDT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id A001D43D62 for ; Thu, 15 Apr 2004 15:39:22 -0700 (PDT) (envelope-from jmg@hydrogen.funkthat.com) Received: (qmail 26844 invoked from network); 15 Apr 2004 22:39:21 -0000 Received: from dsl017-045-168.spk4.dsl.speakeasy.net (HELO hydrogen.funkthat.com) ([69.17.45.168]) (envelope-sender ) by mail4.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 15 Apr 2004 22:39:21 -0000 Received: from hydrogen.funkthat.com (qpszcd@localhost.funkthat.com [127.0.0.1])i3FMdKOE031416; Thu, 15 Apr 2004 15:39:21 -0700 (PDT) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.12.10/8.12.10/Submit) id i3FMdKVt031415; Thu, 15 Apr 2004 15:39:20 -0700 (PDT) Date: Thu, 15 Apr 2004 15:39:20 -0700 From: John-Mark Gurney To: "Brian F. Feldman" Message-ID: <20040415223920.GT567@funkthat.com> Mail-Followup-To: "Brian F. Feldman" , Robert Watson , Seigo Tanimura , Pawel Jakub Dawidek , freebsd-arch@freebsd.org References: <20040409180106.GM567@funkthat.com> <200404152036.i3FKaUd5018165@green.homeunix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200404152036.i3FKaUd5018165@green.homeunix.org> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.2-RELEASE i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html cc: Pawel Jakub Dawidek cc: Seigo Tanimura cc: Robert Watson cc: freebsd-arch@freebsd.org Subject: locking down kqueue (was some other completely unrelated topic) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 22:39:23 -0000 Brian F. Feldman wrote this message on Thu, Apr 15, 2004 at 16:36 -0400: > John-Mark Gurney wrote: > > How does this sound to people? I have some code starting to implement > > this, but I haven't gotten very far with it yet... > > You know, now I think Seigo's right on the money that due to the nature of > the recursion of kqueue's current implementation, it's impossible to get > right with this train of thought. So, let's redesign: > * The kqueue object the user controls needs a mutex. > * The lists (selinfo, mostly) that knotes are on need locking. > * The filterops that kqueue calls out MUST be called with some > kind of locking on the lists that the kqueue is on, and the > user MUST be able to grab any kind of lock from inside a > filterop. You need to be more specific on this.. which filterops should be allowed to grab any locks? In my opinion, filterops should only be allowed to grab a limited number of locks and these are the object lock and the kqueue (list) locks as necessary... > * At some point the object being observed must call back into > kqueue to add itself. We'll end up getting deadlocks if the > locks kqueue holds are not the ones required to add the > object to klists and held when we do the f_attach(). Yes, we need to prevent lock order inversion.. > * Anything that calls KNOTE() or KNOTE_ACTIVATE() directly > will end up recursing back on itself if we don't convert > it to putting the new event on a work queue. How > filt_procattach() calls KNOTE_ACTIVATE() or filt_proc() > calls kqueue_register() is a very good example. Hmmm. I'm going to have to mull on the filt_proc problem a bit.. > * The functions that need to be exported are: > * KNOTE()/KNOTE_ACTIVATE() <- put on a workqueue > * kqueue_register() <- put on a workqueue > * knote klist/(kn_selnext) linking and unlinking > * knote klist/(kn_selnext) is disappearing > The last two are the only ones that are not called recursively > and should have easy locking semantics. Personally, I'd prefer to invert the logic, and have linking/unlinking and disappearing done via a work queue, and KNOTE/kqueue_register done in line if possible... It's also difficult because we need to optimize for both cases of long existing events, and ONE_SHOT events where we are constantly adding/removing events... The reason I say this is because I have a visit for a webserver that uses multiple processors but a single kqueue to handle events, and if we use ONE_SHOT, then we are guarnateed that we are notified of each event once... We need to make sure that the work queue will not be a significant problem... > Nothing is currently designed to work with anything even remotely not > looking like spl(), so we have to either flatten it out (using workqueues) > or change semantics so that when KNOTE() is called it acts like the closure > that we pretend it is. Of course, the easy way to do this is with a worker > queue/condvar/mutex/thread. What other ways do we have available to turn > KNOTE() into a closure, bearing in mind that the entire point of the > mechanism is that there is no memory allocation at the time of event > generation -- only when events are defined (by the user or recursively by > other events). The proc case should be treated special as it is an [ab]use of the kevent system with following children... I'm looking at it more.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 17:38:59 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 91F8216A4CF; Thu, 15 Apr 2004 17:38:59 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3G0cwni020043; Thu, 15 Apr 2004 20:38:59 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404160038.i3G0cwni020043@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Robert Watson , Seigo Tanimura , Pawel Jakub Dawidek , freebsd-arch@freebsd.org In-Reply-To: Message from John-Mark Gurney of "Thu, 15 Apr 2004 15:39:20 PDT." <20040415223920.GT567@funkthat.com> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 15 Apr 2004 20:38:58 -0400 Sender: green@green.homeunix.org Subject: Re: locking down kqueue (was some other completely unrelated topic) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2004 00:39:00 -0000 John-Mark Gurney wrote: > Brian F. Feldman wrote this message on Thu, Apr 15, 2004 at 16:36 -0400: > > Nothing is currently designed to work with anything even remotely not > > looking like spl(), so we have to either flatten it out (using workqueues) > > or change semantics so that when KNOTE() is called it acts like the closure > > that we pretend it is. Of course, the easy way to do this is with a worker > > queue/condvar/mutex/thread. What other ways do we have available to turn > > KNOTE() into a closure, bearing in mind that the entire point of the > > mechanism is that there is no memory allocation at the time of event > > generation -- only when events are defined (by the user or recursively by > > other events). > > The proc case should be treated special as it is an [ab]use of the kevent > system with following children... I'm looking at it more.. It's stupid, hopeless, and an abomination. I'm removing it. The only way it can possibly work to report failure to the "parent" knote at the same time is if KNOTE() and kqueue_register() have the exact same locks -- not gonna happen if there are ANY non-global kqueue locks at all! I'm going to replace it with what it SHOULD have been: implemented, at the end of kevent() after kqueue_scan() as a shortcut that doesn't leave the kernel but does not do any of the horrible, evil, SINGLE special case that makes me have to try to do this to accomodate it: if (kn->kn_fop->f_event(kn, hint) || kn->kn_kq->kq_state & KQ_UPCALL) { int enqueued = 0, upcall; upcall = kn->kn_kq->kq_state & KQ_UPCALL; KNOTE_ACTIVATE(kn, enqueued); if (upcall || enqueued) { struct kevent upkev; long uperrnote; if (upcall) { upkev = kn->kn_kq->kq_upcall; uperrnote = kn->kn_kq->kq_uperror; kn->kn_kq->kq_state &= ~KQ_UPCALL; } mtx_unlock(&knote_mtx); mtx_unlock(&kn->kn_kq->kq_mtx); mtx_unlock(&klist_mtx); if (upcall) { if (kqueue_register(kn->kn_kq, &upkev, curthread) != 0) { } } -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 19:08:38 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E5F5716A4CE; Thu, 15 Apr 2004 19:08:38 -0700 (PDT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B4E743D53; Thu, 15 Apr 2004 19:08:38 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86])i3G28b4u010913; Fri, 16 Apr 2004 12:08:37 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i3G28YI2026812; Fri, 16 Apr 2004 12:08:36 +1000 Date: Fri, 16 Apr 2004 12:08:33 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Cyrille Lefevre In-Reply-To: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> Message-ID: <20040416115549.P11609@gamplex.bde.org> References: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@FreeBSD.org cc: "current @FreeBSD.org" Subject: Re: bin/41071: make NO to NO_ transition patch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2004 02:08:39 -0000 On Thu, 15 Apr 2004, Cyrille Lefevre wrote: > anyone interrested to validate and commit this PR ? I hope not. > it provide a patch set to change all NOFOO variables to NO_FOO. This goes in a direction that I disagree with, and it even changes all of the old mostly-internal variables like NOMAN. About half of the 285+ files touched by it are to change the correct spelling of NOMAN in scattered Makefiles. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 20:40:31 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 43F7A16A4CE; Thu, 15 Apr 2004 20:40:30 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3G3eTKi004092; Thu, 15 Apr 2004 23:40:29 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404160340.i3G3eTKi004092@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: John-Mark Gurney , Robert Watson , Seigo Tanimura , Pawel Jakub Dawidek , freebsd-arch@freebsd.org In-Reply-To: Message from John-Mark Gurney of "Thu, 15 Apr 2004 15:39:20 PDT." <20040415223920.GT567@funkthat.com> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 15 Apr 2004 23:40:28 -0400 Sender: green@green.homeunix.org Subject: Re: locking down kqueue (was some other completely unrelated topic) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2004 03:40:31 -0000 BTW, I'll enclose the current (broken) implementation. It crashes during boot which I really can't track down due to not having a serial console but other than that, the only KNOWN issues with the design should be the fact that many places KNOTE() is called, it is still called with locks held. Index: cam/scsi/scsi_target.c =================================================================== RCS file: /usr/ncvs/src/sys/cam/scsi/scsi_target.c,v retrieving revision 1.60 diff -u -r1.60 scsi_target.c --- cam/scsi/scsi_target.c 21 Feb 2004 21:10:39 -0000 1.60 +++ cam/scsi/scsi_target.c 15 Apr 2004 18:32:10 -0000 @@ -337,7 +337,7 @@ kn->kn_hook = (caddr_t)softc; kn->kn_fop = &targread_filtops; TARG_LOCK(softc); - SLIST_INSERT_HEAD(&softc->read_select.si_note, kn, kn_selnext); + klist_add(&softc->read_select.si_note, kn); TARG_UNLOCK(softc); return (0); } @@ -349,7 +349,7 @@ softc = (struct targ_softc *)kn->kn_hook; TARG_LOCK(softc); - SLIST_REMOVE(&softc->read_select.si_note, kn, knote, kn_selnext); + klist_remove(&softc->read_select.si_note, kn); TARG_UNLOCK(softc); } Index: fs/fifofs/fifo_vnops.c =================================================================== RCS file: /usr/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v retrieving revision 1.92 diff -u -r1.92 fifo_vnops.c --- fs/fifofs/fifo_vnops.c 31 Mar 2004 01:41:29 -0000 1.92 +++ fs/fifofs/fifo_vnops.c 15 Apr 2004 18:32:30 -0000 @@ -407,6 +407,8 @@ return (0); } +/* XXX None of the kqueue functions do their own klist/socket locking. */ + /* ARGSUSED */ static int fifo_kqfilter(ap) @@ -436,7 +438,7 @@ ap->a_kn->kn_hook = (caddr_t)so; - SLIST_INSERT_HEAD(&sb->sb_sel.si_note, ap->a_kn, kn_selnext); + klist_add(&sb->sb_sel.si_note, ap->a_kn); sb->sb_flags |= SB_KNOTE; return (0); @@ -447,7 +449,7 @@ { struct socket *so = (struct socket *)kn->kn_hook; - SLIST_REMOVE(&so->so_rcv.sb_sel.si_note, kn, knote, kn_selnext); + klist_remove(&so->so_rcv.sb_sel.si_note, kn); if (SLIST_EMPTY(&so->so_rcv.sb_sel.si_note)) so->so_rcv.sb_flags &= ~SB_KNOTE; } @@ -471,7 +473,7 @@ { struct socket *so = (struct socket *)kn->kn_hook; - SLIST_REMOVE(&so->so_snd.sb_sel.si_note, kn, knote, kn_selnext); + klist_remove(&so->so_snd.sb_sel.si_note, kn); if (SLIST_EMPTY(&so->so_snd.sb_sel.si_note)) so->so_snd.sb_flags &= ~SB_KNOTE; } Index: gnu/ext2fs/ext2_vnops.c =================================================================== RCS file: /usr/ncvs/src/sys/gnu/ext2fs/ext2_vnops.c,v retrieving revision 1.82 diff -u -r1.82 ext2_vnops.c --- gnu/ext2fs/ext2_vnops.c 11 Mar 2004 16:33:10 -0000 1.82 +++ gnu/ext2fs/ext2_vnops.c 15 Apr 2004 18:31:10 -0000 @@ -1899,7 +1899,7 @@ if (vp->v_pollinfo == NULL) v_addpollinfo(vp); mtx_lock(&vp->v_pollinfo->vpi_lock); - SLIST_INSERT_HEAD(&vp->v_pollinfo->vpi_selinfo.si_note, kn, kn_selnext); + klist_add(&vp->v_pollinfo->vpi_selinfo.si_note, kn); mtx_unlock(&vp->v_pollinfo->vpi_lock); return (0); @@ -1912,8 +1912,7 @@ KASSERT(vp->v_pollinfo != NULL, ("Mising v_pollinfo")); mtx_lock(&vp->v_pollinfo->vpi_lock); - SLIST_REMOVE(&vp->v_pollinfo->vpi_selinfo.si_note, - kn, knote, kn_selnext); + klist_remove(&vp->v_pollinfo->vpi_selinfo.si_note, kn); mtx_unlock(&vp->v_pollinfo->vpi_lock); } Index: kern/kern_event.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/kern_event.c,v retrieving revision 1.67 diff -u -r1.67 kern_event.c --- kern/kern_event.c 20 Feb 2004 04:00:48 -0000 1.67 +++ kern/kern_event.c 16 Apr 2004 03:06:01 -0000 @@ -58,7 +58,6 @@ static int kqueue_scan(struct file *fp, int maxevents, struct kevent *ulistp, const struct timespec *timeout, struct thread *td); -static void kqueue_wakeup(struct kqueue *kq); static fo_rdwr_t kqueue_read; static fo_rdwr_t kqueue_write; @@ -78,10 +77,12 @@ .fo_close = kqueue_close, }; -static void knote_attach(struct knote *kn, struct filedesc *fdp); -static void knote_drop(struct knote *kn, struct thread *td); -static void knote_enqueue(struct knote *kn); -static void knote_dequeue(struct knote *kn); +static void knote_attach(struct kqueue *kq, struct knote *kn, + struct filedesc *fdp); +static void knote_drop(struct kqueue *kq, struct knote *kn, + struct thread *td); +static void knote_enqueue(struct kqueue *kq, struct knote *kn); +static void knote_dequeue(struct kqueue *kq, struct knote *kn); static void knote_init(void); static struct knote *knote_alloc(void); static void knote_free(struct knote *kn); @@ -107,20 +108,47 @@ { 0, filt_timerattach, filt_timerdetach, filt_timer }; static uma_zone_t knote_zone; +static struct mtx klist_mtx; +static struct mtx knote_mtx; static int kq_ncallouts = 0; static int kq_calloutmax = (4 * 1024); SYSCTL_INT(_kern, OID_AUTO, kq_calloutmax, CTLFLAG_RW, &kq_calloutmax, 0, "Maximum number of callouts allocated for kqueue"); -#define KNOTE_ACTIVATE(kn) do { \ +#define KNOTE_ACTIVATE(kn, enqueued) do { \ kn->kn_status |= KN_ACTIVE; \ - if ((kn->kn_status & (KN_QUEUED | KN_DISABLED)) == 0) \ - knote_enqueue(kn); \ + if ((kn->kn_status & (KN_QUEUED | KN_DISABLED)) == 0) { \ + knote_enqueue(kn->kn_kq, kn); \ + enqueued = 1; \ + } \ } while(0) #define KN_HASHSIZE 64 /* XXX should be tunable */ #define KN_HASH(val, mask) (((val) ^ (val >> 8)) & (mask)) +void +klist_add(struct klist *list, struct knote *note) +{ + mtx_assert(&klist_mtx, MA_OWNED); + SLIST_INSERT_HEAD(list, note, kn_selnext); +} + +void +klist_remove(struct klist *list, struct knote *note) +{ + mtx_assert(&klist_mtx, MA_OWNED); + SLIST_REMOVE(list, note, knote, kn_selnext); +} + +void +klist_disappearing(struct klist *list) +{ + mtx_lock(&klist_mtx); + while (SLIST_FIRST(list)) + SLIST_REMOVE_HEAD(list, kn_selnext); + mtx_unlock(&klist_mtx); +} + static int filt_nullattach(struct knote *kn) { @@ -164,7 +192,7 @@ return (1); kn->kn_fop = &kqread_filtops; - SLIST_INSERT_HEAD(&kq->kq_sel.si_note, kn, kn_selnext); + klist_add(&kq->kq_sel.si_note, kn); return (0); } @@ -173,7 +201,7 @@ { struct kqueue *kq = kn->kn_fp->f_data; - SLIST_REMOVE(&kq->kq_sel.si_note, kn, knote, kn_selnext); + klist_remove(&kq->kq_sel.si_note, kn); } /*ARGSUSED*/ @@ -205,6 +233,8 @@ PROC_UNLOCK(p); return (error); } + PHOLD(p); + PROC_UNLOCK(p); kn->kn_ptr.p_proc = p; kn->kn_flags |= EV_CLEAR; /* automatically set */ @@ -219,17 +249,18 @@ } if (immediate == 0) - SLIST_INSERT_HEAD(&p->p_klist, kn, kn_selnext); + klist_add(&p->p_klist, kn); + + PRELE(p); /* * Immediately activate any exit notes if the target process is a * zombie. This is necessary to handle the case where the target * process, e.g. a child, dies before the kevent is registered. + * The side-effect of filt_proc() here is KNOTE_ACTIVATE(). */ - if (immediate && filt_proc(kn, NOTE_EXIT)) - KNOTE_ACTIVATE(kn); - - PROC_UNLOCK(p); + if (immediate) + (void)filt_proc(kn, NOTE_EXIT); return (0); } @@ -246,13 +277,16 @@ filt_procdetach(struct knote *kn) { struct proc *p = kn->kn_ptr.p_proc; + struct knote *ckn; + while ((ckn = SLIST_FIRST(&kn_forklist(kn))) != NULL) { + SLIST_REMOVE_HEAD(&kn_forklist(kn), kn_link); + knote_free(ckn); + } if (kn->kn_status & KN_DETACHED) return; - PROC_LOCK(p); - SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext); - PROC_UNLOCK(p); + klist_remove(&p->p_klist, kn); } static int @@ -285,24 +319,28 @@ * so attach a new knote to it, and immediately report an * event with the parent's pid. */ - if ((event == NOTE_FORK) && (kn->kn_sfflags & NOTE_TRACK)) { - struct kevent kev; - int error; + if (event == NOTE_FORK && kn->kn_sfflags & NOTE_TRACK) { + struct knote *ckn; - /* - * register knote with new process. - */ - kev.ident = hint & NOTE_PDATAMASK; /* pid */ - kev.filter = kn->kn_filter; - kev.flags = kn->kn_flags | EV_ADD | EV_ENABLE | EV_FLAG1; - kev.fflags = kn->kn_sfflags; - kev.data = kn->kn_id; /* parent */ - kev.udata = kn->kn_kevent.udata; /* preserve udata */ - error = kqueue_register(kn->kn_kq, &kev, NULL); - if (error) + /* Lazy-attach this new knote. */ + ckn = knote_alloc(); + if (ckn == NULL) { kn->kn_fflags |= NOTE_TRACKERR; + goto out; + } + ckn->kn_kq = kn->kn_kq; + ckn->kn_fop = kn->kn_fop; + ckn->kn_sfflags = kn->kn_sfflags; + ckn->kn_sdata = kn->kn_id; /* parent */ + ckn->kn_id = hint & NOTE_PDATAMASK; /* pid */ + ckn->kn_filter = kn->kn_filter; + ckn->kn_flags = kn->kn_flags | EV_ADD | EV_ENABLE | EV_FLAG1; + ckn->kn_kevent.udata = /* preserve udata */ + kn->kn_kevent.udata; + SLIST_INSERT_HEAD(&kn_forklist(kn), ckn, kn_link); } +out: return (kn->kn_fflags != 0); } @@ -312,10 +350,13 @@ struct knote *kn = knx; struct callout *calloutp; struct timeval tv; - int tticks; + int tticks, enqueued = 0; + mtx_lock(&klist_mtx); + mtx_lock(&kn->kn_kq->kq_mtx); + mtx_lock(&knote_mtx); kn->kn_data++; - KNOTE_ACTIVATE(kn); + KNOTE_ACTIVATE(kn, enqueued); if ((kn->kn_flags & EV_ONESHOT) == 0) { tv.tv_sec = kn->kn_sdata / 1000; @@ -324,6 +365,11 @@ calloutp = (struct callout *)kn->kn_hook; callout_reset(calloutp, tticks, filt_timerexpire, kn); } + mtx_unlock(&knote_mtx); + mtx_unlock(&kn->kn_kq->kq_mtx); + mtx_unlock(&klist_mtx); + if (enqueued) + KNOTE(&kn->kn_kq->kq_sel.si_note, 0); } /* @@ -338,7 +384,7 @@ if (kq_ncallouts >= kq_calloutmax) return (ENOMEM); - kq_ncallouts++; + kq_ncallouts++; /* protected by klist_mtx */ tv.tv_sec = kn->kn_sdata / 1000; tv.tv_usec = (kn->kn_sdata % 1000) * 1000; @@ -362,7 +408,7 @@ calloutp = (struct callout *)kn->kn_hook; callout_stop(calloutp); FREE(calloutp, M_KQUEUE); - kq_ncallouts--; + kq_ncallouts--; /* protected by klist_mtx */ } static int @@ -383,7 +429,6 @@ struct file *fp; int fd, error; - mtx_lock(&Giant); fdp = td->td_proc->p_fd; error = falloc(td, &fp, &fd); if (error) @@ -391,6 +436,8 @@ /* An extra reference on `nfp' has been held for us by falloc(). */ kq = malloc(sizeof(struct kqueue), M_KQUEUE, M_WAITOK | M_ZERO); TAILQ_INIT(&kq->kq_head); + mtx_init(&kq->kq_mtx, "kqueue mutex", NULL, MTX_DEF); + kq->kq_fdp = fdp; FILE_LOCK(fp); fp->f_flag = FREAD | FWRITE; fp->f_type = DTYPE_KQUEUE; @@ -403,9 +450,7 @@ if (fdp->fd_knlistsize < 0) fdp->fd_knlistsize = 0; /* this process has a kq */ FILEDESC_UNLOCK(fdp); - kq->kq_fdp = fdp; done2: - mtx_unlock(&Giant); return (error); } @@ -425,7 +470,7 @@ int kevent(struct thread *td, struct kevent_args *uap) { - struct kevent *kevp; + struct kevent kqkev[KQ_NEVENTS], *kevp; struct kqueue *kq; struct file *fp; struct timespec ts; @@ -440,22 +485,21 @@ if (uap->timeout != NULL) { error = copyin(uap->timeout, &ts, sizeof(ts)); if (error) - goto done_nogiant; + goto done; uap->timeout = &ts; } - mtx_lock(&Giant); kq = fp->f_data; nerrors = 0; while (uap->nchanges > 0) { n = uap->nchanges > KQ_NEVENTS ? KQ_NEVENTS : uap->nchanges; - error = copyin(uap->changelist, kq->kq_kev, + error = copyin(uap->changelist, kqkev, n * sizeof(struct kevent)); if (error) goto done; for (i = 0; i < n; i++) { - kevp = &kq->kq_kev[i]; + kevp = &kqkev[i]; kevp->flags &= ~EV_SYSFLAGS; error = kqueue_register(kq, kevp, td); if (error) { @@ -484,8 +528,6 @@ error = kqueue_scan(fp, uap->nevents, uap->eventlist, uap->timeout, td); done: - mtx_unlock(&Giant); -done_nogiant: if (fp != NULL) fdrop(fp, td); return (error); @@ -528,7 +570,7 @@ struct filterops *fops; struct file *fp = NULL; struct knote *kn = NULL; - int s, error = 0; + int error, enqueued; if (kev->filter < 0) { if (kev->filter + EVFILT_SYSCOUNT < 0) @@ -544,6 +586,12 @@ return (EINVAL); } + enqueued = 0; +top: + error = 0; + mtx_lock(&klist_mtx); + mtx_lock(&kq->kq_mtx); + mtx_lock(&knote_mtx); FILEDESC_LOCK(fdp); if (fops->f_isfd) { /* validate descriptor */ @@ -575,7 +623,18 @@ } FILEDESC_UNLOCK(fdp); - if (kn == NULL && ((kev->flags & EV_ADD) == 0)) { + /* + * We came from below: EV_ADD enqueued a knote immediately. + */ + if (enqueued) { + enqueued = 0; + if (kn == NULL) { + error = ESRCH; + goto done; + } + goto onceagain; + } + if (kn == NULL && (kev->flags & EV_ADD) == 0) { error = ENOENT; goto done; } @@ -607,9 +666,9 @@ kev->data = 0; kn->kn_kevent = *kev; - knote_attach(kn, fdp); + knote_attach(kq, kn, fdp); if ((error = fops->f_attach(kn)) != 0) { - knote_drop(kn, td); + knote_drop(kq, kn, td); goto done; } } else { @@ -623,36 +682,46 @@ kn->kn_kevent.udata = kev->udata; } - s = splhigh(); - if (kn->kn_fop->f_event(kn, 0)) - KNOTE_ACTIVATE(kn); - splx(s); + if (kn->kn_fop->f_event(kn, 0)) { + KNOTE_ACTIVATE(kn, enqueued); + if (enqueued) { + if (fp != NULL) + fdrop(fp, td); + mtx_unlock(&knote_mtx); + mtx_unlock(&kq->kq_mtx); + mtx_unlock(&klist_mtx); + KNOTE(&kq->kq_sel.si_note, 0); + goto top; + } + } } else if (kev->flags & EV_DELETE) { kn->kn_fop->f_detach(kn); - knote_drop(kn, td); + knote_drop(kq, kn, td); goto done; } +onceagain: if ((kev->flags & EV_DISABLE) && - ((kn->kn_status & KN_DISABLED) == 0)) { - s = splhigh(); + ((kn->kn_status & KN_DISABLED) == 0)) kn->kn_status |= KN_DISABLED; - splx(s); - } - if ((kev->flags & EV_ENABLE) && (kn->kn_status & KN_DISABLED)) { - s = splhigh(); kn->kn_status &= ~KN_DISABLED; if ((kn->kn_status & KN_ACTIVE) && - ((kn->kn_status & KN_QUEUED) == 0)) - knote_enqueue(kn); - splx(s); + ((kn->kn_status & KN_QUEUED) == 0)) { + knote_enqueue(kn->kn_kq, kn); + enqueued = 1; + } } done: if (fp != NULL) fdrop(fp, td); + mtx_unlock(&knote_mtx); + mtx_unlock(&kq->kq_mtx); + mtx_unlock(&klist_mtx); + if (enqueued) + KNOTE(&kq->kq_sel.si_note, 0); return (error); } @@ -661,10 +730,11 @@ const struct timespec *tsp, struct thread *td) { struct kqueue *kq; - struct kevent *kevp; + struct kevent kqkev[KQ_NEVENTS], *kevp; struct timeval atv, rtv, ttv; - struct knote *kn, marker; - int s, count, timeout, nkev = 0, error = 0; + struct knote *kn, *ckn; + u_int gen; + int count, timeout, nkev = 0, error = 0; FILE_LOCK_ASSERT(fp, MA_NOTOWNED); @@ -705,16 +775,17 @@ } start: - kevp = kq->kq_kev; - s = splhigh(); + kevp = kqkev; + mtx_lock(&kq->kq_mtx); if (kq->kq_count == 0) { if (timeout < 0) { error = EWOULDBLOCK; + mtx_unlock(&kq->kq_mtx); } else { kq->kq_state |= KQ_SLEEP; - error = tsleep(kq, PSOCK | PCATCH, "kqread", timeout); + error = msleep(kq, &kq->kq_mtx, PSOCK | PCATCH | PDROP, + "kqread", timeout); } - splx(s); if (error == 0) goto retry; /* don't restart after signals... */ @@ -725,63 +796,129 @@ goto done; } - TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); - while (count) { - kn = TAILQ_FIRST(&kq->kq_head); - TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); - if (kn == &marker) { - splx(s); + for (kn = TAILQ_FIRST(&kq->kq_head); count != 0; + kn = TAILQ_NEXT(kn, kn_tqe)) { + if (kn == NULL) { + mtx_unlock(&kq->kq_mtx); if (count == maxevents) goto retry; goto done; } if (kn->kn_status & KN_DISABLED) { kn->kn_status &= ~KN_QUEUED; - kq->kq_count--; continue; } - if ((kn->kn_flags & EV_ONESHOT) == 0 && - kn->kn_fop->f_event(kn, 0) == 0) { - kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE); - kq->kq_count--; - continue; + if ((kn->kn_flags & EV_ONESHOT) == 0) { + mtx_lock(&knote_mtx); + if (kn->kn_fop->f_event(kn, 0) == 0) { + kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE); + mtx_unlock(&knote_mtx); + continue; + } + mtx_unlock(&knote_mtx); } - *kevp = kn->kn_kevent; - kevp++; - nkev++; if (kn->kn_flags & EV_ONESHOT) { + gen = kq->kq_dqgen; + mtx_unlock(&kq->kq_mtx); + mtx_lock(&klist_mtx); + mtx_lock(&kq->kq_mtx); + if (gen != kq->kq_dqgen || + !(kn->kn_status & KN_QUEUED)) { + mtx_unlock(&klist_mtx); + goto retry; + } + *kevp = kn->kn_kevent; + kevp++; + nkev++; + count--; + mtx_lock(&knote_mtx); kn->kn_status &= ~KN_QUEUED; - kq->kq_count--; - splx(s); kn->kn_fop->f_detach(kn); - knote_drop(kn, td); - s = splhigh(); - } else if (kn->kn_flags & EV_CLEAR) { - kn->kn_data = 0; - kn->kn_fflags = 0; - kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE); - kq->kq_count--; + knote_drop(kq, kn, td); + mtx_unlock(&knote_mtx); + mtx_unlock(&klist_mtx); } else { - TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); + *kevp = kn->kn_kevent; + kevp++; + nkev++; + count--; + if (kn->kn_flags & EV_CLEAR) { + mtx_lock(&knote_mtx); + kn->kn_data = 0; + kn->kn_fflags = 0; + kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE); + mtx_unlock(&knote_mtx); + } } - count--; if (nkev == KQ_NEVENTS) { - splx(s); - error = copyout(&kq->kq_kev, ulistp, + error = copyout(kqkev, ulistp, sizeof(struct kevent) * nkev); ulistp += nkev; nkev = 0; - kevp = kq->kq_kev; - s = splhigh(); + kevp = kqkev; if (error) break; } } - TAILQ_REMOVE(&kq->kq_head, &marker, kn_tqe); - splx(s); + mtx_unlock(&kq->kq_mtx); done: + mtx_lock(&klist_mtx); + mtx_lock(&kq->kq_mtx); + mtx_lock(&knote_mtx); + for (kn = TAILQ_FIRST(&kq->kq_head); kn; kn = TAILQ_NEXT(kn, kn_tqe)) { + if (kn->kn_filter != EVFILT_PROC) + continue; + /* + * This is a pretty crappy implementation of + * kqueue_register() :-( + */ + while ((ckn = SLIST_FIRST(&kn_forklist(kn))) != NULL) { + struct knote *lkn; + + SLIST_REMOVE_HEAD(&kn_forklist(kn), kn_link); + lkn = NULL; + FILEDESC_LOCK(kq->kq_fdp); + if (kq->kq_fdp->fd_knhashmask != 0) { + struct klist *list; + + list = &kq->kq_fdp->fd_knhash[ + KN_HASH((u_long)ckn->kn_id, + kq->kq_fdp->fd_knhashmask)]; + SLIST_FOREACH(lkn, list, kn_link) + if (ckn->kn_id == lkn->kn_id && + kq == lkn->kn_kq && + EVFILT_PROC == lkn->kn_filter) + break; + } + FILEDESC_UNLOCK(kq->kq_fdp); + if (lkn != NULL) { + lkn->kn_sfflags = ckn->kn_sfflags; + lkn->kn_sdata = ckn->kn_sdata; + lkn->kn_kevent.udata = ckn->kn_kevent.udata; + knote_free(ckn); + } else { + knote_attach(kq, ckn, kq->kq_fdp); + if ((error = ckn->kn_fop->f_attach(ckn)) != 0) { + knote_drop(kq, ckn, td); + for (kevp = kqkev; kevp < &kqkev[nkev]; + kevp++) { + if (kevp->ident == kn->kn_id && + kevp->filter == + EVFILT_PROC) { + kn->kn_fflags |= + NOTE_TRACKERR; + break; + } + } + } + } + } + } + mtx_unlock(&knote_mtx); + mtx_unlock(&kq->kq_mtx); + mtx_unlock(&klist_mtx); if (nkev != 0) - error = copyout(&kq->kq_kev, ulistp, + error = copyout(kqkev, ulistp, sizeof(struct kevent) * nkev); td->td_retval[0] = maxevents - count; return (error); @@ -822,9 +959,9 @@ { struct kqueue *kq; int revents = 0; - int s = splnet(); kq = fp->f_data; + mtx_lock(&kq->kq_mtx); if (events & (POLLIN | POLLRDNORM)) { if (kq->kq_count) { revents |= events & (POLLIN | POLLRDNORM); @@ -833,7 +970,7 @@ kq->kq_state |= KQ_SEL; } } - splx(s); + mtx_unlock(&kq->kq_mtx); return (revents); } @@ -846,7 +983,9 @@ kq = fp->f_data; bzero((void *)st, sizeof(*st)); + mtx_lock(&kq->kq_mtx); st->st_size = kq->kq_count; + mtx_unlock(&kq->kq_mtx); st->st_blksize = sizeof(struct kevent); st->st_mode = S_IFIFO; return (0); @@ -858,75 +997,64 @@ { struct kqueue *kq = fp->f_data; struct filedesc *fdp = kq->kq_fdp; - struct knote **knp, *kn, *kn0; + struct knote *kn; int i; +restart: + mtx_lock(&klist_mtx); + mtx_lock(&kq->kq_mtx); FILEDESC_LOCK(fdp); for (i = 0; i < fdp->fd_knlistsize; i++) { - knp = &SLIST_FIRST(&fdp->fd_knlist[i]); - kn = *knp; - while (kn != NULL) { - kn0 = SLIST_NEXT(kn, kn_link); + for (kn = SLIST_FIRST(&fdp->fd_knlist[i]); kn;) { if (kq == kn->kn_kq) { + mtx_lock(&knote_mtx); kn->kn_fop->f_detach(kn); - *knp = kn0; + mtx_unlock(&knote_mtx); FILE_LOCK(kn->kn_fp); FILEDESC_UNLOCK(fdp); fdrop_locked(kn->kn_fp, td); knote_free(kn); - FILEDESC_LOCK(fdp); + mtx_unlock(&kq->kq_mtx); + mtx_unlock(&klist_mtx); + goto restart; } else { - knp = &SLIST_NEXT(kn, kn_link); + kn = SLIST_NEXT(kn, kn_link); } - kn = kn0; } } if (fdp->fd_knhashmask != 0) { for (i = 0; i < fdp->fd_knhashmask + 1; i++) { - knp = &SLIST_FIRST(&fdp->fd_knhash[i]); - kn = *knp; - while (kn != NULL) { - kn0 = SLIST_NEXT(kn, kn_link); + for (kn = SLIST_FIRST(&fdp->fd_knhash[i]); kn;) { if (kq == kn->kn_kq) { + mtx_lock(&knote_mtx); kn->kn_fop->f_detach(kn); - *knp = kn0; + mtx_unlock(&knote_mtx); /* XXX non-fd release of kn->kn_ptr */ FILEDESC_UNLOCK(fdp); knote_free(kn); - FILEDESC_LOCK(fdp); + mtx_unlock(&kq->kq_mtx); + mtx_unlock(&klist_mtx); + goto restart; } else { - knp = &SLIST_NEXT(kn, kn_link); + kn = SLIST_NEXT(kn, kn_link); } - kn = kn0; } } } FILEDESC_UNLOCK(fdp); + mtx_unlock(&klist_mtx); if (kq->kq_state & KQ_SEL) { kq->kq_state &= ~KQ_SEL; selwakeuppri(&kq->kq_sel, PSOCK); } + mtx_unlock(&kq->kq_mtx); + mtx_destroy(&kq->kq_mtx); free(kq, M_KQUEUE); fp->f_data = NULL; return (0); } -static void -kqueue_wakeup(struct kqueue *kq) -{ - - if (kq->kq_state & KQ_SLEEP) { - kq->kq_state &= ~KQ_SLEEP; - wakeup(kq); - } - if (kq->kq_state & KQ_SEL) { - kq->kq_state &= ~KQ_SEL; - selwakeuppri(&kq->kq_sel, PSOCK); - } - KNOTE(&kq->kq_sel.si_note, 0); -} - /* * walk down a list of knotes, activating them if their event has triggered. */ @@ -935,9 +1063,26 @@ { struct knote *kn; - SLIST_FOREACH(kn, list, kn_selnext) - if (kn->kn_fop->f_event(kn, hint)) - KNOTE_ACTIVATE(kn); +top: + mtx_lock(&klist_mtx); + SLIST_FOREACH(kn, list, kn_selnext) { + mtx_lock(&kn->kn_kq->kq_mtx); + mtx_lock(&knote_mtx); + if (kn->kn_fop->f_event(kn, hint)) { + int enqueued = 0; + + KNOTE_ACTIVATE(kn, enqueued); + if (enqueued) { + mtx_unlock(&knote_mtx); + mtx_unlock(&kn->kn_kq->kq_mtx); + mtx_unlock(&klist_mtx); + goto top; + } + } + mtx_unlock(&knote_mtx); + mtx_unlock(&kn->kn_kq->kq_mtx); + } + mtx_unlock(&klist_mtx); } /* @@ -947,11 +1092,19 @@ knote_remove(struct thread *td, struct klist *list) { struct knote *kn; + struct mtx *kqmtx; + mtx_lock(&klist_mtx); while ((kn = SLIST_FIRST(list)) != NULL) { + kqmtx = &kn->kn_kq->kq_mtx; + mtx_lock(kqmtx); + mtx_lock(&knote_mtx); kn->kn_fop->f_detach(kn); - knote_drop(kn, td); + knote_drop(kn->kn_kq, kn, td); + mtx_unlock(&knote_mtx); + mtx_unlock(kqmtx); } + mtx_unlock(&klist_mtx); } /* @@ -970,7 +1123,7 @@ } static void -knote_attach(struct knote *kn, struct filedesc *fdp) +knote_attach(struct kqueue *kq, struct knote *kn, struct filedesc *fdp) { struct klist *list, *tmp_knhash; u_long tmp_knhashmask; @@ -1026,12 +1179,8 @@ kn->kn_status = 0; } -/* - * should be called at spl == 0, since we don't want to hold spl - * while calling fdrop and free. - */ static void -knote_drop(struct knote *kn, struct thread *td) +knote_drop(struct kqueue *kq, struct knote *kn, struct thread *td) { struct filedesc *fdp = td->td_proc->p_fd; struct klist *list; @@ -1047,7 +1196,7 @@ SLIST_REMOVE(list, kn, knote, kn_link); if (kn->kn_status & KN_QUEUED) - knote_dequeue(kn); + knote_dequeue(kn->kn_kq, kn); if (kn->kn_fop->f_isfd) fdrop_locked(kn->kn_fp, td); knote_free(kn); @@ -1055,32 +1204,32 @@ static void -knote_enqueue(struct knote *kn) +knote_enqueue(struct kqueue *kq, struct knote *kn) { - struct kqueue *kq = kn->kn_kq; - int s = splhigh(); - KASSERT((kn->kn_status & KN_QUEUED) == 0, ("knote already queued")); TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); kn->kn_status |= KN_QUEUED; kq->kq_count++; - splx(s); - kqueue_wakeup(kq); + if (kq->kq_state & KQ_SLEEP) { + kq->kq_state &= ~KQ_SLEEP; + wakeup(kq); + } + if (kq->kq_state & KQ_SEL) { + kq->kq_state &= ~KQ_SEL; + selwakeuppri(&kq->kq_sel, PSOCK); + } } static void -knote_dequeue(struct knote *kn) +knote_dequeue(struct kqueue *kq, struct knote *kn) { - struct kqueue *kq = kn->kn_kq; - int s = splhigh(); - KASSERT(kn->kn_status & KN_QUEUED, ("knote not queued")); + kq->kq_dqgen++; TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); kn->kn_status &= ~KN_QUEUED; kq->kq_count--; - splx(s); } static void @@ -1088,6 +1237,8 @@ { knote_zone = uma_zcreate("KNOTE", sizeof(struct knote), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); + mtx_init(&klist_mtx, "kqueue note lists", NULL, MTX_DEF); + mtx_init(&knote_mtx, "kqueue notes", NULL, MTX_DEF); } SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, knote_init, NULL) Index: kern/kern_exec.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/kern_exec.c,v retrieving revision 1.241 diff -u -r1.241 kern_exec.c --- kern/kern_exec.c 1 Apr 2004 00:10:44 -0000 1.241 +++ kern/kern_exec.c 16 Apr 2004 03:07:48 -0000 @@ -622,7 +622,6 @@ * Notify others that we exec'd, and clear the P_INEXEC flag * as we're now a bona fide freshly-execed process. */ - KNOTE(&p->p_klist, NOTE_EXEC); p->p_flag &= ~P_INEXEC; /* @@ -646,6 +645,7 @@ newargs = NULL; } PROC_UNLOCK(p); + KNOTE(&p->p_klist, NOTE_EXEC); /* Set values passed into the program in registers. */ if (p->p_sysent->sv_setregs) Index: kern/kern_exit.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/kern_exit.c,v retrieving revision 1.229 diff -u -r1.229 kern_exit.c --- kern/kern_exit.c 5 Apr 2004 21:03:34 -0000 1.229 +++ kern/kern_exit.c 15 Apr 2004 18:22:52 -0000 @@ -434,6 +434,14 @@ * Save exit status and final rusage info, adding in child rusage * info and self times. */ + KNOTE(&p->p_klist, NOTE_EXIT); + /* + * Start to notify interested parties of our demise. + * Just delete all entries in the p_klist. At this point we won't + * report any more events, and there are nasty race conditions that + * can beat us if we don't. + */ + klist_disappearing(&p->p_klist); mtx_lock(&Giant); PROC_LOCK(p); p->p_xstat = rv; @@ -442,19 +450,7 @@ calcru(p, &p->p_ru->ru_utime, &p->p_ru->ru_stime, NULL); mtx_unlock_spin(&sched_lock); ruadd(p->p_ru, &p->p_stats->p_cru); - - /* - * Notify interested parties of our demise. - */ - KNOTE(&p->p_klist, NOTE_EXIT); mtx_unlock(&Giant); - /* - * Just delete all entries in the p_klist. At this point we won't - * report any more events, and there are nasty race conditions that - * can beat us if we don't. - */ - while (SLIST_FIRST(&p->p_klist)) - SLIST_REMOVE_HEAD(&p->p_klist, kn_selnext); /* * Notify parent that we're gone. If parent has the PS_NOCLDWAIT Index: kern/kern_fork.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/kern_fork.c,v retrieving revision 1.226 diff -u -r1.226 kern_fork.c --- kern/kern_fork.c 5 Apr 2004 21:03:34 -0000 1.226 +++ kern/kern_fork.c 16 Apr 2004 03:08:38 -0000 @@ -715,15 +715,12 @@ /* * Now can be swapped. */ - PROC_LOCK(p1); - _PRELE(p1); - + PRELE(p1); /* * Tell any interested parties about the new process. */ KNOTE(&p1->p_klist, NOTE_FORK | p2->p_pid); - PROC_UNLOCK(p1); /* * Preserve synchronization semantics of vfork. If waiting for Index: kern/kern_sig.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.275 diff -u -r1.275 kern_sig.c --- kern/kern_sig.c 5 Apr 2004 21:03:35 -0000 1.275 +++ kern/kern_sig.c 15 Apr 2004 18:22:52 -0000 @@ -2682,9 +2682,7 @@ kn->kn_ptr.p_proc = p; kn->kn_flags |= EV_CLEAR; /* automatically set */ - PROC_LOCK(p); - SLIST_INSERT_HEAD(&p->p_klist, kn, kn_selnext); - PROC_UNLOCK(p); + klist_add(&p->p_klist, kn); return (0); } @@ -2694,9 +2692,7 @@ { struct proc *p = kn->kn_ptr.p_proc; - PROC_LOCK(p); - SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext); - PROC_UNLOCK(p); + klist_remove(&p->p_klist, kn); } /* Index: kern/sys_pipe.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/sys_pipe.c,v retrieving revision 1.171 diff -u -r1.171 sys_pipe.c --- kern/sys_pipe.c 27 Mar 2004 19:50:22 -0000 1.171 +++ kern/sys_pipe.c 15 Apr 2004 18:22:52 -0000 @@ -1502,7 +1502,7 @@ return (1); } - SLIST_INSERT_HEAD(&cpipe->pipe_sel.si_note, kn, kn_selnext); + klist_add(&cpipe->pipe_sel.si_note, kn); PIPE_UNLOCK(cpipe); return (0); } @@ -1520,7 +1520,7 @@ } cpipe = cpipe->pipe_peer; } - SLIST_REMOVE(&cpipe->pipe_sel.si_note, kn, knote, kn_selnext); + klist_remove(&cpipe->pipe_sel.si_note, kn); PIPE_UNLOCK(cpipe); } Index: kern/tty.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/tty.c,v retrieving revision 1.209 diff -u -r1.209 tty.c --- kern/tty.c 21 Feb 2004 20:41:11 -0000 1.209 +++ kern/tty.c 15 Apr 2004 18:22:52 -0000 @@ -1203,7 +1203,7 @@ kn->kn_hook = (caddr_t)dev; s = spltty(); - SLIST_INSERT_HEAD(klist, kn, kn_selnext); + klist_add(klist, kn); splx(s); return (0); @@ -1215,7 +1215,7 @@ struct tty *tp = ((dev_t)kn->kn_hook)->si_tty; int s = spltty(); - SLIST_REMOVE(&tp->t_rsel.si_note, kn, knote, kn_selnext); + klist_remove(&tp->t_rsel.si_note, kn); splx(s); } @@ -1224,11 +1224,14 @@ { struct tty *tp = ((dev_t)kn->kn_hook)->si_tty; + mtx_lock(&Giant); kn->kn_data = ttnread(tp); if (ISSET(tp->t_state, TS_ZOMBIE)) { kn->kn_flags |= EV_EOF; + mtx_unlock(&Giant); return (1); } + mtx_unlock(&Giant); return (kn->kn_data > 0); } @@ -1238,7 +1241,7 @@ struct tty *tp = ((dev_t)kn->kn_hook)->si_tty; int s = spltty(); - SLIST_REMOVE(&tp->t_wsel.si_note, kn, knote, kn_selnext); + klist_remove(&tp->t_wsel.si_note, kn); splx(s); } @@ -1246,12 +1249,15 @@ filt_ttywrite(struct knote *kn, long hint) { struct tty *tp = ((dev_t)kn->kn_hook)->si_tty; + int setting; + mtx_lock(&Giant); kn->kn_data = tp->t_outq.c_cc; - if (ISSET(tp->t_state, TS_ZOMBIE)) - return (1); - return (kn->kn_data <= tp->t_olowat && + setting = ISSET(tp->t_state, TS_ZOMBIE) || + (kn->kn_data <= tp->t_olowat && ISSET(tp->t_state, TS_CONNECTED)); + mtx_unlock(&Giant); + return (setting); } /* Index: kern/uipc_socket.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.169 diff -u -r1.169 uipc_socket.c --- kern/uipc_socket.c 5 Apr 2004 21:03:36 -0000 1.169 +++ kern/uipc_socket.c 15 Apr 2004 18:22:52 -0000 @@ -1819,7 +1819,7 @@ } s = splnet(); - SLIST_INSERT_HEAD(&sb->sb_sel.si_note, kn, kn_selnext); + klist_add(&sb->sb_sel.si_note, kn); sb->sb_flags |= SB_KNOTE; splx(s); return (0); @@ -1831,7 +1831,7 @@ struct socket *so = kn->kn_fp->f_data; int s = splnet(); - SLIST_REMOVE(&so->so_rcv.sb_sel.si_note, kn, knote, kn_selnext); + klist_remove(&so->so_rcv.sb_sel.si_note, kn); if (SLIST_EMPTY(&so->so_rcv.sb_sel.si_note)) so->so_rcv.sb_flags &= ~SB_KNOTE; splx(s); @@ -1864,7 +1864,7 @@ struct socket *so = kn->kn_fp->f_data; int s = splnet(); - SLIST_REMOVE(&so->so_snd.sb_sel.si_note, kn, knote, kn_selnext); + klist_remove(&so->so_snd.sb_sel.si_note, kn); if (SLIST_EMPTY(&so->so_snd.sb_sel.si_note)) so->so_snd.sb_flags &= ~SB_KNOTE; splx(s); Index: kern/vfs_aio.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/vfs_aio.c,v retrieving revision 1.169 diff -u -r1.169 vfs_aio.c --- kern/vfs_aio.c 14 Mar 2004 02:06:27 -0000 1.169 +++ kern/vfs_aio.c 16 Apr 2004 02:53:31 -0000 @@ -49,6 +49,7 @@ #include #include #include +#include #include #include @@ -2268,14 +2269,13 @@ /* * The aiocbe pointer must be validated before using it, so - * registration is restricted to the kernel; the user cannot - * set EV_FLAG1. + * registration is restricted to the kernel. */ if ((kn->kn_flags & EV_FLAG1) == 0) return (EPERM); kn->kn_flags &= ~EV_FLAG1; - SLIST_INSERT_HEAD(&aiocbe->klist, kn, kn_selnext); + klist_add(&aiocbe->klist, kn); return (0); } @@ -2286,7 +2286,7 @@ { struct aiocblist *aiocbe = (struct aiocblist *)kn->kn_sdata; - SLIST_REMOVE(&aiocbe->klist, kn, knote, kn_selnext); + klist_remove(&aiocbe->klist, kn); } /* kqueue filter function */ Index: kern/vfs_subr.c =================================================================== RCS file: /usr/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.489 diff -u -r1.489 vfs_subr.c --- kern/vfs_subr.c 5 Apr 2004 21:03:37 -0000 1.489 +++ kern/vfs_subr.c 16 Apr 2004 03:11:52 -0000 @@ -3225,8 +3225,8 @@ struct vnode *vp; { - mtx_lock(&vp->v_pollinfo->vpi_lock); VN_KNOTE(vp, NOTE_REVOKE); + mtx_lock(&vp->v_pollinfo->vpi_lock); if (vp->v_pollinfo->vpi_events) { vp->v_pollinfo->vpi_events = 0; selwakeuppri(&vp->v_pollinfo->vpi_selinfo, PRIBIO); Index: net/bpf.c =================================================================== RCS file: /usr/ncvs/src/sys/net/bpf.c,v retrieving revision 1.124 diff -u -r1.124 bpf.c --- net/bpf.c 29 Feb 2004 15:32:33 -0000 1.124 +++ net/bpf.c 15 Apr 2004 18:37:16 -0000 @@ -529,7 +529,6 @@ pgsigio(&d->bd_sigio, d->bd_sig, 0); selwakeuppri(&d->bd_sel, PRINET); - KNOTE(&d->bd_sel.si_note, 0); } static void @@ -537,14 +536,19 @@ void *arg; { struct bpf_d *d = (struct bpf_d *)arg; + int donote = 0; BPFD_LOCK(d); if (d->bd_state == BPF_WAITING) { d->bd_state = BPF_TIMED_OUT; - if (d->bd_slen != 0) + if (d->bd_slen != 0) { bpf_wakeup(d); + donote = 1; + } } BPFD_UNLOCK(d); + if (donote) + KNOTE(&d->bd_sel.si_note, 0); } static int @@ -1093,7 +1097,7 @@ kn->kn_fop = &bpfread_filtops; kn->kn_hook = d; BPFD_LOCK(d); - SLIST_INSERT_HEAD(&d->bd_sel.si_note, kn, kn_selnext); + klist_add(&d->bd_sel.si_note, kn); BPFD_UNLOCK(d); return (0); @@ -1106,7 +1110,7 @@ struct bpf_d *d = (struct bpf_d *)kn->kn_hook; BPFD_LOCK(d); - SLIST_REMOVE(&d->bd_sel.si_note, kn, knote, kn_selnext); + klist_remove(&d->bd_sel.si_note, kn); BPFD_UNLOCK(d); } Index: net/if.c =================================================================== RCS file: /usr/ncvs/src/sys/net/if.c,v retrieving revision 1.185 diff -u -r1.185 if.c --- net/if.c 13 Mar 2004 02:35:03 -0000 1.185 +++ net/if.c 16 Apr 2004 03:13:23 -0000 @@ -211,8 +211,8 @@ kn->kn_hook = (caddr_t)klist; - /* XXX locking? */ - SLIST_INSERT_HEAD(klist, kn, kn_selnext); + /* XXX klist locked */ + klist_add(klist, kn); return (0); } @@ -224,7 +224,8 @@ if (kn->kn_status & KN_DETACHED) return; - SLIST_REMOVE(klist, kn, knote, kn_selnext); + /* XXX klist locked */ + klist_remove(klist, kn); } static int @@ -606,13 +607,13 @@ #ifdef MAC mac_destroy_ifnet(ifp); #endif /* MAC */ - KNOTE(&ifp->if_klist, NOTE_EXIT); IFNET_WLOCK(); TAILQ_REMOVE(&ifnet, ifp, if_link); IFNET_WUNLOCK(); mtx_destroy(&ifp->if_snd.ifq_mtx); IF_AFDATA_DESTROY(ifp); splx(s); + KNOTE(&ifp->if_klist, NOTE_EXIT); } /* Index: sys/event.h =================================================================== RCS file: /usr/ncvs/src/sys/sys/event.h,v retrieving revision 1.22 diff -u -r1.22 event.h --- sys/event.h 2 Feb 2003 19:39:51 -0000 1.22 +++ sys/event.h 16 Apr 2004 03:01:12 -0000 @@ -127,7 +127,10 @@ MALLOC_DECLARE(M_KQUEUE); #endif -#define KNOTE(list, hint) if ((list) != NULL) knote(list, hint) +#define KNOTE(list, hint) do { \ + if ((list) != NULL) \ + knote(list, hint); \ +} while (0) /* * Flag indicating hint is a signal. Used by EVFILT_SIGNAL, and also @@ -168,6 +171,7 @@ #define kn_fflags kn_kevent.fflags #define kn_data kn_kevent.data #define kn_fp kn_ptr.p_fp +#define kn_forklist(kn) (*(struct klist *)&(kn)->kn_hook) }; struct thread; @@ -180,6 +184,9 @@ struct kevent *kev, struct thread *p); extern int kqueue_add_filteropts(int filt, struct filterops *filtops); extern int kqueue_del_filteropts(int filt); +extern void klist_add(struct klist *list, struct knote *note); +extern void klist_remove(struct klist *list, struct knote *note); +extern void klist_disappearing(struct klist *list); #else /* !_KERNEL */ Index: sys/eventvar.h =================================================================== RCS file: /usr/ncvs/src/sys/sys/eventvar.h,v retrieving revision 1.4 diff -u -r1.4 eventvar.h --- sys/eventvar.h 18 Jul 2000 19:31:48 -0000 1.4 +++ sys/eventvar.h 16 Apr 2004 00:40:04 -0000 @@ -34,13 +34,14 @@ struct kqueue { TAILQ_HEAD(kqlist, knote) kq_head; /* list of pending event */ + u_int kq_dqgen; /* generation of dequeues */ int kq_count; /* number of pending events */ + struct mtx kq_mtx; struct selinfo kq_sel; struct filedesc *kq_fdp; int kq_state; #define KQ_SEL 0x01 #define KQ_SLEEP 0x02 - struct kevent kq_kev[KQ_NEVENTS]; }; #endif /* !_SYS_EVENTVAR_H_ */ Index: ufs/ufs/ufs_vnops.c =================================================================== RCS file: /usr/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.238 diff -u -r1.238 ufs_vnops.c --- ufs/ufs/ufs_vnops.c 11 Mar 2004 18:50:33 -0000 1.238 +++ ufs/ufs/ufs_vnops.c 15 Apr 2004 18:38:47 -0000 @@ -2625,7 +2625,7 @@ if (vp->v_pollinfo == NULL) v_addpollinfo(vp); mtx_lock(&vp->v_pollinfo->vpi_lock); - SLIST_INSERT_HEAD(&vp->v_pollinfo->vpi_selinfo.si_note, kn, kn_selnext); + klist_add(&vp->v_pollinfo->vpi_selinfo.si_note, kn); mtx_unlock(&vp->v_pollinfo->vpi_lock); return (0); @@ -2638,8 +2638,7 @@ KASSERT(vp->v_pollinfo != NULL, ("Mising v_pollinfo")); mtx_lock(&vp->v_pollinfo->vpi_lock); - SLIST_REMOVE(&vp->v_pollinfo->vpi_selinfo.si_note, - kn, knote, kn_selnext); + klist_remove(&vp->v_pollinfo->vpi_selinfo.si_note, kn); mtx_unlock(&vp->v_pollinfo->vpi_lock); } -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 08:53:54 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA43816A4CE; Fri, 16 Apr 2004 08:53:54 -0700 (PDT) Received: from huva.hittite.isp.9tel.net (huva.hittite.isp.9tel.net [62.62.156.28]) by mx1.FreeBSD.org (Postfix) with ESMTP id 359EB43D53; Fri, 16 Apr 2004 08:53:54 -0700 (PDT) (envelope-from clefevre-lists@9online.fr) Received: from pc2k (131-122-118-80.kaptech.net [80.118.122.131]) by huva.hittite.isp.9tel.net (Postfix) with SMTP id F2BEB9BB76; Fri, 16 Apr 2004 17:54:53 +0200 (CEST) Message-ID: <022901c423cb$03f493b0$7890a8c0@dyndns.org> From: "Cyrille Lefevre" To: "Bruce Evans" References: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> <20040416115549.P11609@gamplex.bde.org> Date: Fri, 16 Apr 2004 17:48:20 +0200 Organization: ACME MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 cc: arch@FreeBSD.org cc: "current @FreeBSD.org" Subject: Re: bin/41071: make NO to NO_ transition patch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2004 15:53:55 -0000 "Bruce Evans" wrote: > On Thu, 15 Apr 2004, Cyrille Lefevre wrote: > > > anyone interrested to validate and commit this PR ? > > I hope not. I hope yes since this was not the past agreement. http://tinyurl.com/3fraz (http://docs.freebsd.org/cgi/getmsg.cgi?fetch=177270+0+archive/2002/freebsd-arch /20020609.freebsd-arch) In the past versions of this conversation, the general agreement is that going forward we should probably standardize on underscores to seperate words. So, NO_FOO rather than NOFOO. However, no_volunteer has come forward to do the work you've described, so if you're volunteering.... I'd volunteer w/ "Ruslan Ermilov" and even do the job. > > it provide a patch set to change all NOFOO variables to NO_FOO. > > This goes in a direction that I disagree with, and it even changes all > of the old mostly-internal variables like NOMAN. About half of the > 285+ files touched by it are to change the correct spelling of NOMAN > in scattered Makefiles. the purpose of the original thread was to convert "all" variables to one form or another, not just some of them for some or other reason. in one word... be consistant, it's help. it's not my fault if only some "no conforming" variables regarding all others are used in so many makefiles. http://tinyurl.com/3xaq2 (http://docs.freebsd.org/cgi/getmsg.cgi?fetch=159784+0+archive/2002/freebsd-arch /20020728.freebsd-arch) PS : this patch is a transition patch, it doesn't break anything. Cyrille Lefevre. -- home: mailto:cyrille.lefevre@laposte.net From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 12:25:02 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5012E16A4CE; Fri, 16 Apr 2004 12:25:02 -0700 (PDT) Received: from tigra.ip.net.ua (tigra.ip.net.ua [82.193.96.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id D169D43D2D; Fri, 16 Apr 2004 12:24:55 -0700 (PDT) (envelope-from ru@ip.net.ua) Received: from heffalump.ip.net.ua (heffalump.ip.net.ua [82.193.96.213]) by tigra.ip.net.ua (8.12.11/8.12.11) with ESMTP id i3GJSamH043610 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 16 Apr 2004 22:28:37 +0300 (EEST) (envelope-from ru@ip.net.ua) Received: (from ru@localhost) by heffalump.ip.net.ua (8.12.11/8.12.11) id i3GJOV5f001799; Fri, 16 Apr 2004 22:24:31 +0300 (EEST) (envelope-from ru) Date: Fri, 16 Apr 2004 22:24:31 +0300 From: Ruslan Ermilov To: Cyrille Lefevre Message-ID: <20040416192431.GC1584@ip.net.ua> References: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> <20040416115549.P11609@gamplex.bde.org> <022901c423cb$03f493b0$7890a8c0@dyndns.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="c3bfwLpm8qysLVxt" Content-Disposition: inline In-Reply-To: <022901c423cb$03f493b0$7890a8c0@dyndns.org> User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) cc: arch@FreeBSD.org cc: current@FreeBSD.org Subject: Re: bin/41071: make NO to NO_ transition patch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2004 19:25:02 -0000 --c3bfwLpm8qysLVxt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Apr 16, 2004 at 05:48:20PM +0200, Cyrille Lefevre wrote: > "Bruce Evans" wrote: > > On Thu, 15 Apr 2004, Cyrille Lefevre wrote: > > > > > anyone interrested to validate and commit this PR ? > > > > I hope not. >=20 > I hope yes since this was not the past agreement. >=20 > http://tinyurl.com/3fraz > (http://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D177270+0+archive/2002/fre= ebsd-arch > /20020609.freebsd-arch) >=20 >=20 > > In the past versions of this conversation, the general agreement is that > going forward we should probably standardize on underscores to seperate > words. So, NO_FOO rather than NOFOO. However, no_volunteer has come > forward to do the work you've described, so if you're volunteering.... > >=20 > I'd volunteer w/ "Ruslan Ermilov" and even do the job. >=20 Well, it's been a while. ;-) I'm fine with doing it before we branch RELENG_5, and burning bridges the day we branch RELENG_6. Let's wait for some more feedback on arch@ though. If that doesn't work, I will talk to re@. It's also safe since we stay backwards compatible. Nice stuff! Cheers, --=20 Ruslan Ermilov ru@FreeBSD.org FreeBSD committer --c3bfwLpm8qysLVxt Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (FreeBSD) iD8DBQFAgDLvUkv4P6juNwoRAiKUAJ44ncneN2f3lpTFoFn+o/OSn79XjQCePxQQ 1giMbpp+dk1zah0bfQyN5Fc= =abgB -----END PGP SIGNATURE----- --c3bfwLpm8qysLVxt-- From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 17:20:05 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 67C1B16A4CE; Fri, 16 Apr 2004 17:20:05 -0700 (PDT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E93E43D1D; Fri, 16 Apr 2004 17:20:04 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i3H0K05v019762; Sat, 17 Apr 2004 10:20:00 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i3H0JwHW017020; Sat, 17 Apr 2004 10:19:59 +1000 Date: Sat, 17 Apr 2004 10:19:58 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Cyrille Lefevre In-Reply-To: <022901c423cb$03f493b0$7890a8c0@dyndns.org> Message-ID: <20040417101206.N16280@gamplex.bde.org> References: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> <022901c423cb$03f493b0$7890a8c0@dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org cc: "current @FreeBSD.org" Subject: Re: bin/41071: make NO to NO_ transition patch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 00:20:05 -0000 On Fri, 16 Apr 2004, Cyrille Lefevre wrote: > "Bruce Evans" wrote: > > On Thu, 15 Apr 2004, Cyrille Lefevre wrote: > > > > > anyone interrested to validate and commit this PR ? > > > > I hope not. > > I hope yes since this was not the past agreement. I certainly didn't agree to it, and never would. > > In the past versions of this conversation, the general agreement is that > going forward we should probably standardize on underscores to seperate > words. So, NO_FOO rather than NOFOO. However, no_volunteer has come > forward to do the work you've described, so if you're volunteering.... > It was bad to encourage people to waste time on this. > the purpose of the original thread was to convert "all" variables to one > form or another, not just some of them for some or other reason. > in one word... be consistant, it's help. > it's not my fault if only some "no conforming" variables regarding all > others are used in so many makefiles. For a more modest task, try fixing the English spelling of "nothing" to "no thing" and "consistent" to "consistant". Bruce From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 19:12:43 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 2ED3416A4CE for ; Fri, 16 Apr 2004 19:12:43 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3H2Cg8n031749 for ; Fri, 16 Apr 2004 22:12:42 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404170212.i3H2Cg8n031749@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: arch@FreeBSD.org From: Brian Fundakowski Feldman Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 16 Apr 2004 22:12:42 -0400 Sender: green@green.homeunix.org Subject: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 02:12:43 -0000 I believe I have come up with a good solution to the kqueue woes in 5.X, and I'd like to get some feedback on work that so far is letting me (on uniprocessor, at least) run make -j8 buildworld, with USE_KQUEUE in make(1), with no ill effect :) The locking thus far is one global kqueue lock, and I firmly believe we should use MUTEX_PROFILING to determine if we should lock it down any further at this point. There are several major differences so far (of course, fixing that stack-paged-out-kernel-crash-bug is one of them) and several major things still to be fixed. 1. The recursion has been removed from kqueue. This means kqueues cannot be added to other kqueues for EVFILT_READ -- yes, that ability has been around since r1.1 of kern_event.c, but it is utterly pointless and if you take a look at my previous patch, severely complicates many things. Of course, I'm sure someone will notice and complain, but there isn't any documentation that suggests you should kevent() another kqueue(). 2. Because of this, KNOTE() can't end up calling another KNOTE() unless the consumer does something stupid (call KNOTE() from filter::event()). 3. Kqueue does the locking for you when it comes to the non-object lists. All of the filter::attach() and filter::detach() routines need to lock their object lists, but they don't touch kqueue or knote other than setting their own knote's fields. Both of those routines are called without any locks held on kqueue's part. 4. The filter::event() routines are called with internal kqueue locking held. You can lock anything else you need to, but you may not sleep; it is essentially like an interrupt handler. You must not call into KNOTE() with locks held, but you should reference your object. I've fixed what appears to be the most egregious offender, sys_pipe.c 5. If KNOTE() as an interrupt does not work for you, you may call KNOTE() with any locks you like except the ones it uses internally (mainly filedesc and file), but the only information you can give your filter::event() is the hint argument. Examples of #4 are bpf and pipe; they do not need to pass any information in the filter::event() hint, and as every handler that works on the object instead of on hints needs to do, they verify for certain whether or not the KNOTE() should have actually fired and ignore falses. The biggest example of #5 is process events. There are many different process-type locks that may be held when KNOTE() is called, but the implementation of filter::event() is mostly correct in locking nothing. In kern_fork.c, KNOTE() is called outside of the proc lock (p1->p_klist not locked as it should be) because it has to be special-cased somehow. This is the most disgusting thing EVAR. (NB: See http://green.homeunix.org/~green/kqueue-locking.1.patch for that.) Current patch at: http://green.homeunix.org/~green/kqueue-giant-locking.0.patch -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 19:36:34 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5A8CE16A4CE; Fri, 16 Apr 2004 19:36:34 -0700 (PDT) Received: from amsfep12-int.chello.nl (amsfep12-int.chello.nl [213.46.243.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 28DCC43D2F; Fri, 16 Apr 2004 19:36:33 -0700 (PDT) (envelope-from dodell@sitetronics.com) Received: from sitetronics.com ([213.46.199.67]) by amsfep12-int.chello.nl ESMTP <20040417023631.KGDM18840.amsfep12-int.chello.nl@sitetronics.com>; Sat, 17 Apr 2004 04:36:31 +0200 Message-ID: <4080982D.1070800@sitetronics.com> Date: Sat, 17 Apr 2004 04:36:29 +0200 From: "Devon H. O'Dell" User-Agent: Mozilla Thunderbird 0.5 (X11/20040319) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Brian F. Feldman" References: <200404151453.i3FErUVY005892@green.homeunix.org> In-Reply-To: <200404151453.i3FErUVY005892@green.homeunix.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: jilles@stack.nl cc: freebsd-arch@freebsd.org Subject: Re: [patch] lockf(3) user-exploitable kernel panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 02:36:34 -0000 Brian F. Feldman wrote: > "dodell@sitetronics.com" wrote: > >>>>sh has been fixed. I was under the impression that csh used libutil for >>>>this (libutil has been fixed). I'll take a deeper look into shells in >>>>base and in ports and figure out what changes I need to make there. >>>>While I'm at it, I don't think it'd be a bad idea to go ahead and build >>>>in the RLIMIT_SBSIZE to bash and bash2. >>> >>>If it is easy, it might be worthwhile to patch the shells to use >>>libutil and submit those patches back to the maintainers. >> >>There are a huge number of shells to do this with. This subsystem >>looks like somewhat of a kludge to me in this respect; the >>functionality is plainly provided in libutil, while every shell (sh >>and tcsh included) have their own implementations. limits(1) >>even has statically compiled information about the limits for >>every shell it is aware of (including sh, csh, tcsh, bash/bash2 >>and a good few others). I'll take a look at these later. > > > Thanks for doing this work, Devon! The most important part is for > /etc/login.conf to allow you to configure the maximum limits -- all the > shell stuff is really secondary. > Hrm, it seems that my last email went to /dev/null, so I'll write it again. :) I'm glad to have done this work, and I hope I can help out in the future with squashing more bugs :) I don't know who's taken a look at the patch, but it's available at http://freebsd0.sitetronics.com/~dodell/patches/lockfix.tar.gz. login.conf limits are already taken care of; so are libutil, limit(1), tcsh and sh. Regarding Linux compatibility: it seems to me that Linux limits the number of flock-style locks as well. This seems unnecessary as that is effectively limited by the maximum open files rlimit (since these types of locks are one-per-file). Still, if we wish to be compatible, the patch can be modified to affect locks of all types, though not easily. BSD-style locks (flock(2)) don't contain process information in the lf_id field, unlike POSIX locks, which means that keeping track of them per-process can get difficult. Since they're limited by the maxfilesperproc and maxprocesses anyway, it seems a bit overkill to introduce a manner to track these locks on a per-process basis. As long as an administrator keeps these limits to sane values, there is no reason that flock(2)-style locks should pose a problem. OTOH, the lockf(3) (POSIX-style) locks can easily be limited per process; this would simply remove the per-user checks and counts in my code (and fix the fact that change_ruid() needs a struct proc *). Extra sanity checks for fork(2) calls are unnecessary as POSIX locks aren't inherited. Again, any and all feedback would be appreciated. What do I need to do to get this all squared away and ready for commitment. (I'll generate patches for all non-EOLed systems from the final patch.) :) This has been a fun experience and I hope to continue to be able to contribute to the project again soon :) Kind regards, Devon H. O'Dell From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 19:53:55 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 294BB16A4CE; Fri, 16 Apr 2004 19:53:55 -0700 (PDT) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [128.30.28.20]) by mx1.FreeBSD.org (Postfix) with ESMTP id D544143D1F; Fri, 16 Apr 2004 19:53:54 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: from khavrinen.lcs.mit.edu (localhost [IPv6:::1]) by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id i3H2rmXX021501 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA); Fri, 16 Apr 2004 22:53:48 -0400 (EDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id i3H2rmAV021500; Fri, 16 Apr 2004 22:53:48 -0400 (EDT) (envelope-from wollman) Date: Fri, 16 Apr 2004 22:53:48 -0400 (EDT) From: Garrett Wollman Message-Id: <200404170253.i3H2rmAV021500@khavrinen.lcs.mit.edu> To: green@freebsd.org In-Reply-To: <200404170212.i3H2Cg8n031749@green.homeunix.org> Organization: MIT Laboratory for Computer Science X-Spam-Score: -6.6 () IN_REP_TO,QUOTED_EMAIL_TEXT X-Scanned-By: MIMEDefang 2.37 cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 02:53:55 -0000 In article <200404170212.i3H2Cg8n031749@green.homeunix.org> you write: >1. The recursion has been removed from kqueue. This means kqueues cannot be > added to other kqueues for EVFILT_READ -- yes, that ability has been > around since r1.1 of kern_event.c, Actually, I'm fairly certain that Jonathan considered this to be a fairly important property of kqueue and his papers do mention it. It was done that way specifically to allow a kqueue to be included in some larger application's event polling loop without needing to know how it was implemented. -GAWollman From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 20:30:48 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 2A3A916A4CE; Fri, 16 Apr 2004 20:30:48 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3H3Ul0t032543; Fri, 16 Apr 2004 23:30:47 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404170330.i3H3Ul0t032543@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Garrett Wollman In-Reply-To: Message from Garrett Wollman <200404170253.i3H2rmAV021500@khavrinen.lcs.mit.edu> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 16 Apr 2004 23:30:47 -0400 Sender: green@green.homeunix.org cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 03:30:48 -0000 Garrett Wollman wrote: > In article <200404170212.i3H2Cg8n031749@green.homeunix.org> you write: > >1. The recursion has been removed from kqueue. This means kqueues cannot be > > added to other kqueues for EVFILT_READ -- yes, that ability has been > > around since r1.1 of kern_event.c, > > Actually, I'm fairly certain that Jonathan considered this to be a > fairly important property of kqueue and his papers do mention it. > It was done that way specifically to allow a kqueue to be included in > some larger application's event polling loop without needing to know > how it was implemented. I can't imagine a well-designed applications has kqueues of kqueues. I didn't remove the file descriptor polling interface, I removed the file descriptor kqueue interface. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 21:47:08 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 79A5616A4CE; Fri, 16 Apr 2004 21:47:08 -0700 (PDT) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [128.30.28.20]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2FBD643D1F; Fri, 16 Apr 2004 21:47:08 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: from khavrinen.lcs.mit.edu (localhost [IPv6:::1]) by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id i3H4l6XX021994 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA); Sat, 17 Apr 2004 00:47:07 -0400 (EDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id i3H4l6Hn021993; Sat, 17 Apr 2004 00:47:06 -0400 (EDT) (envelope-from wollman) Date: Sat, 17 Apr 2004 00:47:06 -0400 (EDT) From: Garrett Wollman Message-Id: <200404170447.i3H4l6Hn021993@khavrinen.lcs.mit.edu> To: green@freebsd.org X-Newsgroups: mit.lcs.mail.freebsd-arch In-Reply-To: <200404170330.i3H3Ul0t032543@green.homeunix.org> References: Organization: MIT Laboratory for Computer Science X-Spam-Score: -9.9 () IN_REP_TO,REFERENCES X-Scanned-By: MIMEDefang 2.37 cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 04:47:08 -0000 In article <200404170330.i3H3Ul0t032543@green.homeunix.org> you write: >I can't imagine a well-designed applications has kqueues of kqueues. I can in about five seconds' worth of thought. Suppose you have library X. It accomplishes some task asynchronously (it doesn't matter what or how), and provides a descriptor that the calling application must poll for completion. Now use that library into an application that has its own event loop. This is one of the specific motivating examples behind doing kqueue rather than simply extending poll() or select(). Please go and read the papers before you continue down this path. -GAWollman From owner-freebsd-arch@FreeBSD.ORG Fri Apr 16 22:13:25 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 587A216A4CE; Fri, 16 Apr 2004 22:13:22 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3H5DDgq033705; Sat, 17 Apr 2004 01:13:15 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404170513.i3H5DDgq033705@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Garrett Wollman In-Reply-To: Message from Garrett Wollman <200404170447.i3H4l6Hn021993@khavrinen.lcs.mit.edu> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 17 Apr 2004 01:13:13 -0400 Sender: green@green.homeunix.org cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 05:13:25 -0000 Garrett Wollman wrote: > In article <200404170330.i3H3Ul0t032543@green.homeunix.org> you write: > > >I can't imagine a well-designed applications has kqueues of kqueues. > > I can in about five seconds' worth of thought. > > Suppose you have library X. It accomplishes some task asynchronously > (it doesn't matter what or how), and provides a descriptor that the > calling application must poll for completion. Now use that library > into an application that has its own event loop. > > This is one of the specific motivating examples behind doing kqueue > rather than simply extending poll() or select(). Please go and read > the papers before you continue down this path. Contrived. Let's see one. There won't be any -- they will be using threads, not kqueues, because threads work on more than one system. In case you didn't notice, kqueues have been horribly broken for years now, and if you go back and look at all the places I've pointed out so far you'll see how those behaviors are broken. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 06:16:23 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9040116A4D0; Sat, 17 Apr 2004 06:16:23 -0700 (PDT) Received: from shaft.techsupport.co.uk (shaft.techsupport.co.uk [212.250.77.214]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4255643D2D; Sat, 17 Apr 2004 06:16:23 -0700 (PDT) (envelope-from setantae@submonkey.net) Received: from cpc2-cdif3-6-0-cust204.cdif.cable.ntl.com ([81.103.67.204] helo=shrike.submonkey.net ident=mailnull) by shaft.techsupport.co.uk with esmtp (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.31; FreeBSD) id 1BEpgD-0003uW-I9; Sat, 17 Apr 2004 14:16:21 +0100 Received: from setantae by shrike.submonkey.net with local (Exim 4.31; FreeBSD) id 1BEpg9-000MTD-Aq; Sat, 17 Apr 2004 14:16:17 +0100 Date: Sat, 17 Apr 2004 14:16:17 +0100 From: Ceri Davies To: Bruce Evans Message-ID: <20040417131617.GD465@submonkey.net> Mail-Followup-To: Ceri Davies , Bruce Evans , Cyrille Lefevre , arch@freebsd.org, "current @FreeBSD.org" References: <00f401c4232b$8700d0c0$7890a8c0@dyndns.org> <022901c423cb$03f493b0$7890a8c0@dyndns.org> <20040417101206.N16280@gamplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8jEihaNHb65WmIJG" Content-Disposition: inline In-Reply-To: <20040417101206.N16280@gamplex.bde.org> X-PGP: finger ceri@FreeBSD.org User-Agent: Mutt/1.5.4i Sender: Ceri Davies cc: "current @FreeBSD.org" cc: arch@freebsd.org Subject: Re: bin/41071: make NO to NO_ transition patch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 13:16:23 -0000 --8jEihaNHb65WmIJG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 17, 2004 at 10:19:58AM +1000, Bruce Evans wrote: > On Fri, 16 Apr 2004, Cyrille Lefevre wrote: >=20 > > the purpose of the original thread was to convert "all" variables to one > > form or another, not just some of them for some or other reason. > > in one word... be consistant, it's help. > > it's not my fault if only some "no conforming" variables regarding all > > others are used in so many makefiles. >=20 > For a more modest task, try fixing the English spelling of "nothing" > to "no thing" and "consistent" to "consistant". There's noway that could be considered the same thing. (Disclaimer: I really don't care which way this falls out). Ceri --=20 --8jEihaNHb65WmIJG Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (FreeBSD) iD8DBQFAgS4hocfcwTS3JF8RAgEEAKCnW9gfKVz/r904MuZmLJ8ZJHwhmACfTqTn xO4Q9WLRYdaD5mm2MEzJdP0= =zczZ -----END PGP SIGNATURE----- --8jEihaNHb65WmIJG-- From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 11:02:08 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7199116A4CE; Sat, 17 Apr 2004 11:02:08 -0700 (PDT) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [128.30.28.20]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2900743D1D; Sat, 17 Apr 2004 11:02:08 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: from khavrinen.lcs.mit.edu (localhost [IPv6:::1]) by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id i3HI27XX026261 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA); Sat, 17 Apr 2004 14:02:07 -0400 (EDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id i3HI26T4026258; Sat, 17 Apr 2004 14:02:06 -0400 (EDT) (envelope-from wollman) Date: Sat, 17 Apr 2004 14:02:06 -0400 (EDT) From: Garrett Wollman Message-Id: <200404171802.i3HI26T4026258@khavrinen.lcs.mit.edu> To: "Brian F. Feldman" In-Reply-To: <200404170513.i3H5DDgq033705@green.homeunix.org> References: <200404170447.i3H4l6Hn021993@khavrinen.lcs.mit.edu> <200404170513.i3H5DDgq033705@green.homeunix.org> X-Spam-Score: -19.8 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES X-Scanned-By: MIMEDefang 2.37 cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 18:02:08 -0000 < said: > Contrived. Let's see one. There won't be any -- they will be using > threads, not kqueues, because threads work on more than one system. Except, of course, that the thread library may use kqueue internally. > In case > you didn't notice, kqueues have been horribly broken for years now For values of ``horribly broken'' apparently equal to ``not understood by green''. -GAWollman From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 12:00:42 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 66FC316A4CE; Sat, 17 Apr 2004 12:00:42 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56A6D43D45; Sat, 17 Apr 2004 12:00:42 -0700 (PDT) (envelope-from peter@evilpete.dyndns.org) Received: from fw.wemm.org (canning.wemm.org [192.203.228.65]) by canning.wemm.org (Postfix) with ESMTP id ECE7F2A8D5; Sat, 17 Apr 2004 12:00:41 -0700 (PDT) (envelope-from peter@overcee.wemm.org) Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3]) by fw.wemm.org (Postfix) with ESMTP id CBE2EE259; Sat, 17 Apr 2004 12:00:43 -0700 (PDT) (envelope-from peter@overcee.wemm.org) Received: from overcee.wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (8.12.11/8.12.11) with ESMTP id i3HIxorI036217; Sat, 17 Apr 2004 11:59:50 -0700 (PDT) (envelope-from peter@overcee.wemm.org) Received: from localhost (localhost [[UNIX: localhost]]) by overcee.wemm.org (8.12.11/8.12.11/Submit) id i3HIxogW036216; Sat, 17 Apr 2004 11:59:50 -0700 (PDT) (envelope-from peter) From: Peter Wemm To: freebsd-arch@freebsd.org Date: Sat, 17 Apr 2004 11:59:50 -0700 User-Agent: KMail/1.6.1 References: <200404170513.i3H5DDgq033705@green.homeunix.org> In-Reply-To: <200404170513.i3H5DDgq033705@green.homeunix.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200404171159.50311.peter@wemm.org> cc: arch@freebsd.org cc: Garrett Wollman Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 19:00:42 -0000 On Friday 16 April 2004 10:13 pm, Brian F. Feldman wrote: > Garrett Wollman wrote: > > In article <200404170330.i3H3Ul0t032543@green.homeunix.org> you write: > > >I can't imagine a well-designed applications has kqueues of > > > kqueues. > > > > I can in about five seconds' worth of thought. > > > > Suppose you have library X. It accomplishes some task > > asynchronously (it doesn't matter what or how), and provides a > > descriptor that the calling application must poll for completion. > > Now use that library into an application that has its own event > > loop. > > > > This is one of the specific motivating examples behind doing kqueue > > rather than simply extending poll() or select(). Please go and > > read the papers before you continue down this path. > > Contrived. Let's see one. There won't be any -- they will be using > threads, not kqueues, because threads work on more than one system. Actually no. We do this sort of nesting at work. And we don't use threads. Its not a contrived example. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 12:00:42 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 66FC316A4CE; Sat, 17 Apr 2004 12:00:42 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56A6D43D45; Sat, 17 Apr 2004 12:00:42 -0700 (PDT) (envelope-from peter@evilpete.dyndns.org) Received: from fw.wemm.org (canning.wemm.org [192.203.228.65]) by canning.wemm.org (Postfix) with ESMTP id ECE7F2A8D5; Sat, 17 Apr 2004 12:00:41 -0700 (PDT) (envelope-from peter@overcee.wemm.org) Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3]) by fw.wemm.org (Postfix) with ESMTP id CBE2EE259; Sat, 17 Apr 2004 12:00:43 -0700 (PDT) (envelope-from peter@overcee.wemm.org) Received: from overcee.wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (8.12.11/8.12.11) with ESMTP id i3HIxorI036217; Sat, 17 Apr 2004 11:59:50 -0700 (PDT) (envelope-from peter@overcee.wemm.org) Received: from localhost (localhost [[UNIX: localhost]]) by overcee.wemm.org (8.12.11/8.12.11/Submit) id i3HIxogW036216; Sat, 17 Apr 2004 11:59:50 -0700 (PDT) (envelope-from peter) From: Peter Wemm To: freebsd-arch@freebsd.org Date: Sat, 17 Apr 2004 11:59:50 -0700 User-Agent: KMail/1.6.1 References: <200404170513.i3H5DDgq033705@green.homeunix.org> In-Reply-To: <200404170513.i3H5DDgq033705@green.homeunix.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200404171159.50311.peter@wemm.org> cc: arch@freebsd.org cc: Garrett Wollman Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 19:00:42 -0000 On Friday 16 April 2004 10:13 pm, Brian F. Feldman wrote: > Garrett Wollman wrote: > > In article <200404170330.i3H3Ul0t032543@green.homeunix.org> you write: > > >I can't imagine a well-designed applications has kqueues of > > > kqueues. > > > > I can in about five seconds' worth of thought. > > > > Suppose you have library X. It accomplishes some task > > asynchronously (it doesn't matter what or how), and provides a > > descriptor that the calling application must poll for completion. > > Now use that library into an application that has its own event > > loop. > > > > This is one of the specific motivating examples behind doing kqueue > > rather than simply extending poll() or select(). Please go and > > read the papers before you continue down this path. > > Contrived. Let's see one. There won't be any -- they will be using > threads, not kqueues, because threads work on more than one system. Actually no. We do this sort of nesting at work. And we don't use threads. Its not a contrived example. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 12:06:02 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 8289A16A4CE; Sat, 17 Apr 2004 12:06:01 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3HJ5xiY041132; Sat, 17 Apr 2004 15:06:00 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404171906.i3HJ5xiY041132@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Garrett Wollman In-Reply-To: Message from Garrett Wollman <200404171802.i3HI26T4026258@khavrinen.lcs.mit.edu> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 17 Apr 2004 15:05:59 -0400 Sender: green@green.homeunix.org cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 19:06:02 -0000 Garrett Wollman wrote: > < said: > > > Contrived. Let's see one. There won't be any -- they will be using > > threads, not kqueues, because threads work on more than one system. > > Except, of course, that the thread library may use kqueue internally. Then we don't do that. > > In case > > you didn't notice, kqueues have been horribly broken for years now > > For values of ``horribly broken'' apparently equal to ``not understood > by green''. For values of ``horribly broken'' apparently equal to ``does not respect any locking constraints,'' ``does not have semantics which support the idea of a non-spl system,'' and ones like that, yeah. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 15:14:44 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ACBA716A4CE for ; Sat, 17 Apr 2004 15:14:44 -0700 (PDT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7812543D1D for ; Sat, 17 Apr 2004 15:14:44 -0700 (PDT) (envelope-from jmg@hydrogen.funkthat.com) Received: (qmail 3172 invoked from network); 17 Apr 2004 22:14:44 -0000 Received: from dsl017-045-168.spk4.dsl.speakeasy.net (HELO hydrogen.funkthat.com) ([69.17.45.168]) (envelope-sender ) by mail4.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 17 Apr 2004 22:14:44 -0000 Received: from hydrogen.funkthat.com (qnunef@localhost.funkthat.com [127.0.0.1])i3HMEgOE074631; Sat, 17 Apr 2004 15:14:43 -0700 (PDT) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.12.10/8.12.10/Submit) id i3HMEg7e074630; Sat, 17 Apr 2004 15:14:42 -0700 (PDT) Date: Sat, 17 Apr 2004 15:14:42 -0700 From: John-Mark Gurney To: Brian Fundakowski Feldman Message-ID: <20040417221442.GW567@funkthat.com> Mail-Followup-To: Brian Fundakowski Feldman , arch@freebsd.org References: <200404170212.i3H2Cg8n031749@green.homeunix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200404170212.i3H2Cg8n031749@green.homeunix.org> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.2-RELEASE i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html cc: arch@freebsd.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 22:14:44 -0000 Brian Fundakowski Feldman wrote this message on Fri, Apr 16, 2004 at 22:12 -0400: > I believe I have come up with a good solution to the kqueue woes in 5.X, and > I'd like to get some feedback on work that so far is letting me (on > uniprocessor, at least) run make -j8 buildworld, with USE_KQUEUE in make(1), > with no ill effect :) The locking thus far is one global kqueue lock, and I > firmly believe we should use MUTEX_PROFILING to determine if we should lock > it down any further at this point. Ok, are you going to put together a 96 way SMP box with 90 different webservers running to make sure this will scale that far?? Sure, a global lock might work for a 2- or 4- way box, but are you prepared to do the work necessary to make sure this is not a problem?? I thought the point of 5.x was to get things under their own locks instead of moving to an spl based system (which is pretty much what you've reimplemented)... > 1. The recursion has been removed from kqueue. This means kqueues cannot be > added to other kqueues for EVFILT_READ -- yes, that ability has been > around since r1.1 of kern_event.c, but it is utterly pointless and if you > take a look at my previous patch, severely complicates many things. Of > course, I'm sure someone will notice and complain, but there isn't any > documentation that suggests you should kevent() another kqueue(). This is a bug as other people point out... Are you going to make it so you can't select/poll on a kqueue too? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Apr 17 15:48:46 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 57FE316A4CE; Sat, 17 Apr 2004 15:48:46 -0700 (PDT) Received: from localhost (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i3HMmiVI042942; Sat, 17 Apr 2004 18:48:45 -0400 (EDT) (envelope-from green@green.homeunix.org) Message-Id: <200404172248.i3HMmiVI042942@green.homeunix.org> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Brian Fundakowski Feldman , arch@freebsd.org In-Reply-To: Message from John-Mark Gurney of "Sat, 17 Apr 2004 15:14:42 PDT." <20040417221442.GW567@funkthat.com> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 17 Apr 2004 18:48:43 -0400 Sender: green@green.homeunix.org Subject: Re: kqueue giant-locking (&kq_Giant, locking) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 22:48:47 -0000 John-Mark Gurney wrote: > Brian Fundakowski Feldman wrote this message on Fri, Apr 16, 2004 at 22:12 -0400: > > I believe I have come up with a good solution to the kqueue woes in 5.X, and > > I'd like to get some feedback on work that so far is letting me (on > > uniprocessor, at least) run make -j8 buildworld, with USE_KQUEUE in make(1), > > with no ill effect :) The locking thus far is one global kqueue lock, and I > > firmly believe we should use MUTEX_PROFILING to determine if we should lock > > it down any further at this point. > > Ok, are you going to put together a 96 way SMP box with 90 different > webservers running to make sure this will scale that far?? Sure, a > global lock might work for a 2- or 4- way box, but are you prepared to > do the work necessary to make sure this is not a problem?? > > I thought the point of 5.x was to get things under their own locks > instead of moving to an spl based system (which is pretty much what > you've reimplemented)... NO ONE is going to put together a 96 way SMP box and try/want to run FreeBSD. FreeBSD is not going to scale there and kqueue will not be the only reason it won't, if that WOULD be a reason at all. That kind of thinking is how we get far too complicated locking schemes that hurt performance instead of improving it. How many instructions are run to put a knote on its queues? The 2- and 4- and maybe one day 8- (but it doesn't work now!) boxes will probably never have contention with a global lock with kqueue. What makes you think a subsystem lock is "bad" for kqueue? The scheduler has one, select()/poll() have one, semaphore, time, dev_t.... > > 1. The recursion has been removed from kqueue. This means kqueues cannot be > > added to other kqueues for EVFILT_READ -- yes, that ability has been > > around since r1.1 of kern_event.c, but it is utterly pointless and if you > > take a look at my previous patch, severely complicates many things. Of > > course, I'm sure someone will notice and complain, but there isn't any > > documentation that suggests you should kevent() another kqueue(). > > This is a bug as other people point out... Are you going to make it so > you can't select/poll on a kqueue too? Yes, and I'm going to gratuitously change around the kevent structure just to rename all the elements. .... IFF you in the future want to have more locking, like per-kqueue locks, you are going to be bitten IN THE ASS by this because the kqueue is the parent of a knote, not the other way around, and you can't lock both ways. It just don't work. You need to stop complaining about what has to be removed first to make kqueue not be totally broken unless you're coming up, now, with how to fix that. This is how the kernel gets so damn big and broken and so many things get reinvented all the time. People add features that are not well-thought-out and implementations nigh on impossible to modernify. You just can't do this shit. If you want to have anything other than a giant lock on kqueues, you would not be saying "WE CANNOT REMOVE THAT!" without saying how things have to be completely redesigned to take that into account. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\