From owner-freebsd-arch@FreeBSD.ORG Sun Feb 17 18:36:25 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BBBCE16A417 for ; Sun, 17 Feb 2008 18:36:25 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 7F11013C455 for ; Sun, 17 Feb 2008 18:36:25 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m1HISQKm000137; Sun, 17 Feb 2008 11:28:27 -0700 (MST) (envelope-from imp@bsdimp.com) Date: Sun, 17 Feb 2008 11:33:40 -0700 (MST) Message-Id: <20080217.113340.390436320.imp@bsdimp.com> To: julian@elischer.org From: "M. Warner Losh" In-Reply-To: <47B3EB4E.40508@elischer.org> References: <86ve4s9357.fsf@ds4.des.no> <20080213184607.GK1340@hoeg.nl> <47B3EB4E.40508@elischer.org> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: des@des.no, ed@fxq.nl, freebsd-arch@FreeBSD.org Subject: Re: Proposal for redesigning the TTY layer X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2008 18:36:25 -0000 In message: <47B3EB4E.40508@elischer.org> Julian Elischer writes: : Ed Schouten wrote: : > Hello Dag-Erling, : > = : > * Dag-Erling Sm=F8rgrav wrote: : >> Ed Schouten writes: : >>> The last couple of days I've been working on a document which des= cribes : >>> the changes I'm going to perform. I have just finished this docum= ent, so : >>> I'm sending it to this list, so you can give your opinion on this= : >>> matter. : >> wiki.freebsd.org please :) : > = : > Great idea. I'll take some time to write an article when I'm at the= : > office tomorrow. : > = : >>> As stated in the conclusion of this document, I am willing to con= tinue : >>> development after I graduate. Unfortunately I don't possess all h= ardware : >>> supported by the TTY layer, which means I could use some help aft= er I : >>> finished by internship to bring back support for the remaining ha= rdware. : >> Perhaps you could provide a short list of the types of hardware yo= u need : >> help with? : > = : > The only drivers I can take a look at, are: : > = : > - syscons : > - sio and uart : > - pty : > - u(pl)com : = : gxemu 'test' console : = : http://gavare.se/gxemul/gxemul-stable/doc/experiments.html#hello : = : :-) : (the simplest console in the universe) I have code that works with it in the current system. Warner From owner-freebsd-arch@FreeBSD.ORG Sun Feb 17 19:10:43 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6E4316A41B for ; Sun, 17 Feb 2008 19:10:43 +0000 (UTC) (envelope-from rink@tragedy.rink.nu) Received: from mx1.rink.nu (alastor.rink.nu [213.34.49.5]) by mx1.freebsd.org (Postfix) with ESMTP id 73A9513C45E for ; Sun, 17 Feb 2008 19:10:43 +0000 (UTC) (envelope-from rink@tragedy.rink.nu) Received: from localhost (alastor.rink.nu [213.34.49.5]) by mx1.rink.nu (Postfix) with ESMTP id CC4CABFEBDC; Sun, 17 Feb 2008 18:51:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at rink.nu Received: from mx1.rink.nu ([213.34.49.5]) by localhost (alastor.rink.nu [213.34.49.5]) (amavisd-new, port 10024) with ESMTP id 1-7eQN+Ldys6; Sun, 17 Feb 2008 18:51:31 +0000 (UTC) Received: from tragedy.rink.nu (tragedy.rink.nu [213.34.49.3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.rink.nu (Postfix) with ESMTP id 588F3BFEB7A; Sun, 17 Feb 2008 18:51:31 +0000 (UTC) Received: from tragedy.rink.nu (tragedy.rink.nu [213.34.49.3]) by tragedy.rink.nu (8.13.8/8.13.8) with ESMTP id m1HIpVRT001411; Sun, 17 Feb 2008 19:51:31 +0100 (CET) (envelope-from rink@tragedy.rink.nu) Received: (from rink@localhost) by tragedy.rink.nu (8.13.8/8.13.8/Submit) id m1HIpUU3001410; Sun, 17 Feb 2008 19:51:30 +0100 (CET) (envelope-from rink) Date: Sun, 17 Feb 2008 19:51:30 +0100 From: Rink Springer To: Poul-Henning Kamp Message-ID: <20080217185130.GA98720@rink.nu> References: <20080216222230.GA47480@lpthe.jussieu.fr> <13491.1203201868@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <13491.1203201868@critter.freebsd.dk> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org, Michel Talon Subject: Re: Fifolog - a circular file for embedded systems X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2008 19:10:43 -0000 On Sat, Feb 16, 2008 at 10:44:28PM +0000, Poul-Henning Kamp wrote: > But yes, for a number of reasons, I personally lean towards the base > system, as that would allow us to offer it as an option for syslog > also on regular systems. I think having this in base would be great - a port may work well too, but since it's pretty small and quite useful to have without having to install ports, I'd vote for the base system as well. -- Rink P.W. Springer - http://rink.nu "Anyway boys, this is America. Just because you get more votes doesn't mean you win." - Fox Mulder From owner-freebsd-arch@FreeBSD.ORG Sun Feb 17 21:24:49 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F06616A420 for ; Sun, 17 Feb 2008 21:24:49 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [62.111.66.27]) by mx1.freebsd.org (Postfix) with ESMTP id BBFF513C4E5 for ; Sun, 17 Feb 2008 21:24:48 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.str.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 4A47A41C7AC; Sun, 17 Feb 2008 22:24:47 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([62.111.66.27]) by localhost (amavis.str.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id 5V7m8Uu0v6BZ; Sun, 17 Feb 2008 22:24:46 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id EF16A41C7AB; Sun, 17 Feb 2008 22:24:46 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 3986744487F; Sun, 17 Feb 2008 21:24:26 +0000 (UTC) Date: Sun, 17 Feb 2008 21:24:26 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Robert Watson In-Reply-To: <20080106124517.G105@fledge.watson.org> Message-ID: <20080217210205.A49429@maildrop.int.zabbadoz.net> References: <20080106124517.G105@fledge.watson.org> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, kmacy@FreeBSD.org, net@FreeBSD.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2008 21:24:49 -0000 On Sun, 6 Jan 2008, Robert Watson wrote: Hi, [cutting a long mail short and randomly replying;-)] I came across dev/cxgb/ulp/tom/cxgb_tcp_subr.c vs. netinet/tcp_subr.c and I am a bit worried with the way things are done atm. For those functions copied over there are only changes like: - tp = cxgb_tcp_drop(tp, ECONNABORTED); + tp = tcp_drop(tp, ECONNABORTED); - notify = cxgb_tcp_drop_syn_sent; + notify = tcp_drop_syn_sent; - tcp_gen_listen_close(tp); + tcp_offload_listen_close(tp); - (void) tcp_gen_reset(tp); + (void) tcp_output_reset(tp); and SYSCTL stuff. This is a "problem" for following reasons: - code duplication - if one changes netinet/tcp_subr.c one has to change foo4_tcp_subr.c as well - if more drivers are going to implement things that way it'll be even more code duplication. - developers will have to check lots of different places they might not expect in first place. - those things might interfere with our locking as well. I assume (without looking) the other files in the tom directory expose similar behavior. So this is a more general problem: we need to seriously think about abstracting our tcp_subr.c (and other) functions to avoid this duplication or at least integrate things better by other ways. This is mostly asking networking people to think about this so we can iteratively improve things. cxgb has done a good first step in that direction, now is the time to further hone things. /bz -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT Software is harder than hardware so better get it right the first time. From owner-freebsd-arch@FreeBSD.ORG Sun Feb 17 21:24:59 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FDFB16A46B for ; Sun, 17 Feb 2008 21:24:59 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 1CCE513C4CC for ; Sun, 17 Feb 2008 21:24:58 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 1A6BB1CCA3; Sun, 17 Feb 2008 22:24:57 +0100 (CET) Date: Sun, 17 Feb 2008 22:24:57 +0100 From: Ed Schouten To: freebsd-arch@freebsd.org Message-ID: <20080217212457.GX1340@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="va9XEZk9/dJ5GUjX" Content-Disposition: inline User-Agent: Mutt/1.5.17 (2007-11-01) Subject: Device minor number uniqueness X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2008 21:24:59 -0000 --va9XEZk9/dJ5GUjX Content-Type: multipart/mixed; boundary="vM12nk/63StVgfqY" Content-Disposition: inline --vM12nk/63StVgfqY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, This week I was looking at the ttycreate() function in tty.c. As you can see, it generates unique numbers and passes them to make_dev() to make sure each TTY device node has its own number. The manual page has a rather brief description of the `minor' argument of make_dev(): The cdev returned by make_dev() and make_dev_alias() has two fields, si_drv1 and si_drv2, that are available to store state. Both fields a= re of type void *. These are designed to replace the minor argument to make_dev(). I discovered you get a panic when you call make_dev() multiple times with the same minor number, because newdev() always walks down the device list to return an existing device which shares the same minor number. After digging into some more source code, it turns out a lot of drivers use minor number to store device numbers and such, but there is no real reason why we should enforce drivers to use unique minor numbers. Because we cannot change this behaviour at once, I'm proposing the following patch, which adds a new flag called D_MULTIMINOR. This flag allows you to create multiple devices that share the same minor number. This way there is no need for creating your own unrhdr to hold some kind of free list. --=20 Ed Schouten WWW: http://g-rave.nl/ --vM12nk/63StVgfqY Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="make_dev.diff" Content-Transfer-Encoding: quoted-printable =3D=3D=3D=3D //depot/user/ed/mpsafetty/sys/kern/kern_conf.c#3 - /home/ed/p4= /mpsafetty/sys/kern/kern_conf.c =3D=3D=3D=3D @@ -458,10 +458,12 @@ =20 mtx_assert(&devmtx, MA_OWNED); udev =3D y; - LIST_FOREACH(si2, &csw->d_devs, si_list) { - if (si2->si_drv0 =3D=3D udev) { - dev_free_devlocked(si); - return (si2); + if (!(csw->d_flags & D_MULTIMINOR)) { + LIST_FOREACH(si2, &csw->d_devs, si_list) { + if (si2->si_drv0 =3D=3D udev) { + dev_free_devlocked(si); + return (si2); + } } } si->si_drv0 =3D udev; =3D=3D=3D=3D //depot/user/ed/mpsafetty/sys/sys/conf.h#3 - /home/ed/p4/mpsaf= etty/sys/sys/conf.h =3D=3D=3D=3D @@ -171,6 +171,7 @@ #define D_MMAP_ANON 0x00100000 /* special treatment in vm_mmap.c */ #define D_PSEUDO 0x00200000 /* make_dev() can return NULL */ #define D_NEEDGIANT 0x00400000 /* driver want Giant */ +#define D_MULTIMINOR 0x00800000 /* don't track minor numbers */ =20 /* * Version numbers. --vM12nk/63StVgfqY-- --va9XEZk9/dJ5GUjX Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAke4pigACgkQ52SDGA2eCwXR6wCff1IHRNanZ/QP0utNXpAVkFPF UEMAnR2SNtGWau6WpcLyX30zI98F26F3 =OzuT -----END PGP SIGNATURE----- --va9XEZk9/dJ5GUjX-- From owner-freebsd-arch@FreeBSD.ORG Sun Feb 17 21:33:43 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED39C16A417 for ; Sun, 17 Feb 2008 21:33:43 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id AEB5613C43E for ; Sun, 17 Feb 2008 21:33:43 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 65C7E17105; Sun, 17 Feb 2008 21:33:42 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m1HLXfr7024780; Sun, 17 Feb 2008 21:33:41 GMT (envelope-from phk@critter.freebsd.dk) To: Ed Schouten From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sun, 17 Feb 2008 22:24:57 +0100." <20080217212457.GX1340@hoeg.nl> Date: Sun, 17 Feb 2008 21:33:41 +0000 Message-ID: <24779.1203284021@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: freebsd-arch@freebsd.org Subject: Re: Device minor number uniqueness X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2008 21:33:44 -0000 In message <20080217212457.GX1340@hoeg.nl>, Ed Schouten writes: >After digging into some more source code, it turns out a lot of drivers >use minor number to store device numbers and such, but there is no real >reason why we should enforce drivers to use unique minor numbers. the major & minor together combine to the userland concept of a "dev_t" which conforms to POSIX. While I don't think POSIX demands that dev_t has to be unique per device, the amount of software that assumes them to be is not to be sneezed at. If you just need a minor number to fill out the field, use the unit number allocation functions. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sun Feb 17 21:44:35 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0D92D16A41A for ; Sun, 17 Feb 2008 21:44:35 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id B564313C46E for ; Sun, 17 Feb 2008 21:44:34 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id C19C91CCA3; Sun, 17 Feb 2008 22:44:33 +0100 (CET) Date: Sun, 17 Feb 2008 22:44:33 +0100 From: Ed Schouten To: Poul-Henning Kamp Message-ID: <20080217214433.GY1340@hoeg.nl> References: <20080217212457.GX1340@hoeg.nl> <24779.1203284021@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="03sphU6jKm9HdgU1" Content-Disposition: inline In-Reply-To: <24779.1203284021@critter.freebsd.dk> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: freebsd-arch@freebsd.org Subject: Re: Device minor number uniqueness X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2008 21:44:35 -0000 --03sphU6jKm9HdgU1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * Poul-Henning Kamp wrote: > In message <20080217212457.GX1340@hoeg.nl>, Ed Schouten writes: >=20 > >After digging into some more source code, it turns out a lot of drivers > >use minor number to store device numbers and such, but there is no real > >reason why we should enforce drivers to use unique minor numbers. >=20 > the major & minor together combine to the userland concept of a > "dev_t" which conforms to POSIX. >=20 > While I don't think POSIX demands that dev_t has to be unique per > device, the amount of software that assumes them to be is not to > be sneezed at. >=20 > If you just need a minor number to fill out the field, use the > unit number allocation functions. It seems this minor number is completely unrelated to the numbers that are displayed through stat(2)'s st_rdev field. I just created about 400 device nodes without specifying a minor number and all nodes had a unique number. According to sysctl_devname() and devfs_getattr(), st_rdev is just based on the inode number of the device. --=20 Ed Schouten WWW: http://g-rave.nl/ --03sphU6jKm9HdgU1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAke4qsEACgkQ52SDGA2eCwUgRgCfbdqW+LoyolgW1YbMzqvzVv/c PYkAoIBqkTfY8DYpZG2fk2LnM2ldB/6V =f2m+ -----END PGP SIGNATURE----- --03sphU6jKm9HdgU1-- From owner-freebsd-arch@FreeBSD.ORG Mon Feb 18 05:55:05 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05F6616A418 for ; Mon, 18 Feb 2008 05:55:05 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from qb-out-0506.google.com (qb-out-0506.google.com [72.14.204.230]) by mx1.freebsd.org (Postfix) with ESMTP id AB9F113C467 for ; Mon, 18 Feb 2008 05:55:04 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by qb-out-0506.google.com with SMTP id o24so6732997qba.1 for ; Sun, 17 Feb 2008 21:55:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=balJ4I/NBIYSCRBIqWuW326c0HEI/3X216XFonSNKuY=; b=pLZbF3Ch1KtGnPdkThunNplfa93Wbn+YgEh4yhjCVanE27v26Chf7T9fMysLcXg4GK60Drl4wcS+RjgT5dDakGOw52ff0Z8RWfjIQYN0LHFChQlw/cHQkHCOpHMma/eqdGLEGxO3fc2QXBoRj11GMXlHTSMZVlGPlDKDZzWwdt4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=ZcjDfd9LZ6+nhbcb3kQIc8pKSjkseytepQ/10/54qBTFS7XBggHSHG58hPJ2ayoJ4aClGhmfbBlWY8m5NqSK5EkuY4AfCPhDK0xU3gMFdqT0123jaX9YuktBKB4Blr4uvjymiLYC8ucjzdoLjvN3iWHrE10k+4f+LhtguRkI5uI= Received: by 10.114.168.1 with SMTP id q1mr6372831wae.96.1203313291351; Sun, 17 Feb 2008 21:41:31 -0800 (PST) Received: by 10.115.22.10 with HTTP; Sun, 17 Feb 2008 21:41:31 -0800 (PST) Message-ID: Date: Sun, 17 Feb 2008 21:41:31 -0800 From: "Kip Macy" To: "Bjoern A. Zeeb" In-Reply-To: <20080217210205.A49429@maildrop.int.zabbadoz.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080106124517.G105@fledge.watson.org> <20080217210205.A49429@maildrop.int.zabbadoz.net> Cc: arch@freebsd.org, Robert Watson , kmacy@freebsd.org, net@freebsd.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2008 05:55:05 -0000 You might want to check out sys/modules/cxgb/tom/Makefile. -Kip On Feb 17, 2008 1:24 PM, Bjoern A. Zeeb wrote: > On Sun, 6 Jan 2008, Robert Watson wrote: > > Hi, > > [cutting a long mail short and randomly replying;-)] > > I came across dev/cxgb/ulp/tom/cxgb_tcp_subr.c vs. netinet/tcp_subr.c > and I am a bit worried with the way things are done atm. For those > functions copied over there are only changes like: > > - tp = cxgb_tcp_drop(tp, ECONNABORTED); > + tp = tcp_drop(tp, ECONNABORTED); > > - notify = cxgb_tcp_drop_syn_sent; > + notify = tcp_drop_syn_sent; > > - tcp_gen_listen_close(tp); > + tcp_offload_listen_close(tp); > > - (void) tcp_gen_reset(tp); > + (void) tcp_output_reset(tp); > > and SYSCTL stuff. > > > This is a "problem" for following reasons: > - code duplication > - if one changes netinet/tcp_subr.c one has to change foo4_tcp_subr.c > as well > - if more drivers are going to implement things that way it'll be > even more code duplication. > - developers will have to check lots of different places they might > not expect in first place. > - those things might interfere with our locking as well. > > I assume (without looking) the other files in the tom directory expose > similar behavior. So this is a more general problem: > > we need to seriously think about abstracting our tcp_subr.c (and > other) functions to avoid this duplication or at least integrate > things better by other ways. > > This is mostly asking networking people to think about this so we can > iteratively improve things. cxgb has done a good first step in that > direction, now is the time to further hone things. > > > /bz > > -- > Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT > Software is harder than hardware so better get it right the first time. > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Mon Feb 18 08:35:08 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B8B416A417; Mon, 18 Feb 2008 08:35:08 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [62.111.66.27]) by mx1.freebsd.org (Postfix) with ESMTP id 2F38A13C469; Mon, 18 Feb 2008 08:35:08 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.str.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 06FA741C751; Mon, 18 Feb 2008 09:35:06 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([62.111.66.27]) by localhost (amavis.str.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id o5HFXmf6-emP; Mon, 18 Feb 2008 09:35:05 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id 96C4A41C750; Mon, 18 Feb 2008 09:35:05 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 66D7E44487F; Mon, 18 Feb 2008 08:31:58 +0000 (UTC) Date: Mon, 18 Feb 2008 08:31:58 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Kip Macy In-Reply-To: Message-ID: <20080218082839.T49429@maildrop.int.zabbadoz.net> References: <20080106124517.G105@fledge.watson.org> <20080217210205.A49429@maildrop.int.zabbadoz.net> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , kmacy@freebsd.org, net@freebsd.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2008 08:35:08 -0000 On Sun, 17 Feb 2008, Kip Macy wrote: Hi, > You might want to check out sys/modules/cxgb/tom/Makefile. ha, so that file is not compiled at all. Thanks for pointing this out. Is there a reason to keep it in cvs then? I guess there is but it's not obvious to me;-) So basically what does that means from the ?PI perspective. It's no longer needed why? Or why had it been used in first place? Do we expect people to need similar duplicates depending on what their 'hardware' supports? /bz -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT Software is harder than hardware so better get it right the first time. From owner-freebsd-arch@FreeBSD.ORG Mon Feb 18 15:39:21 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7704216A418 for ; Mon, 18 Feb 2008 15:39:21 +0000 (UTC) (envelope-from csaba@beastie.creo.hu) Received: from beastie.creo.hu (www.creo.hu [217.113.62.14]) by mx1.freebsd.org (Postfix) with ESMTP id E490D13C455 for ; Mon, 18 Feb 2008 15:39:20 +0000 (UTC) (envelope-from csaba@beastie.creo.hu) Received: from beastie.creo.hu (localhost [127.0.0.1]) by beastie.creo.hu (8.14.1/8.14.1) with ESMTP id m1IFbI9L037023 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Feb 2008 16:37:18 +0100 (CET) (envelope-from csaba@beastie.creo.hu) Received: (from csaba@localhost) by beastie.creo.hu (8.14.1/8.14.1/Submit) id m1IFbH36037021; Mon, 18 Feb 2008 16:37:17 +0100 (CET) (envelope-from csaba) Date: Mon, 18 Feb 2008 16:37:17 +0100 From: Csaba Henk To: Peter Jeremy Message-ID: <20080218153717.GH49155@beastie.creo.hu> References: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com> <70e8236f0802070321n9097d3fy1b39f637b3c2a06@mail.gmail.com> <867ihdc34c.fsf@ds4.des.no> <20080212190207.GB49155@beastie.creo.hu> <86d4r2540f.fsf@ds4.des.no> <20080213165923.GD49155@beastie.creo.hu> <86zlu493ep.fsf@ds4.des.no> <20080214101511.GE49155@beastie.creo.hu> <20080214182740.GZ64299@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080214182740.GZ64299@server.vk2pj.dyndns.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (beastie.creo.hu [127.0.0.1]); Mon, 18 Feb 2008 16:37:18 +0100 (CET) Cc: freebsd-arch@freebsd.org Subject: Re: [RFC] Remove NTFS kernel support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2008 15:39:21 -0000 On Fri, Feb 15, 2008 at 05:27:40AM +1100, Peter Jeremy wrote: > On Thu, Feb 14, 2008 at 11:15:11AM +0100, Csaba Henk wrote: > >yes, why so? FreeBSD has embraced recently a big chunk of CDDL'd code > >without making much fuss about licensing, and for practical purposes, > > If you're talking about either dtrace or ZFS: > 1) The features are highly desirable and no more suitably licenced > alternative is available now or likely to become available in the > near future. It's subjective how desirable something... it might make sense to claim the above statement wrt. FUSE. OTOH, by "absolutely necessary" I tought of something in the category of gcc/sshd/sendmail... Until ZFS becomes the recommended filesystem for fresh FreeBSD installations, I wouldn't put it into that category. > 2) It is not part of the GENERIC system and will remain optional due to > the license. It smells like apples and oranges to me... GENERIC is the name of the default configuration for the _kernel_, isnt'it? Wrt. FUSE, there was no mention of adding code to the kernel under other license than BSD. The LGPL'd/GPL'd bits we discuss all belong to the userspace. > 3) In the case of dtrace, licensing issues have delayed its implementation > by at least a year. Well again, in case of FUSE, the userspace parts were not reimplemented, they just needed some porting. The kernel module had no technical problems due to licensing issues: it was written from scratch under a BSD license (which in turn was first of all a purely technical constraint due to the differences between the BSD and the Linux VFS), except for the header fuse_kernel.h which was relicensed under a GPL/BSD dual license by courtesy of Miklos Szeredi. Regards, Csaba From owner-freebsd-arch@FreeBSD.ORG Mon Feb 18 16:40:13 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5F5316A419 for ; Mon, 18 Feb 2008 16:40:13 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 6394A13C448 for ; Mon, 18 Feb 2008 16:40:13 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 1B3BF2094; Mon, 18 Feb 2008 17:40:07 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id E3B322093; Mon, 18 Feb 2008 17:40:06 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id B43698449D; Mon, 18 Feb 2008 17:40:06 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Csaba Henk References: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com> <70e8236f0802070321n9097d3fy1b39f637b3c2a06@mail.gmail.com> <867ihdc34c.fsf@ds4.des.no> <20080212190207.GB49155@beastie.creo.hu> <86d4r2540f.fsf@ds4.des.no> <20080213165923.GD49155@beastie.creo.hu> <86zlu493ep.fsf@ds4.des.no> <20080214101511.GE49155@beastie.creo.hu> <20080214182740.GZ64299@server.vk2pj.dyndns.org> <20080218153717.GH49155@beastie.creo.hu> Date: Mon, 18 Feb 2008 17:40:06 +0100 In-Reply-To: <20080218153717.GH49155@beastie.creo.hu> (Csaba Henk's message of "Mon\, 18 Feb 2008 16\:37\:17 +0100") Message-ID: <86d4qu43rt.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-arch@freebsd.org Subject: Re: [RFC] Remove NTFS kernel support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2008 16:40:13 -0000 Csaba Henk writes: > Peter Jeremy writes: > > Csaba Henk writes: > > > yes, why so? FreeBSD has embraced recently a big chunk of CDDL'd > > > code without making much fuss about licensing [...] > > If you're talking about either dtrace or ZFS: > > [...] > > 2) It is not part of the GENERIC system and will remain optional due to > > the license. > It smells like apples and oranges to me... GENERIC is the name of the > default configuration for the _kernel_, isnt'it? Correct. It was *you* who brought up the issue of CDDL code in the kernel. Peter pointed out, very correctly, that the code in question is not included in the default kernel. Before you go any further with this, please take a moment to reflect on the difference between ~120,000 lines of highly complex kernel code and ~20,000 lines of fairly simple userland code which we know has already been (at least partly) reimplemented under a friendlier license. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Mon Feb 18 22:19:21 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E810816A475 for ; Mon, 18 Feb 2008 22:19:21 +0000 (UTC) (envelope-from csaba@beastie.creo.hu) Received: from beastie.creo.hu (www.creo.hu [217.113.62.14]) by mx1.freebsd.org (Postfix) with ESMTP id 7CDA913C455 for ; Mon, 18 Feb 2008 22:19:20 +0000 (UTC) (envelope-from csaba@beastie.creo.hu) Received: from beastie.creo.hu (localhost [127.0.0.1]) by beastie.creo.hu (8.14.1/8.14.1) with ESMTP id m1IMHH1E058582 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Feb 2008 23:17:17 +0100 (CET) (envelope-from csaba@beastie.creo.hu) Received: (from csaba@localhost) by beastie.creo.hu (8.14.1/8.14.1/Submit) id m1IMHHYN058581; Mon, 18 Feb 2008 23:17:17 +0100 (CET) (envelope-from csaba) Date: Mon, 18 Feb 2008 23:17:17 +0100 From: Csaba Henk To: Dag-Erling Smorgrav Message-ID: <20080218221716.GI49155@beastie.creo.hu> References: <867ihdc34c.fsf@ds4.des.no> <20080212190207.GB49155@beastie.creo.hu> <86d4r2540f.fsf@ds4.des.no> <20080213165923.GD49155@beastie.creo.hu> <86zlu493ep.fsf@ds4.des.no> <20080214101511.GE49155@beastie.creo.hu> <20080214182740.GZ64299@server.vk2pj.dyndns.org> <20080218153717.GH49155@beastie.creo.hu> <86d4qu43rt.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86d4qu43rt.fsf@ds4.des.no> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (beastie.creo.hu [127.0.0.1]); Mon, 18 Feb 2008 23:17:18 +0100 (CET) Cc: freebsd-arch@freebsd.org Subject: Re: [RFC] Remove NTFS kernel support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2008 22:19:22 -0000 On Mon, Feb 18, 2008 at 05:40:06PM +0100, Dag-Erling Sm??rgrav wrote: > Csaba Henk writes: > > Peter Jeremy writes: > > > Csaba Henk writes: > > > > yes, why so? FreeBSD has embraced recently a big chunk of CDDL'd > > > > code without making much fuss about licensing [...] > > > If you're talking about either dtrace or ZFS: > > > [...] > > > 2) It is not part of the GENERIC system and will remain optional due to > > > the license. > > It smells like apples and oranges to me... GENERIC is the name of the > > default configuration for the _kernel_, isnt'it? > > Correct. It was *you* who brought up the issue of CDDL code in the > kernel. Peter pointed out, very correctly, that the code in question is > not included in the default kernel. Umm, I don't mean to nit-pick, but I thought of the CDDL code in userspace... Sorry if I was ambiguous. Csaba From owner-freebsd-arch@FreeBSD.ORG Tue Feb 19 02:31:58 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAD2F16A419 for ; Tue, 19 Feb 2008 02:31:58 +0000 (UTC) (envelope-from wwwrun@h128679.serverkompetenz.net) Received: from h128679.serverkompetenz.net (isolution-online.de [81.169.167.126]) by mx1.freebsd.org (Postfix) with ESMTP id A49B313C4E9 for ; Tue, 19 Feb 2008 02:31:58 +0000 (UTC) (envelope-from wwwrun@h128679.serverkompetenz.net) Received: by h128679.serverkompetenz.net (Postfix, from userid 30) id 427846798A8; Tue, 19 Feb 2008 04:07:15 +0100 (CET) To: freebsd-arch@freebsd.org From: Canada Lottery Online MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit Message-Id: <20080219030715.427846798A8@h128679.serverkompetenz.net> Date: Tue, 19 Feb 2008 04:07:15 +0100 (CET) Subject: Lottery Winning Notification Alert X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: mrs.rudluffhanna@yahoo.ca List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 02:31:58 -0000 CANADA LOTTERY ONLINE REFERENCE NUMBER :-BTL/491OXI/04 BATCH NUMBER :-12/25/0304 TICKET NUMBER :-56475600545-18 OUR REF :-5388/02 ---------------------------------------------------------- Dear Lucky Winner, We are pleased to announce your email address as one of the 200 lucky winners in the Free Lotto draw held on Monday 17th Feb.2007. All participants for the online version were selected randomly from World Wide Web sites through computer draw system and extracted from over 100,000 unions,associations and corporate bodies that are listed online. Consequently, you have therefore been approved for a total payout of CAD $500, 000.00 (FIVE HUNDRED THOUSAND DOLLARS) only. Your email address emerged along side with 199 others as a category of winner in this Month draw. This Month Canada Lottery Online is proudly sponsored by the Best Buy Ltd (Canada), Coca Cola, Nokia, The Intel Group, Toyota,Toshiba. The Canada lottery online draw is held once in every six month interval. And is also organized to encourage the use of internet and computers. We are proud to say that over CAD $25 Million Dollars are won every six months in more than 150 countries world-wide. For security reasons, you are advised to keep your winning information confidential until you claim your prize money.This is part of our precautionary measure to avoid double claiming and unwarranted abuse of this program. Please be warned. You are to provide the promotion manager with the under listed Information FULL NAMES(SURNAME FIRST):- DATE OF BIRTH :-YY/MM/DD MARITAL STATUS :-Married/Single CONTACT ADDRESS :- COUNTRY :- TELEPHONE NUMBER :- OCCUPATION :- EMAIL ADDRESS :- ---------------------------------------------------------- To file for your claim,contact(Canada Lottery Online Agent) with the below details :- Agent Name :-Mrs. Rudluff Hanna Email Address:-mrs.rudluffhanna@yahoo.ca :-mrs.rudluffhanna@hotmail.com Agent Registration Number:-723 Faithfully, Canada Lottery Online. Sponsored by the Best Buy Ltd (Canada), Coca Cola, Nokia, The Intel Group, Toyota,Toshiba. From owner-freebsd-arch@FreeBSD.ORG Tue Feb 19 12:47:03 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FC5216A421 for ; Tue, 19 Feb 2008 12:47:03 +0000 (UTC) (envelope-from clientele@welcomeoffice.emv1.net) Received: from emailer99-209.emv1.net (emailer99-209.emv1.net [84.14.99.209]) by mx1.freebsd.org (Postfix) with ESMTP id E85F713C4EC for ; Tue, 19 Feb 2008 12:47:02 +0000 (UTC) (envelope-from clientele@welcomeoffice.emv1.net) Received: by emailer99-209.emv1.net (PowerMTA(TM) v3.2r17) id hnb7sc0bmg0n for ; Tue, 19 Feb 2008 13:46:59 +0100 (envelope-from ) Date: Tue, 19 Feb 2008 13:46:59 +0100 (CET) From: Welcome Office To: =?iso-8859-15?Q?Votre_soci=E9t=E9?= Message-ID: <4532647803.2269251.1203425219746@sch3> MIME-Version: 1.0 X-EMV-CampagneId: 2269251$ X-EMV-MemberId: 4532647803$ Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: =?iso-8859-15?q?Votre_papier_=E0_2=2C08_EUR_la_ramette?= X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "clientele@welcomeoffice.com" List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 12:47:03 -0000 WELCOME OFFICE n°1 DU DISCOUNT AUX ENTREPRISES Si vous ne visualisez pas correctement ce message, cliquez ici : http://trc1.emv2.com/I?a=A9X7Cqgop4aU8QKdc6F4ZorkwA Pour être certain de recevoir notre newsletter , nous vous conseillons d'ajouter clientele@welcomeoffice.emv1.net à votre carnet d'adresses ========================================================= Offre réservée à votre société, pour votre première commande Trois offres pour imprimer à volonté ! - Papier Spécial Repro à 2,08¤ HT au lieu de 3,59¤ HT la ramette de 500 feuilles A4 blanc 80g/m² Pour tous copieurs et imprimantes. Idéal pour vos travaux quotidiens. - Papier Datacopy à 2,99¤ HT au lieu de 4,34¤ HT la ramette de 500 feuilles A4 blanc 85g/m² Pour tous copieurs et imprimantes. - Papier ClairAlfa à 3,49¤ HT au lieu de 3,99¤ HT la ramette de 500 feuilles A4 blanc 80g/m² Pour tous copieurs et imprimantes. Plus grande brillance d'impression. Pour bénéficier de cette offre, rien de plus simple : - connectez vous sur notre site : http://trc1.emv2.com/I?a=A9X7Cqgop4aU8QKdc6F4Zojkwg - dans la section "Ma commande", inscrivez le code suivant : M08P208PP ou copiez-collez le lien suivant dans votre navigateur web http://trc1.emv2.com/I?a=A9X7Cqgop4aU8QKdc6F4Zozkxg Bonne journée. L'équipe WELCOME OFFICE ========================================================= Vous recevez ce message car vous avez été en contact avec le Service Commercial de Welcome Office ou de ses partenaires. Pour ne plus recevoir de messages de la part de Welcome Office, cliquez ici : http://trc1.emv2.com/I?a=A9X7Cqgop4aU8QKdc6F4ZovkwQ From owner-freebsd-arch@FreeBSD.ORG Tue Feb 19 17:04:19 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B93816A419 for ; Tue, 19 Feb 2008 17:04:19 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 54E8413C469 for ; Tue, 19 Feb 2008 17:04:19 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id A94CB2088; Tue, 19 Feb 2008 18:04:12 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 99A33207E; Tue, 19 Feb 2008 18:04:12 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 81F5584462; Tue, 19 Feb 2008 18:04:12 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Csaba Henk References: <867ihdc34c.fsf@ds4.des.no> <20080212190207.GB49155@beastie.creo.hu> <86d4r2540f.fsf@ds4.des.no> <20080213165923.GD49155@beastie.creo.hu> <86zlu493ep.fsf@ds4.des.no> <20080214101511.GE49155@beastie.creo.hu> <20080214182740.GZ64299@server.vk2pj.dyndns.org> <20080218153717.GH49155@beastie.creo.hu> <86d4qu43rt.fsf@ds4.des.no> <20080218221716.GI49155@beastie.creo.hu> Date: Tue, 19 Feb 2008 18:04:12 +0100 In-Reply-To: <20080218221716.GI49155@beastie.creo.hu> (Csaba Henk's message of "Mon\, 18 Feb 2008 23\:17\:17 +0100") Message-ID: <86skzoc1yr.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-arch@freebsd.org Subject: Re: [RFC] Remove NTFS kernel support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 17:04:19 -0000 Csaba Henk writes: > Umm, I don't mean to nit-pick, but I thought of the CDDL code in > userspace... Sorry if I was ambiguous. Ah, OK. As far as I know, the only CDDL code in userspace is in the command-line interfaces to the CDDL code in the kernel. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Tue Feb 19 17:43:53 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B712F16A468 for ; Tue, 19 Feb 2008 17:43:53 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 78A0813C455 for ; Tue, 19 Feb 2008 17:43:53 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id E98EC208C for ; Tue, 19 Feb 2008 18:43:46 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id DC1202084 for ; Tue, 19 Feb 2008 18:43:46 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id C151B84463; Tue, 19 Feb 2008 18:43:46 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: arch@freebsd.org Date: Tue, 19 Feb 2008 18:43:46 +0100 Message-ID: <86odacc04t.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Subject: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 17:43:53 -0000 Four years ago, I created the dev.* sysctl tree for device drivers. Every time a device is registered, a sysctl context is automatically created, and a node is created under dev (e.g. dev.cpu.0), with some standardized nodes under it (%driver, %parent, %desc etc.) plus any node the driver - or even another driver - wants to add. However, not everything in Unix is a device. Specifically, network interfaces aren't. Some network interfaces are also devices, so they have a sysctl node in dev.*: % sysctl dev.msk dev.msk.0.%desc: Marvell Technology Group Ltd. Yukon EC Ultra Id 0xb4 Rev 0= x02 dev.msk.0.%driver: msk dev.msk.0.%parent: mskc0 Others don't: bridge, faith, lo, pflog, vlan etc. What I propose is to add a similar sysctl tree for interfaces. It would look a little different. For instance, some interfaces (bridge, vlan) have parents or children, but most don't. Just as it is for devices, creation and destruction of the interface's sysctl node and context would be hidden inside if_{attach,detach}() and completely transparent to the driver, and there will be an API that drivers can use if they want to add their own nodes. Since interfaces don't all have parents, the API will include a function to specify one for those that do. This is *not* intended to replace ifconfig; it is intended for infor- mation which isn't available through ifconfig and which it wouldn't be natural to place there. For instance, every wlan interface already has a sysctl tree under net.wlan. Drivers that already have sysctl nodes will require less code to create them, and no code at all to destroy them, since if_detach() will take care of that (all nodes in the interface's context are automatically destroyed when the context is destroyed). I'm unsure whether this should go under net.if, or just if. I think I prefer the latter. I'm open to objections and suggestions... DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Tue Feb 19 18:45:05 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56D3316A418 for ; Tue, 19 Feb 2008 18:45:05 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outT.internet-mail-service.net (outT.internet-mail-service.net [216.240.47.243]) by mx1.freebsd.org (Postfix) with ESMTP id 39E3913C4CE for ; Tue, 19 Feb 2008 18:45:05 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Tue, 19 Feb 2008 10:45:04 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 40E011272A2; Tue, 19 Feb 2008 10:45:04 -0800 (PST) Message-ID: <47BB23B7.9050007@elischer.org> Date: Tue, 19 Feb 2008 10:45:11 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= References: <86odacc04t.fsf@ds4.des.no> In-Reply-To: <86odacc04t.fsf@ds4.des.no> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 18:45:05 -0000 Dag-Erling Smørgrav wrote: > Four years ago, I created the dev.* sysctl tree for device drivers. > Every time a device is registered, a sysctl context is automatically > created, and a node is created under dev (e.g. dev.cpu.0), with some > standardized nodes under it (%driver, %parent, %desc etc.) plus any node > the driver - or even another driver - wants to add. > > However, not everything in Unix is a device. Specifically, network > interfaces aren't. > > Some network interfaces are also devices, so they have a sysctl node in > dev.*: > > % sysctl dev.msk > dev.msk.0.%desc: Marvell Technology Group Ltd. Yukon EC Ultra Id 0xb4 Rev 0x02 > dev.msk.0.%driver: msk > dev.msk.0.%parent: mskc0 > > Others don't: bridge, faith, lo, pflog, vlan etc. > > What I propose is to add a similar sysctl tree for interfaces. It would > look a little different. For instance, some interfaces (bridge, vlan) > have parents or children, but most don't. > > Just as it is for devices, creation and destruction of the interface's > sysctl node and context would be hidden inside if_{attach,detach}() and > completely transparent to the driver, and there will be an API that > drivers can use if they want to add their own nodes. > > Since interfaces don't all have parents, the API will include a function > to specify one for those that do. > > This is *not* intended to replace ifconfig; it is intended for infor- > mation which isn't available through ifconfig and which it wouldn't be > natural to place there. For instance, every wlan interface already has > a sysctl tree under net.wlan. > > Drivers that already have sysctl nodes will require less code to create > them, and no code at all to destroy them, since if_detach() will take > care of that (all nodes in the interface's context are automatically > destroyed when the context is destroyed). > > I'm unsure whether this should go under net.if, or just if. I think I > prefer the latter. > > I'm open to objections and suggestions... the usual things apply: a) If you do the work most people would go along. :-) b) being able to compile it without the bloat might be a good idea for embeded systems. > > DES From owner-freebsd-arch@FreeBSD.ORG Tue Feb 19 23:44:31 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A7DD16A420 for ; Tue, 19 Feb 2008 23:44:31 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168]) by mx1.freebsd.org (Postfix) with ESMTP id BC93C13C43E for ; Tue, 19 Feb 2008 23:44:30 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (yj3h0xtt0vmrot7u@localhost.funkthat.com [127.0.0.1]) by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id m1JNWIjP075119; Tue, 19 Feb 2008 15:32:18 -0800 (PST) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id m1JNWIQa075118; Tue, 19 Feb 2008 15:32:18 -0800 (PST) (envelope-from jmg) Date: Tue, 19 Feb 2008 15:32:17 -0800 From: John-Mark Gurney To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20080219233217.GS27248@funkthat.com> Mail-Followup-To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86odacc04t.fsf@ds4.des.no> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (hydrogen.funkthat.com [127.0.0.1]); Tue, 19 Feb 2008 15:32:19 -0800 (PST) Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 23:44:31 -0000 Dag-Erling Smrgrav wrote this message on Tue, Feb 19, 2008 at 18:43 +0100: > Four years ago, I created the dev.* sysctl tree for device drivers. > Every time a device is registered, a sysctl context is automatically > created, and a node is created under dev (e.g. dev.cpu.0), with some > standardized nodes under it (%driver, %parent, %desc etc.) plus any node > the driver - or even another driver - wants to add. > > However, not everything in Unix is a device. Specifically, network > interfaces aren't. [...] > I'm open to objections and suggestions... My concern is that slowly adding them for each interface type could create some conflicts in both naming and location... Are the interface sysctl nodes going to be the same/mirrored for hardware devices? Does dev.msk.0 get duplicated in the interface area? or does it have to decide to put ethernet interface related items in the if sysctl node, and other hardware related (hi/low water marks for DMA) in the seperate tree? How does someone know where to look if they are in different locations for the same device? We should probably create a newbus tree node off the nexus for psuedo devices that are not backed by hardware, and put all of these style devices under them... This will help enforce non-conflicting names, and limit the number of locations where sysctl can be located for devices... This would mean that ifnet would/should grow a device_t and can either get stored w/ one provided in the hardware case, or one get automaticly created if one isn't provided... This would enable all psuedo devices to have a single location, and you not have to search to remeber, oh, there's net.if, dev., tty.if, disk., or some other set of random psuedo devices... I'm all for making it easier for devices to export configuration information, I just want to ensure that it's easy to find and locate, since documentation usually comes last... (I still need to write a man page for my bktrau device driver. :) ) -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 09:54:50 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C8F316A403; Wed, 20 Feb 2008 09:54:50 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 513D513C45E; Wed, 20 Feb 2008 09:54:50 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1K9sHHk022858; Wed, 20 Feb 2008 04:54:19 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Tue, 19 Feb 2008 23:55:22 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Daniel Eischen In-Reply-To: <20080112194521.I957@desktop> Message-ID: <20080219234101.D920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 09:54:50 -0000 On Sat, 12 Jan 2008, Jeff Roberson wrote: > On Sat, 12 Jan 2008, Daniel Eischen wrote: > >> On Sat, 12 Jan 2008, Jeff Roberson wrote: >> >>> Now, there is one problem with the linux api that I want to discuss before >>> I commit it. The current patch always works on curthread. However, the >>> api allows for setting the binding of a pid. I believe, although I'm not >>> certain, that pids and tids in linux are in the same number space. It's >>> not clear to me whether you can set an affinity for an entire process and >>> have it effect an individual thread or whether you set it on a thread by >>> thread basis. When supplying a non-curproc pid do you bind all threads in >>> the target process? >>> >>> Are our tids and pids in the same number space? And are they available to >>> application programmers? I haven't followed that very carefully. >> >> I believe marcel made tids and pids disjoint so that any pid is >> never equal to any tid. But regardless, I don't think we want >> to rely on that. I would prefer the Solaris approach of specifying >> what we want (pid, tid, jail id, etc) as an argument in the API >> so there is no confusion. > > Yes, I would prefer that as well I believe. So I'll add an extra parameter > and in the linux code we'll use whatever their default is. Of course the > initial implementation will still only support curthread but I plan on > finishing the rest before 8.0 is done. So what does everyone think of something like this: int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); #define AFFINITY_GET 0x1 #define AFFINITY_SET 0x2 #define AFFINITY_PID 0x4 #define AFFINITY_TID 0x8 I'm not married to any of these names. If you know of something that would be more regular please comment. Behavior according to flags would be as such: Get or set affinity and fetch from or store into mask. Error if mask is not large enough. Fill with zeros if it's too large. If pid is specified on set all threads in the pid are set to the requested affinity. On get it doesn't make much sense but I guess I'll make it the union of all threads affinities. If tid is specified the mask applies only to the requested tid. The mask is always inherited from the creating thread and propagates on fork(). I have these semantics implemented and appearing to work in ULE. I can implement them in 4BSD but it will be very inefficient in some edge cases since each cpu doesn't have its own run queue. Binding and pinning are still both supported via the same kernel interfaces as they were. They are considered to override user specified affinity. This means the kernel can temporarily bind a thread to a cpu that it does not have affinity for. I may add an assert to verify that we never leave the kernel with binding still set so userspace sees only the cpus it requests. The thread's affinity is stored in a cpumask variable in the thread structure. If someone wanted to implement restricting a jail to a particular cpu they could add an affinity cmd that would walk all processes belonging to a jail and restrict their masks appropriately. You'd also want to check a jail mask on each call to affinity(). Linux sched_setaffinity() should be a subset of this functionality and thus easily support. Comments appreciated. This will go in late next week. Thanks, Jeff > > Jeff > >> >> -- >> DE >> > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 10:30:15 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92E4316A401; Wed, 20 Feb 2008 10:30:15 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 550E413C52A; Wed, 20 Feb 2008 10:30:15 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id AD5B146BE3; Wed, 20 Feb 2008 05:30:14 -0500 (EST) Date: Wed, 20 Feb 2008 10:30:14 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeff Roberson In-Reply-To: <20080219234101.D920@desktop> Message-ID: <20080220101348.D44565@fledge.watson.org> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@freebsd.org, Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 10:30:15 -0000 On Tue, 19 Feb 2008, Jeff Roberson wrote: >> Yes, I would prefer that as well I believe. So I'll add an extra parameter >> and in the linux code we'll use whatever their default is. Of course the >> initial implementation will still only support curthread but I plan on >> finishing the rest before 8.0 is done. > > So what does everyone think of something like this: > > int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); > > #define AFFINITY_GET 0x1 > #define AFFINITY_SET 0x2 > #define AFFINITY_PID 0x4 > #define AFFINITY_TID 0x8 > > I'm not married to any of these names. If you know of something that would > be more regular please comment. > > Behavior according to flags would be as such: > > Get or set affinity and fetch from or store into mask. Error if mask is not > large enough. Fill with zeros if it's too large. > > If pid is specified on set all threads in the pid are set to the requested > affinity. On get it doesn't make much sense but I guess I'll make it the > union of all threads affinities. > > If tid is specified the mask applies only to the requested tid. > > The mask is always inherited from the creating thread and propagates on > fork(). > > I have these semantics implemented and appearing to work in ULE. I can > implement them in 4BSD but it will be very inefficient in some edge cases > since each cpu doesn't have its own run queue. > > Binding and pinning are still both supported via the same kernel interfaces > as they were. They are considered to override user specified affinity. > This means the kernel can temporarily bind a thread to a cpu that it does > not have affinity for. I may add an assert to verify that we never leave > the kernel with binding still set so userspace sees only the cpus it > requests. > > The thread's affinity is stored in a cpumask variable in the thread > structure. If someone wanted to implement restricting a jail to a > particular cpu they could add an affinity cmd that would walk all processes > belonging to a jail and restrict their masks appropriately. You'd also want > to check a jail mask on each call to affinity(). > > Linux sched_setaffinity() should be a subset of this functionality and thus > easily support. > > Comments appreciated. This will go in late next week. A few thoughts: - It would be good to have an interface to request what CPUs are available to use, not just what CPUs are in use. - It would be useful to have a way to have an availability mask for what CPUs the thread/process is allowed to use. The former is simply useful for applications -- in using your previous patch, one immediate question you want to ask as an application programmer is "tell me what CPUs are available so I can figure out how to distribute work, how many threads to start, where to bind them, etc". The latter is useful for system administrators, who may want to say things like "Start apache with the following mask of CPUs, and let Apache determine its policy with respect to that bound as though the other CPUs don't exist". It could also be used to create a jail bound. So perhaps this means a slightly more complex API, but not much more complex. How about: int cpuaffinity_get(scope, id, length, mask) int cpuaffinity_getmax(scope, id, length, mask) int cpuaffinity_set(scope, id, length, mask) int cpuaffinity_setmax(scope, id, length, mask) Scope would be something on the order of process (representing individual processes or process groups, potentially), id would be the id in that scope namespace, length and mask would be as you propose. You could imagine adding a further field to indicate whether it's the current affinity or the maximum affinity, but I'm not sure the details matter all that much. Here might be some application logic, though: cpumask_t max; int cpu, i; (void)cpuaffinity_getmax(CMASK_PROC, getpid(), &max, sizeof(max)); for (i = 0; i < CMASK_CPUCOUNT(&max); i++) { cpu = CMASK_CPUINDEX(&max, i); /* Start a thread, bind it to 'cpu'. */ /* Or, migrate CPUs sequentially looking at data. */ } In the balance between all-doing system calls and multiple system calls, this also makes me a bit happier, and it's not an entirely aesthetic concern. Differentiating get and set methods is fairly useful for tracking down problems when debugging, or if doing things like masking process system calls for security reasons. There are two things I like from the other systems that I don't believe this captures well: (1) The solaris notion of CPU sets, so that policy can be expressed in terms of a global CPU set namespace administered by the system administrator. I.e., create a CPU set "Apache", then use a tool to modify the set at runtime. (2) The Darwin notion of defining CPU use policy rather than masks -- i.e., "I don't care what CPU it is, but run these threads on the same CPU", or "the same core", etc. I'm happy for us to move ahead with the lower level interface you've defined without addressing these concerns, but I think we should be keeping them in mind as well. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 10:52:26 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E3B116A401; Wed, 20 Feb 2008 10:52:26 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 10BE413C455; Wed, 20 Feb 2008 10:52:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1KAqKKp028343; Wed, 20 Feb 2008 05:52:23 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 20 Feb 2008 00:53:26 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Robert Watson In-Reply-To: <20080220101348.D44565@fledge.watson.org> Message-ID: <20080220005030.Y920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 10:52:26 -0000 On Wed, 20 Feb 2008, Robert Watson wrote: > > On Tue, 19 Feb 2008, Jeff Roberson wrote: > >>> Yes, I would prefer that as well I believe. So I'll add an extra >>> parameter and in the linux code we'll use whatever their default is. Of >>> course the initial implementation will still only support curthread but I >>> plan on finishing the rest before 8.0 is done. >> >> So what does everyone think of something like this: >> >> int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); >> >> #define AFFINITY_GET 0x1 >> #define AFFINITY_SET 0x2 >> #define AFFINITY_PID 0x4 >> #define AFFINITY_TID 0x8 >> >> I'm not married to any of these names. If you know of something that would >> be more regular please comment. >> >> Behavior according to flags would be as such: >> >> Get or set affinity and fetch from or store into mask. Error if mask is >> not large enough. Fill with zeros if it's too large. >> >> If pid is specified on set all threads in the pid are set to the requested >> affinity. On get it doesn't make much sense but I guess I'll make it the >> union of all threads affinities. >> >> If tid is specified the mask applies only to the requested tid. >> >> The mask is always inherited from the creating thread and propagates on >> fork(). >> >> I have these semantics implemented and appearing to work in ULE. I can >> implement them in 4BSD but it will be very inefficient in some edge cases >> since each cpu doesn't have its own run queue. >> >> Binding and pinning are still both supported via the same kernel interfaces >> as they were. They are considered to override user specified affinity. >> This means the kernel can temporarily bind a thread to a cpu that it does >> not have affinity for. I may add an assert to verify that we never leave >> the kernel with binding still set so userspace sees only the cpus it >> requests. >> >> The thread's affinity is stored in a cpumask variable in the thread >> structure. If someone wanted to implement restricting a jail to a >> particular cpu they could add an affinity cmd that would walk all processes >> belonging to a jail and restrict their masks appropriately. You'd also want >> to check a jail mask on each call to affinity(). >> >> Linux sched_setaffinity() should be a subset of this functionality and thus >> easily support. >> >> Comments appreciated. This will go in late next week. > > A few thoughts: > > - It would be good to have an interface to request what CPUs are available to > use, not just what CPUs are in use. > > - It would be useful to have a way to have an availability mask for what CPUs > the thread/process is allowed to use. > > The former is simply useful for applications -- in using your previous patch, > one immediate question you want to ask as an application programmer is "tell > me what CPUs are available so I can figure out how to distribute work, how > many threads to start, where to bind them, etc". The latter is useful for > system administrators, who may want to say things like "Start apache with the > following mask of CPUs, and let Apache determine its policy with respect to > that bound as though the other CPUs don't exist". It could also be used to > create a jail bound. > > So perhaps this means a slightly more complex API, but not much more complex. > How about: > > int cpuaffinity_get(scope, id, length, mask) > int cpuaffinity_getmax(scope, id, length, mask) > int cpuaffinity_set(scope, id, length, mask) > int cpuaffinity_setmax(scope, id, length, mask) > > Scope would be something on the order of process (representing individual > processes or process groups, potentially), id would be the id in that scope > namespace, length and mask would be as you propose. You could imagine adding > a further field to indicate whether it's the current affinity or the maximum > affinity, but I'm not sure the details matter all that much. Here might be > some application logic, though: Well I'm not sure about the max. How about just a cpuaffinity_get with a scope that specifies what cpus are available to you? If the set is restricted by a jail or some other mechanism it would be returned in avail. Otherwise all cpus would be returned. The thread probably wouldn't directly mainpulate its max, rather it would be set by changing the jail or cpu group it belonged to. Jeff > > cpumask_t max; > int cpu, i; > > (void)cpuaffinity_getmax(CMASK_PROC, getpid(), &max, sizeof(max)); > for (i = 0; i < CMASK_CPUCOUNT(&max); i++) { > cpu = CMASK_CPUINDEX(&max, i); > /* Start a thread, bind it to 'cpu'. */ > /* Or, migrate CPUs sequentially looking at data. */ > } > > In the balance between all-doing system calls and multiple system calls, this > also makes me a bit happier, and it's not an entirely aesthetic concern. > Differentiating get and set methods is fairly useful for tracking down > problems when debugging, or if doing things like masking process system calls > for security reasons. > > There are two things I like from the other systems that I don't believe this > captures well: > > (1) The solaris notion of CPU sets, so that policy can be expressed in terms > of a global CPU set namespace administered by the system administrator. > I.e., create a CPU set "Apache", then use a tool to modify the set at > runtime. > > (2) The Darwin notion of defining CPU use policy rather than masks -- i.e., > "I > don't care what CPU it is, but run these threads on the same CPU", or > "the > same core", etc. > > I'm happy for us to move ahead with the lower level interface you've defined > without addressing these concerns, but I think we should be keeping them in > mind as well. > > Robert N M Watson > Computer Laboratory > University of Cambridge > From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 11:10:02 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0E8E16A408; Wed, 20 Feb 2008 11:10:02 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7FCAE13C4F0; Wed, 20 Feb 2008 11:10:02 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id B8D8546BC2; Wed, 20 Feb 2008 06:10:01 -0500 (EST) Date: Wed, 20 Feb 2008 11:10:01 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeff Roberson In-Reply-To: <20080220005030.Y920@desktop> Message-ID: <20080220105333.G44565@fledge.watson.org> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 11:10:03 -0000 On Wed, 20 Feb 2008, Jeff Roberson wrote: >> So perhaps this means a slightly more complex API, but not much more >> complex. How about: >> >> int cpuaffinity_get(scope, id, length, mask) >> int cpuaffinity_getmax(scope, id, length, mask) >> int cpuaffinity_set(scope, id, length, mask) >> int cpuaffinity_setmax(scope, id, length, mask) >> >> Scope would be something on the order of process (representing individual >> processes or process groups, potentially), id would be the id in that scope >> namespace, length and mask would be as you propose. You could imagine >> adding a further field to indicate whether it's the current affinity or the >> maximum affinity, but I'm not sure the details matter all that much. Here >> might be some application logic, though: > > Well I'm not sure about the max. How about just a cpuaffinity_get with a > scope that specifies what cpus are available to you? If the set is > restricted by a jail or some other mechanism it would be returned in avail. > Otherwise all cpus would be returned. The thread probably wouldn't directly > mainpulate its max, rather it would be set by changing the jail or cpu group > it belonged to. I think the actual details don't matter too much as long as we can express what we need to, so I'm ok with a special scope for that. You do raise an interesting point, though, on the nature of scope: presumably we'd like to be able to query a few different "maximums" in the interest of having a maximally debuggable system -- be it the hardware limit, the administrative limit, etc. So perhaps the scopes are something more like: #define CPUAFF_SCOPE_HARDWARE 1 /* Hardware limits */ #define CPUAFF_SCOPE_SYSTEM 2 /* System usage mask */ #define CPUAFF_SCOPE_PROCESS 3 /* Processes and process groups */ #define CPUAFF_SCOPE_THREAD 4 /* Threads in process */ CPUAFF_SCOPE_HARDWARE would be what the kernel has probed at the physical layer. This would be get-only, and available simply to provide a consistent interface when dealing with CPU masks. CPUAFF_SCOPE_SYSTEM would be what are exposed by the kernel for use by the application. This would be get-only without privilege, and presumably not exceed CPUAFF_SCOPE_HARDWARE. This would be the practical upper bound on what a process could use, and would likely be how we implement Jail scoping of CPUs. This might or might not be how one tries to discourage userspace use of a particular CPU. CPUAFF_SCOPE_PROCESS would be the process affinity for the process as a whole. It would be get-set without privilege, but limited to CPUAFF_SCOPE_SYSTEM. I'm not sure we want that limitation to be something overridden with privilege, but I guess we can think about that. CPUAFF_SCOPE_THREAD would be the thread affinity for an individual thread. It would be get-set without privilege. Do we limit it to CPUAFF_SCOPE_SYSTEM or CPUAFF_SCOPE_PROCESS? Since this involves some access control now, a pondering on access control: When a process sets the affinity of another thread or process, is it limited to the system scope of the current process, or of the target process? This may be a practical question if we're talking about how to deal with a process outside of a jail setting the affinity of a process inside jail. It's easy to implement the right thing in the kernel, but does that then imply that the system scope query should take a process ID so that a user tool can figure out what valid choices are? :-) I think it's useful to play out a few of these scenarios a bit and see what works; we won't be locked into any particular model until it hits a -STABLE branch, but getting the underlying primitive right will make it a lot easier to implement some of the more mature services we have in mind, making them more likely to happen. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 11:18:39 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C0DB16A401 for ; Wed, 20 Feb 2008 11:18:39 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id CFC5713C4F5 for ; Wed, 20 Feb 2008 11:18:38 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 61FE646B06; Wed, 20 Feb 2008 06:18:38 -0500 (EST) Date: Wed, 20 Feb 2008 11:18:38 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: John-Mark Gurney In-Reply-To: <20080219233217.GS27248@funkthat.com> Message-ID: <20080220111157.H44565@fledge.watson.org> References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 11:18:39 -0000 On Tue, 19 Feb 2008, John-Mark Gurney wrote: > Dag-Erling Smrgrav wrote this message on Tue, Feb 19, 2008 at 18:43 +0100: >> Four years ago, I created the dev.* sysctl tree for device drivers. Every >> time a device is registered, a sysctl context is automatically created, and >> a node is created under dev (e.g. dev.cpu.0), with some standardized nodes >> under it (%driver, %parent, %desc etc.) plus any node the driver - or even >> another driver - wants to add. >> >> However, not everything in Unix is a device. Specifically, network >> interfaces aren't. > > [...] > >> I'm open to objections and suggestions... > > My concern is that slowly adding them for each interface type could create > some conflicts in both naming and location... We also support interface renaming... Does newbus mind if you rename the devices in its tree? > Are the interface sysctl nodes going to be the same/mirrored for hardware > devices? Does dev.msk.0 get duplicated in the interface area? or does it > have to decide to put ethernet interface related items in the if sysctl > node, and other hardware related (hi/low water marks for DMA) in the > seperate tree? How does someone know where to look if they are in different > locations for the same device? > > We should probably create a newbus tree node off the nexus for psuedo > devices that are not backed by hardware, and put all of these style devices > under them... This will help enforce non-conflicting names, and limit the > number of locations where sysctl can be located for devices... This would > mean that ifnet would/should grow a device_t and can either get stored w/ > one provided in the hardware case, or one get automaticly created if one > isn't provided... This would enable all psuedo devices to have a single > location, and you not have to search to remeber, oh, there's net.if, dev., > tty.if, disk., or some other set of random psuedo devices... > > I'm all for making it easier for devices to export configuration > information, I just want to ensure that it's easy to find and locate, since > documentation usually comes last... (I still need to write a man page for > my bktrau device driver. :) ) I'm not sure how I feel about creating newbus device trees for all network interfaces. I like the idea of a unified bus topology but wonder about the constraints -- among other things, we have no Giant requirement for network stack interface allocation. Perhaps the problem is that I feel uncomfortable with the assumption that creating a 1:1 mapping between hardware device nodes and logical interface nodes is the right thing to do. And, about interface renaming: right now, the newbus nodes for the physical device have a constant name, we just change the administrative name of the interface used in the network stack. I don't think we want the hardware-related nodes to be renamed, but under what situations is a MIB entry going to be associated with the stack name, and under what situations the hardware name? Perhaps we should have an entirely seperate if.* subtree in order to keep the two notions distinct. Another thought: historically, things like link layer administration, etc, have used the stack name for an interface and stack management tools -- that is, ioctl and the interface identifier. While I'm not a big fan of ioctl, this has been a relatively consistent approach for dealing with administering everything but global protocol settings (which sometimes go via sysctl). I'm not sure I'd like to see that change on the basis that, while possibly not entirely better, at least it is consistent. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 12:35:15 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8921016A400 for ; Wed, 20 Feb 2008 12:35:15 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 44B5213C4DB for ; Wed, 20 Feb 2008 12:35:15 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 13AAC208E; Wed, 20 Feb 2008 13:35:12 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 8E2F6208C; Wed, 20 Feb 2008 13:35:11 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 7B91984488; Wed, 20 Feb 2008 13:35:11 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Julian Elischer References: <86odacc04t.fsf@ds4.des.no> <47BB23B7.9050007@elischer.org> Date: Wed, 20 Feb 2008 13:35:11 +0100 In-Reply-To: <47BB23B7.9050007@elischer.org> (Julian Elischer's message of "Tue\, 19 Feb 2008 10\:45\:11 -0800") Message-ID: <86wsozyfeo.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 12:35:15 -0000 Julian Elischer writes: > a) If you do the work most people would go along. :-) But of course :) > b) being able to compile it without the bloat might be a good idea for > embeded systems. !bloat; it would actually reduce the amount of code, due to increased centralization. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 12:42:49 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBD0E16A406 for ; Wed, 20 Feb 2008 12:42:49 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 891DE13C45A for ; Wed, 20 Feb 2008 12:42:49 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 50D512084 for ; Wed, 20 Feb 2008 13:42:38 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 3775F207F for ; Wed, 20 Feb 2008 13:42:38 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 0A9A284488; Wed, 20 Feb 2008 13:42:38 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> Date: Wed, 20 Feb 2008 13:42:37 +0100 In-Reply-To: <20080219233217.GS27248@funkthat.com> (John-Mark Gurney's message of "Tue\, 19 Feb 2008 15\:32\:17 -0800") Message-ID: <86skznyf2a.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 12:42:49 -0000 John-Mark Gurney writes: > My concern is that slowly adding them for each interface type could > create some conflicts in both naming and location... each interface type? this would be done in if_attach() / if_detach(), everything is taken care of centrally for all interfaces. > Are the interface sysctl nodes going to be the same/mirrored for hardware > devices? Does dev.msk.0 get duplicated in the interface area? No, there's if.* for interface stuff and dev.* for hardware stuff. Some nodes might move from one tree to the other, but I suspect that most won't. Like I said, some interfaces already do this "manually" under net.*. > or does it > have to decide to put ethernet interface related items in the if sysctl > node, and other hardware related (hi/low water marks for DMA) in the > seperate tree? How does someone know where to look if they are in > different locations for the same device? Not all interfaces are devices. That is the whole point... > We should probably create a newbus tree node off the nexus for psuedo > devices that are not backed by hardware, and put all of these style > devices under them... Uh, no. Devices that aren't backed by hardware are still devices, they still have a device_t, and they still have dev.* nodes; nexus is not backed by hardware, for instance, it's just a convenient top-level device that serves as parent for all other devices. Basically, there is a dev.* node for every device_t in the system. I want to have an if.* node for every struct ifnet. > This will help enforce non-conflicting names, I don't see why you're so hung up on conflicting names. It's a non- issue. Every interface has a unique name. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 13:06:57 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A602816A404; Wed, 20 Feb 2008 13:06:57 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 43DA013C469; Wed, 20 Feb 2008 13:06:56 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m1KD6gIA021276; Wed, 20 Feb 2008 08:06:43 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Wed, 20 Feb 2008 08:06:43 -0500 (EST) Date: Wed, 20 Feb 2008 08:06:43 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Jeff Roberson In-Reply-To: <20080219234101.D920@desktop> Message-ID: References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 13:06:57 -0000 On Tue, 19 Feb 2008, Jeff Roberson wrote: > On Sat, 12 Jan 2008, Jeff Roberson wrote: > >> On Sat, 12 Jan 2008, Daniel Eischen wrote: >> >>> On Sat, 12 Jan 2008, Jeff Roberson wrote: >>> >>>> Now, there is one problem with the linux api that I want to discuss >>>> before I commit it. The current patch always works on curthread. >>>> However, the api allows for setting the binding of a pid. I believe, >>>> although I'm not certain, that pids and tids in linux are in the same >>>> number space. It's not clear to me whether you can set an affinity for >>>> an entire process and have it effect an individual thread or whether you >>>> set it on a thread by thread basis. When supplying a non-curproc pid do >>>> you bind all threads in the target process? >>>> >>>> Are our tids and pids in the same number space? And are they available >>>> to application programmers? I haven't followed that very carefully. >>> >>> I believe marcel made tids and pids disjoint so that any pid is >>> never equal to any tid. But regardless, I don't think we want >>> to rely on that. I would prefer the Solaris approach of specifying >>> what we want (pid, tid, jail id, etc) as an argument in the API >>> so there is no confusion. >> >> Yes, I would prefer that as well I believe. So I'll add an extra parameter >> and in the linux code we'll use whatever their default is. Of course the >> initial implementation will still only support curthread but I plan on >> finishing the rest before 8.0 is done. > > So what does everyone think of something like this: > > int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); > > #define AFFINITY_GET 0x1 > #define AFFINITY_SET 0x2 > #define AFFINITY_PID 0x4 > #define AFFINITY_TID 0x8 > > I'm not married to any of these names. If you know of something that would > be more regular please comment. I take it 'cmd' is either AFFINITY_GET or AFFINITY_SET, and which is AFFINITY_PID or AFFINITY_TID. Is there a reason why, for 2 different arguments to cpuaffinity(), the flags are disjoint? It almost seems like you wanted: int cpuaffinity(int flags, int masksize, unsigned *mask) I prefer the API you specified, keeping 'cmd' and 'which' as separate arguments. Is masksize in bytes or in units of unsigned? Do we need helper functions/macros for the mask? Like sigemptyset, sigaddset, etc? -- DE From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 13:58:19 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C82B716A405 for ; Wed, 20 Feb 2008 13:58:19 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id A569F13C467 for ; Wed, 20 Feb 2008 13:58:18 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m1KB0GCI002989 for ; Wed, 20 Feb 2008 22:00:16 +1100 Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m1KAxwOs005659 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 20 Feb 2008 21:59:59 +1100 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m1KAxw9v052497; Wed, 20 Feb 2008 21:59:58 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m1KAxwZY052496; Wed, 20 Feb 2008 21:59:58 +1100 (EST) (envelope-from peter) Date: Wed, 20 Feb 2008 21:59:58 +1100 From: Peter Jeremy To: Jeff Roberson Message-ID: <20080220105958.GO51095@server.vk2pj.dyndns.org> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yEPQxsgoJgBvi8ip" Content-Disposition: inline In-Reply-To: <20080219234101.D920@desktop> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 13:58:19 -0000 --yEPQxsgoJgBvi8ip Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 19, 2008 at 11:55:22PM -1000, Jeff Roberson wrote: >So what does everyone think of something like this: > >int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); > >#define AFFINITY_GET 0x1 >#define AFFINITY_SET 0x2 >#define AFFINITY_PID 0x4 >#define AFFINITY_TID 0x8 > >I'm not married to any of these names. If you know of something that woul= d=20 >be more regular please comment. It's probably not immediately relevant (because I don't have suitable code and doubt you do either) but how would this extend to: - a process wanting to set thread affinity to the h/w theads associated with a single core (which includes the issue of identifying which logical CPUs are linked with which physical cores) - a process wanting to take advantage of a system's NUMA topology to optimise thread affinities. - creating sets of logical CPUs and assigning sets of processes/threads to them. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --yEPQxsgoJgBvi8ip Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHvAgu/opHv/APuIcRAm+fAJ4iTJR8HEKsroSBiyo+4hi1dVU6hACcDZWS l1+Pyc2BmKaa/znC/Bs//Zg= =coKN -----END PGP SIGNATURE----- --yEPQxsgoJgBvi8ip-- From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 20:00:42 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B09B16A401 for ; Wed, 20 Feb 2008 20:00:42 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outE.internet-mail-service.net (outE.internet-mail-service.net [216.240.47.228]) by mx1.freebsd.org (Postfix) with ESMTP id 2CC8913C44B for ; Wed, 20 Feb 2008 20:00:41 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 20 Feb 2008 12:00:41 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id C97161272C8; Wed, 20 Feb 2008 12:00:40 -0800 (PST) Message-ID: <47BC86F1.8040108@elischer.org> Date: Wed, 20 Feb 2008 12:00:49 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= References: <86odacc04t.fsf@ds4.des.no> <47BB23B7.9050007@elischer.org> <86wsozyfeo.fsf@ds4.des.no> In-Reply-To: <86wsozyfeo.fsf@ds4.des.no> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 20:00:42 -0000 Dag-Erling Smørgrav wrote: > Julian Elischer writes: >> a) If you do the work most people would go along. :-) > > But of course :) > >> b) being able to compile it without the bloat might be a good idea for >> embeded systems. > > !bloat; it would actually reduce the amount of code, due to increased > centralization. If that is the case then I see no real objection. > > DES From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 20:47:51 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 12CDB16A402; Wed, 20 Feb 2008 20:47:51 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id C082E13C478; Wed, 20 Feb 2008 20:47:50 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id E72C42085; Wed, 20 Feb 2008 21:47:42 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id CBDEA207E; Wed, 20 Feb 2008 21:47:42 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id A03BE8448A; Wed, 20 Feb 2008 21:47:42 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Robert Watson References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <20080220111157.H44565@fledge.watson.org> Date: Wed, 20 Feb 2008 21:47:42 +0100 In-Reply-To: <20080220111157.H44565@fledge.watson.org> (Robert Watson's message of "Wed\, 20 Feb 2008 11\:18\:38 +0000 \(GMT\)") Message-ID: <86ablvuzgx.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, John-Mark Gurney Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 20:47:51 -0000 Robert Watson writes: > We also support interface renaming... Does newbus mind if you rename > the devices in its tree? I'm not sure whether you're replying to my proposal or to Julian's interpretation / extrapolation of it... but I have no intention of hooking interfaces into newbus. I just want a sysctl tree for struct ifnet like we have a sysctl tree for device_t, to access interface parameters which are not easily accessible through ifconfig. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 22:32:27 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 495FE16A401; Wed, 20 Feb 2008 22:32:27 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id A42EE13C4DB; Wed, 20 Feb 2008 22:32:26 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1KMWLAV023049; Wed, 20 Feb 2008 17:32:24 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 20 Feb 2008 12:33:29 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Daniel Eischen In-Reply-To: Message-ID: <20080220123209.V920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 22:32:27 -0000 On Wed, 20 Feb 2008, Daniel Eischen wrote: > On Tue, 19 Feb 2008, Jeff Roberson wrote: > >> On Sat, 12 Jan 2008, Jeff Roberson wrote: >> >>> On Sat, 12 Jan 2008, Daniel Eischen wrote: >>> >>>> On Sat, 12 Jan 2008, Jeff Roberson wrote: >>>> >>>>> Now, there is one problem with the linux api that I want to discuss >>>>> before I commit it. The current patch always works on curthread. >>>>> However, the api allows for setting the binding of a pid. I believe, >>>>> although I'm not certain, that pids and tids in linux are in the same >>>>> number space. It's not clear to me whether you can set an affinity for >>>>> an entire process and have it effect an individual thread or whether you >>>>> set it on a thread by thread basis. When supplying a non-curproc pid do >>>>> you bind all threads in the target process? >>>>> >>>>> Are our tids and pids in the same number space? And are they available >>>>> to application programmers? I haven't followed that very carefully. >>>> >>>> I believe marcel made tids and pids disjoint so that any pid is >>>> never equal to any tid. But regardless, I don't think we want >>>> to rely on that. I would prefer the Solaris approach of specifying >>>> what we want (pid, tid, jail id, etc) as an argument in the API >>>> so there is no confusion. >>> >>> Yes, I would prefer that as well I believe. So I'll add an extra >>> parameter and in the linux code we'll use whatever their default is. Of >>> course the initial implementation will still only support curthread but I >>> plan on finishing the rest before 8.0 is done. >> >> So what does everyone think of something like this: >> >> int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); >> >> #define AFFINITY_GET 0x1 >> #define AFFINITY_SET 0x2 >> #define AFFINITY_PID 0x4 >> #define AFFINITY_TID 0x8 >> >> I'm not married to any of these names. If you know of something that would >> be more regular please comment. > > I take it 'cmd' is either AFFINITY_GET or AFFINITY_SET, and which > is AFFINITY_PID or AFFINITY_TID. Is there a reason why, for 2 different > arguments to cpuaffinity(), the flags are disjoint? It almost seems > like you wanted: > > int cpuaffinity(int flags, int masksize, unsigned *mask) > > I prefer the API you specified, keeping 'cmd' and 'which' as > separate arguments. > Yes, I'll either do that or have seperate get/set syscalls. > Is masksize in bytes or in units of unsigned? Do we need helper > functions/macros for the mask? Like sigemptyset, sigaddset, etc? I have macros copied from FD_SET, as CPU_SET, CPU_ISSET, etc. This will go in sys/sched.h. > > -- > DE > From owner-freebsd-arch@FreeBSD.ORG Wed Feb 20 23:46:20 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04CFF16A401 for ; Wed, 20 Feb 2008 23:46:20 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168]) by mx1.freebsd.org (Postfix) with ESMTP id C6E7013C459 for ; Wed, 20 Feb 2008 23:46:19 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (tvx8ldvyypfucyz0@localhost.funkthat.com [127.0.0.1]) by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id m1KNkBe9098259; Wed, 20 Feb 2008 15:46:11 -0800 (PST) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id m1KNkA5J098258; Wed, 20 Feb 2008 15:46:10 -0800 (PST) (envelope-from jmg) Date: Wed, 20 Feb 2008 15:46:09 -0800 From: John-Mark Gurney To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20080220234609.GB96595@funkthat.com> Mail-Followup-To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <86skznyf2a.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86skznyf2a.fsf@ds4.des.no> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (hydrogen.funkthat.com [127.0.0.1]); Wed, 20 Feb 2008 15:46:11 -0800 (PST) Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 23:46:20 -0000 Dag-Erling Smrgrav wrote this message on Wed, Feb 20, 2008 at 13:42 +0100: > John-Mark Gurney writes: > > My concern is that slowly adding them for each interface type could > > create some conflicts in both naming and location... > > each interface type? this would be done in if_attach() / if_detach(), > everything is taken care of centrally for all interfaces. I'm talking about any type of psuedo device. Virtual web cams from the network, tty, disks, etc. Any kernel interface that has a logical view to the user that is not backed by hardware.... > > Are the interface sysctl nodes going to be the same/mirrored for hardware > > devices? Does dev.msk.0 get duplicated in the interface area? > > No, there's if.* for interface stuff and dev.* for hardware stuff. Some > nodes might move from one tree to the other, but I suspect that most > won't. Like I said, some interfaces already do this "manually" under > net.*. And now new hardware devices could start using one or the other causing confusion in the future.. we should not allow that... I know this isn't python, but python has some good guiding principals and one of them is: There should be one-- and preferably only one --obvious way to do it. this adds two obvious ways to handle network interface backed by hardware sysctl data.. > > or does it > > have to decide to put ethernet interface related items in the if sysctl > > node, and other hardware related (hi/low water marks for DMA) in the > > seperate tree? How does someone know where to look if they are in > > different locations for the same device? > > Not all interfaces are devices. That is the whole point... I'm confused about this, since below you say they are. > > We should probably create a newbus tree node off the nexus for psuedo > > devices that are not backed by hardware, and put all of these style > > devices under them... > > Uh, no. > > Devices that aren't backed by hardware are still devices, they still > have a device_t, and they still have dev.* nodes; nexus is not backed by Oh, they do? I don't see a dev.lo0 tree, or are we now talking about other devices? > hardware, for instance, it's just a convenient top-level device that > serves as parent for all other devices. But I don't see the device for lo0 under nexus... You just said: "Devices that aren't backed by hardware are still devices" and: "parent for all other devices" other devices than nexus? or other devices that are real? > Basically, there is a dev.* node for every device_t in the system. exactly, and I want a device_t for all devices (including psuedo interfaces, etc.) in the system... > I want to have an if.* node for every struct ifnet. If all devices have a device_t and a dev.* node, then this isn't necessary, my point... > > This will help enforce non-conflicting names, > > I don't see why you're so hung up on conflicting names. It's a non- > issue. Every interface has a unique name. I'm not hung up on conflicting names, I was pointing out a side benifit of this proposal. Where else have a talked about conflicting names before? I'm simply saying that if all network interfaces, psuedo or not had a device_t, then they would have a dev.* entry, and we would not need to make ifnet have it's own sysctl tree. Robert does point out that we don't require locks to add an interface, but there are things call taskqueues that we could make use of to add/create device_t's w/ Giant w/o having to add it to the if_attach function. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 00:26:37 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1023A16A406 for ; Thu, 21 Feb 2008 00:26:37 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id C44D313C459 for ; Thu, 21 Feb 2008 00:26:36 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 40C472085 for ; Thu, 21 Feb 2008 01:26:33 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 328B32049 for ; Thu, 21 Feb 2008 01:26:33 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 1CC4A84488; Thu, 21 Feb 2008 01:26:33 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <86skznyf2a.fsf@ds4.des.no> <20080220234609.GB96595@funkthat.com> Date: Thu, 21 Feb 2008 01:26:32 +0100 In-Reply-To: <20080220234609.GB96595@funkthat.com> (John-Mark Gurney's message of "Wed\, 20 Feb 2008 15\:46\:09 -0800") Message-ID: <864pc3w3wn.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 00:26:37 -0000 John-Mark Gurney writes: > Dag-Erling Smrgrav writes: > > John-Mark Gurney writes: > > > My concern is that slowly adding them for each interface type could > > > create some conflicts in both naming and location... > > each interface type? this would be done in if_attach() / if_detach(), > > everything is taken care of centrally for all interfaces. > I'm talking about any type of psuedo device. Virtual web cams from > the network, tty, disks, etc. Any kernel interface that has a logical > view to the user that is not backed by hardware.... Every disk has a device_t; ttys have some sort of driver attachment since they have device nodes; I have no idea what you mean about a "virtual web cam". > And now new hardware devices could start using one or the other causing > confusion in the future.. we should not allow that... I know this isn't > python, but python has some good guiding principals and one of them is: > There should be one-- and preferably only one --obvious way to do it. This is why Python is unusable for any real work. > this adds two obvious ways to handle network interface backed by > hardware sysctl data.. No, it simply allows the driver writer to make a mistake which can be caught during code review. > > Not all interfaces are devices. That is the whole point... > I'm confused about this, since below you say they are. No, they're not. Some are (if_fxp), some aren't (if_loop). That is the entire point. If they were, we wouldn't be having this conversation. > > Devices that aren't backed by hardware are still devices, they still > > have a device_t, and they still have dev.* nodes; nexus is not backed by > Oh, they do? I don't see a dev.lo0 tree, or are we now talking about > other devices? lo0 is not a device. It is a network interface. > exactly, and I want a device_t for all devices (including psuedo > interfaces, etc.) in the system... That is your cross to bear, not mine... DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 00:44:12 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DEA7516A408; Thu, 21 Feb 2008 00:44:12 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 9CCCD13C4CC; Thu, 21 Feb 2008 00:44:12 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 6A7022087; Thu, 21 Feb 2008 01:44:09 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 5BEBB2049; Thu, 21 Feb 2008 01:44:09 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 3C23C8448A; Thu, 21 Feb 2008 01:44:09 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Robert Watson References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <20080220111157.H44565@fledge.watson.org> <86ablvuzgx.fsf@ds4.des.no> Date: Thu, 21 Feb 2008 01:44:09 +0100 In-Reply-To: <86ablvuzgx.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Wed\, 20 Feb 2008 21\:47\:42 +0100") Message-ID: <86zltvuoiu.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, John-Mark Gurney Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 00:44:13 -0000 Dag-Erling Sm=C3=B8rgrav writes: > Robert Watson writes: > > We also support interface renaming... Does newbus mind if you rename > > the devices in its tree? > I'm not sure whether you're replying to my proposal or to Julian's s/Julian/John-Mark/ DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 03:27:15 2008 Return-Path: Delivered-To: arch@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D492B16A402; Thu, 21 Feb 2008 03:27:15 +0000 (UTC) (envelope-from davidxu@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C1DAC13C455; Thu, 21 Feb 2008 03:27:15 +0000 (UTC) (envelope-from davidxu@FreeBSD.org) Received: from apple.my.domain (root@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m1L3RAZK095464; Thu, 21 Feb 2008 03:27:12 GMT (envelope-from davidxu@freebsd.org) Message-ID: <47BCEFDB.5040207@freebsd.org> Date: Thu, 21 Feb 2008 11:28:27 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.9 (X11/20071211) MIME-Version: 1.0 To: Robert Watson References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> In-Reply-To: <20080220105333.G44565@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Daniel Eischen , Andrew Gallatin , arch@FreeBSD.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 03:27:15 -0000 Robert Watson wrote: > > On Wed, 20 Feb 2008, Jeff Roberson wrote: > >>> So perhaps this means a slightly more complex API, but not much more >>> complex. How about: >>> >>> int cpuaffinity_get(scope, id, length, mask) >>> int cpuaffinity_getmax(scope, id, length, mask) >>> int cpuaffinity_set(scope, id, length, mask) >>> int cpuaffinity_setmax(scope, id, length, mask) >>> Are these features only for jail or something else which don't care CPU L2 cache sharing between cores ? since program still can not figure out L2 sharing information, no way to optimize its thread's cpu arrangement. Regards, David Xu From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 03:57:31 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55A2316A400; Thu, 21 Feb 2008 03:57:31 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 1A73B13C442; Thu, 21 Feb 2008 03:57:31 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1L3v19M091101; Wed, 20 Feb 2008 22:57:03 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 20 Feb 2008 17:58:11 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: David Xu In-Reply-To: <47BCEFDB.5040207@freebsd.org> Message-ID: <20080220175532.Q920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, Robert Watson , Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 03:57:31 -0000 On Thu, 21 Feb 2008, David Xu wrote: > Robert Watson wrote: >> >> On Wed, 20 Feb 2008, Jeff Roberson wrote: >> >>>> So perhaps this means a slightly more complex API, but not much more >>>> complex. How about: >>>> >>>> int cpuaffinity_get(scope, id, length, mask) >>>> int cpuaffinity_getmax(scope, id, length, mask) >>>> int cpuaffinity_set(scope, id, length, mask) >>>> int cpuaffinity_setmax(scope, id, length, mask) >>>> > > Are these features only for jail or something else which don't care CPU > L2 cache sharing between cores ? since program still can not figure out > L2 sharing information, no way to optimize its thread's cpu arrangement. These are all for binding to specific cpu sets. Potentially with some support for creating something like solaris psets. This is when you have some static information about what processors to use. I think a cache aware solution will be the next process. I have a patch which allows the kernel scheduler to understand the cache hierarchies in the system. This information could be exported to userland if that would be useful. > > Regards, > David Xu > From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 07:24:17 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10FCF16A400 for ; Thu, 21 Feb 2008 07:24:17 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168]) by mx1.freebsd.org (Postfix) with ESMTP id E7C7213C45E for ; Thu, 21 Feb 2008 07:24:16 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (cafx6hacy2vybf0x@localhost.funkthat.com [127.0.0.1]) by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id m1L7OBEV004912; Wed, 20 Feb 2008 23:24:11 -0800 (PST) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id m1L7OA54004911; Wed, 20 Feb 2008 23:24:10 -0800 (PST) (envelope-from jmg) Date: Wed, 20 Feb 2008 23:24:10 -0800 From: John-Mark Gurney To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20080221072410.GC96595@funkthat.com> Mail-Followup-To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86odacc04t.fsf@ds4.des.no> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (hydrogen.funkthat.com [127.0.0.1]); Wed, 20 Feb 2008 23:24:11 -0800 (PST) Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 07:24:17 -0000 Dag-Erling Smrgrav wrote this message on Tue, Feb 19, 2008 at 18:43 +0100: > I'm open to objections and suggestions... I object. I have stated my case. If you don't want to listen to it, that's fine. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 07:45:19 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEABF16A405; Thu, 21 Feb 2008 07:45:19 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 975B013C442; Thu, 21 Feb 2008 07:45:19 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1L7iYiF013941; Thu, 21 Feb 2008 02:44:35 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 20 Feb 2008 21:45:44 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: David Xu , Daniel Eischen , Robert Watson , Andrew Gallatin In-Reply-To: <20080220175532.Q920@desktop> Message-ID: <20080220213253.A920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 07:45:20 -0000 I have the following api working: /* * Parameters for the level argument to getaffinity. */ #define CPU_LEVEL_SYS 1 /* All system cpus. */ #define CPU_LEVEL_AVAIL 2 /* Available cpus for which. */ #define CPU_LEVEL_WHICH 3 /* Actual mask for which. */ /* * Parameters for the which argument to {get,set}affinity. */ #define CPU_WHICH_TID 1 /* Specifies a thread id. */ #define CPU_WHICH_PID 2 /* Specifies a process id. */ #define CPU_WHICH_SET 3 /* Specifies a set id. */ Along with a CPU_CLR, CPU_COPY, CPU_ISSET, CPU_SET, CPU_ZERO for manipulating the sets. int getaffinity(int level, int which, int id, int cpusetsize, long *mask); int setaffinity(int which, int id, int cpusetsize, long *mask); The get call has a notion of 'level' which allows us to fetch different masks. The system set is all processors in the system. The available set is the set of cpus available to the tid/pid in the 'which' argument. An application would fetch the avail set and then potentially reduce it. The setaffinity call doesn't have a level because the avail/sys sets are immutable. You can only set things which can be specified by the which argument. I also have a 'cpuset' command which can run a new program with a given cpu set, view and modify sets of arbitrary pids. This is all working and I can supply patches if anyone is interested. I have to implement 4BSD support before I can commit. I have a proposal for solaris style processor sets which I think is simple and sufficient for most cases. It involves the following new syscalls: int cpuset(void); int setcpuset(pid_t pid, int setid); int getcpuset(pid_t pid); The notion would be that you can create a new numbered cpuset with cpuset(). You can modify or inspect its affinity with get/setaffinity above and the CPU_WHICH_SET argument. The cpuset exists as long as there are members of the set. Sort of like a process group or session. The {get,set}cpuset calls can inspect or modify the state. This set would not be modifiable by user processes or by processes in a jail. It would create the restriction that differs between 'avail' and 'sys' above. Processors would be able to directly bind to any processor within the set. Changing the set would apply to all processes in the set. The cpuset would be per-process while the mask is per-thread. Sets involvement is inherited on fork(). In solaris sets can be named and have a more complete management api. I'm not really interested in implementing all of that but I believe what I have outlined here would be subset of this and no code/syscalls would be wasted. Comments? Objections? I'm fairly pleased with this arrangement now. Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 09:27:43 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3883D16A406; Thu, 21 Feb 2008 09:27:43 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8AE6B13C4D3; Thu, 21 Feb 2008 09:27:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id B91FC46B93; Thu, 21 Feb 2008 04:27:41 -0500 (EST) Date: Thu, 21 Feb 2008 09:27:41 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeff Roberson In-Reply-To: <20080220213253.A920@desktop> Message-ID: <20080221092011.J52922@fledge.watson.org> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@freebsd.org, David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 09:27:43 -0000 On Wed, 20 Feb 2008, Jeff Roberson wrote: > I also have a 'cpuset' command which can run a new program with a given cpu > set, view and modify sets of arbitrary pids. This is all working and I can > supply patches if anyone is interested. I have to implement 4BSD support > before I can commit. > > I have a proposal for solaris style processor sets which I think is simple > and sufficient for most cases. It involves the following new syscalls: > > int cpuset(void); int setcpuset(pid_t pid, int setid); int getcpuset(pid_t > pid); > > The notion would be that you can create a new numbered cpuset with cpuset(). > You can modify or inspect its affinity with get/setaffinity above and the > CPU_WHICH_SET argument. The cpuset exists as long as there are members of > the set. Sort of like a process group or session. The {get,set}cpuset > calls can inspect or modify the state. > > This set would not be modifiable by user processes or by processes in a > jail. It would create the restriction that differs between 'avail' and 'sys' > above. Processors would be able to directly bind to any processor within the > set. Changing the set would apply to all processes in the set. The cpuset > would be per-process while the mask is per-thread. Sets involvement is > inherited on fork(). > > In solaris sets can be named and have a more complete management api. I'm > not really interested in implementing all of that but I believe what I have > outlined here would be subset of this and no code/syscalls would be wasted. > > Comments? Objections? I'm fairly pleased with this arrangement now. Just to put a few notes from our conversation on IRC in e-mail: - I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t, cpuset_t *) so that we don't mix up ID's and return values. More recent interfaces tend to do this, I believe, and it means that the prototype, even if not the ABI, remains the same if the set identifier changes in the future. - You don't mention what happens if a process's cpu set changes to preclude a CPU the process has a thread with affinity for. Online, you suggested SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL action might be a friendlier model. We should see what Solaris and others do here though. I like the idea that the affinity is a guarantee in userspace because it means that you can rely on it; I'm OK with the idea that your thread always runs on the CPUs you have affinity for unless in the SIGCPUGONE handler :-). - It would be nice to be able to use CPU sets in jail as well, suggesting a hierarchal model with some sort of tagging so you know what CPU sets were created in a jail such that you know whether they can be changed in a jail. While I recognize this makes things a lot more tricky, I think we should basically be planning more carefully with respect to virtualization when we add new interfaces, since it's a widely used feature, and the current set of "stragglers" unsupported in Jail is growing rather than shrinking. - There's still no way to specify an affinity policy rather than explicit affinity, but if our CPU set model is sufficiently general, that might be a vehicle to do that. I.e., cpuset_setpolicy() rather than setting a mask. - In the interests of boring API changes, recent APIs tend to prefix the method on the object name. Have you thought about cpuset_create(), cpuset_foo(), etc? That reduces the chances of interfering with application namespaces. I think, anyway. :-). I need to ponder the proposal a little more, ideally over a hot beverage this morning, and will follow up if I have further thoughts. Thanks for working on this, BTW -- affinity is well-overdue for FreeBSD. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 10:07:09 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B6D116A402 for ; Thu, 21 Feb 2008 10:07:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id DC15613C469 for ; Thu, 21 Feb 2008 10:07:08 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 611F346BAB; Thu, 21 Feb 2008 05:07:08 -0500 (EST) Date: Thu, 21 Feb 2008 10:07:08 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= In-Reply-To: <86ablvuzgx.fsf@ds4.des.no> Message-ID: <20080221100156.V52922@fledge.watson.org> References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <20080220111157.H44565@fledge.watson.org> <86ablvuzgx.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-1809286258-1203588428=:52922" Cc: arch@freebsd.org, John-Mark Gurney Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 10:07:09 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1809286258-1203588428=:52922 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 20 Feb 2008, Dag-Erling Sm=F8rgrav wrote: > Robert Watson writes: > >> We also support interface renaming... Does newbus mind if you rename th= e=20 >> devices in its tree? > > I'm not sure whether you're replying to my proposal or to Julian's=20 > interpretation / extrapolation of it... but I have no intention of hooki= ng=20 > interfaces into newbus. I just want a sysctl tree for struct ifnet like = we=20 > have a sysctl tree for device_t, to access interface parameters which are= =20 > not easily accessible through ifconfig. Hmm. When I look at net/if.c, I don't see renaming support, so perhaps thi= s=20 was just a proposal I was thinking of and not actual code. In either case,= I=20 think the question stands: in a world where interface renaming is supported= ,=20 is your plan to also rename the if.X sysctl tree created for the interface?= =20 Does sysctl have a facility to do this? I assume that somehow the details of your plan involve automatically creati= ng=20 a root node for the interface in if_attach and then exposing the node to th= e=20 driver, possibly via a new pointer in struct ifnet? I'm certainly fine wit= h=20 such a notion, but think we should establish, for devices with a number of= =20 sysctl trees (i.e., dev.em vs if.em, dev.da0 vs disk.da0, etc), a general= =20 philosophy for placing nodes in one or the other somewhat deterministically= =2E Robert N M Watson Computer Laboratory University of Cambridge --621616949-1809286258-1203588428=:52922-- From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 12:12:29 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B108816A401 for ; Thu, 21 Feb 2008 12:12:29 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 72D8313C45A for ; Thu, 21 Feb 2008 12:12:29 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 31F3D2091 for ; Thu, 21 Feb 2008 13:12:24 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id B44472088 for ; Thu, 21 Feb 2008 13:12:23 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 992B0844CC; Thu, 21 Feb 2008 13:12:23 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> <20080221072410.GC96595@funkthat.com> Date: Thu, 21 Feb 2008 13:12:23 +0100 In-Reply-To: <20080221072410.GC96595@funkthat.com> (John-Mark Gurney's message of "Wed\, 20 Feb 2008 23\:24\:10 -0800") Message-ID: <86ablua4pk.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 12:12:29 -0000 John-Mark Gurney writes: > I object. I have stated my case. If you don't want to listen to it, > that's fine. Your case was very unclear. In fact, I would say you objected to something that you made up in your head which was completely different from what I actually proposed. Perhaps you should re-read my original email. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 12:21:19 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74CCC16A400; Thu, 21 Feb 2008 12:21:19 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 3610D13C459; Thu, 21 Feb 2008 12:21:19 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 8D5042093; Thu, 21 Feb 2008 13:21:15 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 67F082087; Thu, 21 Feb 2008 13:21:15 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 48DF9844CD; Thu, 21 Feb 2008 13:21:15 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Robert Watson References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <20080220111157.H44565@fledge.watson.org> <86ablvuzgx.fsf@ds4.des.no> <20080221100156.V52922@fledge.watson.org> Date: Thu, 21 Feb 2008 13:21:15 +0100 In-Reply-To: <20080221100156.V52922@fledge.watson.org> (Robert Watson's message of "Thu\, 21 Feb 2008 10\:07\:08 +0000 \(GMT\)") Message-ID: <8663wia4as.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, John-Mark Gurney Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 12:21:19 -0000 Robert Watson writes: > Hmm. When I look at net/if.c, I don't see renaming support, so > perhaps this was just a proposal I was thinking of and not actual > code. no, it's there, check ifconfig(8) > In either case, I think the question stands: in a world where > interface renaming is supported, is your plan to also rename the if.X > sysctl tree created for the interface? Does sysctl have a facility to > do this? I will have to investigate. I think it can be arranged in some way. > I assume that somehow the details of your plan involve automatically > creating a root node for the interface in if_attach and then exposing > the node to the driver, possibly via a new pointer in struct ifnet? Via two functions (ifnet_get_sysctl_{context,root}() or something like it) which internally use extra members in struct ifnet, yes (or wherever it makes sense to place them without breaking the ABI) > I'm certainly fine with such a notion, but think we should establish, > for devices with a number of sysctl trees (i.e., dev.em vs if.em, > dev.da0 vs disk.da0, etc), a general philosophy for placing nodes in > one or the other somewhat deterministically. Hardware-related things go in dev, network-related things go in if. If you want to tune the number of DMA queues or whatever, do it in dev; if you want to allow traffic through a specific interface to bypass pfil, do it in if (this may be something we want to do centrally for all interfaces) What happened when dev was introduced was that certain settings which were previously system-wide for all instances of the same driver (e.g. all fxp devices) became per-device instead. I expect the same will happen with if, for instance: net.link.bridge.ipfw net.link.bridge.log_stp net.link.bridge.pfil_local_phys net.link.bridge.pfil_member net.link.bridge.pfil_bridge net.link.bridge.ipfw_arp net.link.bridge.pfil_onlyip would go into if.bridge.0.ipfw if.bridge.1.ipfw etc. so we will actually gain functionality while in all likelihood reducing the amount of code (and code duplication) DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 13:38:59 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3F6A16A404; Thu, 21 Feb 2008 13:38:59 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 7328113C46E; Thu, 21 Feb 2008 13:38:59 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id EF2112091; Thu, 21 Feb 2008 14:38:52 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id DFF95208C; Thu, 21 Feb 2008 14:38:52 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id C49BF844CD; Thu, 21 Feb 2008 14:38:52 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Robert Watson References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <20080220111157.H44565@fledge.watson.org> <86ablvuzgx.fsf@ds4.des.no> <20080221100156.V52922@fledge.watson.org> <8663wia4as.fsf@ds4.des.no> Date: Thu, 21 Feb 2008 14:38:52 +0100 In-Reply-To: <8663wia4as.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Thu\, 21 Feb 2008 13\:21\:15 +0100") Message-ID: <86k5ky8m4z.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, John-Mark Gurney Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 13:38:59 -0000 Dag-Erling Sm=C3=B8rgrav writes: > Robert Watson writes: > > Hmm. When I look at net/if.c, I don't see renaming support, so > > perhaps this was just a proposal I was thinking of and not actual > > code. > no, it's there, check ifconfig(8) specifically, des@ds4 ~% sudo kldload if_bridge des@ds4 ~% sudo ifconfig bridge0 plumb des@ds4 ~% ifconfig -l msk0 lo0 pflog0 bridge0 des@ds4 ~% sudo ifconfig bridge0 name foo des@ds4 ~% ifconfig -l msk0 lo0 pflog0 foo DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 15:24:15 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3CA0B16A402; Thu, 21 Feb 2008 15:24:15 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9E30913C468; Thu, 21 Feb 2008 15:24:14 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m1LFO9Y1012231; Thu, 21 Feb 2008 09:24:09 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id m1LFO9EH012230; Thu, 21 Feb 2008 09:24:09 -0600 (CST) (envelope-from brooks) Date: Thu, 21 Feb 2008 09:24:08 -0600 From: Brooks Davis To: Robert Watson Message-ID: <20080221152408.GA12023@lor.one-eyed-alien.net> References: <86odacc04t.fsf@ds4.des.no> <20080219233217.GS27248@funkthat.com> <20080220111157.H44565@fledge.watson.org> <86ablvuzgx.fsf@ds4.des.no> <20080221100156.V52922@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="YiEDa0DAkWCtVeE4" Content-Disposition: inline In-Reply-To: <20080221100156.V52922@fledge.watson.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Thu, 21 Feb 2008 09:24:10 -0600 (CST) Cc: Dag-Erling Sm??rgrav , John-Mark Gurney , arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 15:24:15 -0000 --YiEDa0DAkWCtVeE4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 21, 2008 at 10:07:08AM +0000, Robert Watson wrote: > On Wed, 20 Feb 2008, Dag-Erling Sm?rgrav wrote: >=20 >> Robert Watson writes: >>=20 >>> We also support interface renaming... Does newbus mind if you rename t= he=20 >>> devices in its tree? >>=20 >> I'm not sure whether you're replying to my proposal or to Julian's=20 >> interpretation / extrapolation of it... but I have no intention of=20 >> hooking interfaces into newbus. I just want a sysctl tree for struct=20 >> ifnet like we have a sysctl tree for device_t, to access interface=20 >> parameters which are not easily accessible through ifconfig. >=20 > Hmm. When I look at net/if.c, I don't see renaming support, so perhaps= =20 > this was just a proposal I was thinking of and not actual code. In eithe= r=20 > case, I think the question stands: in a world where interface renaming is= =20 > supported, is your plan to also rename the if.X sysctl tree created for t= he=20 > interface? Does sysctl have a facility to do this? I think that one way or another it should be possible to reach the sysctl by if_index since that is the only stable way to access an interface though out its life. I might actually suggest making that the only way and have if.1.name be available. -- Brooks --YiEDa0DAkWCtVeE4 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHvZeYXY6L6fI4GtQRAq+SAKDWoU2VUwWkotuw0L46r8MGqzarVQCeJb+6 eOUZsjvip/UzKDo0m+7EEQI= =T+v0 -----END PGP SIGNATURE----- --YiEDa0DAkWCtVeE4-- From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 16:34:21 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E88116A400; Thu, 21 Feb 2008 16:34:21 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 05E7413C458; Thu, 21 Feb 2008 16:34:20 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m1LGY95t000213; Thu, 21 Feb 2008 11:34:09 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Thu, 21 Feb 2008 11:34:09 -0500 (EST) Date: Thu, 21 Feb 2008 11:34:09 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Jeff Roberson In-Reply-To: <20080220213253.A920@desktop> Message-ID: References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 16:34:21 -0000 On Wed, 20 Feb 2008, Jeff Roberson wrote: > I have the following api working: > > /* > * Parameters for the level argument to getaffinity. > */ > #define CPU_LEVEL_SYS 1 /* All system cpus. */ > #define CPU_LEVEL_AVAIL 2 /* Available cpus for which. */ > #define CPU_LEVEL_WHICH 3 /* Actual mask for which. */ > > /* > * Parameters for the which argument to {get,set}affinity. > */ > #define CPU_WHICH_TID 1 /* Specifies a thread id. */ > #define CPU_WHICH_PID 2 /* Specifies a process id. */ > #define CPU_WHICH_SET 3 /* Specifies a set id. */ > > > Along with a CPU_CLR, CPU_COPY, CPU_ISSET, CPU_SET, CPU_ZERO for manipulating > the sets. > > int getaffinity(int level, int which, int id, int cpusetsize, long *mask); > int setaffinity(int which, int id, int cpusetsize, long *mask); > > The get call has a notion of 'level' which allows us to fetch different > masks. The system set is all processors in the system. The available set is > the set of cpus available to the tid/pid in the 'which' argument. An > application would fetch the avail set and then potentially reduce it. > > The setaffinity call doesn't have a level because the avail/sys sets are > immutable. You can only set things which can be specified by the which > argument. Everything looks pretty good to me, but if you add the 'level' to setaffinity(), you might be able to say "run on any ONE of the CPUs in the cpuset - I don't care which one". -- DE From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 19:32:22 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B9CA16A40A for ; Thu, 21 Feb 2008 19:32:22 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168]) by mx1.freebsd.org (Postfix) with ESMTP id 4F79013C4E8 for ; Thu, 21 Feb 2008 19:32:22 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (u7jpxubhwyd2j795@localhost.funkthat.com [127.0.0.1]) by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id m1LJWIGC017208; Thu, 21 Feb 2008 11:32:18 -0800 (PST) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id m1LJWHJY017205; Thu, 21 Feb 2008 11:32:17 -0800 (PST) (envelope-from jmg) Date: Thu, 21 Feb 2008 11:32:17 -0800 From: John-Mark Gurney To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20080221193217.GF96595@funkthat.com> Mail-Followup-To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> <20080221072410.GC96595@funkthat.com> <86ablua4pk.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86ablua4pk.fsf@ds4.des.no> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (hydrogen.funkthat.com [127.0.0.1]); Thu, 21 Feb 2008 11:32:18 -0800 (PST) Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 19:32:22 -0000 Dag-Erling Smrgrav wrote this message on Thu, Feb 21, 2008 at 13:12 +0100: > John-Mark Gurney writes: > > I object. I have stated my case. If you don't want to listen to it, > > that's fine. > > Your case was very unclear. In fact, I would say you objected to > something that you made up in your head which was completely different > from what I actually proposed. Perhaps you should re-read my original > email. My case is perfectly clear. We already have dev.* for this, and you want to add a second, confusing, place to put similar/same information... Yes, this is specific for network interfaces, but what makes a network interface special that it's configuration can't live in dev.*? You stated that you were fine w/ some items being in dev.* and others in net.if.* for the same device, which is why I objected. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 21:04:57 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8438A16A40F for ; Thu, 21 Feb 2008 21:04:57 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 3D13313C458 for ; Thu, 21 Feb 2008 21:04:57 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id BB8EE207F for ; Thu, 21 Feb 2008 22:04:53 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.3/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 44A75207E for ; Thu, 21 Feb 2008 22:04:53 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 2CB47844CC; Thu, 21 Feb 2008 22:04:53 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: arch@freebsd.org References: <86odacc04t.fsf@ds4.des.no> <20080221072410.GC96595@funkthat.com> <86ablua4pk.fsf@ds4.des.no> <20080221193217.GF96595@funkthat.com> Date: Thu, 21 Feb 2008 22:04:53 +0100 In-Reply-To: <20080221193217.GF96595@funkthat.com> (John-Mark Gurney's message of "Thu\, 21 Feb 2008 11\:32\:17 -0800") Message-ID: <868x1eja16.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 21:04:57 -0000 John-Mark Gurney writes: > My case is perfectly clear. We already have dev.* for this, and you > want to add a second, confusing, place to put similar/same information... > Yes, this is specific for network interfaces, but what makes a network > interface special that it's configuration can't live in dev.*? You > stated that you were fine w/ some items being in dev.* and others in > net.if.* for the same device, which is why I objected. If you can't tell the difference between a struct ifnet and a device_t, I'm afraid we're going to have to agree to disagree. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 21:08:14 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9093916A403; Thu, 21 Feb 2008 21:08:14 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 450B813C448; Thu, 21 Feb 2008 21:08:14 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m1LL84vo003331; Thu, 21 Feb 2008 16:08:04 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m1LL84Ho003330; Thu, 21 Feb 2008 16:08:04 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Thu, 21 Feb 2008 16:08:04 -0500 From: David Schultz To: Jeff Roberson Message-ID: <20080221210804.GA3240@zim.MIT.EDU> Mail-Followup-To: Jeff Roberson , David Xu , Daniel Eischen , Robert Watson , Andrew Gallatin , arch@FreeBSD.ORG References: <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080220213253.A920@desktop> Cc: Daniel Eischen , arch@FreeBSD.ORG, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 21:08:14 -0000 I have no specific comments, but I wanted to point out that the Solaris kernel team put a lot of thought into coming up with a flexible processor binding API for Solaris 10 that meshes well with jails (a.k.a. zones in Solaris). It might be worthwhile to investigate what good ideas they might have had, and to decide if compatibility is worthwhile: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/pset.h There are manpages for this stuff somewhere. pset_create and pset_destroy obviously create and delete processor set definitions. pset_bind binds a particular process / thread / session / jail to a processor set, similar to your CPU_WHICH_* flags, I think, but with more options. From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 21:57:14 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 827F416A405; Thu, 21 Feb 2008 21:57:14 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 3854613C4DD; Thu, 21 Feb 2008 21:57:14 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1LLv9gj004220; Thu, 21 Feb 2008 16:57:11 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Thu, 21 Feb 2008 11:58:23 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: David Schultz In-Reply-To: <20080221210804.GA3240@zim.MIT.EDU> Message-ID: <20080221114911.I920@desktop> References: <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221210804.GA3240@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.ORG, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 21:57:14 -0000 On Thu, 21 Feb 2008, David Schultz wrote: > I have no specific comments, but I wanted to point out that the > Solaris kernel team put a lot of thought into coming up with a > flexible processor binding API for Solaris 10 that meshes well > with jails (a.k.a. zones in Solaris). It might be worthwhile to > investigate what good ideas they might have had, and to decide if > compatibility is worthwhile: Well interestingly enough I came up with an almost identical thing but with fewer features. If someone wanted to add the missing features I'm fine with that. I don't know if we want to or are free to copy the api exactly. > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/pset.h > > There are manpages for this stuff somewhere. pset_create and > pset_destroy obviously create and delete processor set definitions. > pset_bind binds a particular process / thread / session / jail > to a processor set, similar to your CPU_WHICH_* flags, I think, > but with more options. > I didn't see the header but I read some web articles about it. The only real discrepancy is whether you explicitly destroy them or they disappear when the last processor leaves the set. I could be convinced to do either, but so far I'm leaning towards automatic destruction. Hopefully someone who is involved with jails will work with me to integrate the two. I intend only to implement the infrastructure required to create and modify sets. Someone who is jail savvy will need to apply the set and the appropriate security restrictions. Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 22:06:46 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A74F816A400 for ; Thu, 21 Feb 2008 22:06:46 +0000 (UTC) (envelope-from sam@errno.com) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id 14BC413C4E1 for ; Thu, 21 Feb 2008 22:06:46 +0000 (UTC) (envelope-from sam@errno.com) Received: from Macintosh-2.local ([10.0.0.196]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id m1LLuebe067842 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 Feb 2008 13:56:40 -0800 (PST) (envelope-from sam@errno.com) Message-ID: <47BDF398.3060108@errno.com> Date: Thu, 21 Feb 2008 13:56:40 -0800 From: Sam Leffler Organization: Errno Consulting User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= References: <86odacc04t.fsf@ds4.des.no> <20080221072410.GC96595@funkthat.com> <86ablua4pk.fsf@ds4.des.no> <20080221193217.GF96595@funkthat.com> <868x1eja16.fsf@ds4.des.no> In-Reply-To: <868x1eja16.fsf@ds4.des.no> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-DCC-sonic.net-Metrics: ebb.errno.com; whitelist Cc: arch@freebsd.org Subject: Re: dev.* analogue for interfaces X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 22:06:46 -0000 Dag-Erling Smørgrav wrote: > John-Mark Gurney writes: >> My case is perfectly clear. We already have dev.* for this, and you >> want to add a second, confusing, place to put similar/same information... >> Yes, this is specific for network interfaces, but what makes a network >> interface special that it's configuration can't live in dev.*? You >> stated that you were fine w/ some items being in dev.* and others in >> net.if.* for the same device, which is why I objected. > > If you can't tell the difference between a struct ifnet and a device_t, > I'm afraid we're going to have to agree to disagree. > > DES I think you need to experiment with this before you push a proposal. In net80211 I've had parallel net.wlan.X tree's that are companion to dev.* tree's and it's worked out ok but mostly because there is a clear layering/distinction between the two. I believe the original motivation for this was for s/w only devices that don't otherwise have a dev.* entry. I recently handled something like this for the cryptosoft driver by arbitrarily attaching it to nexus and it worked out very well. I personally would just attach these other devices under net. as that's existing practice but I'm open to your suggestion. Sam From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 22:54:01 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D578B16A402; Thu, 21 Feb 2008 22:54:01 +0000 (UTC) (envelope-from ceri@submonkey.net) Received: from shrike.submonkey.net (cpc3-cdif2-0-0-cust64.cdif.cable.ntl.com [81.106.128.65]) by mx1.freebsd.org (Postfix) with ESMTP id 80BE413C45A; Thu, 21 Feb 2008 22:54:01 +0000 (UTC) (envelope-from ceri@submonkey.net) Received: from ceri by shrike.submonkey.net with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1JSK33-000HtM-JV; Thu, 21 Feb 2008 22:37:49 +0000 Date: Thu, 21 Feb 2008 22:37:49 +0000 From: Ceri Davies To: Robert Watson Message-ID: <20080221223749.GJ22033@submonkey.net> References: <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="k3qmt+ucFURmlhDS" Content-Disposition: inline In-Reply-To: <20080221092011.J52922@fledge.watson.org> X-PGP: finger ceri@FreeBSD.org User-Agent: Mutt/1.5.17 (2007-11-01) Sender: Ceri Davies Cc: Daniel Eischen , David Xu , Andrew Gallatin , arch@freebsd.org Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 22:54:01 -0000 --k3qmt+ucFURmlhDS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 21, 2008 at 09:27:41AM +0000, Robert Watson wrote: > - You don't mention what happens if a process's cpu set changes to preclu= de a > CPU the process has a thread with affinity for. Online, you suggested > SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL ac= tion > might be a friendlier model. We should see what Solaris and others do = here > though. I like the idea that the affinity is a guarantee in userspace > because it means that you can rely on it; I'm OK with the idea that your > thread always runs on the CPUs you have affinity for unless in the > SIGCPUGONE handler :-). If a processor set disappears from under a process on Solaris, the process gets moved to the "default" set (or, in other words, they aren't in a set any more). Ceri --=20 That must be wonderful! I don't understand it at all. -- Moliere --k3qmt+ucFURmlhDS Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHvf09ocfcwTS3JF8RAkgyAKCjZPFk+FvbFmy45woBII+H/v4QuACfe7Rv 6b0H+pNwaAyfOnOm9HxUjMI= =d5La -----END PGP SIGNATURE----- --k3qmt+ucFURmlhDS-- From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 23:38:35 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0238B16A408; Thu, 21 Feb 2008 23:38:35 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id B427C13C4E5; Thu, 21 Feb 2008 23:38:34 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1LNcSwl024554; Thu, 21 Feb 2008 18:38:29 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Thu, 21 Feb 2008 13:39:42 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Ceri Davies In-Reply-To: <20080221223749.GJ22033@submonkey.net> Message-ID: <20080221133804.T920@desktop> References: <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080221223749.GJ22033@submonkey.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 23:38:35 -0000 On Thu, 21 Feb 2008, Ceri Davies wrote: > On Thu, Feb 21, 2008 at 09:27:41AM +0000, Robert Watson wrote: > >> - You don't mention what happens if a process's cpu set changes to preclude a >> CPU the process has a thread with affinity for. Online, you suggested >> SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL action >> might be a friendlier model. We should see what Solaris and others do here >> though. I like the idea that the affinity is a guarantee in userspace >> because it means that you can rely on it; I'm OK with the idea that your >> thread always runs on the CPUs you have affinity for unless in the >> SIGCPUGONE handler :-). > > If a processor set disappears from under a process on Solaris, the > process gets moved to the "default" set (or, in other words, they aren't > in a set any more). Yes, that's ok, but what if the process has requested a specific cpu that it's now no longer allowed to access? The sets are seperate from the thread's specific requested binding. If the thread binds to a specific processor within the set and the set disappears what should we do? What if that process was relying on the binding to access cpu specific features such as tsc? Allowing it to migrate could then break the code. Thanks, Jeff > > Ceri > -- > That must be wonderful! I don't understand it at all. > -- Moliere > From owner-freebsd-arch@FreeBSD.ORG Fri Feb 22 15:53:04 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A5A416A404; Fri, 22 Feb 2008 15:53:04 +0000 (UTC) (envelope-from ceri@submonkey.net) Received: from shrike.submonkey.net (cpc3-cdif2-0-0-cust64.cdif.cable.ntl.com [81.106.128.65]) by mx1.freebsd.org (Postfix) with ESMTP id B48AF13C46E; Fri, 22 Feb 2008 15:53:03 +0000 (UTC) (envelope-from ceri@submonkey.net) Received: from ceri by shrike.submonkey.net with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1JSaCq-000PZp-Qh; Fri, 22 Feb 2008 15:53:00 +0000 Date: Fri, 22 Feb 2008 15:53:00 +0000 From: Ceri Davies To: Jeff Roberson Message-ID: <20080222155300.GA72691@submonkey.net> References: <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080221223749.GJ22033@submonkey.net> <20080221133804.T920@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9amGYk9869ThD9tj" Content-Disposition: inline In-Reply-To: <20080221133804.T920@desktop> X-PGP: finger ceri@FreeBSD.org User-Agent: Mutt/1.5.17 (2007-11-01) Sender: Ceri Davies Cc: Daniel Eischen , arch@FreeBSD.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2008 15:53:04 -0000 --9amGYk9869ThD9tj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 21, 2008 at 01:39:42PM -1000, Jeff Roberson wrote: >=20 > On Thu, 21 Feb 2008, Ceri Davies wrote: >=20 >> On Thu, Feb 21, 2008 at 09:27:41AM +0000, Robert Watson wrote: >>=20 >>> - You don't mention what happens if a process's cpu set changes to prec= lude a >>> CPU the process has a thread with affinity for. Online, you suggested >>> SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL = action >>> might be a friendlier model. We should see what Solaris and others d= o here >>> though. I like the idea that the affinity is a guarantee in userspace >>> because it means that you can rely on it; I'm OK with the idea that y= our >>> thread always runs on the CPUs you have affinity for unless in the >>> SIGCPUGONE handler :-). >>=20 >> If a processor set disappears from under a process on Solaris, the >> process gets moved to the "default" set (or, in other words, they aren't >> in a set any more). >=20 > Yes, that's ok, but what if the process has requested a specific cpu that= =20 > it's now no longer allowed to access? The sets are seperate from the=20 > thread's specific requested binding. If the thread binds to a specific= =20 > processor within the set and the set disappears what should we do? What = if=20 > that process was relying on the binding to access cpu specific features= =20 > such as tsc? Allowing it to migrate could then break the code. OK, I was talking about processor sets; in Solaris, binding to a set (pset_bind()) and binding to a specific processor (processor_bind()) are different operations. A processor that has LWPs bound to it specifically (with processor_bind()) may not be taken offline or marked as spare, unless the operation is forced, whereupon forcing removes the binding. Since this is an administrative choice, it's acceptable. Ceri --=20 That must be wonderful! I don't understand it at all. -- Moliere --9amGYk9869ThD9tj Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHvu/cocfcwTS3JF8RAqyYAJ9ILwO5kCBIzm6+m4nzR7zGh0U50ACeP6Ba EezrE/EO0o/7cp5E2ryb918= =qdXO -----END PGP SIGNATURE----- --9amGYk9869ThD9tj-- From owner-freebsd-arch@FreeBSD.ORG Fri Feb 22 22:33:16 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A076916A400; Fri, 22 Feb 2008 22:33:16 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 67E1E13C461; Fri, 22 Feb 2008 22:33:16 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1MMWraK094664; Fri, 22 Feb 2008 17:32:54 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Fri, 22 Feb 2008 12:34:13 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Robert Watson In-Reply-To: <20080221092011.J52922@fledge.watson.org> Message-ID: <20080222121253.N920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@FreeBSD.org, David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2008 22:33:16 -0000 On Thu, 21 Feb 2008, Robert Watson wrote: > On Wed, 20 Feb 2008, Jeff Roberson wrote: > >> I also have a 'cpuset' command which can run a new program with a given cpu >> set, view and modify sets of arbitrary pids. This is all working and I can >> supply patches if anyone is interested. I have to implement 4BSD support >> before I can commit. >> >> I have a proposal for solaris style processor sets which I think is simple >> and sufficient for most cases. It involves the following new syscalls: >> >> int cpuset(void); int setcpuset(pid_t pid, int setid); int getcpuset(pid_t >> pid); >> >> The notion would be that you can create a new numbered cpuset with >> cpuset(). You can modify or inspect its affinity with get/setaffinity above >> and the CPU_WHICH_SET argument. The cpuset exists as long as there are >> members of the set. Sort of like a process group or session. The >> {get,set}cpuset calls can inspect or modify the state. >> >> This set would not be modifiable by user processes or by processes in a >> jail. It would create the restriction that differs between 'avail' and >> 'sys' above. Processors would be able to directly bind to any processor >> within the set. Changing the set would apply to all processes in the set. >> The cpuset would be per-process while the mask is per-thread. Sets >> involvement is inherited on fork(). >> >> In solaris sets can be named and have a more complete management api. I'm >> not really interested in implementing all of that but I believe what I have >> outlined here would be subset of this and no code/syscalls would be wasted. >> >> Comments? Objections? I'm fairly pleased with this arrangement now. > > Just to put a few notes from our conversation on IRC in e-mail: > > - I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t, cpuset_t > *) so that we don't mix up ID's and return values. More recent interfaces > tend to do this, I believe, and it means that the prototype, even if not > the > ABI, remains the same if the set identifier changes in the future. Ok, this is a good suggestion and I did this. This is actually my preferred method as well but most syscalls don't follow this pattern and I was trying to make it look syscallish. > > - You don't mention what happens if a process's cpu set changes to preclude a > CPU the process has a thread with affinity for. Online, you suggested > SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL action > might be a friendlier model. We should see what Solaris and others do here > though. I like the idea that the affinity is a guarantee in userspace > because it means that you can rely on it; I'm OK with the idea that your > thread always runs on the CPUs you have affinity for unless in the > SIGCPUGONE handler :-). I could also reject changes to the cpuset if they leave a thread with nothing to run on. It might be confusing for the administrator and hard to tell them which thread caused the problem. However, it might be nicer than killing a thread as well. Another option would be to expel the offending thread from the set that is in violation and reparent it to the real system root along with a syslog message or similar. If the administrator addressed the problem with the set he could then reassign the grouping. This is what I would most like comments about. Should we have a force mode? Which of these behaviors sound best to you? > > - It would be nice to be able to use CPU sets in jail as well, suggesting a > hierarchal model with some sort of tagging so you know what CPU sets were > created in a jail such that you know whether they can be changed in a jail. > While I recognize this makes things a lot more tricky, I think we should > basically be planning more carefully with respect to virtualization when we > add new interfaces, since it's a widely used feature, and the current set > of > "stragglers" unsupported in Jail is growing rather than shrinking. I have implemented a hierarchical model. Each thread has a pointer to the cpuset that it's in. If it makes a local modification via setaffinity() it gets an anonymous cpuset that is a child of the set assigned to the process. This anonymous set will also be inherited across fork/thread creation. In this model presently there are nodes marked as root. To query the 'system' cpus available we walk up from the current node until we find a root. These are the 'system' set. A thread may not break out of its system set. A process may join the root set but it may not modify a root that is a parent. Jails would create a new root. A process outside of the jail can modify the set of processors in the jail but a process within the jail/root may not. The next level down from the root is the assigned set. The root may be an assigned set or this may be a subset of the root. Processes may create sets which are parented back to their root and may include any processors within their root. The mask of the assigned set is returned as 'available' processors. This gives a 1 to 3 level hierarchy. The root, an assigned set, and an anonymous set. Any of these but the root may be omitted. There is no current way for userland to create subsets of assigned sets to permit further nesting. I'm not sure I see value in it right now and it gives the possibility of unbound tree depth. Anonymous sets are immutable as they are shared and changes only apply to the thread/pid in the WHICH argument and not others which have inherited from it. Anonymous sets have no id and may not be specifically manipulated via a setid. You must refer to the process/thread. From the administration point of view they don't exist. When a set is modified we walk down the children recursively and apply the new mask. This is done with a global set lock under which all modifications and tree operations are performed. The td_cpuset pointer is protected under the thread_lock() and may read the set without a lock. This gives the possibility for certain kinds of races but I believe they are all safe. Hopefully I explained that well enough for people to follow. I realize it's a lot of text but it's fairly simple book keeping code. This is all implemented and I'm debugging now. > > - There's still no way to specify an affinity policy rather than explicit > affinity, but if our CPU set model is sufficiently general, that might be a > vehicle to do that. I.e., cpuset_setpolicy() rather than setting a mask. Yes, I think this is orthogonal and can be addressed seperately. I'm not sure how many userland programs are smart enough or even capable of making determinations about their cache behavior however. We should open another discussion once this one is done. > > - In the interests of boring API changes, recent APIs tend to prefix the > method on the object name. Have you thought about cpuset_create(), > cpuset_foo(), etc? That reduces the chances of interfering with > application > namespaces. I think, anyway. :-). Yes, I prefer that as well, as I mentioned syscalls tended to favor brevity. I'm fine with changing that trend. > > I need to ponder the proposal a little more, ideally over a hot beverage this > morning, and will follow up if I have further thoughts. Thanks for working > on this, BTW -- affinity is well-overdue for FreeBSD. A little more to ponder now! Your feedback is much appreciated. I believe the present hierarchical model satisfies the jail requirements of restricting cpus in the jail while still allowing the jail to create sets. The unanswered questions are: 1) What to do about sets that strand threads, options described above. 2) Are people ok with the transient nature of sets? 3) Does anyone want to help with man pages, administrative tools, etc? I have a prototype tool called 'cpuset' that fully exercises the api but is probably ugly. Will post details soon. > > Robert N M Watson > Computer Laboratory > University of Cambridge > From owner-freebsd-arch@FreeBSD.ORG Fri Feb 22 22:44:34 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3562216A402; Fri, 22 Feb 2008 22:44:34 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id EAC9313C45A; Fri, 22 Feb 2008 22:44:33 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m1MMiOns010827; Fri, 22 Feb 2008 17:44:23 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Fri, 22 Feb 2008 17:44:24 -0500 (EST) Date: Fri, 22 Feb 2008 17:44:24 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Jeff Roberson In-Reply-To: <20080222121253.N920@desktop> Message-ID: References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2008 22:44:34 -0000 On Fri, 22 Feb 2008, Jeff Roberson wrote: > > On Thu, 21 Feb 2008, Robert Watson wrote: > >> On Wed, 20 Feb 2008, Jeff Roberson wrote: >> >>> I also have a 'cpuset' command which can run a new program with a given >>> cpu set, view and modify sets of arbitrary pids. This is all working and >>> I can supply patches if anyone is interested. I have to implement 4BSD >>> support before I can commit. >>> >>> I have a proposal for solaris style processor sets which I think is simple >>> and sufficient for most cases. It involves the following new syscalls: >>> >>> int cpuset(void); int setcpuset(pid_t pid, int setid); int getcpuset(pid_t >>> pid); >>> >>> The notion would be that you can create a new numbered cpuset with >>> cpuset(). You can modify or inspect its affinity with get/setaffinity >>> above and the CPU_WHICH_SET argument. The cpuset exists as long as there >>> are members of the set. Sort of like a process group or session. The >>> {get,set}cpuset calls can inspect or modify the state. >>> >>> This set would not be modifiable by user processes or by processes in a >>> jail. It would create the restriction that differs between 'avail' and >>> 'sys' above. Processors would be able to directly bind to any processor >>> within the set. Changing the set would apply to all processes in the set. >>> The cpuset would be per-process while the mask is per-thread. Sets >>> involvement is inherited on fork(). >>> >>> In solaris sets can be named and have a more complete management api. I'm >>> not really interested in implementing all of that but I believe what I >>> have outlined here would be subset of this and no code/syscalls would be >>> wasted. >>> >>> Comments? Objections? I'm fairly pleased with this arrangement now. >> >> Just to put a few notes from our conversation on IRC in e-mail: >> >> - I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t, >> cpuset_t >> *) so that we don't mix up ID's and return values. More recent interfaces >> tend to do this, I believe, and it means that the prototype, even if not >> the >> ABI, remains the same if the set identifier changes in the future. > > Ok, this is a good suggestion and I did this. This is actually my preferred > method as well but most syscalls don't follow this pattern and I was trying > to make it look syscallish. I would probably use cpuset_create(), cpuset_get(), cpuset_set()... Don't know if you need cpuset_destroy()... -- DE From owner-freebsd-arch@FreeBSD.ORG Fri Feb 22 23:08:44 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EDF316A402; Fri, 22 Feb 2008 23:08:44 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 1BCC813C45E; Fri, 22 Feb 2008 23:08:44 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1MN8ebp002439; Fri, 22 Feb 2008 18:08:41 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Fri, 22 Feb 2008 13:10:00 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Daniel Eischen In-Reply-To: Message-ID: <20080222130701.U920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2008 23:08:44 -0000 On Fri, 22 Feb 2008, Daniel Eischen wrote: > On Fri, 22 Feb 2008, Jeff Roberson wrote: > >> >> On Thu, 21 Feb 2008, Robert Watson wrote: >> >>> On Wed, 20 Feb 2008, Jeff Roberson wrote: >>> >>>> I also have a 'cpuset' command which can run a new program with a given >>>> cpu set, view and modify sets of arbitrary pids. This is all working and >>>> I can supply patches if anyone is interested. I have to implement 4BSD >>>> support before I can commit. >>>> >>>> I have a proposal for solaris style processor sets which I think is >>>> simple and sufficient for most cases. It involves the following new >>>> syscalls: >>>> >>>> int cpuset(void); int setcpuset(pid_t pid, int setid); int >>>> getcpuset(pid_t pid); >>>> >>>> The notion would be that you can create a new numbered cpuset with >>>> cpuset(). You can modify or inspect its affinity with get/setaffinity >>>> above and the CPU_WHICH_SET argument. The cpuset exists as long as there >>>> are members of the set. Sort of like a process group or session. The >>>> {get,set}cpuset calls can inspect or modify the state. >>>> >>>> This set would not be modifiable by user processes or by processes in a >>>> jail. It would create the restriction that differs between 'avail' and >>>> 'sys' above. Processors would be able to directly bind to any processor >>>> within the set. Changing the set would apply to all processes in the set. >>>> The cpuset would be per-process while the mask is per-thread. Sets >>>> involvement is inherited on fork(). >>>> >>>> In solaris sets can be named and have a more complete management api. >>>> I'm not really interested in implementing all of that but I believe what >>>> I have outlined here would be subset of this and no code/syscalls would >>>> be wasted. >>>> >>>> Comments? Objections? I'm fairly pleased with this arrangement now. >>> >>> Just to put a few notes from our conversation on IRC in e-mail: >>> >>> - I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t, >>> cpuset_t >>> *) so that we don't mix up ID's and return values. More recent >>> interfaces >>> tend to do this, I believe, and it means that the prototype, even if not >>> the >>> ABI, remains the same if the set identifier changes in the future. >> >> Ok, this is a good suggestion and I did this. This is actually my >> preferred method as well but most syscalls don't follow this pattern and I >> was trying to make it look syscallish. > > I would probably use cpuset_create(), cpuset_get(), cpuset_set()... > Don't know if you need cpuset_destroy()... In the solaris model sets are explicitly created and destroyed. In my model they are transient and only exist as long as they have members. So I don't have a destroy. fwiw it looks like linux also does a persistent thing that you modify via a filesystem. If we later want to add some attributes which we'd like to persist it'd be as simple as adding a destroy call and adding an extra ref on create. We should decide that before 8.0 however when the api becomes more entrenched. > > -- > DE > From owner-freebsd-arch@FreeBSD.ORG Fri Feb 22 23:13:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9046E16A407; Fri, 22 Feb 2008 23:13:07 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id EB87D13C468; Fri, 22 Feb 2008 23:13:06 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m1MNCkOI029137; Fri, 22 Feb 2008 17:12:46 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id m1MNCkBh029136; Fri, 22 Feb 2008 17:12:46 -0600 (CST) (envelope-from brooks) Date: Fri, 22 Feb 2008 17:12:46 -0600 From: Brooks Davis To: Jeff Roberson Message-ID: <20080222231245.GA28788@lor.one-eyed-alien.net> References: <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="a8Wt8u1KmwUX3Y2C" Content-Disposition: inline In-Reply-To: <20080222121253.N920@desktop> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Fri, 22 Feb 2008 17:12:46 -0600 (CST) Cc: Daniel Eischen , arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2008 23:13:07 -0000 --a8Wt8u1KmwUX3Y2C Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote: >=20 > On Thu, 21 Feb 2008, Robert Watson wrote: >=20 >> On Wed, 20 Feb 2008, Jeff Roberson wrote: >>=20 >>> I also have a 'cpuset' command which can run a new program with a given= =20 >>> cpu set, view and modify sets of arbitrary pids. This is all working a= nd=20 >>> I can supply patches if anyone is interested. I have to implement 4BSD= =20 >>> support before I can commit. >>> I have a proposal for solaris style processor sets which I think is=20 >>> simple and sufficient for most cases. It involves the following new=20 >>> syscalls: >>> int cpuset(void); int setcpuset(pid_t pid, int setid); int=20 >>> getcpuset(pid_t pid); >>> The notion would be that you can create a new numbered cpuset with=20 >>> cpuset(). You can modify or inspect its affinity with get/setaffinity= =20 >>> above and the CPU_WHICH_SET argument. The cpuset exists as long as the= re=20 >>> are members of the set. Sort of like a process group or session. The= =20 >>> {get,set}cpuset calls can inspect or modify the state. >>> This set would not be modifiable by user processes or by processes in a= =20 >>> jail. It would create the restriction that differs between 'avail' and= =20 >>> 'sys' above. Processors would be able to directly bind to any processor= =20 >>> within the set. Changing the set would apply to all processes in the se= t.=20 >>> The cpuset would be per-process while the mask is per-thread. Sets=20 >>> involvement is inherited on fork(). >>> In solaris sets can be named and have a more complete management api. = =20 >>> I'm not really interested in implementing all of that but I believe wha= t=20 >>> I have outlined here would be subset of this and no code/syscalls would= =20 >>> be wasted. >>> Comments? Objections? I'm fairly pleased with this arrangement now. >>=20 >> Just to put a few notes from our conversation on IRC in e-mail: >>=20 >> - I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t,=20 >> cpuset_t >> *) so that we don't mix up ID's and return values. More recent=20 >> interfaces >> tend to do this, I believe, and it means that the prototype, even if no= t=20 >> the >> ABI, remains the same if the set identifier changes in the future. >=20 > Ok, this is a good suggestion and I did this. This is actually my=20 > preferred method as well but most syscalls don't follow this pattern and = I=20 > was trying to make it look syscallish. > >> - You don't mention what happens if a process's cpu set changes to=20 >> preclude a >> CPU the process has a thread with affinity for. Online, you suggested >> SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL=20 >> action >> might be a friendlier model. We should see what Solaris and others do= =20 >> here >> though. I like the idea that the affinity is a guarantee in userspace >> because it means that you can rely on it; I'm OK with the idea that your >> thread always runs on the CPUs you have affinity for unless in the >> SIGCPUGONE handler :-). >=20 > I could also reject changes to the cpuset if they leave a thread with=20 > nothing to run on. It might be confusing for the administrator and hard = to=20 > tell them which thread caused the problem. However, it might be nicer th= an=20 > killing a thread as well. >=20 > Another option would be to expel the offending thread from the set that i= s=20 > in violation and reparent it to the real system root along with a syslog= =20 > message or similar. If the administrator addressed the problem with the= =20 > set he could then reassign the grouping. >=20 > This is what I would most like comments about. Should we have a force=20 > mode? Which of these behaviors sound best to you? It seems to me that refusing by default and reparenting when forced sound r= igh to me. There migth also be some value in adding the ability to signal all processes/threads bound to a cpu set so you can kill them if that's what you want to do. >> - It would be nice to be able to use CPU sets in jail as well, suggestin= g=20 >> a >> hierarchal model with some sort of tagging so you know what CPU sets we= re >> created in a jail such that you know whether they can be changed in a= =20 >> jail. >> While I recognize this makes things a lot more tricky, I think we should >> basically be planning more carefully with respect to virtualization whe= n=20 >> we >> add new interfaces, since it's a widely used feature, and the current s= et=20 >> of >> "stragglers" unsupported in Jail is growing rather than shrinking. >=20 > I have implemented a hierarchical model. Each thread has a pointer to th= e=20 > cpuset that it's in. If it makes a local modification via setaffinity() = it=20 > gets an anonymous cpuset that is a child of the set assigned to the=20 > process. This anonymous set will also be inherited across fork/thread=20 > creation. >=20 > In this model presently there are nodes marked as root. To query the=20 > 'system' cpus available we walk up from the current node until we find a= =20 > root. These are the 'system' set. A thread may not break out of its=20 > system set. A process may join the root set but it may not modify a root= =20 > that is a parent. Jails would create a new root. A process outside of t= he=20 > jail can modify the set of processors in the jail but a process within th= e=20 > jail/root may not. >=20 > The next level down from the root is the assigned set. The root may be a= n=20 > assigned set or this may be a subset of the root. Processes may create= =20 > sets which are parented back to their root and may include any processors= =20 > within their root. The mask of the assigned set is returned as 'availabl= e'=20 > processors. >=20 > This gives a 1 to 3 level hierarchy. The root, an assigned set, and an=20 > anonymous set. Any of these but the root may be omitted. There is no=20 > current way for userland to create subsets of assigned sets to permit=20 > further nesting. I'm not sure I see value in it right now and it gives t= he=20 > possibility of unbound tree depth. >=20 > Anonymous sets are immutable as they are shared and changes only apply to= =20 > the thread/pid in the WHICH argument and not others which have inherited= =20 > from it. Anonymous sets have no id and may not be specifically manipulat= ed=20 > via a setid. You must refer to the process/thread. From the=20 > administration point of view they don't exist. >=20 > When a set is modified we walk down the children recursively and apply th= e=20 > new mask. This is done with a global set lock under which all=20 > modifications and tree operations are performed. The td_cpuset pointer i= s=20 > protected under the thread_lock() and may read the set without a lock. Th= is=20 > gives the possibility for certain kinds of races but I believe they are a= ll=20 > safe. >=20 > Hopefully I explained that well enough for people to follow. I realize= =20 > it's a lot of text but it's fairly simple book keeping code. This is all= =20 > implemented and I'm debugging now. One place I'd like to implement CPU affinity is in the Sun Grid Engine execution daemon. I think anonymous set would not be sufficent there because the model allows new tasks to be started on a particular node at any time during a parallel job. I'd have to do some more digging in the code to be entierly certain. I think the less limits we place on the hierarchy, the better off we'll be unless there are compeling complexity reasons to avoid them. >> - There's still no way to specify an affinity policy rather than explicit >> affinity, but if our CPU set model is sufficiently general, that might = be=20 >> a >> vehicle to do that. I.e., cpuset_setpolicy() rather than setting a mas= k. >=20 > Yes, I think this is orthogonal and can be addressed seperately. I'm not= =20 > sure how many userland programs are smart enough or even capable of makin= g=20 > determinations about their cache behavior however. We should open anothe= r=20 > discussion once this one is done. >=20 >>=20 >> - In the interests of boring API changes, recent APIs tend to prefix the >> method on the object name. Have you thought about cpuset_create(), >> cpuset_foo(), etc? That reduces the chances of interfering with=20 >> application >> namespaces. I think, anyway. :-). >=20 > Yes, I prefer that as well, as I mentioned syscalls tended to favor=20 > brevity. I'm fine with changing that trend. >=20 >>=20 >> I need to ponder the proposal a little more, ideally over a hot beverage= =20 >> this morning, and will follow up if I have further thoughts. Thanks for= =20 >> working on this, BTW -- affinity is well-overdue for FreeBSD. >=20 > A little more to ponder now! Your feedback is much appreciated. >=20 > I believe the present hierarchical model satisfies the jail requirements = of=20 > restricting cpus in the jail while still allowing the jail to create sets. >=20 > The unanswered questions are: >=20 > 1) What to do about sets that strand threads, options described above. > 2) Are people ok with the transient nature of sets? > 3) Does anyone want to help with man pages, administrative tools, etc? = I=20 > have a prototype tool called 'cpuset' that fully exercises the api but is= =20 > probably ugly. Will post details soon. I could help with some of this as it furthers a funded project at work. -- Brooks --a8Wt8u1KmwUX3Y2C Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHv1btXY6L6fI4GtQRAnnGAJ9z3R/j+8/TrqOni6YsWrPyPFWA9gCgxfNK 7Dm2dW5L4wJDeLucFO3x2ME= =MJzF -----END PGP SIGNATURE----- --a8Wt8u1KmwUX3Y2C-- From owner-freebsd-arch@FreeBSD.ORG Fri Feb 22 23:51:41 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D9FB16A400; Fri, 22 Feb 2008 23:51:41 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 6552213C447; Fri, 22 Feb 2008 23:51:41 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1MNpX1C010800; Fri, 22 Feb 2008 18:51:38 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Fri, 22 Feb 2008 13:52:54 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Brooks Davis In-Reply-To: <20080222231245.GA28788@lor.one-eyed-alien.net> Message-ID: <20080222134923.M920@desktop> References: <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> <20080222231245.GA28788@lor.one-eyed-alien.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2008 23:51:41 -0000 On Fri, 22 Feb 2008, Brooks Davis wrote: > On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote: >> >> On Thu, 21 Feb 2008, Robert Watson wrote: >> >>> On Wed, 20 Feb 2008, Jeff Roberson wrote: >>> >>>> I also have a 'cpuset' command which can run a new program with a given >>>> cpu set, view and modify sets of arbitrary pids. This is all working and >>>> I can supply patches if anyone is interested. I have to implement 4BSD >>>> support before I can commit. >>>> I have a proposal for solaris style processor sets which I think is >>>> simple and sufficient for most cases. It involves the following new >>>> syscalls: >>>> int cpuset(void); int setcpuset(pid_t pid, int setid); int >>>> getcpuset(pid_t pid); >>>> The notion would be that you can create a new numbered cpuset with >>>> cpuset(). You can modify or inspect its affinity with get/setaffinity >>>> above and the CPU_WHICH_SET argument. The cpuset exists as long as there >>>> are members of the set. Sort of like a process group or session. The >>>> {get,set}cpuset calls can inspect or modify the state. >>>> This set would not be modifiable by user processes or by processes in a >>>> jail. It would create the restriction that differs between 'avail' and >>>> 'sys' above. Processors would be able to directly bind to any processor >>>> within the set. Changing the set would apply to all processes in the set. >>>> The cpuset would be per-process while the mask is per-thread. Sets >>>> involvement is inherited on fork(). >>>> In solaris sets can be named and have a more complete management api. >>>> I'm not really interested in implementing all of that but I believe what >>>> I have outlined here would be subset of this and no code/syscalls would >>>> be wasted. >>>> Comments? Objections? I'm fairly pleased with this arrangement now. >>> >>> Just to put a few notes from our conversation on IRC in e-mail: >>> >>> - I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t, >>> cpuset_t >>> *) so that we don't mix up ID's and return values. More recent >>> interfaces >>> tend to do this, I believe, and it means that the prototype, even if not >>> the >>> ABI, remains the same if the set identifier changes in the future. >> >> Ok, this is a good suggestion and I did this. This is actually my >> preferred method as well but most syscalls don't follow this pattern and I >> was trying to make it look syscallish. >> >>> - You don't mention what happens if a process's cpu set changes to >>> preclude a >>> CPU the process has a thread with affinity for. Online, you suggested >>> SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL >>> action >>> might be a friendlier model. We should see what Solaris and others do >>> here >>> though. I like the idea that the affinity is a guarantee in userspace >>> because it means that you can rely on it; I'm OK with the idea that your >>> thread always runs on the CPUs you have affinity for unless in the >>> SIGCPUGONE handler :-). >> >> I could also reject changes to the cpuset if they leave a thread with >> nothing to run on. It might be confusing for the administrator and hard to >> tell them which thread caused the problem. However, it might be nicer than >> killing a thread as well. >> >> Another option would be to expel the offending thread from the set that is >> in violation and reparent it to the real system root along with a syslog >> message or similar. If the administrator addressed the problem with the >> set he could then reassign the grouping. >> >> This is what I would most like comments about. Should we have a force >> mode? Which of these behaviors sound best to you? > > It seems to me that refusing by default and reparenting when forced sound righ > to me. There migth also be some value in adding the ability to signal all > processes/threads bound to a cpu set so you can kill them if that's what you > want to do. This is where I'm leaning as well. The refuse/force. the cpuset_signal() would have to walk all processes to determine which processes belong to that set however. There are no back pointers between threads and sets. Still, that's not to terrible given that it would be very infrequent. > >>> - It would be nice to be able to use CPU sets in jail as well, suggesting >>> a >>> hierarchal model with some sort of tagging so you know what CPU sets were >>> created in a jail such that you know whether they can be changed in a >>> jail. >>> While I recognize this makes things a lot more tricky, I think we should >>> basically be planning more carefully with respect to virtualization when >>> we >>> add new interfaces, since it's a widely used feature, and the current set >>> of >>> "stragglers" unsupported in Jail is growing rather than shrinking. >> >> I have implemented a hierarchical model. Each thread has a pointer to the >> cpuset that it's in. If it makes a local modification via setaffinity() it >> gets an anonymous cpuset that is a child of the set assigned to the >> process. This anonymous set will also be inherited across fork/thread >> creation. >> >> In this model presently there are nodes marked as root. To query the >> 'system' cpus available we walk up from the current node until we find a >> root. These are the 'system' set. A thread may not break out of its >> system set. A process may join the root set but it may not modify a root >> that is a parent. Jails would create a new root. A process outside of the >> jail can modify the set of processors in the jail but a process within the >> jail/root may not. >> >> The next level down from the root is the assigned set. The root may be an >> assigned set or this may be a subset of the root. Processes may create >> sets which are parented back to their root and may include any processors >> within their root. The mask of the assigned set is returned as 'available' >> processors. >> >> This gives a 1 to 3 level hierarchy. The root, an assigned set, and an >> anonymous set. Any of these but the root may be omitted. There is no >> current way for userland to create subsets of assigned sets to permit >> further nesting. I'm not sure I see value in it right now and it gives the >> possibility of unbound tree depth. >> >> Anonymous sets are immutable as they are shared and changes only apply to >> the thread/pid in the WHICH argument and not others which have inherited >> from it. Anonymous sets have no id and may not be specifically manipulated >> via a setid. You must refer to the process/thread. From the >> administration point of view they don't exist. >> >> When a set is modified we walk down the children recursively and apply the >> new mask. This is done with a global set lock under which all >> modifications and tree operations are performed. The td_cpuset pointer is >> protected under the thread_lock() and may read the set without a lock. This >> gives the possibility for certain kinds of races but I believe they are all >> safe. >> >> Hopefully I explained that well enough for people to follow. I realize >> it's a lot of text but it's fairly simple book keeping code. This is all >> implemented and I'm debugging now. > > One place I'd like to implement CPU affinity is in the Sun Grid Engine > execution daemon. I think anonymous set would not be sufficent there > because the model allows new tasks to be started on a particular node at > any time during a parallel job. I'd have to do some more digging in the > code to be entierly certain. I think the less limits we place on the > hierarchy, the better off we'll be unless there are compeling complexity > reasons to avoid them. With the anonymous set you can bind any thread to any cpu that is visible to it. How would this not work? > >>> - There's still no way to specify an affinity policy rather than explicit >>> affinity, but if our CPU set model is sufficiently general, that might be >>> a >>> vehicle to do that. I.e., cpuset_setpolicy() rather than setting a mask. >> >> Yes, I think this is orthogonal and can be addressed seperately. I'm not >> sure how many userland programs are smart enough or even capable of making >> determinations about their cache behavior however. We should open another >> discussion once this one is done. >> >>> >>> - In the interests of boring API changes, recent APIs tend to prefix the >>> method on the object name. Have you thought about cpuset_create(), >>> cpuset_foo(), etc? That reduces the chances of interfering with >>> application >>> namespaces. I think, anyway. :-). >> >> Yes, I prefer that as well, as I mentioned syscalls tended to favor >> brevity. I'm fine with changing that trend. >> >>> >>> I need to ponder the proposal a little more, ideally over a hot beverage >>> this morning, and will follow up if I have further thoughts. Thanks for >>> working on this, BTW -- affinity is well-overdue for FreeBSD. >> >> A little more to ponder now! Your feedback is much appreciated. >> >> I believe the present hierarchical model satisfies the jail requirements of >> restricting cpus in the jail while still allowing the jail to create sets. >> >> The unanswered questions are: >> >> 1) What to do about sets that strand threads, options described above. >> 2) Are people ok with the transient nature of sets? >> 3) Does anyone want to help with man pages, administrative tools, etc? I >> have a prototype tool called 'cpuset' that fully exercises the api but is >> probably ugly. Will post details soon. > > I could help with some of this as it furthers a funded project at work. I will provide patches soon. It would be great to have a developer with a users perspective to look at some of the details and especially the administration side of things. I think someone else has offered to help with man pages but I need to double check. Thanks, Jeff > > -- Brooks > From owner-freebsd-arch@FreeBSD.ORG Sat Feb 23 19:40:50 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4524516A407 for ; Sat, 23 Feb 2008 19:40:50 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9A7FD13C467 for ; Sat, 23 Feb 2008 19:40:49 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m1NJem4v038915; Sat, 23 Feb 2008 13:40:48 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id m1NJembj038914; Sat, 23 Feb 2008 13:40:48 -0600 (CST) (envelope-from brooks) Date: Sat, 23 Feb 2008 13:40:47 -0600 From: Brooks Davis To: Jeff Roberson Message-ID: <20080223194047.GB38485@lor.one-eyed-alien.net> References: <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> <20080222231245.GA28788@lor.one-eyed-alien.net> <20080222134923.M920@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rS8CxjVDS/+yyDmU" Content-Disposition: inline In-Reply-To: <20080222134923.M920@desktop> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Sat, 23 Feb 2008 13:40:48 -0600 (CST) Cc: Brooks Davis , Andrew Gallatin , Daniel Eischen , arch@freebsd.org, Robert Watson , David Xu Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2008 19:40:50 -0000 --rS8CxjVDS/+yyDmU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 22, 2008 at 01:52:54PM -1000, Jeff Roberson wrote: > On Fri, 22 Feb 2008, Brooks Davis wrote: >=20 >> On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote: >>>=20 >>> On Thu, 21 Feb 2008, Robert Watson wrote: >>>=20 >>>> On Wed, 20 Feb 2008, Jeff Roberson wrote: >>>> - It would be nice to be able to use CPU sets in jail as well,=20 >>>> suggesting >>>> a >>>> hierarchal model with some sort of tagging so you know what CPU sets= =20 >>>> were >>>> created in a jail such that you know whether they can be changed in a >>>> jail. >>>> While I recognize this makes things a lot more tricky, I think we=20 >>>> should >>>> basically be planning more carefully with respect to virtualization= =20 >>>> when >>>> we >>>> add new interfaces, since it's a widely used feature, and the current= =20 >>>> set >>>> of >>>> "stragglers" unsupported in Jail is growing rather than shrinking. >>>=20 >>> I have implemented a hierarchical model. Each thread has a pointer to= =20 >>> the >>> cpuset that it's in. If it makes a local modification via setaffinity(= )=20 >>> it >>> gets an anonymous cpuset that is a child of the set assigned to the >>> process. This anonymous set will also be inherited across fork/thread >>> creation. >>>=20 >>> In this model presently there are nodes marked as root. To query the >>> 'system' cpus available we walk up from the current node until we find a >>> root. These are the 'system' set. A thread may not break out of its >>> system set. A process may join the root set but it may not modify a ro= ot >>> that is a parent. Jails would create a new root. A process outside of= =20 >>> the >>> jail can modify the set of processors in the jail but a process within= =20 >>> the >>> jail/root may not. >>>=20 >>> The next level down from the root is the assigned set. The root may be= =20 >>> an >>> assigned set or this may be a subset of the root. Processes may create >>> sets which are parented back to their root and may include any processo= rs >>> within their root. The mask of the assigned set is returned as=20 >>> 'available' >>> processors. >>>=20 >>> This gives a 1 to 3 level hierarchy. The root, an assigned set, and an >>> anonymous set. Any of these but the root may be omitted. There is no >>> current way for userland to create subsets of assigned sets to permit >>> further nesting. I'm not sure I see value in it right now and it gives= =20 >>> the >>> possibility of unbound tree depth. >>>=20 >>> Anonymous sets are immutable as they are shared and changes only apply = to >>> the thread/pid in the WHICH argument and not others which have inherited >>> from it. Anonymous sets have no id and may not be specifically=20 >>> manipulated >>> via a setid. You must refer to the process/thread. From the >>> administration point of view they don't exist. >>>=20 >>> When a set is modified we walk down the children recursively and apply= =20 >>> the >>> new mask. This is done with a global set lock under which all >>> modifications and tree operations are performed. The td_cpuset pointer= =20 >>> is >>> protected under the thread_lock() and may read the set without a lock.= =20 >>> This >>> gives the possibility for certain kinds of races but I believe they are= =20 >>> all >>> safe. >>>=20 >>> Hopefully I explained that well enough for people to follow. I realize >>> it's a lot of text but it's fairly simple book keeping code. This is a= ll >>> implemented and I'm debugging now. >>=20 >> One place I'd like to implement CPU affinity is in the Sun Grid Engine >> execution daemon. I think anonymous set would not be sufficent there >> because the model allows new tasks to be started on a particular node at >> any time during a parallel job. I'd have to do some more digging in the >> code to be entierly certain. I think the less limits we place on the >> hierarchy, the better off we'll be unless there are compeling complexity >> reasons to avoid them. >=20 > With the anonymous set you can bind any thread to any cpu that is visible= =20 > to it. How would this not work? I'm still trying to wrap my head around the anonymous sets. Is the idea that once you are in an anonymous set, you can't expand it, or can you expand out as far as the assigned set? I'd like for parallel jobs to be allocated a set of cpus that they can't change, but still be able to make their own decisions about thread affinity if they desire (for example OpenMPI has some support for this so processes stay put and in theory benefit from positive cache effects). If that's feasible in this model, I'm happy ok it. I think we should keep in mind that these SGE execution daemons might be sitting inside jails. ;-) >>>> - There's still no way to specify an affinity policy rather than=20 >>>> explicit >>>> affinity, but if our CPU set model is sufficiently general, that migh= t=20 >>>> be >>>> a >>>> vehicle to do that. I.e., cpuset_setpolicy() rather than setting a= =20 >>>> mask. >>>=20 >>> Yes, I think this is orthogonal and can be addressed seperately. I'm n= ot >>> sure how many userland programs are smart enough or even capable of=20 >>> making >>> determinations about their cache behavior however. We should open=20 >>> another >>> discussion once this one is done. >>>=20 >>>>=20 >>>> - In the interests of boring API changes, recent APIs tend to prefix t= he >>>> method on the object name. Have you thought about cpuset_create(), >>>> cpuset_foo(), etc? That reduces the chances of interfering with >>>> application >>>> namespaces. I think, anyway. :-). >>>=20 >>> Yes, I prefer that as well, as I mentioned syscalls tended to favor >>> brevity. I'm fine with changing that trend. >>>=20 >>>>=20 >>>> I need to ponder the proposal a little more, ideally over a hot bevera= ge >>>> this morning, and will follow up if I have further thoughts. Thanks f= or >>>> working on this, BTW -- affinity is well-overdue for FreeBSD. >>>=20 >>> A little more to ponder now! Your feedback is much appreciated. >>>=20 >>> I believe the present hierarchical model satisfies the jail requirement= s=20 >>> of >>> restricting cpus in the jail while still allowing the jail to create=20 >>> sets. >>>=20 >>> The unanswered questions are: >>>=20 >>> 1) What to do about sets that strand threads, options described above. >>> 2) Are people ok with the transient nature of sets? >>> 3) Does anyone want to help with man pages, administrative tools, etc?= =20 >>> I >>> have a prototype tool called 'cpuset' that fully exercises the api but = is >>> probably ugly. Will post details soon. >>=20 >> I could help with some of this as it furthers a funded project at work. >=20 > I will provide patches soon. It would be great to have a developer with = a=20 > users perspective to look at some of the details and especially the=20 > administration side of things. I think someone else has offered to help= =20 > with man pages but I need to double check. Cool. If you can get some basics out by late Sunday afternoon (CST) I should be able to look at it and think about it on the plane Monday. -- Brooks --rS8CxjVDS/+yyDmU Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHwHa/XY6L6fI4GtQRAqmjAJ48y/n2UVTEOA723K6tYv1RtK112gCfSvYK aArGS4pjj474J94hq+iskLA= =CIob -----END PGP SIGNATURE----- --rS8CxjVDS/+yyDmU-- From owner-freebsd-arch@FreeBSD.ORG Sat Feb 23 21:20:23 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C34D416A400; Sat, 23 Feb 2008 21:20:23 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 934E613C458; Sat, 23 Feb 2008 21:20:23 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1NLK8gC091597; Sat, 23 Feb 2008 16:20:09 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sat, 23 Feb 2008 11:21:33 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Brooks Davis In-Reply-To: <20080223194047.GB38485@lor.one-eyed-alien.net> Message-ID: <20080223111659.K920@desktop> References: <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> <20080222231245.GA28788@lor.one-eyed-alien.net> <20080222134923.M920@desktop> <20080223194047.GB38485@lor.one-eyed-alien.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2008 21:20:24 -0000 On Sat, 23 Feb 2008, Brooks Davis wrote: > On Fri, Feb 22, 2008 at 01:52:54PM -1000, Jeff Roberson wrote: >> On Fri, 22 Feb 2008, Brooks Davis wrote: >> >>> On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote: >>>> >>>> On Thu, 21 Feb 2008, Robert Watson wrote: >>>> >>>>> On Wed, 20 Feb 2008, Jeff Roberson wrote: > >>>>> - It would be nice to be able to use CPU sets in jail as well, >>>>> suggesting >>>>> a >>>>> hierarchal model with some sort of tagging so you know what CPU sets >>>>> were >>>>> created in a jail such that you know whether they can be changed in a >>>>> jail. >>>>> While I recognize this makes things a lot more tricky, I think we >>>>> should >>>>> basically be planning more carefully with respect to virtualization >>>>> when >>>>> we >>>>> add new interfaces, since it's a widely used feature, and the current >>>>> set >>>>> of >>>>> "stragglers" unsupported in Jail is growing rather than shrinking. >>>> >>>> I have implemented a hierarchical model. Each thread has a pointer to >>>> the >>>> cpuset that it's in. If it makes a local modification via setaffinity() >>>> it >>>> gets an anonymous cpuset that is a child of the set assigned to the >>>> process. This anonymous set will also be inherited across fork/thread >>>> creation. >>>> >>>> In this model presently there are nodes marked as root. To query the >>>> 'system' cpus available we walk up from the current node until we find a >>>> root. These are the 'system' set. A thread may not break out of its >>>> system set. A process may join the root set but it may not modify a root >>>> that is a parent. Jails would create a new root. A process outside of >>>> the >>>> jail can modify the set of processors in the jail but a process within >>>> the >>>> jail/root may not. >>>> >>>> The next level down from the root is the assigned set. The root may be >>>> an >>>> assigned set or this may be a subset of the root. Processes may create >>>> sets which are parented back to their root and may include any processors >>>> within their root. The mask of the assigned set is returned as >>>> 'available' >>>> processors. >>>> >>>> This gives a 1 to 3 level hierarchy. The root, an assigned set, and an >>>> anonymous set. Any of these but the root may be omitted. There is no >>>> current way for userland to create subsets of assigned sets to permit >>>> further nesting. I'm not sure I see value in it right now and it gives >>>> the >>>> possibility of unbound tree depth. >>>> >>>> Anonymous sets are immutable as they are shared and changes only apply to >>>> the thread/pid in the WHICH argument and not others which have inherited >>>> from it. Anonymous sets have no id and may not be specifically >>>> manipulated >>>> via a setid. You must refer to the process/thread. From the >>>> administration point of view they don't exist. >>>> >>>> When a set is modified we walk down the children recursively and apply >>>> the >>>> new mask. This is done with a global set lock under which all >>>> modifications and tree operations are performed. The td_cpuset pointer >>>> is >>>> protected under the thread_lock() and may read the set without a lock. >>>> This >>>> gives the possibility for certain kinds of races but I believe they are >>>> all >>>> safe. >>>> >>>> Hopefully I explained that well enough for people to follow. I realize >>>> it's a lot of text but it's fairly simple book keeping code. This is all >>>> implemented and I'm debugging now. >>> >>> One place I'd like to implement CPU affinity is in the Sun Grid Engine >>> execution daemon. I think anonymous set would not be sufficent there >>> because the model allows new tasks to be started on a particular node at >>> any time during a parallel job. I'd have to do some more digging in the >>> code to be entierly certain. I think the less limits we place on the >>> hierarchy, the better off we'll be unless there are compeling complexity >>> reasons to avoid them. >> >> With the anonymous set you can bind any thread to any cpu that is visible >> to it. How would this not work? > > I'm still trying to wrap my head around the anonymous sets. Is the idea > that once you are in an anonymous set, you can't expand it, or can you > expand out as far as the assigned set? I'd like for parallel jobs to > be allocated a set of cpus that they can't change, but still be able > to make their own decisions about thread affinity if they desire (for > example OpenMPI has some support for this so processes stay put and in > theory benefit from positive cache effects). If that's feasible in > this model, I'm happy ok it. I think we should keep in mind that these > SGE execution daemons might be sitting inside jails. ;-) Ah, when I said the anonymous sets were immutable, that only means that they are copy-on-write. Because you can't know who shares a copy via fork or thread creation you must make a new set each time you write. I made the anonymous sets so that the parent would have a list of all derivative children sets so that modifications to the parent would be reflected in the child. This also means that the scheduler only has to look at one bitmap to determine the available cpus for a thread. > >>>>> - There's still no way to specify an affinity policy rather than >>>>> explicit >>>>> affinity, but if our CPU set model is sufficiently general, that might >>>>> be >>>>> a >>>>> vehicle to do that. I.e., cpuset_setpolicy() rather than setting a >>>>> mask. >>>> >>>> Yes, I think this is orthogonal and can be addressed seperately. I'm not >>>> sure how many userland programs are smart enough or even capable of >>>> making >>>> determinations about their cache behavior however. We should open >>>> another >>>> discussion once this one is done. >>>> >>>>> >>>>> - In the interests of boring API changes, recent APIs tend to prefix the >>>>> method on the object name. Have you thought about cpuset_create(), >>>>> cpuset_foo(), etc? That reduces the chances of interfering with >>>>> application >>>>> namespaces. I think, anyway. :-). >>>> >>>> Yes, I prefer that as well, as I mentioned syscalls tended to favor >>>> brevity. I'm fine with changing that trend. >>>> >>>>> >>>>> I need to ponder the proposal a little more, ideally over a hot beverage >>>>> this morning, and will follow up if I have further thoughts. Thanks for >>>>> working on this, BTW -- affinity is well-overdue for FreeBSD. >>>> >>>> A little more to ponder now! Your feedback is much appreciated. >>>> >>>> I believe the present hierarchical model satisfies the jail requirements >>>> of >>>> restricting cpus in the jail while still allowing the jail to create >>>> sets. >>>> >>>> The unanswered questions are: >>>> >>>> 1) What to do about sets that strand threads, options described above. >>>> 2) Are people ok with the transient nature of sets? >>>> 3) Does anyone want to help with man pages, administrative tools, etc? >>>> I >>>> have a prototype tool called 'cpuset' that fully exercises the api but is >>>> probably ugly. Will post details soon. >>> >>> I could help with some of this as it furthers a funded project at work. >> >> I will provide patches soon. It would be great to have a developer with a >> users perspective to look at some of the details and especially the >> administration side of things. I think someone else has offered to help >> with man pages but I need to double check. > > Cool. If you can get some basics out by late Sunday afternoon (CST) I > should be able to look at it and think about it on the plane Monday. I can definitely do that. I'm just debugging now. > > -- Brooks > From owner-freebsd-arch@FreeBSD.ORG Sat Feb 23 21:35:19 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A56916A404; Sat, 23 Feb 2008 21:35:19 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 1A32113C459; Sat, 23 Feb 2008 21:35:18 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m1NLZ7AB040388; Sat, 23 Feb 2008 15:35:07 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id m1NLZ73l040387; Sat, 23 Feb 2008 15:35:07 -0600 (CST) (envelope-from brooks) Date: Sat, 23 Feb 2008 15:35:07 -0600 From: Brooks Davis To: Jeff Roberson Message-ID: <20080223213507.GD39699@lor.one-eyed-alien.net> References: <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> <20080220213253.A920@desktop> <20080221092011.J52922@fledge.watson.org> <20080222121253.N920@desktop> <20080222231245.GA28788@lor.one-eyed-alien.net> <20080222134923.M920@desktop> <20080223194047.GB38485@lor.one-eyed-alien.net> <20080223111659.K920@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1sNVjLsmu1MXqwQ/" Content-Disposition: inline In-Reply-To: <20080223111659.K920@desktop> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Sat, 23 Feb 2008 15:35:07 -0600 (CST) Cc: Daniel Eischen , arch@freebsd.org, Robert Watson , David Xu , Andrew Gallatin Subject: Re: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2008 21:35:19 -0000 --1sNVjLsmu1MXqwQ/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Feb 23, 2008 at 11:21:33AM -1000, Jeff Roberson wrote: >=20 > On Sat, 23 Feb 2008, Brooks Davis wrote: >=20 >> On Fri, Feb 22, 2008 at 01:52:54PM -1000, Jeff Roberson wrote: >>> On Fri, 22 Feb 2008, Brooks Davis wrote: >>>=20 >>>> On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote: >>>>>=20 >>>>> On Thu, 21 Feb 2008, Robert Watson wrote: >>>>>=20 >>>>>> On Wed, 20 Feb 2008, Jeff Roberson wrote: >>=20 >>>>>> - It would be nice to be able to use CPU sets in jail as well, >>>>>> suggesting >>>>>> a >>>>>> hierarchal model with some sort of tagging so you know what CPU sets >>>>>> were >>>>>> created in a jail such that you know whether they can be changed in= a >>>>>> jail. >>>>>> While I recognize this makes things a lot more tricky, I think we >>>>>> should >>>>>> basically be planning more carefully with respect to virtualization >>>>>> when >>>>>> we >>>>>> add new interfaces, since it's a widely used feature, and the curre= nt >>>>>> set >>>>>> of >>>>>> "stragglers" unsupported in Jail is growing rather than shrinking. >>>>>=20 >>>>> I have implemented a hierarchical model. Each thread has a pointer to >>>>> the >>>>> cpuset that it's in. If it makes a local modification via=20 >>>>> setaffinity() >>>>> it >>>>> gets an anonymous cpuset that is a child of the set assigned to the >>>>> process. This anonymous set will also be inherited across fork/thread >>>>> creation. >>>>>=20 >>>>> In this model presently there are nodes marked as root. To query the >>>>> 'system' cpus available we walk up from the current node until we fin= d=20 >>>>> a >>>>> root. These are the 'system' set. A thread may not break out of its >>>>> system set. A process may join the root set but it may not modify a= =20 >>>>> root >>>>> that is a parent. Jails would create a new root. A process outside = of >>>>> the >>>>> jail can modify the set of processors in the jail but a process within >>>>> the >>>>> jail/root may not. >>>>>=20 >>>>> The next level down from the root is the assigned set. The root may = be >>>>> an >>>>> assigned set or this may be a subset of the root. Processes may crea= te >>>>> sets which are parented back to their root and may include any=20 >>>>> processors >>>>> within their root. The mask of the assigned set is returned as >>>>> 'available' >>>>> processors. >>>>>=20 >>>>> This gives a 1 to 3 level hierarchy. The root, an assigned set, and an >>>>> anonymous set. Any of these but the root may be omitted. There is no >>>>> current way for userland to create subsets of assigned sets to permit >>>>> further nesting. I'm not sure I see value in it right now and it giv= es >>>>> the >>>>> possibility of unbound tree depth. >>>>>=20 >>>>> Anonymous sets are immutable as they are shared and changes only appl= y=20 >>>>> to >>>>> the thread/pid in the WHICH argument and not others which have=20 >>>>> inherited >>>>> from it. Anonymous sets have no id and may not be specifically >>>>> manipulated >>>>> via a setid. You must refer to the process/thread. From the >>>>> administration point of view they don't exist. >>>>>=20 >>>>> When a set is modified we walk down the children recursively and apply >>>>> the >>>>> new mask. This is done with a global set lock under which all >>>>> modifications and tree operations are performed. The td_cpuset point= er >>>>> is >>>>> protected under the thread_lock() and may read the set without a lock. >>>>> This >>>>> gives the possibility for certain kinds of races but I believe they a= re >>>>> all >>>>> safe. >>>>>=20 >>>>> Hopefully I explained that well enough for people to follow. I reali= ze >>>>> it's a lot of text but it's fairly simple book keeping code. This is= =20 >>>>> all >>>>> implemented and I'm debugging now. >>>>=20 >>>> One place I'd like to implement CPU affinity is in the Sun Grid Engine >>>> execution daemon. I think anonymous set would not be sufficent there >>>> because the model allows new tasks to be started on a particular node = at >>>> any time during a parallel job. I'd have to do some more digging in t= he >>>> code to be entierly certain. I think the less limits we place on the >>>> hierarchy, the better off we'll be unless there are compeling complexi= ty >>>> reasons to avoid them. >>>=20 >>> With the anonymous set you can bind any thread to any cpu that is visib= le >>> to it. How would this not work? >>=20 >> I'm still trying to wrap my head around the anonymous sets. Is the idea >> that once you are in an anonymous set, you can't expand it, or can you >> expand out as far as the assigned set? I'd like for parallel jobs to >> be allocated a set of cpus that they can't change, but still be able >> to make their own decisions about thread affinity if they desire (for >> example OpenMPI has some support for this so processes stay put and in >> theory benefit from positive cache effects). If that's feasible in >> this model, I'm happy ok it. I think we should keep in mind that these >> SGE execution daemons might be sitting inside jails. ;-) >=20 > Ah, when I said the anonymous sets were immutable, that only means that= =20 > they are copy-on-write. Because you can't know who shares a copy via for= k=20 > or thread creation you must make a new set each time you write. >=20 > I made the anonymous sets so that the parent would have a list of all=20 > derivative children sets so that modifications to the parent would be=20 > reflected in the child. This also means that the scheduler only has to= =20 > look at one bitmap to determine the available cpus for a thread. I think the anonymous sets seem like a good idea. On solution to my problem might be to make changing your current set to be something that is not a subset of your parent (or maybe your current set?) is privileged. -- Brooks --1sNVjLsmu1MXqwQ/ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHwJGKXY6L6fI4GtQRAl3iAKDXYMD6U6rx87OVqGsDfQgQk/GVfACfXlra EDNQLEYWfYoI6H5v7YsDBWM= =YC+R -----END PGP SIGNATURE----- --1sNVjLsmu1MXqwQ/--