From owner-freebsd-arch@FreeBSD.ORG Thu Feb 21 07:45:19 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEABF16A405; Thu, 21 Feb 2008 07:45:19 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 975B013C442; Thu, 21 Feb 2008 07:45:19 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m1L7iYiF013941; Thu, 21 Feb 2008 02:44:35 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 20 Feb 2008 21:45:44 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: David Xu , Daniel Eischen , Robert Watson , Andrew Gallatin In-Reply-To: <20080220175532.Q920@desktop> Message-ID: <20080220213253.A920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org> <20080220005030.Y920@desktop> <20080220105333.G44565@fledge.watson.org> <47BCEFDB.5040207@freebsd.org> <20080220175532.Q920@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: getaffinity/setaffinity and cpu sets. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2008 07:45:20 -0000 I have the following api working: /* * Parameters for the level argument to getaffinity. */ #define CPU_LEVEL_SYS 1 /* All system cpus. */ #define CPU_LEVEL_AVAIL 2 /* Available cpus for which. */ #define CPU_LEVEL_WHICH 3 /* Actual mask for which. */ /* * Parameters for the which argument to {get,set}affinity. */ #define CPU_WHICH_TID 1 /* Specifies a thread id. */ #define CPU_WHICH_PID 2 /* Specifies a process id. */ #define CPU_WHICH_SET 3 /* Specifies a set id. */ Along with a CPU_CLR, CPU_COPY, CPU_ISSET, CPU_SET, CPU_ZERO for manipulating the sets. int getaffinity(int level, int which, int id, int cpusetsize, long *mask); int setaffinity(int which, int id, int cpusetsize, long *mask); The get call has a notion of 'level' which allows us to fetch different masks. The system set is all processors in the system. The available set is the set of cpus available to the tid/pid in the 'which' argument. An application would fetch the avail set and then potentially reduce it. The setaffinity call doesn't have a level because the avail/sys sets are immutable. You can only set things which can be specified by the which argument. I also have a 'cpuset' command which can run a new program with a given cpu set, view and modify sets of arbitrary pids. This is all working and I can supply patches if anyone is interested. I have to implement 4BSD support before I can commit. I have a proposal for solaris style processor sets which I think is simple and sufficient for most cases. It involves the following new syscalls: int cpuset(void); int setcpuset(pid_t pid, int setid); int getcpuset(pid_t pid); The notion would be that you can create a new numbered cpuset with cpuset(). You can modify or inspect its affinity with get/setaffinity above and the CPU_WHICH_SET argument. The cpuset exists as long as there are members of the set. Sort of like a process group or session. The {get,set}cpuset calls can inspect or modify the state. This set would not be modifiable by user processes or by processes in a jail. It would create the restriction that differs between 'avail' and 'sys' above. Processors would be able to directly bind to any processor within the set. Changing the set would apply to all processes in the set. The cpuset would be per-process while the mask is per-thread. Sets involvement is inherited on fork(). In solaris sets can be named and have a more complete management api. I'm not really interested in implementing all of that but I believe what I have outlined here would be subset of this and no code/syscalls would be wasted. Comments? Objections? I'm fairly pleased with this arrangement now. Thanks, Jeff