Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Feb 2008 00:53:26 -1000 (HST)
From:      Jeff Roberson <jroberson@chesapeake.net>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        Daniel Eischen <deischen@FreeBSD.org>, arch@FreeBSD.org, Andrew Gallatin <gallatin@cs.duke.edu>
Subject:   Re: Linux compatible setaffinity.
Message-ID:  <20080220005030.Y920@desktop>
In-Reply-To: <20080220101348.D44565@fledge.watson.org>
References:  <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <Pine.GSO.4.64.0801122240510.15683@sea.ntplx.net> <20080112194521.I957@desktop> <20080219234101.D920@desktop> <20080220101348.D44565@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 20 Feb 2008, Robert Watson wrote:

>
> On Tue, 19 Feb 2008, Jeff Roberson wrote:
>
>>> Yes, I would prefer that as well I believe.  So I'll add an extra 
>>> parameter and in the linux code we'll use whatever their default is.  Of 
>>> course the initial implementation will still only support curthread but I 
>>> plan on finishing the rest before 8.0 is done.
>> 
>> So what does everyone think of something like this:
>> 
>> int cpuaffinity(int cmd, long which, int masksize, unsigned *mask);
>> 
>> #define AFFINITY_GET	0x1
>> #define	AFFINITY_SET	0x2
>> #define	AFFINITY_PID	0x4
>> #define	AFFINITY_TID	0x8
>> 
>> I'm not married to any of these names.  If you know of something that would 
>> be more regular please comment.
>> 
>> Behavior according to flags would be as such:
>> 
>> Get or set affinity and fetch from or store into mask.  Error if mask is 
>> not large enough.  Fill with zeros if it's too large.
>> 
>> If pid is specified on set all threads in the pid are set to the requested 
>> affinity.  On get it doesn't make much sense but I guess I'll make it the 
>> union of all threads affinities.
>> 
>> If tid is specified the mask applies only to the requested tid.
>> 
>> The mask is always inherited from the creating thread and propagates on 
>> fork().
>> 
>> I have these semantics implemented and appearing to work in ULE.  I can 
>> implement them in 4BSD but it will be very inefficient in some edge cases 
>> since each cpu doesn't have its own run queue.
>> 
>> Binding and pinning are still both supported via the same kernel interfaces 
>> as they were.  They are considered to override user specified affinity. 
>> This means the kernel can temporarily bind a thread to a cpu that it does 
>> not have affinity for.  I may add an assert to verify that we never leave 
>> the kernel with binding still set so userspace sees only the cpus it 
>> requests.
>> 
>> The thread's affinity is stored in a cpumask variable in the thread 
>> structure.  If someone wanted to implement restricting a jail to a 
>> particular cpu they could add an affinity cmd that would walk all processes 
>> belonging to a jail and restrict their masks appropriately. You'd also want 
>> to check a jail mask on each call to affinity().
>> 
>> Linux sched_setaffinity() should be a subset of this functionality and thus 
>> easily support.
>> 
>> Comments appreciated.  This will go in late next week.
>
> A few thoughts:
>
> - It would be good to have an interface to request what CPUs are available to
>  use, not just what CPUs are in use.
>
> - It would be useful to have a way to have an availability mask for what CPUs
>  the thread/process is allowed to use.
>
> The former is simply useful for applications -- in using your previous patch, 
> one immediate question you want to ask as an application programmer is "tell 
> me what CPUs are available so I can figure out how to distribute work, how 
> many threads to start, where to bind them, etc".  The latter is useful for 
> system administrators, who may want to say things like "Start apache with the 
> following mask of CPUs, and let Apache determine its policy with respect to 
> that bound as though the other CPUs don't exist".  It could also be used to 
> create a jail bound.
>
> So perhaps this means a slightly more complex API, but not much more complex. 
> How about:
>
> int cpuaffinity_get(scope, id, length, mask)
> int cpuaffinity_getmax(scope, id, length, mask)
> int cpuaffinity_set(scope, id, length, mask)
> int cpuaffinity_setmax(scope, id, length, mask)
>
> Scope would be something on the order of process (representing individual 
> processes or process groups, potentially), id would be the id in that scope 
> namespace, length and mask would be as you propose.  You could imagine adding 
> a further field to indicate whether it's the current affinity or the maximum 
> affinity, but I'm not sure the details matter all that much.  Here might be 
> some application logic, though:

Well I'm not sure about the max.  How about just a cpuaffinity_get with a 
scope that specifies what cpus are available to you?  If the set is 
restricted by a jail or some other mechanism it would be returned in 
avail.  Otherwise all cpus would be returned.  The thread probably 
wouldn't directly mainpulate its max, rather it would be set by changing 
the jail or cpu group it belonged to.

Jeff

>
> 	cpumask_t max;
> 	int cpu, i;
>
> 	(void)cpuaffinity_getmax(CMASK_PROC, getpid(), &max, sizeof(max));
> 	for (i = 0; i < CMASK_CPUCOUNT(&max); i++) {
> 		cpu = CMASK_CPUINDEX(&max, i);
> 		/* Start a thread, bind it to 'cpu'. */
> 		/* Or, migrate CPUs sequentially looking at data. */
> 	}
>
> In the balance between all-doing system calls and multiple system calls, this 
> also makes me a bit happier, and it's not an entirely aesthetic concern. 
> Differentiating get and set methods is fairly useful for tracking down 
> problems when debugging, or if doing things like masking process system calls 
> for security reasons.
>
> There are two things I like from the other systems that I don't believe this 
> captures well:
>
> (1) The solaris notion of CPU sets, so that policy can be expressed in terms
>    of a global CPU set namespace administered by the system administrator.
>    I.e., create a CPU set "Apache", then use a tool to modify the set at
>    runtime.
>
> (2) The Darwin notion of defining CPU use policy rather than masks -- i.e., 
> "I
>    don't care what CPU it is, but run these threads on the same CPU", or 
> "the
>    same core", etc.
>
> I'm happy for us to move ahead with the lower level interface you've defined 
> without addressing these concerns, but I think we should be keeping them in 
> mind as well.
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080220005030.Y920>