Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Feb 2008 23:55:22 -1000 (HST)
From:      Jeff Roberson <jroberson@chesapeake.net>
To:        Daniel Eischen <deischen@freebsd.org>
Cc:        arch@freebsd.org, Robert Watson <rwatson@freebsd.org>, Andrew Gallatin <gallatin@cs.duke.edu>
Subject:   Re: Linux compatible setaffinity.
Message-ID:  <20080219234101.D920@desktop>
In-Reply-To: <20080112194521.I957@desktop>
References:  <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <Pine.GSO.4.64.0801122240510.15683@sea.ntplx.net> <20080112194521.I957@desktop>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 12 Jan 2008, Jeff Roberson wrote:

> On Sat, 12 Jan 2008, Daniel Eischen wrote:
>
>> On Sat, 12 Jan 2008, Jeff Roberson wrote:
>> 
>>> Now, there is one problem with the linux api that I want to discuss before 
>>> I commit it.  The current patch always works on curthread.  However, the 
>>> api allows for setting the binding of a pid.  I believe, although I'm not 
>>> certain, that pids and tids in linux are in the same number space.  It's 
>>> not clear to me whether you can set an affinity for an entire process and 
>>> have it effect an individual thread or whether you set it on a thread by 
>>> thread basis.  When supplying a non-curproc pid do you bind all threads in 
>>> the target process?
>>> 
>>> Are our tids and pids in the same number space?  And are they available to 
>>> application programmers?  I haven't followed that very carefully.
>> 
>> I believe marcel made tids and pids disjoint so that any pid is
>> never equal to any tid.  But regardless, I don't think we want
>> to rely on that.  I would prefer the Solaris approach of specifying
>> what we want (pid, tid, jail id, etc) as an argument in the API
>> so there is no confusion.
>
> Yes, I would prefer that as well I believe.  So I'll add an extra parameter 
> and in the linux code we'll use whatever their default is.  Of course the 
> initial implementation will still only support curthread but I plan on 
> finishing the rest before 8.0 is done.

So what does everyone think of something like this:

int cpuaffinity(int cmd, long which, int masksize, unsigned *mask);

#define AFFINITY_GET	0x1
#define	AFFINITY_SET	0x2
#define	AFFINITY_PID	0x4
#define	AFFINITY_TID	0x8

I'm not married to any of these names.  If you know of something that 
would be more regular please comment.

Behavior according to flags would be as such:

Get or set affinity and fetch from or store into mask.  Error if mask is 
not large enough.  Fill with zeros if it's too large.

If pid is specified on set all threads in the pid are set to the requested 
affinity.  On get it doesn't make much sense but I guess I'll make it the 
union of all threads affinities.

If tid is specified the mask applies only to the requested tid.

The mask is always inherited from the creating thread and propagates on 
fork().

I have these semantics implemented and appearing to work in ULE.  I can 
implement them in 4BSD but it will be very inefficient in some edge cases 
since each cpu doesn't have its own run queue.

Binding and pinning are still both supported via the same kernel 
interfaces as they were.  They are considered to override user specified 
affinity.  This means the kernel can temporarily bind  a thread to a cpu 
that it does not have affinity for.  I may add an assert to verify that 
we never leave the kernel with binding still set so userspace sees only 
the cpus it requests.

The thread's affinity is stored in a cpumask variable in the thread 
structure.  If someone wanted to implement restricting a jail to a 
particular cpu they could add an affinity cmd that would walk all 
processes belonging to a jail and restrict their masks appropriately. 
You'd also want to check a jail mask on each call to affinity().

Linux sched_setaffinity() should be a subset of this functionality and 
thus easily support.

Comments appreciated.  This will go in late next week.

Thanks,
Jeff

>
> Jeff
>
>> 
>> -- 
>> DE
>> 
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080219234101.D920>