From owner-freebsd-arch@FreeBSD.ORG Mon Dec 13 00:30:13 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5996B1065694; Mon, 13 Dec 2010 00:30:13 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms173007pub.verizon.net (vms173007pub.verizon.net [206.46.173.7]) by mx1.freebsd.org (Postfix) with ESMTP id 3BB548FC08; Mon, 13 Dec 2010 00:30:12 +0000 (UTC) Received: from verizon.net ([unknown] [173.54.27.21]) by vms173007.mailsrvcs.net (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009)) with ESMTPA id <0LDC000BO9A6LHL1@vms173007.mailsrvcs.net>; Sun, 12 Dec 2010 17:30:06 -0600 (CST) Sender: root Message-id: <4D052B3C.29454AC@verizon.net> Date: Sun, 12 Dec 2010 15:06:20 -0500 From: Sergey Babkin X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 4.7-RELEASE i386) X-Accept-Language: en, ru MIME-version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2010 00:30:13 -0000 John Baldwin wrote: > > The current layout breaks up the global thread priority space (0 - 255) into a > couple of bands: > > 0 - 63 : interrupt threads > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > 128 - 159 : real-time user threads (rtprio) > 160 - 223 : time-sharing user threads > 224 - 255 : idle threads (idprio and kernel idle procs) > > If we decide to change the behavior I see two possible fixes: > > 1) (easy) just move the real-time priority range above the kernel sleep > priority range Would not this cause a priority inversion when an RT process enters the kernel mode? -SB From owner-freebsd-arch@FreeBSD.ORG Mon Dec 13 14:38:17 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F59410656AB for ; Mon, 13 Dec 2010 14:38:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1F9948FC29 for ; Mon, 13 Dec 2010 14:38:17 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id BEFE146B35; Mon, 13 Dec 2010 09:38:16 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F39648A01D; Mon, 13 Dec 2010 09:38:15 -0500 (EST) From: John Baldwin To: Sergey Babkin Date: Mon, 13 Dec 2010 09:27:26 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <4D052B3C.29454AC@verizon.net> In-Reply-To: <4D052B3C.29454AC@verizon.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012130927.26815.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 13 Dec 2010 09:38:16 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2010 14:38:17 -0000 On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > John Baldwin wrote: > > > > The current layout breaks up the global thread priority space (0 - 255) into a > > couple of bands: > > > > 0 - 63 : interrupt threads > > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > > 128 - 159 : real-time user threads (rtprio) > > 160 - 223 : time-sharing user threads > > 224 - 255 : idle threads (idprio and kernel idle procs) > > > > If we decide to change the behavior I see two possible fixes: > > > > 1) (easy) just move the real-time priority range above the kernel sleep > > priority range > > Would not this cause a priority inversion when an RT process > enters the kernel mode? How so? Note that timesharing threads are not "bumped" to a kernel sleep priority when they enter the kernel either. The kernel sleep priorities are purely a way for certain sleep channels to cause a thread to be treated as interactive and give it a priority boost to favor interactive threads. Threads in the kernel do not automatically have higher priority than threads not in the kernel. Keep in mind that all stopped threads (threads not executing) are always in the kernel when they stop. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Dec 13 14:38:18 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81B9D1065697; Mon, 13 Dec 2010 14:38:18 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 51F8C8FC2E; Mon, 13 Dec 2010 14:38:18 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id F168E46B03; Mon, 13 Dec 2010 09:38:17 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E28808A027; Mon, 13 Dec 2010 09:38:16 -0500 (EST) From: John Baldwin To: David Xu Date: Mon, 13 Dec 2010 09:37:46 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <4D02D90C.20503@freebsd.org> In-Reply-To: <4D02D90C.20503@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012130937.46666.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 13 Dec 2010 09:38:17 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2010 14:38:18 -0000 On Friday, December 10, 2010 8:51:08 pm David Xu wrote: > John Baldwin wrote: > > So I finally had a case today where I wanted to use rtprio but it doesn't seem > > very useful in its current state. Specifically, I want to be able to tag > > certain user processes as being more important than any other user processes > > even to the point that if one of my important processes blocks on a mutex, the > > owner of that mutex should be more important than sshd being woken up from > > sbwait by new data (for example). This doesn't work currently with rtprio due > > to the way the priorities are laid out (and I believe I probably argued for > > the current layout back when it was proposed). > > > > The current layout breaks up the global thread priority space (0 - 255) into a > > couple of bands: > > > > 0 - 63 : interrupt threads > > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > > 128 - 159 : real-time user threads (rtprio) > > 160 - 223 : time-sharing user threads > > 224 - 255 : idle threads (idprio and kernel idle procs) > > > > The problem I am running into is that when a time-sharing thread goes to sleep > > in the kernel (waiting on select, socket data, tty, etc.) it actually ends up > > in the kernel priorities range (64 - 127). This means when it wakes up it > > will trump (and preempt) a real-time user thread even though these processes > > nominally have a priority down in the 160 - 223 range. We do drop the kernel > > sleep priority during userret(), but we don't recheck the scheduler queues to > > see if we should preempt the thread during userret(), so it effectively runs > > with the kernel sleep priority for the rest of the quantum while it is in > > userland. > > > > My first question is if this behavior is the desired behavior? Originally I > > think I preferred the current layout because I thought a thread in the kernel > > should always have priority so it can release locks, etc. However, priority > > propagation should actually handle the case of some very important thread > > needing a lock. In my use case today where I actually want to use rtprio I > > think I want different behavior where the rtprio thread is more important than > > the thread waking up with PSOCK, etc. > > > > If we decide to change the behavior I see two possible fixes: > > > > 1) (easy) just move the real-time priority range above the kernel sleep > > priority range > > > > > > This is not always correct, a userland realtime process may not be > always more > urgent than a normal time-sharing code which is backing up a file system > or doing > some important things, for example receiving money account from a socket. Err, no. When a user has indicated that a process is rtprio, we should assume that it is _always_ more important than a time-sharing process. The sole exclusion to this is when lending priority and that purpose is to let the rtprio (or ithread) run as soon as possible anyway. > Process sleeping in kernel seems doing really important thing, for example > removing data from a device interrupt or writing into device, while a thread > which is realtime consuming 100% cpu time might be a deadloop thread. Note that rtprio requires root to enable, so if the user wants to rtprio a buggy process that is their problem. However, with our current system there is _no_ way for me to ensure that my very important process doesn't get preempted by a new sshd process. Or more accurately, even if my very important process has a dedicated CPU via cpuset but it blocks on a mutex, the priority it lends to the owner of that mutex is not sufficient to prevent the lock holder from being preempted by sshd when a new ssh connection arrives. The fact that ULE uses RT priority levels for interactive threads is also problematic for me since even with RT moved above kernel sleep levels sshd still ends up with an effective real-time priority of 0. > > 2) (harder) make sched_userret() check the run queue to see if it should > > preempt when dropping the kernel sleep priority. I think bde@ has suggested > > that we should do this for correctness previously (and I've had some old, > > unfinished patches to do this in a branch in p4 for several years). > > > > > This is too overhead, try it and benchmark it for real world application. It depends on your real world application, but I do lean towards (1) as I think (2) is too expensive. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Dec 13 14:38:19 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D31710656A4 for ; Mon, 13 Dec 2010 14:38:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4E42F8FC19 for ; Mon, 13 Dec 2010 14:38:19 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id F3F2446B35; Mon, 13 Dec 2010 09:38:18 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0994E8A009; Mon, 13 Dec 2010 09:38:18 -0500 (EST) From: John Baldwin To: Peter Jeremy Date: Mon, 13 Dec 2010 09:38:04 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <20101211213918.GB21959@server.vk2pj.dyndns.org> In-Reply-To: <20101211213918.GB21959@server.vk2pj.dyndns.org> MIME-Version: 1.0 Message-Id: <201012130938.04746.jhb@freebsd.org> Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 13 Dec 2010 09:38:18 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2010 14:38:19 -0000 On Saturday, December 11, 2010 4:39:18 pm Peter Jeremy wrote: > On 2010-Dec-10 10:50:45 -0500, John Baldwin wrote: > >The problem I am running into is that when a time-sharing thread goes to sleep > >in the kernel (waiting on select, socket data, tty, etc.) it actually ends up > >in the kernel priorities range (64 - 127). This means when it wakes up it > >will trump (and preempt) a real-time user thread even though these processes > >nominally have a priority down in the 160 - 223 range. We do drop the kernel > >sleep priority during userret(), but we don't recheck the scheduler queues to > >see if we should preempt the thread during userret(), so it effectively runs > >with the kernel sleep priority for the rest of the quantum while it is in > >userland. > > This may also explain the situation I'm seeing where idprio processes > are receiving more than "idle" time (see "idprio processes slowing > down system" in -hackers). Yes, this likely does explain that. We could fix that for just idle priority class threads in sched_userret() easily enough (just call mi_switch() if td->td_pri_class == PRI_CLASS_IDLE). > >2) (harder) make sched_userret() check the run queue to see if it should > >preempt when dropping the kernel sleep priority. > > IMHO, this is the "correct" solution but that needs to be tempered by > the additional overhead this might incur. The other concern (and reason I lean towards 1) is that this is also a feature that we depend on to favor interactive threads over compute-bound threads. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Dec 14 01:30:23 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1D25D106566C; Tue, 14 Dec 2010 01:30:23 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E50F58FC16; Tue, 14 Dec 2010 01:30:22 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBE1ULlS013221; Tue, 14 Dec 2010 01:30:22 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D06C8B0.1060409@freebsd.org> Date: Tue, 14 Dec 2010 09:30:24 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <4D052B3C.29454AC@verizon.net> <201012130927.26815.jhb@freebsd.org> In-Reply-To: <201012130927.26815.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Dec 2010 01:30:23 -0000 John Baldwin wrote: > On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >> John Baldwin wrote: >>> The current layout breaks up the global thread priority space (0 - 255) > into a >>> couple of bands: >>> >>> 0 - 63 : interrupt threads >>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>> 128 - 159 : real-time user threads (rtprio) >>> 160 - 223 : time-sharing user threads >>> 224 - 255 : idle threads (idprio and kernel idle procs) >>> >>> If we decide to change the behavior I see two possible fixes: >>> >>> 1) (easy) just move the real-time priority range above the kernel sleep >>> priority range >> Would not this cause a priority inversion when an RT process >> enters the kernel mode? > > How so? Note that timesharing threads are not "bumped" to a kernel sleep > priority when they enter the kernel either. The kernel sleep priorities are > purely a way for certain sleep channels to cause a thread to be treated as > interactive and give it a priority boost to favor interactive threads. > Threads in the kernel do not automatically have higher priority than threads > not in the kernel. Keep in mind that all stopped threads (threads not > executing) are always in the kernel when they stop. I have requirement to make a thread running in kernel has more higher priority over a thread running userland code, because our kernel mutex is not sleepable which does not like Solaris did, I have to use semaphore like code in kern_umtx.c to lock a chain, which allows me to read and write user address space, this is how umtxq_busy() did, but it does not prevent a userland thread from preempting a thread which locked the chain, if a realtime thread preempts a thread locked the chain, it may lock up whole processes using pthread. I think our realtime scheduling is not very useful, it is too easy to lock up system. Regards, David Xu From owner-freebsd-arch@FreeBSD.ORG Tue Dec 14 04:02:08 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49A52106564A; Tue, 14 Dec 2010 04:02:08 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms173001pub.verizon.net (vms173001pub.verizon.net [206.46.173.1]) by mx1.freebsd.org (Postfix) with ESMTP id 29D058FC0A; Tue, 14 Dec 2010 04:02:07 +0000 (UTC) Received: from verizon.net ([unknown] [173.54.27.21]) by vms173001.mailsrvcs.net (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009)) with ESMTPA id <0LDE006ZLGILU7V2@vms173001.mailsrvcs.net>; Mon, 13 Dec 2010 22:01:41 -0600 (CST) Sender: root Message-id: <4D06BC5D.E573E3F1@verizon.net> Date: Mon, 13 Dec 2010 19:37:49 -0500 From: Sergey Babkin X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 4.7-RELEASE i386) X-Accept-Language: en, ru MIME-version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <4D052B3C.29454AC@verizon.net> <201012130927.26815.jhb@freebsd.org> Content-type: text/plain; charset=koi8-r Content-transfer-encoding: 7bit Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Dec 2010 04:02:08 -0000 John Baldwin wrote: > > On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > > John Baldwin wrote: > > > > > > The current layout breaks up the global thread priority space (0 - 255) > into a > > > couple of bands: > > > > > > 0 - 63 : interrupt threads > > > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > > > 128 - 159 : real-time user threads (rtprio) > > > 160 - 223 : time-sharing user threads > > > 224 - 255 : idle threads (idprio and kernel idle procs) > > > > > > If we decide to change the behavior I see two possible fixes: > > > > > > 1) (easy) just move the real-time priority range above the kernel sleep > > > priority range > > > > Would not this cause a priority inversion when an RT process > > enters the kernel mode? > > How so? Note that timesharing threads are not "bumped" to a kernel sleep > priority when they enter the kernel either. The kernel sleep priorities are > purely a way for certain sleep channels to cause a thread to be treated as > interactive and give it a priority boost to favor interactive threads. > Threads in the kernel do not automatically have higher priority than threads > not in the kernel. Keep in mind that all stopped threads (threads not > executing) are always in the kernel when they stop. I may be a bit behind the times here. But historically the "default" process priority means the priority when the process was pre-empted. If it did a system call, the priority on wake up would be as specified in the sleep() kernel function (or its more modern analog, like a sleeplock or condition variable). This would let the kernel code react quickly, and then on return from the syscall revert to the original priority, and possibly get pre-empted by another process at that time. If the user-mode priority is higher than the kernel-mode priority, this would mean that once a high priority process does a system call (say for example, poll()), it would experience a priority inversion and sleep with a lower priority than specified. A fix for this should be fairly straightforward. The process structure has the RT priority in it, so all that sleep() need is to check it and use that priority if it's higher than the one given as an argument. Or optionally, for the RT processes bump the argument by how much the process'es RT priority is over the "RT baseline". (Well, "logically over", numerically under). This would not solve the more general classic priority inversion issue with some low-priority process grabbing some kernel resource and sleeping at a lower priority while then an RT process waits for this resource to be freed. I think the original idea of in-kernel processes having the higher priorities is in part an attempt to answer this problem. But I agree with you that letting the RT processes have a higher priority than the TS processes in the kernel mode is better than nothing. Um, a stupid question: does the signal() primitive on mutexes/condvars (i.e. "wake up one sleeper") pick the thread with the highest priority? I guess it should, since otherwise the classic priority inversion can get pretty bad. But you can tell from what I say, for how long I haven't looked at that code, so I don't know the answer. -SB From owner-freebsd-arch@FreeBSD.ORG Tue Dec 14 05:33:46 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04FAC1065670; Tue, 14 Dec 2010 05:33:46 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CD1C68FC12; Tue, 14 Dec 2010 05:33:45 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBE5XhQd089590; Tue, 14 Dec 2010 05:33:44 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D0701BA.9090401@freebsd.org> Date: Tue, 14 Dec 2010 13:33:46 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: Sergey Babkin References: <201012101050.45214.jhb@freebsd.org> <4D052B3C.29454AC@verizon.net> <201012130927.26815.jhb@freebsd.org> <4D06BC5D.E573E3F1@verizon.net> In-Reply-To: <4D06BC5D.E573E3F1@verizon.net> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Dec 2010 05:33:46 -0000 Sergey Babkin wrote: > John Baldwin wrote: >> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >>> John Baldwin wrote: >>>> The current layout breaks up the global thread priority space (0 - 255) >> into a >>>> couple of bands: >>>> >>>> 0 - 63 : interrupt threads >>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>>> 128 - 159 : real-time user threads (rtprio) >>>> 160 - 223 : time-sharing user threads >>>> 224 - 255 : idle threads (idprio and kernel idle procs) >>>> >>>> If we decide to change the behavior I see two possible fixes: >>>> >>>> 1) (easy) just move the real-time priority range above the kernel sleep >>>> priority range >>> Would not this cause a priority inversion when an RT process >>> enters the kernel mode? >> How so? Note that timesharing threads are not "bumped" to a kernel sleep >> priority when they enter the kernel either. The kernel sleep priorities are >> purely a way for certain sleep channels to cause a thread to be treated as >> interactive and give it a priority boost to favor interactive threads. >> Threads in the kernel do not automatically have higher priority than threads >> not in the kernel. Keep in mind that all stopped threads (threads not >> executing) are always in the kernel when they stop. > > I may be a bit behind the times here. But historically the "default" > process priority means the priority when the process was pre-empted. > If it did a system call, the priority on wake up would be as > specified in the sleep() kernel function (or its more modern > analog, like a sleeplock or condition variable). This would > let the kernel code react quickly, and then on return from > the syscall revert to the original priority, and possibly > get pre-empted by another process at that time. Agree, when a thread is woken up, it means kernel has some events the thread needs to process. > > If the user-mode priority is higher than the kernel-mode priority, > this would mean that once a high priority process does a system > call (say for example, poll()), it would experience a priority > inversion and sleep with a lower priority than specified. > > A fix for this should be fairly straightforward. The process structure > has the RT priority in it, so all that sleep() need is to check > it and use that priority if it's higher than the one given > as an argument. Or optionally, for the RT processes bump the argument > by how much the process'es RT priority is over the "RT baseline". > (Well, "logically over", numerically under). > > This would not solve the more general classic priority inversion > issue with some low-priority process grabbing some kernel > resource and sleeping at a lower priority while then an RT process > waits for this resource to be freed. I think the original > idea of in-kernel processes having the higher priorities is in > part an attempt to answer this problem. But I agree with you that > letting the RT processes have a higher priority than the TS processes > in the kernel mode is better than nothing. > Is there a way to indicate the current thread is in critical section, and should not be preempted util it is blocked ? once it is resumed, it still runs at higher priority than RT. > Um, a stupid question: does the signal() primitive on mutexes/condvars > (i.e. "wake up one sleeper") pick the thread with the highest > priority? I guess it should, since otherwise the classic > priority inversion can get pretty bad. But you can tell from > what I say, for how long I haven't looked at that code, so I > don't know the answer. > mutex/condvar does not have queue migrating or wakeup deferring, once a thread is woken up, if it is scheduled to run, it will immediately spin, but this just wastes time, because the owner may have not released the mutex yet, and if the mutex owner is preempted, things get worse. > -SB From owner-freebsd-arch@FreeBSD.ORG Tue Dec 14 12:57:15 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D58CC106564A for ; Tue, 14 Dec 2010 12:57:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 963B68FC15 for ; Tue, 14 Dec 2010 12:57:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 21CB046B09; Tue, 14 Dec 2010 07:57:15 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C1F7E8A009; Tue, 14 Dec 2010 07:57:13 -0500 (EST) From: John Baldwin To: Sergey Babkin Date: Tue, 14 Dec 2010 07:50:58 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012130927.26815.jhb@freebsd.org> <4D06BC5D.E573E3F1@verizon.net> In-Reply-To: <4D06BC5D.E573E3F1@verizon.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Message-Id: <201012140750.58712.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 14 Dec 2010 07:57:13 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Dec 2010 12:57:16 -0000 On Monday, December 13, 2010 7:37:49 pm Sergey Babkin wrote: > John Baldwin wrote: > > > > On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > > > John Baldwin wrote: > > > > > > > > The current layout breaks up the global thread priority space (0 - 255) > > into a > > > > couple of bands: > > > > > > > > 0 - 63 : interrupt threads > > > > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > > > > 128 - 159 : real-time user threads (rtprio) > > > > 160 - 223 : time-sharing user threads > > > > 224 - 255 : idle threads (idprio and kernel idle procs) > > > > > > > > If we decide to change the behavior I see two possible fixes: > > > > > > > > 1) (easy) just move the real-time priority range above the kernel sleep > > > > priority range > > > > > > Would not this cause a priority inversion when an RT process > > > enters the kernel mode? > > > > How so? Note that timesharing threads are not "bumped" to a kernel sleep > > priority when they enter the kernel either. The kernel sleep priorities are > > purely a way for certain sleep channels to cause a thread to be treated as > > interactive and give it a priority boost to favor interactive threads. > > Threads in the kernel do not automatically have higher priority than threads > > not in the kernel. Keep in mind that all stopped threads (threads not > > executing) are always in the kernel when they stop. > > I may be a bit behind the times here. But historically the "default" > process priority means the priority when the process was pre-empted. > If it did a system call, the priority on wake up would be as > specified in the sleep() kernel function (or its more modern > analog, like a sleeplock or condition variable). This would > let the kernel code react quickly, and then on return from > the syscall revert to the original priority, and possibly > get pre-empted by another process at that time. Except we don't do an explicit check in userret() to see if we should preempt when we drop the priority. We effectively let the process/thread run at the higher "sleep" priority until either 1) it's quantum expires, or 2) an interrupt causes a preemption due to some other higher priority thread being scheduled. However, if a higher priority thread is already on the run queue when we return to userland, it will not be preempted to. That is what the 2) suggestion in the original e-mail was about. > If the user-mode priority is higher than the kernel-mode priority, > this would mean that once a high priority process does a system > call (say for example, poll()), it would experience a priority > inversion and sleep with a lower priority than specified. That's what this part of the patch for 1) is about: Index: kern/kern_synch.c =================================================================== --- kern/kern_synch.c (revision 215592) +++ kern/kern_synch.c (working copy) @@ -214,7 +214,8 @@ * Adjust this thread's priority, if necessary. */ pri = priority & PRIMASK; - if (pri != 0 && pri != td->td_priority) { + if (pri != 0 && pri != td->td_priority && + td->td_pri_class == PRI_TIMESHARE) { thread_lock(td); sched_prio(td, pri); thread_unlock(td); This avoids the priority inversion. It also avoids giving a bump to an 'idprio' thread. Note that if any thread holds a mutex or rwlock that a higher priority thread needs, we lend the priority to the lock holder while the mutex is held and we will preempt to the higher priority thread when the mutex is released. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Dec 14 12:57:16 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5AB1106566B; Tue, 14 Dec 2010 12:57:16 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A7FE78FC18; Tue, 14 Dec 2010 12:57:16 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 6111B46B03; Tue, 14 Dec 2010 07:57:16 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4AF128A01D; Tue, 14 Dec 2010 07:57:15 -0500 (EST) From: John Baldwin To: David Xu Date: Tue, 14 Dec 2010 07:56:52 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012130927.26815.jhb@freebsd.org> <4D06C8B0.1060409@freebsd.org> In-Reply-To: <4D06C8B0.1060409@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012140756.52926.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 14 Dec 2010 07:57:15 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Dec 2010 12:57:17 -0000 On Monday, December 13, 2010 8:30:24 pm David Xu wrote: > John Baldwin wrote: > > On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > >> John Baldwin wrote: > >>> The current layout breaks up the global thread priority space (0 - 255) > > into a > >>> couple of bands: > >>> > >>> 0 - 63 : interrupt threads > >>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) > >>> 128 - 159 : real-time user threads (rtprio) > >>> 160 - 223 : time-sharing user threads > >>> 224 - 255 : idle threads (idprio and kernel idle procs) > >>> > >>> If we decide to change the behavior I see two possible fixes: > >>> > >>> 1) (easy) just move the real-time priority range above the kernel sleep > >>> priority range > >> Would not this cause a priority inversion when an RT process > >> enters the kernel mode? > > > > How so? Note that timesharing threads are not "bumped" to a kernel sleep > > priority when they enter the kernel either. The kernel sleep priorities are > > purely a way for certain sleep channels to cause a thread to be treated as > > interactive and give it a priority boost to favor interactive threads. > > Threads in the kernel do not automatically have higher priority than threads > > not in the kernel. Keep in mind that all stopped threads (threads not > > executing) are always in the kernel when they stop. > > I have requirement to make a thread running in kernel has more higher > priority over a thread running userland code, because our kernel > mutex is not sleepable which does not like Solaris did, I have to use > semaphore like code in kern_umtx.c to lock a chain, which allows me > to read and write user address space, this is how umtxq_busy() did, > but it does not prevent a userland thread from preempting a thread > which locked the chain, if a realtime thread preempts a thread > locked the chain, it may lock up whole processes using pthread. > I think our realtime scheduling is not very useful, it is too easy > to lock up system. Users are not forced to use rtprio. They choose to do so, and they have to be root to enable it (either directly or by extending root privileges via sudo or some such). Just because you don't have a use case for it doesn't mean that other people do not. Right now there is no way possible to say that a given userland process is more important than 'sshd' (or any other daemon) blocked in poll/select/kevent waiting for a packet. However, there are use cases where other long-running userland processes are in fact far more important than sshd (or similar processes such as getty, etc.). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Dec 15 01:40:12 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC7E9106566C; Wed, 15 Dec 2010 01:40:12 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9F0338FC0A; Wed, 15 Dec 2010 01:40:12 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBF1e9YZ002284; Wed, 15 Dec 2010 01:40:10 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D081C7C.5040407@freebsd.org> Date: Wed, 15 Dec 2010 09:40:12 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <201012130927.26815.jhb@freebsd.org> <4D06C8B0.1060409@freebsd.org> <201012140756.52926.jhb@freebsd.org> In-Reply-To: <201012140756.52926.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2010 01:40:12 -0000 John Baldwin wrote: > On Monday, December 13, 2010 8:30:24 pm David Xu wrote: >> John Baldwin wrote: >>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >>>> John Baldwin wrote: >>>>> The current layout breaks up the global thread priority space (0 - 255) >>> into a >>>>> couple of bands: >>>>> >>>>> 0 - 63 : interrupt threads >>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>>>> 128 - 159 : real-time user threads (rtprio) >>>>> 160 - 223 : time-sharing user threads >>>>> 224 - 255 : idle threads (idprio and kernel idle procs) >>>>> >>>>> If we decide to change the behavior I see two possible fixes: >>>>> >>>>> 1) (easy) just move the real-time priority range above the kernel sleep >>>>> priority range >>>> Would not this cause a priority inversion when an RT process >>>> enters the kernel mode? >>> How so? Note that timesharing threads are not "bumped" to a kernel sleep >>> priority when they enter the kernel either. The kernel sleep priorities are >>> purely a way for certain sleep channels to cause a thread to be treated as >>> interactive and give it a priority boost to favor interactive threads. >>> Threads in the kernel do not automatically have higher priority than threads >>> not in the kernel. Keep in mind that all stopped threads (threads not >>> executing) are always in the kernel when they stop. >> I have requirement to make a thread running in kernel has more higher >> priority over a thread running userland code, because our kernel >> mutex is not sleepable which does not like Solaris did, I have to use >> semaphore like code in kern_umtx.c to lock a chain, which allows me >> to read and write user address space, this is how umtxq_busy() did, >> but it does not prevent a userland thread from preempting a thread >> which locked the chain, if a realtime thread preempts a thread >> locked the chain, it may lock up whole processes using pthread. >> I think our realtime scheduling is not very useful, it is too easy >> to lock up system. > > Users are not forced to use rtprio. They choose to do so, and they have to > be root to enable it (either directly or by extending root privileges via > sudo or some such). Just because you don't have a use case for it doesn't > mean that other people do not. Right now there is no way possible to say > that a given userland process is more important than 'sshd' (or any other > daemon) blocked in poll/select/kevent waiting for a packet. However, there > are use cases where other long-running userland processes are in fact far > more important than sshd (or similar processes such as getty, etc.). > You still don't answer me about how to avoid a time-sharing thread holding a critical kernel resource which preempted by a user RT thread, and later the RT thread requires the resource, but the time-sharing thread has no chance to run because another RT thread is dominating the CPU because it is doing CPU bound work, result is deadlock, even if you know you trust your RT process, there are many code which were written by you, i.e the libc and any other libraries using threading are completely not ready for RT use. How ever let a thread in kernel have higher priority over a thread running userland code will fix such a deadlock in kernel. From owner-freebsd-arch@FreeBSD.ORG Wed Dec 15 01:41:07 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 448111065674; Wed, 15 Dec 2010 01:41:07 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 171DD8FC16; Wed, 15 Dec 2010 01:41:07 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBF1f36Q009454; Wed, 15 Dec 2010 01:41:04 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D081CB2.80200@freebsd.org> Date: Wed, 15 Dec 2010 09:41:06 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <201012130927.26815.jhb@freebsd.org> <4D06C8B0.1060409@freebsd.org> <201012140756.52926.jhb@freebsd.org> <4D081C7C.5040407@freebsd.org> In-Reply-To: <4D081C7C.5040407@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2010 01:41:07 -0000 > You still don't answer me about how to avoid a time-sharing thread > holding a critical kernel resource which preempted by a user RT thread, > and later the RT thread requires the resource, but the time-sharing > thread has no chance to run because another RT thread is dominating > the CPU because it is doing CPU bound work, result is deadlock, even if > you know you trust your RT process, there are many code which were not > written by you, > > From owner-freebsd-arch@FreeBSD.ORG Wed Dec 15 14:38:47 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F776106566B; Wed, 15 Dec 2010 14:38:47 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 51CF58FC12; Wed, 15 Dec 2010 14:38:47 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0886346B09; Wed, 15 Dec 2010 09:38:47 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id ED79E8A01D; Wed, 15 Dec 2010 09:38:45 -0500 (EST) From: John Baldwin To: David Xu Date: Wed, 15 Dec 2010 09:38:43 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012140756.52926.jhb@freebsd.org> <4D081C7C.5040407@freebsd.org> In-Reply-To: <4D081C7C.5040407@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012150938.44217.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 15 Dec 2010 09:38:46 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2010 14:38:47 -0000 On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: > John Baldwin wrote: > > On Monday, December 13, 2010 8:30:24 pm David Xu wrote: > >> John Baldwin wrote: > >>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > >>>> John Baldwin wrote: > >>>>> The current layout breaks up the global thread priority space (0 - 255) > >>> into a > >>>>> couple of bands: > >>>>> > >>>>> 0 - 63 : interrupt threads > >>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) > >>>>> 128 - 159 : real-time user threads (rtprio) > >>>>> 160 - 223 : time-sharing user threads > >>>>> 224 - 255 : idle threads (idprio and kernel idle procs) > >>>>> > >>>>> If we decide to change the behavior I see two possible fixes: > >>>>> > >>>>> 1) (easy) just move the real-time priority range above the kernel sleep > >>>>> priority range > >>>> Would not this cause a priority inversion when an RT process > >>>> enters the kernel mode? > >>> How so? Note that timesharing threads are not "bumped" to a kernel sleep > >>> priority when they enter the kernel either. The kernel sleep priorities are > >>> purely a way for certain sleep channels to cause a thread to be treated as > >>> interactive and give it a priority boost to favor interactive threads. > >>> Threads in the kernel do not automatically have higher priority than threads > >>> not in the kernel. Keep in mind that all stopped threads (threads not > >>> executing) are always in the kernel when they stop. > >> I have requirement to make a thread running in kernel has more higher > >> priority over a thread running userland code, because our kernel > >> mutex is not sleepable which does not like Solaris did, I have to use > >> semaphore like code in kern_umtx.c to lock a chain, which allows me > >> to read and write user address space, this is how umtxq_busy() did, > >> but it does not prevent a userland thread from preempting a thread > >> which locked the chain, if a realtime thread preempts a thread > >> locked the chain, it may lock up whole processes using pthread. > >> I think our realtime scheduling is not very useful, it is too easy > >> to lock up system. > > > > Users are not forced to use rtprio. They choose to do so, and they have to > > be root to enable it (either directly or by extending root privileges via > > sudo or some such). Just because you don't have a use case for it doesn't > > mean that other people do not. Right now there is no way possible to say > > that a given userland process is more important than 'sshd' (or any other > > daemon) blocked in poll/select/kevent waiting for a packet. However, there > > are use cases where other long-running userland processes are in fact far > > more important than sshd (or similar processes such as getty, etc.). > > > You still don't answer me about how to avoid a time-sharing thread > holding a critical kernel resource which preempted by a user RT thread, > and later the RT thread requires the resource, but the time-sharing > thread has no chance to run because another RT thread is dominating > the CPU because it is doing CPU bound work, result is deadlock, even if > you know you trust your RT process, there are many code which were > written by you, i.e the libc and any other libraries using threading > are completely not ready for RT use. > How ever let a thread in kernel have higher priority over a thread > running userland code will fix such a deadlock in kernel. Put another way, the time-sharing thread that I don't care about (sshd, or some other monitoring daemon, etc.) is stealing a resource I care about (time, in the form of CPU cycles) from my RT process that is critical to getting my work done. Beyond that a few more points: - You are ignoring "tools, not policy". You don't know what is in my binary (and I can't really tell you). Assume for a minute that I'm not completely dumb and can write userland code that is safe to run at this high of a priority level. You already trust me to write code in the kernel that runs at even higher priority now. :) - You repeatedly keep missing (ignoring?) the fact that this is _optional_. Users have to intentionally decide to enable this, and there are users who do _need_ this functionality. - You have also missed that this has always been true for idprio processes (and is in fact why we restrict idprio to root), so this is not "new". - Finally, you also are missing that this can already happen _now_ for plain old time sharing processes if the thread holding the resource doesn't ever do a sleep that raises the priority. For example, if a time-sharing thread with some typical priority >= PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for that file (if it is unlocked) and hold that lock while it's priority is >= PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes up sshd for a new SSH connection, the interrupt thread will preempt the thread holding the vnode lock, and sshd will be executed instead of the thread holding the vnode lock when the ithread finishes. If sshd needs the vnode lock that the original thread holds, then sshd will block until the original thread is rescheduled due to the random fates of time and releases the vnode lock. In summary, the kernel sleep priorities do _not_ serve to prevent all priority inversions, what they do accomplish is giving preferential treatment to idle, "interactive" threads. A bit more information on my use case btw: My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the CPU from the global cpuset and ensure no interrupts are routed to that CPU). The problem I have is that if my RT process blocks on a lock (e.g. a lock on a VM object during a page fault), then I want the RT thread to lend its RT priority to the thread that holds the lock over on another CPU so that the lock can be released as quickly as possible. This use case is perfectly safe (the RT thread is not preempting other threads, instead other threads are partitioned off into a separate set of available CPUs). What I need is to ensure that the syncer or pagedaemon or whoever holds the lock I need gets a chance to run right away when it holds a lock that I need. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Dec 15 16:56:55 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5E62106564A for ; Wed, 15 Dec 2010 16:56:55 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 9C1E88FC0C for ; Wed, 15 Dec 2010 16:56:55 +0000 (UTC) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.4/8.14.4/NETPLEX) with ESMTP id oBFGk2Af022848; Wed, 15 Dec 2010 11:46:02 -0500 X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.2.6 (mail.netplex.net [204.213.176.10]); Wed, 15 Dec 2010 11:46:02 -0500 (EST) Date: Wed, 15 Dec 2010 11:46:02 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <201012150938.44217.jhb@freebsd.org> Message-ID: References: <201012101050.45214.jhb@freebsd.org> <201012140756.52926.jhb@freebsd.org> <4D081C7C.5040407@freebsd.org> <201012150938.44217.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2010 16:56:56 -0000 On Wed, 15 Dec 2010, John Baldwin wrote: > > Put another way, the time-sharing thread that I don't care about (sshd, or > some other monitoring daemon, etc.) is stealing a resource I care about > (time, in the form of CPU cycles) from my RT process that is critical to > getting my work done. > > Beyond that a few more points: > > - You are ignoring "tools, not policy". You don't know what is in my binary > (and I can't really tell you). Assume for a minute that I'm not completely > dumb and can write userland code that is safe to run at this high of a > priority level. You already trust me to write code in the kernel that runs > at even higher priority now. :) > - You repeatedly keep missing (ignoring?) the fact that this is _optional_. > Users have to intentionally decide to enable this, and there are users who > do _need_ this functionality. > - You have also missed that this has always been true for idprio processes > (and is in fact why we restrict idprio to root), so this is not "new". > - Finally, you also are missing that this can already happen _now_ for plain > old time sharing processes if the thread holding the resource doesn't ever > do a sleep that raises the priority. > > For example, if a time-sharing thread with some typical priority >= > PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for > that file (if it is unlocked) and hold that lock while it's priority is >= > PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes > up sshd for a new SSH connection, the interrupt thread will preempt the > thread holding the vnode lock, and sshd will be executed instead of the > thread holding the vnode lock when the ithread finishes. If sshd needs the > vnode lock that the original thread holds, then sshd will block until the > original thread is rescheduled due to the random fates of time and releases > the vnode lock. > > In summary, the kernel sleep priorities do _not_ serve to prevent all > priority inversions, what they do accomplish is giving preferential treatment > to idle, "interactive" threads. > > A bit more information on my use case btw: > > My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the > CPU from the global cpuset and ensure no interrupts are routed to that CPU). > The problem I have is that if my RT process blocks on a lock (e.g. a lock on a > VM object during a page fault), then I want the RT thread to lend its RT > priority to the thread that holds the lock over on another CPU so that the lock > can be released as quickly as possible. This use case is perfectly safe (the > RT thread is not preempting other threads, instead other threads are partitioned > off into a separate set of available CPUs). What I need is to ensure that the > syncer or pagedaemon or whoever holds the lock I need gets a chance to run right > away when it holds a lock that I need. And speaking as a developer that writes applications that require real-time priorities, all of the above is a good summary. As such a developer, I don't use real-time priorities to make applications run faster, have more throughput, get more work done, or anything like that. It is to attempt to meet real world deadlines. Our applications do not busy the CPU, they block mostly, waking up for and handling events - both periodic and aperiodic. We know our applications run real-time, so we try to be as efficient as possible. If there is something more CPU intensive, then perhaps we'll have another lower priority thread/process to handle that task. The important thing is that we need to meet or respond to a time- critical event. We do expect that our real-time threads will run over time sharing or other lower priority threads, and that priority will be propagated for any contested OS locks. In our situation, it is acceptable to starve low priority tasks, though we do design the applications to avoid that. -- DE From owner-freebsd-arch@FreeBSD.ORG Wed Dec 15 23:15:08 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D6DBF106566B; Wed, 15 Dec 2010 23:15:07 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mail.cksoft.de (mail.cksoft.de [IPv6:2001:4068:10::3]) by mx1.freebsd.org (Postfix) with ESMTP id 767EA8FC13; Wed, 15 Dec 2010 23:15:07 +0000 (UTC) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 785B941C756; Thu, 16 Dec 2010 00:15:06 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([192.168.74.103]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id va1IlFPWrzne; Thu, 16 Dec 2010 00:15:05 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id B6B2041C752; Thu, 16 Dec 2010 00:15:05 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 41DDB4448F3; Wed, 15 Dec 2010 23:14:43 +0000 (UTC) Date: Wed, 15 Dec 2010 23:14:42 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Robert Watson In-Reply-To: Message-ID: <20101215230640.K6126@maildrop.int.zabbadoz.net> References: X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Future of netnatm: looking for testers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2010 23:15:08 -0000 On Wed, 28 Jul 2010, Robert Watson wrote: In reply to the original post: > Dear all: > > When the new link layer framework was introduced in 8.0, one of our ATM > stacks, netnatm, was left behind. As a result, it neither compiles nor runs > in 8.x and 9.x. This e-mail serves two purposes: > > (1) To solicit a volunteer who can work on the netnatm stack in 9.x, with > potential merge to 8.x, to get it back to functionality before 9.0 ships. > This is the preferred course of action. > > (2) To serve as notice that if we can't find a volunteer to do this, we will > remove netnatm and associated parts from the tree in 9.0 since they'll have > gone one major version neither compiling nor running. This is the fallback > plan. > > I'm in no great rush to remove netnatm, having spent quite a bit of time > making it work in our MPSAFE world order a couple of years ago. However, the > code is bitrotting and requires urgent attention if it's going to work again > easily (the stack is changing around it, and because netnatm doesn't build, > it will get only cursory and likely incorrect updates). I'm happy to help > funnel changes into the tree from non-committers, as well as answer questions > about the network stack, but I have no hardware facilities for debugging or > testing netnatm changes myself, nor, unfortunately, the time to work on the > code. > > In order to provide further motivation for potentially interested parties, > here's the proposed six-month removal schedule: > > 28 July 2010 - Notice of proposed removal > 28 October 2010 - Transmit of notice of proposed removal > 28 January 2011 - Proposed removal date > > This schedule may be updated as the 9.0 release schedule becomes more clear, > or if there are obvious signs of improvement and just a couple more months > would get it fixed :-). And, if worst comes to worst and we can't find a > volunteer, the code will live on in the source repository history if there's > a desire to rejuvenate it in the future. I would request two things: 1) the extra couple of months; this will not prevent the evitable removal yet only defer it. 2) If anyone of you is using (or want to be able to (continue to) use) NATM or can test things, I re-enabled it with most of the code in HEAD and the patch is available for 8,x as well but need to work with somoene to make sure it'll really work. I am willing to spend more time on it if you send me an email. Best Regards, Bjoern ------------------------------------------------------------------------ > Author: bz > Date: Wed Dec 15 22:58:45 2010 > New Revision: 216466 > URL: http://svn.freebsd.org/changeset/base/216466 > > Log: > Bring back (most of) NATM to avoid further bitrot after r186119. > Keep three lines disabled which I am unsure if they had been used at all. > This will allow us to seek testers and possibly bring it all back. If you have the ability to test (on 8.x or HEAD) or are using NATM, please get in contact with me. > Discussed with: rwatson > MFC after: 7 weeks > > Modified: > head/sys/conf/NOTES > head/sys/netinet/if_atm.c ------------------------------------------------------------------------ -- Bjoern A. Zeeb Welcome a new stage of life. Going to jail sucks -- All my daemons like it! http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html From owner-freebsd-arch@FreeBSD.ORG Thu Dec 16 04:16:53 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B26A41065693; Thu, 16 Dec 2010 04:16:53 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9528A8FC17; Thu, 16 Dec 2010 04:16:53 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBG4GoBh046075; Thu, 16 Dec 2010 04:16:51 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D0992B5.7060005@freebsd.org> Date: Thu, 16 Dec 2010 12:16:53 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <201012140756.52926.jhb@freebsd.org> <4D081C7C.5040407@freebsd.org> <201012150938.44217.jhb@freebsd.org> In-Reply-To: <201012150938.44217.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Dec 2010 04:16:53 -0000 John Baldwin wrote: > On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: >> John Baldwin wrote: >>> On Monday, December 13, 2010 8:30:24 pm David Xu wrote: >>>> John Baldwin wrote: >>>>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >>>>>> John Baldwin wrote: >>>>>>> The current layout breaks up the global thread priority space (0 - 255) >>>>> into a >>>>>>> couple of bands: >>>>>>> >>>>>>> 0 - 63 : interrupt threads >>>>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>>>>>> 128 - 159 : real-time user threads (rtprio) >>>>>>> 160 - 223 : time-sharing user threads >>>>>>> 224 - 255 : idle threads (idprio and kernel idle procs) >>>>>>> >>>>>>> If we decide to change the behavior I see two possible fixes: >>>>>>> >>>>>>> 1) (easy) just move the real-time priority range above the kernel sleep >>>>>>> priority range >>>>>> Would not this cause a priority inversion when an RT process >>>>>> enters the kernel mode? >>>>> How so? Note that timesharing threads are not "bumped" to a kernel sleep >>>>> priority when they enter the kernel either. The kernel sleep priorities are >>>>> purely a way for certain sleep channels to cause a thread to be treated as >>>>> interactive and give it a priority boost to favor interactive threads. >>>>> Threads in the kernel do not automatically have higher priority than threads >>>>> not in the kernel. Keep in mind that all stopped threads (threads not >>>>> executing) are always in the kernel when they stop. >>>> I have requirement to make a thread running in kernel has more higher >>>> priority over a thread running userland code, because our kernel >>>> mutex is not sleepable which does not like Solaris did, I have to use >>>> semaphore like code in kern_umtx.c to lock a chain, which allows me >>>> to read and write user address space, this is how umtxq_busy() did, >>>> but it does not prevent a userland thread from preempting a thread >>>> which locked the chain, if a realtime thread preempts a thread >>>> locked the chain, it may lock up whole processes using pthread. >>>> I think our realtime scheduling is not very useful, it is too easy >>>> to lock up system. >>> Users are not forced to use rtprio. They choose to do so, and they have to >>> be root to enable it (either directly or by extending root privileges via >>> sudo or some such). Just because you don't have a use case for it doesn't >>> mean that other people do not. Right now there is no way possible to say >>> that a given userland process is more important than 'sshd' (or any other >>> daemon) blocked in poll/select/kevent waiting for a packet. However, there >>> are use cases where other long-running userland processes are in fact far >>> more important than sshd (or similar processes such as getty, etc.). >>> >> You still don't answer me about how to avoid a time-sharing thread >> holding a critical kernel resource which preempted by a user RT thread, >> and later the RT thread requires the resource, but the time-sharing >> thread has no chance to run because another RT thread is dominating >> the CPU because it is doing CPU bound work, result is deadlock, even if >> you know you trust your RT process, there are many code which were >> written by you, i.e the libc and any other libraries using threading >> are completely not ready for RT use. >> How ever let a thread in kernel have higher priority over a thread >> running userland code will fix such a deadlock in kernel. > > Put another way, the time-sharing thread that I don't care about (sshd, or > some other monitoring daemon, etc.) is stealing a resource I care about > (time, in the form of CPU cycles) from my RT process that is critical to > getting my work done. > > Beyond that a few more points: > > - You are ignoring "tools, not policy". You don't know what is in my binary > (and I can't really tell you). Assume for a minute that I'm not completely > dumb and can write userland code that is safe to run at this high of a > priority level. You already trust me to write code in the kernel that runs > at even higher priority now. :) > - You repeatedly keep missing (ignoring?) the fact that this is _optional_. > Users have to intentionally decide to enable this, and there are users who > do _need_ this functionality. > - You have also missed that this has always been true for idprio processes > (and is in fact why we restrict idprio to root), so this is not "new". > - Finally, you also are missing that this can already happen _now_ for plain > old time sharing processes if the thread holding the resource doesn't ever > do a sleep that raises the priority. > > For example, if a time-sharing thread with some typical priority >= > PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for > that file (if it is unlocked) and hold that lock while it's priority is >= > PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes > up sshd for a new SSH connection, the interrupt thread will preempt the > thread holding the vnode lock, and sshd will be executed instead of the > thread holding the vnode lock when the ithread finishes. If sshd needs the > vnode lock that the original thread holds, then sshd will block until the > original thread is rescheduled due to the random fates of time and releases > the vnode lock. > > In summary, the kernel sleep priorities do _not_ serve to prevent all > priority inversions, what they do accomplish is giving preferential treatment > to idle, "interactive" threads. > > A bit more information on my use case btw: > > My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the > CPU from the global cpuset and ensure no interrupts are routed to that CPU). > The problem I have is that if my RT process blocks on a lock (e.g. a lock on a > VM object during a page fault), then I want the RT thread to lend its RT > priority to the thread that holds the lock over on another CPU so that the lock > can be released as quickly as possible. This use case is perfectly safe (the > RT thread is not preempting other threads, instead other threads are partitioned > off into a separate set of available CPUs). What I need is to ensure that the > syncer or pagedaemon or whoever holds the lock I need gets a chance to run right > away when it holds a lock that I need. > What I meant is that whenever thread is in kernel mode, it always has higher priority over thread running user code, and all threads in kernel mode may have same priority except those interrupt threads which has higher priority, but this should be carefully designed to use mutex and spinlock between interrupt threads and other threads, mutex uses turnstile to propagate priority, spin lock disables interrupt, otherwise there still is priority inversion in kernel, i.e rwlock, sx lock. I really don't care if idprio will be preempted at user boundary or not, I think it really should do, any thread returning to userland should check if there is a higher priority RT thread is in run queue, if this is true, it always should switch context, for other cases, i.e TS (time-sharing) vs TS in run queue, keep current behavior may still be a good idea for better performance. To clarify my idea, this is sample code: in trap.c: set td_pflags |= TDP_KERNELMODE; in sched_switch(): if (td_pflags & TDP_KERNELMODE) sched_prio(td, PRI_KERNEL); the PRI_KERNEL_MODE will always be higher than any RT priority and TS priority and IDLE priority, but will lower than interrupt threads priority. in userret: td_pflags &=~ TDP_KERNELMODE; restore priority to its current user priority check rescheduling: TS vs TS may ignore some flags; TS vs RT in run queue, switch context RT vs RT in run queue, compare priority and switch context ... Now kernel itself is safe to run RT priority thread, unlike current code which will dead lock. From owner-freebsd-arch@FreeBSD.ORG Thu Dec 16 04:55:23 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04A60106566C; Thu, 16 Dec 2010 04:55:23 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DCE588FC16; Thu, 16 Dec 2010 04:55:22 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBG4tLrx086579; Thu, 16 Dec 2010 04:55:21 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D099BBC.7050200@freebsd.org> Date: Thu, 16 Dec 2010 12:55:24 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: Daniel Eischen References: <201012101050.45214.jhb@freebsd.org> <201012140756.52926.jhb@freebsd.org> <4D081C7C.5040407@freebsd.org> <201012150938.44217.jhb@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Dec 2010 04:55:23 -0000 Daniel Eischen wrote: > On Wed, 15 Dec 2010, John Baldwin wrote: >> >> Put another way, the time-sharing thread that I don't care about >> (sshd, or >> some other monitoring daemon, etc.) is stealing a resource I care about >> (time, in the form of CPU cycles) from my RT process that is critical to >> getting my work done. >> >> Beyond that a few more points: >> >> - You are ignoring "tools, not policy". You don't know what is in my >> binary >> (and I can't really tell you). Assume for a minute that I'm not >> completely >> dumb and can write userland code that is safe to run at this high of a >> priority level. You already trust me to write code in the kernel >> that runs >> at even higher priority now. :) >> - You repeatedly keep missing (ignoring?) the fact that this is >> _optional_. >> Users have to intentionally decide to enable this, and there are >> users who >> do _need_ this functionality. >> - You have also missed that this has always been true for idprio >> processes >> (and is in fact why we restrict idprio to root), so this is not "new". >> - Finally, you also are missing that this can already happen _now_ for >> plain >> old time sharing processes if the thread holding the resource doesn't >> ever >> do a sleep that raises the priority. >> >> For example, if a time-sharing thread with some typical priority >= >> PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock >> for >> that file (if it is unlocked) and hold that lock while it's priority >> is >= >> PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that >> wakes >> up sshd for a new SSH connection, the interrupt thread will preempt the >> thread holding the vnode lock, and sshd will be executed instead of the >> thread holding the vnode lock when the ithread finishes. If sshd >> needs the >> vnode lock that the original thread holds, then sshd will block until the >> original thread is rescheduled due to the random fates of time and >> releases >> the vnode lock. >> >> In summary, the kernel sleep priorities do _not_ serve to prevent all >> priority inversions, what they do accomplish is giving preferential >> treatment >> to idle, "interactive" threads. >> >> A bit more information on my use case btw: >> >> My RT processes are each assigned a _dedicated_ CPU via cpuset (we >> remove the >> CPU from the global cpuset and ensure no interrupts are routed to that >> CPU). >> The problem I have is that if my RT process blocks on a lock (e.g. a >> lock on a >> VM object during a page fault), then I want the RT thread to lend its RT >> priority to the thread that holds the lock over on another CPU so that >> the lock >> can be released as quickly as possible. This use case is perfectly >> safe (the >> RT thread is not preempting other threads, instead other threads are >> partitioned >> off into a separate set of available CPUs). What I need is to ensure >> that the >> syncer or pagedaemon or whoever holds the lock I need gets a chance to >> run right >> away when it holds a lock that I need. > > And speaking as a developer that writes applications that require > real-time priorities, all of the above is a good summary. As such > a developer, I don't use real-time priorities to make applications > run faster, have more throughput, get more work done, or anything > like that. It is to attempt to meet real world deadlines. Our > applications do not busy the CPU, they block mostly, waking up for > and handling events - both periodic and aperiodic. We know our > applications run real-time, so we try to be as efficient as possible. > If there is something more CPU intensive, then perhaps we'll have > another lower priority thread/process to handle that task. The > important thing is that we need to meet or respond to a time- > critical event. > > We do expect that our real-time threads will run over time > sharing or other lower priority threads, and that priority > will be propagated for any contested OS locks. In our situation, > it is acceptable to starve low priority tasks, though we do > design the applications to avoid that. > I am not objecting RT scheduling, I just said the kernel is not ready for RT use, it has priority inversion, as an example I even wrote code to implement priority-inherit pthread mutex for libthr, this is for RT programming. But kernel has priority inversion, because the priority inversions, it will not meet time critical requirement even if you configured the machine properly, this can not be fixed by proposed priority range adjust. What I have said and done is try to find a way to fix priority inversion problem in kernel. I know msleep raises priority is a hacking, if all user threads in kernel mode have same higher level priority than those in user mode, the priority raising by msleep may be eliminated, the realtime scheduling for user thread still works once it returned to user mode as I said in another reply. From owner-freebsd-arch@FreeBSD.ORG Thu Dec 16 14:41:11 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A317106564A; Thu, 16 Dec 2010 14:41:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id CE0BF8FC0C; Thu, 16 Dec 2010 14:41:10 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 7F63A46B65; Thu, 16 Dec 2010 09:41:10 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 35DD58A01D; Thu, 16 Dec 2010 09:41:09 -0500 (EST) From: John Baldwin To: David Xu Date: Thu, 16 Dec 2010 09:40:57 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012150938.44217.jhb@freebsd.org> <4D0992B5.7060005@freebsd.org> In-Reply-To: <4D0992B5.7060005@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012160940.58116.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Thu, 16 Dec 2010 09:41:09 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Dec 2010 14:41:11 -0000 On Wednesday, December 15, 2010 11:16:53 pm David Xu wrote: > John Baldwin wrote: > > On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: > >> John Baldwin wrote: > >>> On Monday, December 13, 2010 8:30:24 pm David Xu wrote: > >>>> John Baldwin wrote: > >>>>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > >>>>>> John Baldwin wrote: > >>>>>>> The current layout breaks up the global thread priority space (0 - 255) > >>>>> into a > >>>>>>> couple of bands: > >>>>>>> > >>>>>>> 0 - 63 : interrupt threads > >>>>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) > >>>>>>> 128 - 159 : real-time user threads (rtprio) > >>>>>>> 160 - 223 : time-sharing user threads > >>>>>>> 224 - 255 : idle threads (idprio and kernel idle procs) > >>>>>>> > >>>>>>> If we decide to change the behavior I see two possible fixes: > >>>>>>> > >>>>>>> 1) (easy) just move the real-time priority range above the kernel sleep > >>>>>>> priority range > >>>>>> Would not this cause a priority inversion when an RT process > >>>>>> enters the kernel mode? > >>>>> How so? Note that timesharing threads are not "bumped" to a kernel sleep > >>>>> priority when they enter the kernel either. The kernel sleep priorities are > >>>>> purely a way for certain sleep channels to cause a thread to be treated as > >>>>> interactive and give it a priority boost to favor interactive threads. > >>>>> Threads in the kernel do not automatically have higher priority than threads > >>>>> not in the kernel. Keep in mind that all stopped threads (threads not > >>>>> executing) are always in the kernel when they stop. > >>>> I have requirement to make a thread running in kernel has more higher > >>>> priority over a thread running userland code, because our kernel > >>>> mutex is not sleepable which does not like Solaris did, I have to use > >>>> semaphore like code in kern_umtx.c to lock a chain, which allows me > >>>> to read and write user address space, this is how umtxq_busy() did, > >>>> but it does not prevent a userland thread from preempting a thread > >>>> which locked the chain, if a realtime thread preempts a thread > >>>> locked the chain, it may lock up whole processes using pthread. > >>>> I think our realtime scheduling is not very useful, it is too easy > >>>> to lock up system. > >>> Users are not forced to use rtprio. They choose to do so, and they have to > >>> be root to enable it (either directly or by extending root privileges via > >>> sudo or some such). Just because you don't have a use case for it doesn't > >>> mean that other people do not. Right now there is no way possible to say > >>> that a given userland process is more important than 'sshd' (or any other > >>> daemon) blocked in poll/select/kevent waiting for a packet. However, there > >>> are use cases where other long-running userland processes are in fact far > >>> more important than sshd (or similar processes such as getty, etc.). > >>> > >> You still don't answer me about how to avoid a time-sharing thread > >> holding a critical kernel resource which preempted by a user RT thread, > >> and later the RT thread requires the resource, but the time-sharing > >> thread has no chance to run because another RT thread is dominating > >> the CPU because it is doing CPU bound work, result is deadlock, even if > >> you know you trust your RT process, there are many code which were > >> written by you, i.e the libc and any other libraries using threading > >> are completely not ready for RT use. > >> How ever let a thread in kernel have higher priority over a thread > >> running userland code will fix such a deadlock in kernel. > > > > Put another way, the time-sharing thread that I don't care about (sshd, or > > some other monitoring daemon, etc.) is stealing a resource I care about > > (time, in the form of CPU cycles) from my RT process that is critical to > > getting my work done. > > > > Beyond that a few more points: > > > > - You are ignoring "tools, not policy". You don't know what is in my binary > > (and I can't really tell you). Assume for a minute that I'm not completely > > dumb and can write userland code that is safe to run at this high of a > > priority level. You already trust me to write code in the kernel that runs > > at even higher priority now. :) > > - You repeatedly keep missing (ignoring?) the fact that this is _optional_. > > Users have to intentionally decide to enable this, and there are users who > > do _need_ this functionality. > > - You have also missed that this has always been true for idprio processes > > (and is in fact why we restrict idprio to root), so this is not "new". > > - Finally, you also are missing that this can already happen _now_ for plain > > old time sharing processes if the thread holding the resource doesn't ever > > do a sleep that raises the priority. > > > > For example, if a time-sharing thread with some typical priority >= > > PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for > > that file (if it is unlocked) and hold that lock while it's priority is >= > > PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes > > up sshd for a new SSH connection, the interrupt thread will preempt the > > thread holding the vnode lock, and sshd will be executed instead of the > > thread holding the vnode lock when the ithread finishes. If sshd needs the > > vnode lock that the original thread holds, then sshd will block until the > > original thread is rescheduled due to the random fates of time and releases > > the vnode lock. > > > > In summary, the kernel sleep priorities do _not_ serve to prevent all > > priority inversions, what they do accomplish is giving preferential treatment > > to idle, "interactive" threads. > > > > A bit more information on my use case btw: > > > > My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the > > CPU from the global cpuset and ensure no interrupts are routed to that CPU). > > The problem I have is that if my RT process blocks on a lock (e.g. a lock on a > > VM object during a page fault), then I want the RT thread to lend its RT > > priority to the thread that holds the lock over on another CPU so that the lock > > can be released as quickly as possible. This use case is perfectly safe (the > > RT thread is not preempting other threads, instead other threads are partitioned > > off into a separate set of available CPUs). What I need is to ensure that the > > syncer or pagedaemon or whoever holds the lock I need gets a chance to run right > > away when it holds a lock that I need. > > > What I meant is that whenever thread is in kernel mode, it always has > higher priority over thread running user code, and all threads in kernel > mode may have same priority except those interrupt threads which > has higher priority, but this should be carefully designed to use > mutex and spinlock between interrupt threads and other threads, > mutex uses turnstile to propagate priority, spin lock disables > interrupt, otherwise there still is priority inversion in kernel, i.e > rwlock, sx lock. Except that this isn't really true. Really, if a thread is asleep in select() or poll() or kevent(), what critical resource is it holding? I had the same view originally when the current set of priorites were setup. However, I've had to change it since I now have a real-world use case for rtprio. First, I think this is the easy part of the argument: Can you agree that if a RT process is in the kernel, it should have priority over a TS process in the kernel? Thus, if a RT process blocks in the kernel, it would need to lend enough of a priority to the lock holder to preempt any TS process in the kernel, yes? If so, that argues for RT processes in the kernel having a higher priority than all the other kernel sleep priorities. The second part is harder, and that is what happens when a RT process is in userland. First, some food for thought. Do you realize that currently, the syncer and pagedaemon threads run at PVM? This is intentional so that these processes run in the "background" even though they are in the kernel. Specifically, when sshd does wakeup from a sleep at PSOCK or the like, the kernel doesn't just let it run in the kernel, it effectively lets it keep that PSOCK priority in userland until the next context switch due to an interrupt or the quantum expiring. This means that when you ssh into a box, the your interactive typing ends up preempting syncer and pagedaemon. And this is a good thing, because syncer and pagedaemon are _background_ processes. Preempting them only for the kernel portion of sshd (as the change to userret in both your proposal and my original #2 would do) would not really favor interactive processes because the user relies on the userland portion of an interactive process to run, too (userland is the part that echos back the characters as they are typed). So even now, with TS threads, we have TS userland code that is _more important_ than code in the kernel. Another example is the idlezero kernel process. This is kernel code, but is easily far less important than pretty much all userland code. Kernel code is _not_ always more important than userland code. It often is, but it sometimes isn't. If you can accept that, then it is no longer strange to consider that even the userland code in a RT process is more important than kernel code in a TS process. In our case we do chew up a lot of CPU in userland for our RT processes, but we handle this case by using dedicated CPUs. Our RT processes really are the most important processes on the box. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Dec 16 18:04:30 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1ACCB1065672; Thu, 16 Dec 2010 18:04:30 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from out-0.mx.aerioconnect.net (out-0-31.mx.aerioconnect.net [216.240.47.91]) by mx1.freebsd.org (Postfix) with ESMTP id EB8868FC15; Thu, 16 Dec 2010 18:04:29 +0000 (UTC) Received: from idiom.com (postfix@mx0.idiom.com [216.240.32.160]) by out-0.mx.aerioconnect.net (8.13.8/8.13.8) with ESMTP id oBGI4S8x005441; Thu, 16 Dec 2010 10:04:28 -0800 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137]) by idiom.com (Postfix) with ESMTP id F40E52D6015; Thu, 16 Dec 2010 10:04:26 -0800 (PST) Message-ID: <4D0A54A8.90901@freebsd.org> Date: Thu, 16 Dec 2010 10:04:24 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <201012150938.44217.jhb@freebsd.org> <4D0992B5.7060005@freebsd.org> <201012160940.58116.jhb@freebsd.org> In-Reply-To: <201012160940.58116.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 216.240.47.51 Cc: arch@freebsd.org, Sergey Babkin , David Xu Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Dec 2010 18:04:30 -0000 On 12/16/10 6:40 AM, John Baldwin wrote: > On Wednesday, December 15, 2010 11:16:53 pm David Xu wrote: >> John Baldwin wrote: >>> On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: >>>> John Baldwin wrote: >>>>> On Monday, December 13, 2010 8:30:24 pm David Xu wrote: >>>>>> John Baldwin wrote: >>>>>>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >>>>>>>> John Baldwin wrote: >>>>>>>>> The current layout breaks up the global thread priority space (0 - 255) >>>>>>> into a >>>>>>>>> couple of bands: >>>>>>>>> >>>>>>>>> 0 - 63 : interrupt threads >>>>>>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>>>>>>>> 128 - 159 : real-time user threads (rtprio) >>>>>>>>> 160 - 223 : time-sharing user threads >>>>>>>>> 224 - 255 : idle threads (idprio and kernel idle procs) >>>>>>>>> >>>>>>>>> If we decide to change the behavior I see two possible fixes: >>>>>>>>> >>>>>>>>> 1) (easy) just move the real-time priority range above the kernel sleep >>>>>>>>> priority range >>>>>>>> Would not this cause a priority inversion when an RT process >>>>>>>> enters the kernel mode? >>>>>>> How so? Note that timesharing threads are not "bumped" to a kernel sleep >>>>>>> priority when they enter the kernel either. The kernel sleep priorities are >>>>>>> purely a way for certain sleep channels to cause a thread to be treated as >>>>>>> interactive and give it a priority boost to favor interactive threads. >>>>>>> Threads in the kernel do not automatically have higher priority than threads >>>>>>> not in the kernel. Keep in mind that all stopped threads (threads not >>>>>>> executing) are always in the kernel when they stop. >>>>>> I have requirement to make a thread running in kernel has more higher >>>>>> priority over a thread running userland code, because our kernel >>>>>> mutex is not sleepable which does not like Solaris did, I have to use >>>>>> semaphore like code in kern_umtx.c to lock a chain, which allows me >>>>>> to read and write user address space, this is how umtxq_busy() did, >>>>>> but it does not prevent a userland thread from preempting a thread >>>>>> which locked the chain, if a realtime thread preempts a thread >>>>>> locked the chain, it may lock up whole processes using pthread. >>>>>> I think our realtime scheduling is not very useful, it is too easy >>>>>> to lock up system. >>>>> Users are not forced to use rtprio. They choose to do so, and they have to >>>>> be root to enable it (either directly or by extending root privileges via >>>>> sudo or some such). Just because you don't have a use case for it doesn't >>>>> mean that other people do not. Right now there is no way possible to say >>>>> that a given userland process is more important than 'sshd' (or any other >>>>> daemon) blocked in poll/select/kevent waiting for a packet. However, there >>>>> are use cases where other long-running userland processes are in fact far >>>>> more important than sshd (or similar processes such as getty, etc.). >>>>> >>>> You still don't answer me about how to avoid a time-sharing thread >>>> holding a critical kernel resource which preempted by a user RT thread, >>>> and later the RT thread requires the resource, but the time-sharing >>>> thread has no chance to run because another RT thread is dominating >>>> the CPU because it is doing CPU bound work, result is deadlock, even if >>>> you know you trust your RT process, there are many code which were >>>> written by you, i.e the libc and any other libraries using threading >>>> are completely not ready for RT use. >>>> How ever let a thread in kernel have higher priority over a thread >>>> running userland code will fix such a deadlock in kernel. >>> Put another way, the time-sharing thread that I don't care about (sshd, or >>> some other monitoring daemon, etc.) is stealing a resource I care about >>> (time, in the form of CPU cycles) from my RT process that is critical to >>> getting my work done. >>> >>> Beyond that a few more points: >>> >>> - You are ignoring "tools, not policy". You don't know what is in my binary >>> (and I can't really tell you). Assume for a minute that I'm not completely >>> dumb and can write userland code that is safe to run at this high of a >>> priority level. You already trust me to write code in the kernel that runs >>> at even higher priority now. :) >>> - You repeatedly keep missing (ignoring?) the fact that this is _optional_. >>> Users have to intentionally decide to enable this, and there are users who >>> do _need_ this functionality. >>> - You have also missed that this has always been true for idprio processes >>> (and is in fact why we restrict idprio to root), so this is not "new". >>> - Finally, you also are missing that this can already happen _now_ for plain >>> old time sharing processes if the thread holding the resource doesn't ever >>> do a sleep that raises the priority. >>> >>> For example, if a time-sharing thread with some typical priority>= >>> PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for >>> that file (if it is unlocked) and hold that lock while it's priority is>= >>> PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes >>> up sshd for a new SSH connection, the interrupt thread will preempt the >>> thread holding the vnode lock, and sshd will be executed instead of the >>> thread holding the vnode lock when the ithread finishes. If sshd needs the >>> vnode lock that the original thread holds, then sshd will block until the >>> original thread is rescheduled due to the random fates of time and releases >>> the vnode lock. >>> >>> In summary, the kernel sleep priorities do _not_ serve to prevent all >>> priority inversions, what they do accomplish is giving preferential treatment >>> to idle, "interactive" threads. >>> >>> A bit more information on my use case btw: >>> >>> My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the >>> CPU from the global cpuset and ensure no interrupts are routed to that CPU). >>> The problem I have is that if my RT process blocks on a lock (e.g. a lock on a >>> VM object during a page fault), then I want the RT thread to lend its RT >>> priority to the thread that holds the lock over on another CPU so that the lock >>> can be released as quickly as possible. This use case is perfectly safe (the >>> RT thread is not preempting other threads, instead other threads are partitioned >>> off into a separate set of available CPUs). What I need is to ensure that the >>> syncer or pagedaemon or whoever holds the lock I need gets a chance to run right >>> away when it holds a lock that I need. >>> >> What I meant is that whenever thread is in kernel mode, it always has >> higher priority over thread running user code, and all threads in kernel >> mode may have same priority except those interrupt threads which >> has higher priority, but this should be carefully designed to use >> mutex and spinlock between interrupt threads and other threads, >> mutex uses turnstile to propagate priority, spin lock disables >> interrupt, otherwise there still is priority inversion in kernel, i.e >> rwlock, sx lock. > Except that this isn't really true. Really, if a thread is asleep in > select() or poll() or kevent(), what critical resource is it holding? I had > the same view originally when the current set of priorites were setup. > However, I've had to change it since I now have a real-world use case for > rtprio. > > First, I think this is the easy part of the argument: Can you agree that if > a RT process is in the kernel, it should have priority over a TS process in > the kernel? Thus, if a RT process blocks in the kernel, it would need to > lend enough of a priority to the lock holder to preempt any TS process in the > kernel, yes? If so, that argues for RT processes in the kernel having a > higher priority than all the other kernel sleep priorities. > > The second part is harder, and that is what happens when a RT process is in > userland. First, some food for thought. Do you realize that currently, the > syncer and pagedaemon threads run at PVM? This is intentional so that these > processes run in the "background" even though they are in the kernel. > Specifically, when sshd does wakeup from a sleep at PSOCK or the like, the > kernel doesn't just let it run in the kernel, it effectively lets it keep > that PSOCK priority in userland until the next context switch due to an > interrupt or the quantum expiring. This means that when you ssh into a box, > the your interactive typing ends up preempting syncer and pagedaemon. And > this is a good thing, because syncer and pagedaemon are _background_ > processes. Preempting them only for the kernel portion of sshd (as the > change to userret in both your proposal and my original #2 would do) would > not really favor interactive processes because the user relies on the > userland portion of an interactive process to run, too (userland is the part > that echos back the characters as they are typed). So even now, with TS > threads, we have TS userland code that is _more important_ than code in the > kernel. Another example is the idlezero kernel process. This is kernel > code, but is easily far less important than pretty much all userland code. > Kernel code is _not_ always more important than userland code. It often is, > but it sometimes isn't. If you can accept that, then it is no longer strange > to consider that even the userland code in a RT process is more important > than kernel code in a TS process. > > In our case we do chew up a lot of CPU in userland for our RT processes, but > we handle this case by using dedicated CPUs. Our RT processes really are the > most important processes on the box. > I have to agree with John on this one.. The real-time property for threads is a dangerous tool which we allow a system "Adminstrator" (i.e. someone with root,) to do some things. It is perfectly understood that doing the WRONG thing will negatively impact the system (maybe even make it unworkable). However the decision to set a process to realtime mode means that the Administrator has decided that that process/thread is more importnat than everything else in the system. One could argue about whether this applies to interrupts, but in the modern day of even cell phones having multiple processors, it gets harder and harder to make the case that userland code should not be able to pre-empt or block kernel code. I think this philosophy has always been true.. As Terry Lambert used to say at the beginning of the project: Unix's job is to delver the bullet to where-ever the user wants to put it, including the user's foot. When you are the administrator you get to have a pretty big foot. In addition many of freeBSD's 'Users' are in fact producers of 'product' boxes. They know EXACTLY what is running on the system, and where, and want the ability to label a process in the way that John shows. For them it is the primary purpose of the box to do task X and doing task X comes before all other tasks, possibly even non related interrupts. Julian From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 01:59:06 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ACDE01065670; Fri, 17 Dec 2010 01:59:06 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 990588FC13; Fri, 17 Dec 2010 01:59:06 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBH1x4MI074932; Fri, 17 Dec 2010 01:59:05 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D0AC3EC.1040701@freebsd.org> Date: Fri, 17 Dec 2010 09:59:08 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <201012150938.44217.jhb@freebsd.org> <4D0992B5.7060005@freebsd.org> <201012160940.58116.jhb@freebsd.org> In-Reply-To: <201012160940.58116.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 01:59:06 -0000 John Baldwin wrote: > On Wednesday, December 15, 2010 11:16:53 pm David Xu wrote: >> John Baldwin wrote: >>> On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: >>>> John Baldwin wrote: >>>>> On Monday, December 13, 2010 8:30:24 pm David Xu wrote: >>>>>> John Baldwin wrote: >>>>>>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >>>>>>>> John Baldwin wrote: >>>>>>>>> The current layout breaks up the global thread priority space (0 - 255) >>>>>>> into a >>>>>>>>> couple of bands: >>>>>>>>> >>>>>>>>> 0 - 63 : interrupt threads >>>>>>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>>>>>>>> 128 - 159 : real-time user threads (rtprio) >>>>>>>>> 160 - 223 : time-sharing user threads >>>>>>>>> 224 - 255 : idle threads (idprio and kernel idle procs) >>>>>>>>> >>>>>>>>> If we decide to change the behavior I see two possible fixes: >>>>>>>>> >>>>>>>>> 1) (easy) just move the real-time priority range above the kernel sleep >>>>>>>>> priority range >>>>>>>> Would not this cause a priority inversion when an RT process >>>>>>>> enters the kernel mode? >>>>>>> How so? Note that timesharing threads are not "bumped" to a kernel sleep >>>>>>> priority when they enter the kernel either. The kernel sleep priorities are >>>>>>> purely a way for certain sleep channels to cause a thread to be treated as >>>>>>> interactive and give it a priority boost to favor interactive threads. >>>>>>> Threads in the kernel do not automatically have higher priority than threads >>>>>>> not in the kernel. Keep in mind that all stopped threads (threads not >>>>>>> executing) are always in the kernel when they stop. >>>>>> I have requirement to make a thread running in kernel has more higher >>>>>> priority over a thread running userland code, because our kernel >>>>>> mutex is not sleepable which does not like Solaris did, I have to use >>>>>> semaphore like code in kern_umtx.c to lock a chain, which allows me >>>>>> to read and write user address space, this is how umtxq_busy() did, >>>>>> but it does not prevent a userland thread from preempting a thread >>>>>> which locked the chain, if a realtime thread preempts a thread >>>>>> locked the chain, it may lock up whole processes using pthread. >>>>>> I think our realtime scheduling is not very useful, it is too easy >>>>>> to lock up system. >>>>> Users are not forced to use rtprio. They choose to do so, and they have to >>>>> be root to enable it (either directly or by extending root privileges via >>>>> sudo or some such). Just because you don't have a use case for it doesn't >>>>> mean that other people do not. Right now there is no way possible to say >>>>> that a given userland process is more important than 'sshd' (or any other >>>>> daemon) blocked in poll/select/kevent waiting for a packet. However, there >>>>> are use cases where other long-running userland processes are in fact far >>>>> more important than sshd (or similar processes such as getty, etc.). >>>>> >>>> You still don't answer me about how to avoid a time-sharing thread >>>> holding a critical kernel resource which preempted by a user RT thread, >>>> and later the RT thread requires the resource, but the time-sharing >>>> thread has no chance to run because another RT thread is dominating >>>> the CPU because it is doing CPU bound work, result is deadlock, even if >>>> you know you trust your RT process, there are many code which were >>>> written by you, i.e the libc and any other libraries using threading >>>> are completely not ready for RT use. >>>> How ever let a thread in kernel have higher priority over a thread >>>> running userland code will fix such a deadlock in kernel. >>> Put another way, the time-sharing thread that I don't care about (sshd, or >>> some other monitoring daemon, etc.) is stealing a resource I care about >>> (time, in the form of CPU cycles) from my RT process that is critical to >>> getting my work done. >>> >>> Beyond that a few more points: >>> >>> - You are ignoring "tools, not policy". You don't know what is in my binary >>> (and I can't really tell you). Assume for a minute that I'm not completely >>> dumb and can write userland code that is safe to run at this high of a >>> priority level. You already trust me to write code in the kernel that runs >>> at even higher priority now. :) >>> - You repeatedly keep missing (ignoring?) the fact that this is _optional_. >>> Users have to intentionally decide to enable this, and there are users who >>> do _need_ this functionality. >>> - You have also missed that this has always been true for idprio processes >>> (and is in fact why we restrict idprio to root), so this is not "new". >>> - Finally, you also are missing that this can already happen _now_ for plain >>> old time sharing processes if the thread holding the resource doesn't ever >>> do a sleep that raises the priority. >>> >>> For example, if a time-sharing thread with some typical priority >= >>> PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for >>> that file (if it is unlocked) and hold that lock while it's priority is >= >>> PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes >>> up sshd for a new SSH connection, the interrupt thread will preempt the >>> thread holding the vnode lock, and sshd will be executed instead of the >>> thread holding the vnode lock when the ithread finishes. If sshd needs the >>> vnode lock that the original thread holds, then sshd will block until the >>> original thread is rescheduled due to the random fates of time and releases >>> the vnode lock. >>> >>> In summary, the kernel sleep priorities do _not_ serve to prevent all >>> priority inversions, what they do accomplish is giving preferential treatment >>> to idle, "interactive" threads. >>> >>> A bit more information on my use case btw: >>> >>> My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the >>> CPU from the global cpuset and ensure no interrupts are routed to that CPU). >>> The problem I have is that if my RT process blocks on a lock (e.g. a lock on a >>> VM object during a page fault), then I want the RT thread to lend its RT >>> priority to the thread that holds the lock over on another CPU so that the lock >>> can be released as quickly as possible. This use case is perfectly safe (the >>> RT thread is not preempting other threads, instead other threads are partitioned >>> off into a separate set of available CPUs). What I need is to ensure that the >>> syncer or pagedaemon or whoever holds the lock I need gets a chance to run right >>> away when it holds a lock that I need. >>> >> What I meant is that whenever thread is in kernel mode, it always has >> higher priority over thread running user code, and all threads in kernel >> mode may have same priority except those interrupt threads which >> has higher priority, but this should be carefully designed to use >> mutex and spinlock between interrupt threads and other threads, >> mutex uses turnstile to propagate priority, spin lock disables >> interrupt, otherwise there still is priority inversion in kernel, i.e >> rwlock, sx lock. > > Except that this isn't really true. Really, if a thread is asleep in > select() or poll() or kevent(), what critical resource is it holding? I had > the same view originally when the current set of priorites were setup. > However, I've had to change it since I now have a real-world use case for > rtprio. > > First, I think this is the easy part of the argument: Can you agree that if > a RT process is in the kernel, it should have priority over a TS process in > the kernel? Thus, if a RT process blocks in the kernel, it would need to > lend enough of a priority to the lock holder to preempt any TS process in the > kernel, yes? If so, that argues for RT processes in the kernel having a > higher priority than all the other kernel sleep priorities. > Yes, RT processes should preempt any TS, but how can you lend priority for lockmgr and sx lock and all locking based on msleep() and wakeup() ? That's why I try to fix it, they have priority inversion, to fix the problem, a POSIX priority-protect mutex like semantic is needed, that when a lock is locked, thread needs to raise its priority at high enough priority to protect priority inversion, when a thread tries to lock a lower priority ceiling lock, it should abort, this means lock order reversal ? kernel may panic for correctness. Consequences of priority inversion depends on application, it may be dangerous or trivial, but it is not correct. > The second part is harder, and that is what happens when a RT process is in > userland. First, some food for thought. Do you realize that currently, the > syncer and pagedaemon threads run at PVM? This is intentional so that these > processes run in the "background" even though they are in the kernel. > Specifically, when sshd does wakeup from a sleep at PSOCK or the like, the > kernel doesn't just let it run in the kernel, it effectively lets it keep > that PSOCK priority in userland until the next context switch due to an > interrupt or the quantum expiring. This means that when you ssh into a box, > the your interactive typing ends up preempting syncer and pagedaemon. And > this is a good thing, because syncer and pagedaemon are _background_ > processes. Preempting them only for the kernel portion of sshd (as the > change to userret in both your proposal and my original #2 would do) would > not really favor interactive processes because the user relies on the > userland portion of an interactive process to run, too (userland is the part > that echos back the characters as they are typed). So even now, with TS > threads, we have TS userland code that is _more important_ than code in the > kernel. Another example is the idlezero kernel process. This is kernel > code, but is easily far less important than pretty much all userland code. > Kernel code is _not_ always more important than userland code. It often is, > but it sometimes isn't. If you can accept that, then it is no longer strange > to consider that even the userland code in a RT process is more important > than kernel code in a TS process. > I think this may not the intention that a TS thread tries to keep its high priority over a kernel threads which is important, I guess the original idea is to eliminate extra context switch between TS, the TS priority algorithm may have some errors, and this keeps extra context switch away, for example, current code still sets PPQ 4 but not 1, this further make priorities fuzzy. Thinking about a TS priority algorithm like the one in current kernel: assume two threads A and B both start from same priority 160, they are CPU pigs, a small granularity (N) of clock ticks causes thread A drop its priority to 161, and it found thread B has higher priority 160, now B should preempt A, and context switched. 2 * N ticks later, thread B drops its priority to 162, now it found thread A has higher priority 161, it switches context, let thread A run. The N is far less than scheduler's quantum. This can be called as an error, because thread are not scheduled based on quantum. However existing algorithm is incorrect for RT scheduling, RT scheduling is strictly based on static priority, the result of the existing algorithm is priority inversion (PPQ = 4, and not preempt at user boundary), because RT scheduling is based on static priority algorithm, the priority inversion will be forever, it is unlike TS algorithm which will lower cpu pig to low priority, and the priority inversion is temporarily killed. > In our case we do chew up a lot of CPU in userland for our RT processes, but > we handle this case by using dedicated CPUs. Our RT processes really are the > most important processes on the box. > From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 06:20:46 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 535C81065672; Fri, 17 Dec 2010 06:20:46 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 23B2B8FC14; Fri, 17 Dec 2010 06:20:46 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBH6Kilh067082; Fri, 17 Dec 2010 06:20:44 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4D0B013F.3060203@freebsd.org> Date: Fri, 17 Dec 2010 14:20:47 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: Julian Elischer References: <201012101050.45214.jhb@freebsd.org> <201012150938.44217.jhb@freebsd.org> <4D0992B5.7060005@freebsd.org> <201012160940.58116.jhb@freebsd.org> <4D0A54A8.90901@freebsd.org> In-Reply-To: <4D0A54A8.90901@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 06:20:46 -0000 Julian Elischer wrote: > On 12/16/10 6:40 AM, John Baldwin wrote: >> On Wednesday, December 15, 2010 11:16:53 pm David Xu wrote: >>> John Baldwin wrote: >>>> On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: >>>>> John Baldwin wrote: >>>>>> On Monday, December 13, 2010 8:30:24 pm David Xu wrote: >>>>>>> John Baldwin wrote: >>>>>>>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: >>>>>>>>> John Baldwin wrote: >>>>>>>>>> The current layout breaks up the global thread priority space >>>>>>>>>> (0 - 255) >>>>>>>> into a >>>>>>>>>> couple of bands: >>>>>>>>>> >>>>>>>>>> 0 - 63 : interrupt threads >>>>>>>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) >>>>>>>>>> 128 - 159 : real-time user threads (rtprio) >>>>>>>>>> 160 - 223 : time-sharing user threads >>>>>>>>>> 224 - 255 : idle threads (idprio and kernel idle procs) >>>>>>>>>> >>>>>>>>>> If we decide to change the behavior I see two possible fixes: >>>>>>>>>> >>>>>>>>>> 1) (easy) just move the real-time priority range above the >>>>>>>>>> kernel sleep >>>>>>>>>> priority range >>>>>>>>> Would not this cause a priority inversion when an RT process >>>>>>>>> enters the kernel mode? >>>>>>>> How so? Note that timesharing threads are not "bumped" to a >>>>>>>> kernel sleep >>>>>>>> priority when they enter the kernel either. The kernel sleep >>>>>>>> priorities are >>>>>>>> purely a way for certain sleep channels to cause a thread to be >>>>>>>> treated as >>>>>>>> interactive and give it a priority boost to favor interactive >>>>>>>> threads. >>>>>>>> Threads in the kernel do not automatically have higher priority >>>>>>>> than threads >>>>>>>> not in the kernel. Keep in mind that all stopped threads >>>>>>>> (threads not >>>>>>>> executing) are always in the kernel when they stop. >>>>>>> I have requirement to make a thread running in kernel has more >>>>>>> higher >>>>>>> priority over a thread running userland code, because our kernel >>>>>>> mutex is not sleepable which does not like Solaris did, I have to >>>>>>> use >>>>>>> semaphore like code in kern_umtx.c to lock a chain, which allows me >>>>>>> to read and write user address space, this is how umtxq_busy() did, >>>>>>> but it does not prevent a userland thread from preempting a thread >>>>>>> which locked the chain, if a realtime thread preempts a thread >>>>>>> locked the chain, it may lock up whole processes using pthread. >>>>>>> I think our realtime scheduling is not very useful, it is too easy >>>>>>> to lock up system. >>>>>> Users are not forced to use rtprio. They choose to do so, and >>>>>> they have to >>>>>> be root to enable it (either directly or by extending root >>>>>> privileges via >>>>>> sudo or some such). Just because you don't have a use case for it >>>>>> doesn't >>>>>> mean that other people do not. Right now there is no way possible >>>>>> to say >>>>>> that a given userland process is more important than 'sshd' (or >>>>>> any other >>>>>> daemon) blocked in poll/select/kevent waiting for a packet. >>>>>> However, there >>>>>> are use cases where other long-running userland processes are in >>>>>> fact far >>>>>> more important than sshd (or similar processes such as getty, etc.). >>>>>> >>>>> You still don't answer me about how to avoid a time-sharing thread >>>>> holding a critical kernel resource which preempted by a user RT >>>>> thread, >>>>> and later the RT thread requires the resource, but the time-sharing >>>>> thread has no chance to run because another RT thread is dominating >>>>> the CPU because it is doing CPU bound work, result is deadlock, >>>>> even if >>>>> you know you trust your RT process, there are many code which were >>>>> written by you, i.e the libc and any other libraries using threading >>>>> are completely not ready for RT use. >>>>> How ever let a thread in kernel have higher priority over a thread >>>>> running userland code will fix such a deadlock in kernel. >>>> Put another way, the time-sharing thread that I don't care about >>>> (sshd, or >>>> some other monitoring daemon, etc.) is stealing a resource I care about >>>> (time, in the form of CPU cycles) from my RT process that is >>>> critical to >>>> getting my work done. >>>> >>>> Beyond that a few more points: >>>> >>>> - You are ignoring "tools, not policy". You don't know what is in >>>> my binary >>>> (and I can't really tell you). Assume for a minute that I'm not >>>> completely >>>> dumb and can write userland code that is safe to run at this high >>>> of a >>>> priority level. You already trust me to write code in the kernel >>>> that runs >>>> at even higher priority now. :) >>>> - You repeatedly keep missing (ignoring?) the fact that this is >>>> _optional_. >>>> Users have to intentionally decide to enable this, and there are >>>> users who >>>> do _need_ this functionality. >>>> - You have also missed that this has always been true for idprio >>>> processes >>>> (and is in fact why we restrict idprio to root), so this is not >>>> "new". >>>> - Finally, you also are missing that this can already happen _now_ >>>> for plain >>>> old time sharing processes if the thread holding the resource >>>> doesn't ever >>>> do a sleep that raises the priority. >>>> >>>> For example, if a time-sharing thread with some typical priority>= >>>> PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode >>>> lock for >>>> that file (if it is unlocked) and hold that lock while it's priority >>>> is>= >>>> PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet >>>> that wakes >>>> up sshd for a new SSH connection, the interrupt thread will preempt the >>>> thread holding the vnode lock, and sshd will be executed instead of the >>>> thread holding the vnode lock when the ithread finishes. If sshd >>>> needs the >>>> vnode lock that the original thread holds, then sshd will block >>>> until the >>>> original thread is rescheduled due to the random fates of time and >>>> releases >>>> the vnode lock. >>>> >>>> In summary, the kernel sleep priorities do _not_ serve to prevent all >>>> priority inversions, what they do accomplish is giving preferential >>>> treatment >>>> to idle, "interactive" threads. >>>> >>>> A bit more information on my use case btw: >>>> >>>> My RT processes are each assigned a _dedicated_ CPU via cpuset (we >>>> remove the >>>> CPU from the global cpuset and ensure no interrupts are routed to >>>> that CPU). >>>> The problem I have is that if my RT process blocks on a lock (e.g. a >>>> lock on a >>>> VM object during a page fault), then I want the RT thread to lend >>>> its RT >>>> priority to the thread that holds the lock over on another CPU so >>>> that the lock >>>> can be released as quickly as possible. This use case is perfectly >>>> safe (the >>>> RT thread is not preempting other threads, instead other threads are >>>> partitioned >>>> off into a separate set of available CPUs). What I need is to >>>> ensure that the >>>> syncer or pagedaemon or whoever holds the lock I need gets a chance >>>> to run right >>>> away when it holds a lock that I need. >>>> >>> What I meant is that whenever thread is in kernel mode, it always has >>> higher priority over thread running user code, and all threads in kernel >>> mode may have same priority except those interrupt threads which >>> has higher priority, but this should be carefully designed to use >>> mutex and spinlock between interrupt threads and other threads, >>> mutex uses turnstile to propagate priority, spin lock disables >>> interrupt, otherwise there still is priority inversion in kernel, i.e >>> rwlock, sx lock. >> Except that this isn't really true. Really, if a thread is asleep in >> select() or poll() or kevent(), what critical resource is it holding? >> I had >> the same view originally when the current set of priorites were setup. >> However, I've had to change it since I now have a real-world use case for >> rtprio. >> >> First, I think this is the easy part of the argument: Can you agree >> that if >> a RT process is in the kernel, it should have priority over a TS >> process in >> the kernel? Thus, if a RT process blocks in the kernel, it would need to >> lend enough of a priority to the lock holder to preempt any TS process >> in the >> kernel, yes? If so, that argues for RT processes in the kernel having a >> higher priority than all the other kernel sleep priorities. >> >> The second part is harder, and that is what happens when a RT process >> is in >> userland. First, some food for thought. Do you realize that >> currently, the >> syncer and pagedaemon threads run at PVM? This is intentional so that >> these >> processes run in the "background" even though they are in the kernel. >> Specifically, when sshd does wakeup from a sleep at PSOCK or the like, >> the >> kernel doesn't just let it run in the kernel, it effectively lets it keep >> that PSOCK priority in userland until the next context switch due to an >> interrupt or the quantum expiring. This means that when you ssh into >> a box, >> the your interactive typing ends up preempting syncer and pagedaemon. >> And >> this is a good thing, because syncer and pagedaemon are _background_ >> processes. Preempting them only for the kernel portion of sshd (as the >> change to userret in both your proposal and my original #2 would do) >> would >> not really favor interactive processes because the user relies on the >> userland portion of an interactive process to run, too (userland is >> the part >> that echos back the characters as they are typed). So even now, with TS >> threads, we have TS userland code that is _more important_ than code >> in the >> kernel. Another example is the idlezero kernel process. This is kernel >> code, but is easily far less important than pretty much all userland >> code. >> Kernel code is _not_ always more important than userland code. It >> often is, >> but it sometimes isn't. If you can accept that, then it is no longer >> strange >> to consider that even the userland code in a RT process is more important >> than kernel code in a TS process. >> >> In our case we do chew up a lot of CPU in userland for our RT >> processes, but >> we handle this case by using dedicated CPUs. Our RT processes really >> are the >> most important processes on the box. >> > > I have to agree with John on this one.. > The real-time property for threads is a dangerous tool which we allow a > system "Adminstrator" (i.e. someone with root,) to do some things. > It is perfectly understood that doing the WRONG thing will negatively > impact the system (maybe even make it unworkable). However the decision to > set a process to realtime mode means that the Administrator has decided > that > that process/thread is more importnat than everything else in the system. > One could argue about whether this applies to interrupts, but in the > modern day > of even cell phones having multiple processors, it gets harder and harder > to make the case that userland code should not be able to pre-empt > or block kernel code. > > I think this philosophy has always been true.. As Terry Lambert used to > say > at the beginning of the project: Unix's job is to delver the bullet to > where-ever the > user wants to put it, including the user's foot. When you are the > administrator > you get to have a pretty big foot. > > In addition many of freeBSD's 'Users' are in fact producers of 'product' > boxes. > They know EXACTLY what is running on the system, and where, and want the > ability > to label a process in the way that John shows. For them it is the > primary purpose > of the box to do task X and doing task X comes before all other tasks, > possibly even > non related interrupts. > > Julian > The main problem is correctness, not if root can use it or not, I know it is his machine, he can do whatever he wants to do. :-) I have to repeat: The question is can the kernel correctly schedule RT threads ? no. The fact is so many lock semantics are not RT safe, lockmgr, sx lock, rwlock and other locks based on msleep/wakeup which do not use priority propagating or do not protect priority have priority inversion. Also the PPQ = 4 is incorrect for RT scheduling, it is another kind of priority inversion. So what can we do here ? if mutex and spin lock can not be used, it should either raise thread's priority to a high enough level or all threads have equal priority in kernel. If future changes can not fix the above problems, those changes are nonsense. From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 12:36:25 2010 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D88C3106564A; Fri, 17 Dec 2010 12:36:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B37C38FC17; Fri, 17 Dec 2010 12:36:25 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 6E8EA46B7E; Fri, 17 Dec 2010 07:36:25 -0500 (EST) Date: Fri, 17 Dec 2010 12:36:25 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Bjoern A. Zeeb" In-Reply-To: <20101215230640.K6126@maildrop.int.zabbadoz.net> Message-ID: References: <20101215230640.K6126@maildrop.int.zabbadoz.net> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Future of netnatm: looking for testers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 12:36:25 -0000 On Wed, 15 Dec 2010, Bjoern A. Zeeb wrote: > I would request two things: > > 1) the extra couple of months; this will not prevent the evitable removal > yet only defer it. Sounds good to me -- my goal is not to remove NETNATM, rather, the remove code that doesn't compile or work. I'm happy to sit on this for a while and see if things improve; fixing the former is great, fixing the latter would be even better :-). (I wonder if Harti is in a situation to test any of this still?) Robert > > 2) If anyone of you is using (or want to be able to (continue to) use) NATM > or can test things, I re-enabled it with most of the code in HEAD and > the patch is available for 8,x as well but need to work with somoene > to make sure it'll really work. I am willing to spend more time on it > if you send me an email. > > Best Regards, > Bjoern > > ------------------------------------------------------------------------ >> Author: bz >> Date: Wed Dec 15 22:58:45 2010 >> New Revision: 216466 >> URL: http://svn.freebsd.org/changeset/base/216466 >> >> Log: >> Bring back (most of) NATM to avoid further bitrot after r186119. >> Keep three lines disabled which I am unsure if they had been used at all. >> This will allow us to seek testers and possibly bring it all back. > > If you have the ability to test (on 8.x or HEAD) or are using NATM, > please get in contact with me. > > > >> Discussed with: rwatson >> MFC after: 7 weeks >> >> Modified: >> head/sys/conf/NOTES >> head/sys/netinet/if_atm.c > ------------------------------------------------------------------------ > > -- > Bjoern A. Zeeb Welcome a new stage of life. > Going to jail sucks -- All my daemons like it! > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html > From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 12:56:43 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB39510656A3; Fri, 17 Dec 2010 12:56:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6BCB28FC14; Fri, 17 Dec 2010 12:56:43 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 1CC7D46B58; Fri, 17 Dec 2010 07:56:43 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D3C618A01D; Fri, 17 Dec 2010 07:56:41 -0500 (EST) From: John Baldwin To: David Xu Date: Fri, 17 Dec 2010 07:52:06 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012160940.58116.jhb@freebsd.org> <4D0AC3EC.1040701@freebsd.org> In-Reply-To: <4D0AC3EC.1040701@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012170752.06540.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Fri, 17 Dec 2010 07:56:41 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 12:56:43 -0000 On Thursday, December 16, 2010 8:59:08 pm David Xu wrote: > John Baldwin wrote: > > On Wednesday, December 15, 2010 11:16:53 pm David Xu wrote: > >> John Baldwin wrote: > >>> On Tuesday, December 14, 2010 8:40:12 pm David Xu wrote: > >>>> John Baldwin wrote: > >>>>> On Monday, December 13, 2010 8:30:24 pm David Xu wrote: > >>>>>> John Baldwin wrote: > >>>>>>> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > >>>>>>>> John Baldwin wrote: > >>>>>>>>> The current layout breaks up the global thread priority space (0 - 255) > >>>>>>> into a > >>>>>>>>> couple of bands: > >>>>>>>>> > >>>>>>>>> 0 - 63 : interrupt threads > >>>>>>>>> 64 - 127 : kernel sleep priorities (PSOCK, etc.) > >>>>>>>>> 128 - 159 : real-time user threads (rtprio) > >>>>>>>>> 160 - 223 : time-sharing user threads > >>>>>>>>> 224 - 255 : idle threads (idprio and kernel idle procs) > >>>>>>>>> > >>>>>>>>> If we decide to change the behavior I see two possible fixes: > >>>>>>>>> > >>>>>>>>> 1) (easy) just move the real-time priority range above the kernel sleep > >>>>>>>>> priority range > >>>>>>>> Would not this cause a priority inversion when an RT process > >>>>>>>> enters the kernel mode? > >>>>>>> How so? Note that timesharing threads are not "bumped" to a kernel sleep > >>>>>>> priority when they enter the kernel either. The kernel sleep priorities are > >>>>>>> purely a way for certain sleep channels to cause a thread to be treated as > >>>>>>> interactive and give it a priority boost to favor interactive threads. > >>>>>>> Threads in the kernel do not automatically have higher priority than threads > >>>>>>> not in the kernel. Keep in mind that all stopped threads (threads not > >>>>>>> executing) are always in the kernel when they stop. > >>>>>> I have requirement to make a thread running in kernel has more higher > >>>>>> priority over a thread running userland code, because our kernel > >>>>>> mutex is not sleepable which does not like Solaris did, I have to use > >>>>>> semaphore like code in kern_umtx.c to lock a chain, which allows me > >>>>>> to read and write user address space, this is how umtxq_busy() did, > >>>>>> but it does not prevent a userland thread from preempting a thread > >>>>>> which locked the chain, if a realtime thread preempts a thread > >>>>>> locked the chain, it may lock up whole processes using pthread. > >>>>>> I think our realtime scheduling is not very useful, it is too easy > >>>>>> to lock up system. > >>>>> Users are not forced to use rtprio. They choose to do so, and they have to > >>>>> be root to enable it (either directly or by extending root privileges via > >>>>> sudo or some such). Just because you don't have a use case for it doesn't > >>>>> mean that other people do not. Right now there is no way possible to say > >>>>> that a given userland process is more important than 'sshd' (or any other > >>>>> daemon) blocked in poll/select/kevent waiting for a packet. However, there > >>>>> are use cases where other long-running userland processes are in fact far > >>>>> more important than sshd (or similar processes such as getty, etc.). > >>>>> > >>>> You still don't answer me about how to avoid a time-sharing thread > >>>> holding a critical kernel resource which preempted by a user RT thread, > >>>> and later the RT thread requires the resource, but the time-sharing > >>>> thread has no chance to run because another RT thread is dominating > >>>> the CPU because it is doing CPU bound work, result is deadlock, even if > >>>> you know you trust your RT process, there are many code which were > >>>> written by you, i.e the libc and any other libraries using threading > >>>> are completely not ready for RT use. > >>>> How ever let a thread in kernel have higher priority over a thread > >>>> running userland code will fix such a deadlock in kernel. > >>> Put another way, the time-sharing thread that I don't care about (sshd, or > >>> some other monitoring daemon, etc.) is stealing a resource I care about > >>> (time, in the form of CPU cycles) from my RT process that is critical to > >>> getting my work done. > >>> > >>> Beyond that a few more points: > >>> > >>> - You are ignoring "tools, not policy". You don't know what is in my binary > >>> (and I can't really tell you). Assume for a minute that I'm not completely > >>> dumb and can write userland code that is safe to run at this high of a > >>> priority level. You already trust me to write code in the kernel that runs > >>> at even higher priority now. :) > >>> - You repeatedly keep missing (ignoring?) the fact that this is _optional_. > >>> Users have to intentionally decide to enable this, and there are users who > >>> do _need_ this functionality. > >>> - You have also missed that this has always been true for idprio processes > >>> (and is in fact why we restrict idprio to root), so this is not "new". > >>> - Finally, you also are missing that this can already happen _now_ for plain > >>> old time sharing processes if the thread holding the resource doesn't ever > >>> do a sleep that raises the priority. > >>> > >>> For example, if a time-sharing thread with some typical priority >= > >>> PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for > >>> that file (if it is unlocked) and hold that lock while it's priority is >= > >>> PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes > >>> up sshd for a new SSH connection, the interrupt thread will preempt the > >>> thread holding the vnode lock, and sshd will be executed instead of the > >>> thread holding the vnode lock when the ithread finishes. If sshd needs the > >>> vnode lock that the original thread holds, then sshd will block until the > >>> original thread is rescheduled due to the random fates of time and releases > >>> the vnode lock. > >>> > >>> In summary, the kernel sleep priorities do _not_ serve to prevent all > >>> priority inversions, what they do accomplish is giving preferential treatment > >>> to idle, "interactive" threads. > >>> > >>> A bit more information on my use case btw: > >>> > >>> My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the > >>> CPU from the global cpuset and ensure no interrupts are routed to that CPU). > >>> The problem I have is that if my RT process blocks on a lock (e.g. a lock on a > >>> VM object during a page fault), then I want the RT thread to lend its RT > >>> priority to the thread that holds the lock over on another CPU so that the lock > >>> can be released as quickly as possible. This use case is perfectly safe (the > >>> RT thread is not preempting other threads, instead other threads are partitioned > >>> off into a separate set of available CPUs). What I need is to ensure that the > >>> syncer or pagedaemon or whoever holds the lock I need gets a chance to run right > >>> away when it holds a lock that I need. > >>> > >> What I meant is that whenever thread is in kernel mode, it always has > >> higher priority over thread running user code, and all threads in kernel > >> mode may have same priority except those interrupt threads which > >> has higher priority, but this should be carefully designed to use > >> mutex and spinlock between interrupt threads and other threads, > >> mutex uses turnstile to propagate priority, spin lock disables > >> interrupt, otherwise there still is priority inversion in kernel, i.e > >> rwlock, sx lock. > > > > Except that this isn't really true. Really, if a thread is asleep in > > select() or poll() or kevent(), what critical resource is it holding? I had > > the same view originally when the current set of priorites were setup. > > However, I've had to change it since I now have a real-world use case for > > rtprio. > > > > First, I think this is the easy part of the argument: Can you agree that if > > a RT process is in the kernel, it should have priority over a TS process in > > the kernel? Thus, if a RT process blocks in the kernel, it would need to > > lend enough of a priority to the lock holder to preempt any TS process in the > > kernel, yes? If so, that argues for RT processes in the kernel having a > > higher priority than all the other kernel sleep priorities. > > > > Yes, RT processes should preempt any TS, but how can you lend priority > for lockmgr and sx lock and all locking based on msleep() and wakeup() ? > That's why I try to fix it, they have priority inversion, to fix the > problem, a POSIX priority-protect mutex like semantic is needed, that > when a lock is locked, thread needs to raise its priority at high enough > priority to protect priority inversion, when a thread tries to lock a > lower priority ceiling lock, it should abort, this means lock order > reversal ? kernel may panic for correctness. > Consequences of priority inversion depends on application, it may be > dangerous or trivial, but it is not correct. Yes, we do not do priority lending for sleep locks, and to date we never have. This is not a new problem and moving RT priority higher is not introducing any _new_ problems. However, it does bring _new_ functionality that some people need. Just because you don't need it doesn't mean it isn't important. Don't let the perfect be the enemy of the good. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 14:06:43 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from alona.my.domain (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 57C29106564A; Fri, 17 Dec 2010 14:06:40 +0000 (UTC) (envelope-from davidxu@freebsd.org) Message-ID: <4D0B6E54.2070802@freebsd.org> Date: Fri, 17 Dec 2010 22:06:12 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.21 (X11/20090522) MIME-Version: 1.0 To: John Baldwin References: <201012101050.45214.jhb@freebsd.org> <201012160940.58116.jhb@freebsd.org> <4D0AC3EC.1040701@freebsd.org> <201012170752.06540.jhb@freebsd.org> In-Reply-To: <201012170752.06540.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 14:06:43 -0000 John Baldwin wrote: > Yes, we do not do priority lending for sleep locks, and to date we never > have. This is not a new problem and moving RT priority higher is not > introducing any _new_ problems. However, it does bring _new_ functionality > that some people need. Just because you don't need it doesn't mean it isn't > important. > > Don't let the perfect be the enemy of the good. > > I guess that your real requirement is preempting at user boundary for static priority thread, however current code does not. I doubt that preempting in kernel path which holding an unknown lock has any visible benefit for your application. Yes, perfect is not the enemy but the goal, isn't mutex with priority propagating for perfect ? From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 14:13:25 2010 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 130951065670; Fri, 17 Dec 2010 14:13:25 +0000 (UTC) (envelope-from Hartmut.Brandt@dlr.de) Received: from smtp4.dlr.de (smtp3.dlr.de [129.247.252.33]) by mx1.freebsd.org (Postfix) with ESMTP id 605E58FC18; Fri, 17 Dec 2010 14:13:23 +0000 (UTC) Received: from DLREXHUB01.intra.dlr.de ([172.21.152.130]) by smtp4.dlr.de with Microsoft SMTPSVC(6.0.3790.4675); Fri, 17 Dec 2010 15:01:19 +0100 Received: from beagle.kn.op.dlr.de (129.247.178.136) by smtp.dlr.de (172.21.152.151) with Microsoft SMTP Server (TLS) id 14.1.255.0; Fri, 17 Dec 2010 15:01:18 +0100 Date: Fri, 17 Dec 2010 15:01:20 +0100 From: Harti Brandt X-X-Sender: brandt_h@beagle.kn.op.dlr.de To: Robert Watson In-Reply-To: Message-ID: <20101217145756.N2417@beagle.kn.op.dlr.de> References: <20101215230640.K6126@maildrop.int.zabbadoz.net> X-OpenPGP-Key: harti@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [129.247.178.136] X-OriginalArrivalTime: 17 Dec 2010 14:01:19.0384 (UTC) FILETIME=[E1113980:01CB9DF2] Cc: freebsd-net@FreeBSD.org, "Bjoern A. Zeeb" , freebsd-arch@FreeBSD.org Subject: Re: Future of netnatm: looking for testers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Harti Brandt List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 14:13:25 -0000 On Fri, 17 Dec 2010, Robert Watson wrote: RW> RW>On Wed, 15 Dec 2010, Bjoern A. Zeeb wrote: RW> RW>> I would request two things: RW>> RW>> 1) the extra couple of months; this will not prevent the evitable removal RW>> yet only defer it. RW> RW>Sounds good to me -- my goal is not to remove NETNATM, rather, the remove RW>code that doesn't compile or work. I'm happy to sit on this for a while and RW>see if things improve; fixing the former is great, fixing the latter would be RW>even better :-). RW> RW>(I wonder if Harti is in a situation to test any of this still?) I have the equipment, but catastrophically no time. If there were a developer somewhere around the corner here, I could give him a couple of ATM cards and even a switch. sorry, harti RW>> RW>> 2) If anyone of you is using (or want to be able to (continue to) use) NATM RW>> or can test things, I re-enabled it with most of the code in HEAD and RW>> the patch is available for 8,x as well but need to work with somoene RW>> to make sure it'll really work. I am willing to spend more time on it RW>> if you send me an email. RW>> RW>> Best Regards, RW>> Bjoern RW>> RW>> ------------------------------------------------------------------------ RW>> > Author: bz RW>> > Date: Wed Dec 15 22:58:45 2010 RW>> > New Revision: 216466 RW>> > URL: http://svn.freebsd.org/changeset/base/216466 RW>> > RW>> > Log: RW>> > Bring back (most of) NATM to avoid further bitrot after r186119. RW>> > Keep three lines disabled which I am unsure if they had been used at RW>> > all. RW>> > This will allow us to seek testers and possibly bring it all back. RW>> RW>> If you have the ability to test (on 8.x or HEAD) or are using NATM, RW>> please get in contact with me. RW>> RW>> RW>> RW>> > Discussed with: rwatson RW>> > MFC after: 7 weeks RW>> > RW>> > Modified: RW>> > head/sys/conf/NOTES RW>> > head/sys/netinet/if_atm.c RW>> ------------------------------------------------------------------------ RW>> RW>> -- RW>> Bjoern A. Zeeb Welcome a new stage of life. RW>> Going to jail sucks -- All my daemons like it! RW>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html RW>> RW>_______________________________________________ RW>freebsd-arch@freebsd.org mailing list RW>http://lists.freebsd.org/mailman/listinfo/freebsd-arch RW>To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" RW> RW> From owner-freebsd-arch@FreeBSD.ORG Fri Dec 17 14:44:29 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84ADE106564A; Fri, 17 Dec 2010 14:44:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5534F8FC0C; Fri, 17 Dec 2010 14:44:29 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 9566F46B45; Fri, 17 Dec 2010 09:44:28 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B30418A009; Fri, 17 Dec 2010 09:44:27 -0500 (EST) From: John Baldwin To: David Xu Date: Fri, 17 Dec 2010 09:44:27 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012170752.06540.jhb@freebsd.org> <4D0B6E54.2070802@freebsd.org> In-Reply-To: <4D0B6E54.2070802@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012170944.27250.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Fri, 17 Dec 2010 09:44:27 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2010 14:44:29 -0000 On Friday, December 17, 2010 9:06:12 am David Xu wrote: > John Baldwin wrote: > > Yes, we do not do priority lending for sleep locks, and to date we never > > have. This is not a new problem and moving RT priority higher is not > > introducing any _new_ problems. However, it does bring _new_ functionality > > that some people need. Just because you don't need it doesn't mean it isn't > > important. > > > > Don't let the perfect be the enemy of the good. > > > > > I guess that your real requirement is preempting at user boundary > for static priority thread, however current code does not. I doubt that > preempting in kernel path which holding an unknown lock has any > visible benefit for your application. Yes, perfect is not the enemy but > the goal, isn't mutex with priority propagating for perfect ? Actually, in my case what I need is for some other process that holds a lock in the kernel that I need to run as soon as possible. I would be fine with preempting a different thread in the kernel so the thread holding the lock I need can run. It is true that in my case I don't really care about the priority of my thread since it runs on a dedicated CPU. What I really care about is the priority it lends to other threads. I need that to be higher than just about everything else. In my case I could even benefit from priority_propagation() sending an IPI to a remote CPU so the thread I just lent priority to can run, but that is somewhat unique due to my use of cpusets. Without a restricted cpuset you wouldn't need that since the running thread is about to block and will run the lock holder as the next thread if needed. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Sat Dec 18 06:16:47 2010 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBC7E106564A for ; Sat, 18 Dec 2010 06:16:47 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from mail2.fluidhosting.com (mx23.fluidhosting.com [204.14.89.6]) by mx1.freebsd.org (Postfix) with ESMTP id 5EF9C8FC08 for ; Sat, 18 Dec 2010 06:16:47 +0000 (UTC) Received: (qmail 9696 invoked by uid 399); 18 Dec 2010 05:50:06 -0000 Received: from localhost (HELO doug-optiplex.ka9q.net) (dougb@dougbarton.us@127.0.0.1) by localhost with ESMTPAM; 18 Dec 2010 05:50:06 -0000 X-Originating-IP: 127.0.0.1 X-Sender: dougb@dougbarton.us Message-ID: <4D0C4B8D.5040409@FreeBSD.org> Date: Fri, 17 Dec 2010 21:50:05 -0800 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.13) Gecko/20101210 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-arch@FreeBSD.org X-Enigmail-Version: 1.1.2 OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.org, freebsd-current@FreeBSD.org Subject: Discussion about upgrading BIND in RELENG_7 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Dec 2010 06:16:47 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 For those interested in the topic I have opened a discussion about the idea of upgrading BIND in RELENG_7 on freebsd-stable. You can find the post here: http://lists.freebsd.org/pipermail/freebsd-stable/2010-December/060640.html If you have anything to contribute please follow up on that list. Thanks, Doug - -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iQEcBAEBCAAGBQJNDEuMAAoJEFzGhvEaGryEJMgIAMDmYwcX9vLOd9nh+XnZk/7S WJ2brZWl0BSG57J369rp3wQnQvFh9xMnUdUG2gnMnQOR/JwLKeBsQdcVfxhL6RgT mzelwstEa+OzmS4+cj96jWZYwQN1jyT5jMJCrbdM7JKTTPZG5PhUJTYvy7w68qNm lhzdBbyQnw5iVKv/tsCU7m1ioSa7Aq1fRgj7O5/GkBAfcXSrF31S66LxRPtcM5OP Ebxk4ttmJxZx5HXbQkU8xMhluYGvUaVt2quUks7mqkJ83NR6wNyL5A7WiP5aRHoA UWj5bfCiACnblRRL89d2jT858okQ/eeqUBMZ8DQLzlOSKB/FNJxj7iIL+6+evX0= =22Q7 -----END PGP SIGNATURE-----