From owner-freebsd-arch@FreeBSD.ORG  Mon Dec 13 14:38:18 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 81B9D1065697;
	Mon, 13 Dec 2010 14:38:18 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 51F8C8FC2E;
	Mon, 13 Dec 2010 14:38:18 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id F168E46B03;
	Mon, 13 Dec 2010 09:38:17 -0500 (EST)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id E28808A027;
	Mon, 13 Dec 2010 09:38:16 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: David Xu <davidxu@freebsd.org>
Date: Mon, 13 Dec 2010 09:37:46 -0500
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; )
References: <201012101050.45214.jhb@freebsd.org> <4D02D90C.20503@freebsd.org>
In-Reply-To: <4D02D90C.20503@freebsd.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201012130937.46666.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Mon, 13 Dec 2010 09:38:17 -0500 (EST)
X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham
	version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx
Cc: arch@freebsd.org
Subject: Re: Realtime thread priorities
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Dec 2010 14:38:18 -0000

On Friday, December 10, 2010 8:51:08 pm David Xu wrote:
> John Baldwin wrote:
> > So I finally had a case today where I wanted to use rtprio but it doesn't seem 
> > very useful in its current state.  Specifically, I want to be able to tag 
> > certain user processes as being more important than any other user processes 
> > even to the point that if one of my important processes blocks on a mutex, the 
> > owner of that mutex should be more important than sshd being woken up from 
> > sbwait by new data (for example).  This doesn't work currently with rtprio due 
> > to the way the priorities are laid out (and I believe I probably argued for 
> > the current layout back when it was proposed).
> >
> > The current layout breaks up the global thread priority space (0 - 255) into a 
> > couple of bands:
> >
> >   0 -  63 : interrupt threads
> >  64 - 127 : kernel sleep priorities (PSOCK, etc.)
> > 128 - 159 : real-time user threads (rtprio)
> > 160 - 223 : time-sharing user threads
> > 224 - 255 : idle threads (idprio and kernel idle procs)
> >
> > The problem I am running into is that when a time-sharing thread goes to sleep 
> > in the kernel (waiting on select, socket data, tty, etc.) it actually ends up 
> > in the kernel priorities range (64 - 127).  This means when it wakes up it 
> > will trump (and preempt) a real-time user thread even though these processes 
> > nominally have a priority down in the 160 - 223 range.  We do drop the kernel 
> > sleep priority during userret(), but we don't recheck the scheduler queues to 
> > see if we should preempt the thread during userret(), so it effectively runs 
> > with the kernel sleep priority for the rest of the quantum while it is in 
> > userland.
> >
> > My first question is if this behavior is the desired behavior?  Originally I 
> > think I preferred the current layout because I thought a thread in the kernel 
> > should always have priority so it can release locks, etc.  However, priority 
> > propagation should actually handle the case of some very important thread 
> > needing a lock.  In my use case today where I actually want to use rtprio I 
> > think I want different behavior where the rtprio thread is more important than 
> > the thread waking up with PSOCK, etc.
> >
> > If we decide to change the behavior I see two possible fixes:
> >
> > 1) (easy) just move the real-time priority range above the kernel sleep 
> > priority range
> >
> >   
> 
> This is not always correct, a userland realtime process may not be 
> always more
> urgent than a normal time-sharing code which is backing up a file system 
> or doing
> some important things, for example receiving money account from a socket.

Err, no.  When a user has indicated that a process is rtprio, we should
assume that it is _always_ more important than a time-sharing process.  The
sole exclusion to this is when lending priority and that purpose is to let
the rtprio (or ithread) run as soon as possible anyway.

> Process sleeping in kernel seems doing really important thing, for example
> removing data from a device interrupt or writing into device, while a thread
> which is realtime consuming 100% cpu time might be a deadloop thread.

Note that rtprio requires root to enable, so if the user wants to rtprio a
buggy process that is their problem.  However, with our current system there
is _no_ way for me to ensure that my very important process doesn't get
preempted by a new sshd process.  Or more accurately, even if my very
important process has a dedicated CPU via cpuset but it blocks on a mutex,
the priority it lends to the owner of that mutex is not sufficient to prevent
the lock holder from being preempted by sshd when a new ssh connection
arrives.

The fact that ULE uses RT priority levels for interactive threads is also
problematic for me since even with RT moved above kernel sleep levels sshd
still ends up with an effective real-time priority of 0.

> > 2) (harder) make sched_userret() check the run queue to see if it should 
> > preempt when dropping the kernel sleep priority.  I think bde@ has suggested 
> > that we should do this for correctness previously (and I've had some old, 
> > unfinished patches to do this in a branch in p4 for several years).
> >
> >   
> This is too overhead, try it and benchmark it for real world application.

It depends on your real world application, but I do lean towards (1) as I
think (2) is too expensive.

-- 
John Baldwin