Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Apr 1999 08:43:09 -0500
From:      "Richard Seaman, Jr." <dick@tar.com>
To:        hackers@freebsd.org
Cc:        bde@freebsd.org
Subject:   Posix sched-*() functions and other scheduling issues
Message-ID:  <19990415084309.B442@tar.com>

next in thread | raw e-mail | index | archive | help
Sometime around the beginning of the year, Peter Dufault did quite
a bit of work revamping the posix priority code (sched_*() functions)
as well as fixing other problems in the scheduling code.  The results
are posted at

 ftp://ftp.freebsd.org/pub/FreeBSD/development/misc/PATCHES.sched

In a recent posting to the -hackers mailing list, he indicated he
would not commit the patches any time soon in part because he 
lacked the time to support them, and in part due to objections
from a core team member that the proposed changes were too
invasive.  In a followup posting He also suggested that someone
might want to rework them into a smaller set of patches.

I have believed that these patches solve a number of problems,
and have extracted (and very slightly modified) a portion of
his patches to address the problems noted below.  These modified
patches are available at

 http://lt.tar.com/sched.diff

I hope that some people will test them, and that if they test
well, that someone will commit them.  I will submit a PR with
the patches if I receive reports that they work ok (they
do here).

(1) donice() in kern_resource.c
-------------------------------
After the nice value is adjusted, maybe_resched() is called
indirectly via resetpriority().  However, the value of p_priority
does not get updated until after the call, so that maybe_resched()
does its test based on the old p_priority value.  The result is
that if rescheduling is warranted (ie. want_resched should be
set), it doesn't happen.

Also, if the process is on a run queue, it is not put on a new
run queue, if warranted by the priority change.  As result the
process will continue to be scheduled based on its old priority
until is actually selected to run.


(2) rtprio() in kern_resource.c
-------------------------------
If the priority of a process is adjusted by rtprio(), there is no
check made to see if want_resched should be set due to the 
change (maybe_resched() is never called).  In addition, if
the process is on a runqueue, its position on the runqueue will
not reflect its new priority and it will be scheduled based on
its old priority until it is selected to run.

In addition, it would be nice to use rtprio() to adjust real and
idle time priorities in the posix sched_*() functions.  However,
the only interface to rtprio() is for the rtprio() syscall, which
does copyin/copyout of paramters.  It would be helpful to
split this function, creating a "dortprio" function, analagous
to donice(), that can be called internally in the kernel.


(3) maybe_resched() in kern_synch.c
-------------------------------
If the process being checked (chk) is not the current process,
the result of maybe_resched is correct.  But, if the process
being checked is the current process, the result is the opposite
of what if should be (ie. the current process should be 
considered for rescheduling if its priority is made less favorable).

In addition, when checking whether a non-RTP_PRIO_NORMAL process
should be rescheduled, the value of proc->p_priority is used for
the test.  This value is meaningless for such processes.  In
addition, when the the process being checked is the current
process, there is no value maintained of what the previous priority
was (for non-RTP_PRIO_NORMAL processes), so if the comparison
is corrected, there is nothing to compare against.


(4) curpriority (global variable)
--------------------------------
The value of proc->p_priority is stored in a global variable curpriority
for the current process as it exists the kernel via userret(). The value of
curpriority is used in maybe_resched() to decide if a process should be
rescheduled.  This implicitly assumes that the curpriority of the last
process to exit the kernel is the proper basis for comparison.  On a
uniprocessor system curpriority should also be the value of curproc->p_priority.
Its not clear to me what this value represents on an SMP system.

If a global variable such as this is to be maintained, it should also reflect
the rtprio values of the process, so the comparison in maybe_resched() reflects
the rtprio values for rtp processes.

However, I wonder if its better to maintain this value on a per-proc
basis?  I don't really understand what the global variable curpriority
really means in an SMP environment.  It seems to me this could be the
p_priority value of the process running on a different processor if that
process was the last to exist the kernel.  I wonder if this is the 
comparison that is intended? 

	Note: The modified patches do not address the SMP issue I have
	raised here.  They follow the existing pattern of using a 
	global variable (but modified to reflect rtp information).

(5) resetpriority() in kern_synch.c
-----------------------------------
This function adjusts the value of p_usrpri and then calls maybe_resched().
However, maybe_resched() does not use the values of p_usrpri, but
only the value of p_priority (for a RTP_PRIO_NORMAL process).  Since 
p_priority has not generally been updated at this point, maybe_resched()
may generate the wrong result.  The proper sequence would appear to be:
	resetpriority() /* which updates p_usrpri */
	if (on a run queue) {
	     remove from the existing run queue /* uses old p_priority */
	     proc->p_priority = proc->p_usrpri
	     put on new run queue /* uses new p_priority */
	} else {
	     proc->p_priority = proc->p_usrpri
	}
	maybe_resched() /* uses updated p_priority to compare with 
	                 * previously stored value of curpriority 
	                 */



(6) statclock(), schedcpu(), forwarded_statclock() in kern_synch.c
kern_clock.c and mp_machdep.c)
------------------------------------------------------------------
Variables such as proc->p_estcpu, proc->p_usrpri, and proc->p_priority
which are used exclusively by RTP_PRIO_NORMAL processes are maintained
and adjusted for non-RTP_PRIO_NORMAL processes as well.  This is not
necessarily wrong, since these values could be used again if the
process priority class is reset to RTP_PRIO_NORMAL.  There is a
policy decision though, as to what these values should be upon return
to class RTP_PRIO_NORMAL.

	Note: The modified patches do not change the current code
	on this point, whereas Peter's patches modified the code
	to only update these statistics if the process in question
	was RTP_PRIO_NORMAL.


(7) sched_setparam() sched_getparam() in ksched.c
-------------------------------------------------
sched_setparam treats SCHED_OTHER as EINVAL, while sched_getparam
does not.  sched_getparam does not set the return value for the
priority in the case of SCHED_OTHER and does not return an error
value to indicate the return value is invalid.

This is also inconsistent with sched_setscheduler which allows
setting of priority for SCHED_OTHER.

(8) sched_get_priority_max() in ksched.c
----------------------------------------
The values returned for all cases are wrong.


(9) sched_get_priority_min() in ksched.c
----------------------------------------
The value returned for SCHED_OTHER is wrong.


(10) sched_setscheduler() in ksched.c
-------------------------------------
Allows a priority to be set for SCHED_OTHER (which is equated to
RTP_PRIO_NORMAL), but the value set is rtprio.prio which is not
not used in any way by RTP_PRIO_NORMAL processes, so this is
meaningless. Nor is the value range checked.

Also, after priority reset, if the process is on a run queue, there
is no adjustment of the run queue.  This is a similar issue to
the donice and rtprio problems noted above.

Also, want_resched is always set after a priority adjustment,
whether the priority change necessitates it or not
(need_resched() is called instead of maybe_resched()).


(11) sched_*() functions generally, see ksched.c and p1003_1b.c
---------------------------------------------------------------
The permission gate implemented here is much more restrictive
than the permission gates for setpriority() and rtprio(), even
though these functions do not implement any more functionality.
In general the caller must be uid 0 to execute any of these 
functions.  

Most apps that use these functions expect to be able to execute
all but the sched_set*() functions as a normal user.  This causes
problems in porting applications that use the sched_*() 
interface.  If these functions are completely implemented in
terms of donice() and dortprio(), they could simply use the permission
gate already in place for these functions.

	Note: The modified patches leave the code for the separate
	permission gate in place, but effectively disable it. See
	the CAN_AFFECT macro in p1003_1b.c.  The permission gates
	in donice and dortprio are used instead.

Also, the sched_*() functions currently require a custom kernel.
These functions are the primary scheduling interface for Linux,
so that some linux applications running in Linux emu require them.
Also, since these are POSIX and Unix98 interfaces, a number of
applications use them, which affects ported applications.

Properly implemented, theses should add *very little* code to the
kernel since they can easily be implemented in terms of donice()
and dortprio().  Therefore it would be very desireable to include
the necessary options in the GENERIC kernel.
 
Also, these functions do not have any way of dealing with
RTP_PRIO_IDLE processes, and thus are an incomplete
interface to the FreeBSD priority scheduling system.

-- 
Richard Seaman, Jr.           email: dick@tar.com
5182 N. Maple Lane            phone: 414-367-5450
Chenequa WI 53058             fax:   414-367-5852


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990415084309.B442>