Date: Thu, 15 Apr 1999 08:43:09 -0500 From: "Richard Seaman, Jr." <dick@tar.com> To: hackers@freebsd.org Cc: bde@freebsd.org Subject: Posix sched-*() functions and other scheduling issues Message-ID: <19990415084309.B442@tar.com>
next in thread | raw e-mail | index | archive | help
Sometime around the beginning of the year, Peter Dufault did quite a bit of work revamping the posix priority code (sched_*() functions) as well as fixing other problems in the scheduling code. The results are posted at ftp://ftp.freebsd.org/pub/FreeBSD/development/misc/PATCHES.sched In a recent posting to the -hackers mailing list, he indicated he would not commit the patches any time soon in part because he lacked the time to support them, and in part due to objections from a core team member that the proposed changes were too invasive. In a followup posting He also suggested that someone might want to rework them into a smaller set of patches. I have believed that these patches solve a number of problems, and have extracted (and very slightly modified) a portion of his patches to address the problems noted below. These modified patches are available at http://lt.tar.com/sched.diff I hope that some people will test them, and that if they test well, that someone will commit them. I will submit a PR with the patches if I receive reports that they work ok (they do here). (1) donice() in kern_resource.c ------------------------------- After the nice value is adjusted, maybe_resched() is called indirectly via resetpriority(). However, the value of p_priority does not get updated until after the call, so that maybe_resched() does its test based on the old p_priority value. The result is that if rescheduling is warranted (ie. want_resched should be set), it doesn't happen. Also, if the process is on a run queue, it is not put on a new run queue, if warranted by the priority change. As result the process will continue to be scheduled based on its old priority until is actually selected to run. (2) rtprio() in kern_resource.c ------------------------------- If the priority of a process is adjusted by rtprio(), there is no check made to see if want_resched should be set due to the change (maybe_resched() is never called). In addition, if the process is on a runqueue, its position on the runqueue will not reflect its new priority and it will be scheduled based on its old priority until it is selected to run. In addition, it would be nice to use rtprio() to adjust real and idle time priorities in the posix sched_*() functions. However, the only interface to rtprio() is for the rtprio() syscall, which does copyin/copyout of paramters. It would be helpful to split this function, creating a "dortprio" function, analagous to donice(), that can be called internally in the kernel. (3) maybe_resched() in kern_synch.c ------------------------------- If the process being checked (chk) is not the current process, the result of maybe_resched is correct. But, if the process being checked is the current process, the result is the opposite of what if should be (ie. the current process should be considered for rescheduling if its priority is made less favorable). In addition, when checking whether a non-RTP_PRIO_NORMAL process should be rescheduled, the value of proc->p_priority is used for the test. This value is meaningless for such processes. In addition, when the the process being checked is the current process, there is no value maintained of what the previous priority was (for non-RTP_PRIO_NORMAL processes), so if the comparison is corrected, there is nothing to compare against. (4) curpriority (global variable) -------------------------------- The value of proc->p_priority is stored in a global variable curpriority for the current process as it exists the kernel via userret(). The value of curpriority is used in maybe_resched() to decide if a process should be rescheduled. This implicitly assumes that the curpriority of the last process to exit the kernel is the proper basis for comparison. On a uniprocessor system curpriority should also be the value of curproc->p_priority. Its not clear to me what this value represents on an SMP system. If a global variable such as this is to be maintained, it should also reflect the rtprio values of the process, so the comparison in maybe_resched() reflects the rtprio values for rtp processes. However, I wonder if its better to maintain this value on a per-proc basis? I don't really understand what the global variable curpriority really means in an SMP environment. It seems to me this could be the p_priority value of the process running on a different processor if that process was the last to exist the kernel. I wonder if this is the comparison that is intended? Note: The modified patches do not address the SMP issue I have raised here. They follow the existing pattern of using a global variable (but modified to reflect rtp information). (5) resetpriority() in kern_synch.c ----------------------------------- This function adjusts the value of p_usrpri and then calls maybe_resched(). However, maybe_resched() does not use the values of p_usrpri, but only the value of p_priority (for a RTP_PRIO_NORMAL process). Since p_priority has not generally been updated at this point, maybe_resched() may generate the wrong result. The proper sequence would appear to be: resetpriority() /* which updates p_usrpri */ if (on a run queue) { remove from the existing run queue /* uses old p_priority */ proc->p_priority = proc->p_usrpri put on new run queue /* uses new p_priority */ } else { proc->p_priority = proc->p_usrpri } maybe_resched() /* uses updated p_priority to compare with * previously stored value of curpriority */ (6) statclock(), schedcpu(), forwarded_statclock() in kern_synch.c kern_clock.c and mp_machdep.c) ------------------------------------------------------------------ Variables such as proc->p_estcpu, proc->p_usrpri, and proc->p_priority which are used exclusively by RTP_PRIO_NORMAL processes are maintained and adjusted for non-RTP_PRIO_NORMAL processes as well. This is not necessarily wrong, since these values could be used again if the process priority class is reset to RTP_PRIO_NORMAL. There is a policy decision though, as to what these values should be upon return to class RTP_PRIO_NORMAL. Note: The modified patches do not change the current code on this point, whereas Peter's patches modified the code to only update these statistics if the process in question was RTP_PRIO_NORMAL. (7) sched_setparam() sched_getparam() in ksched.c ------------------------------------------------- sched_setparam treats SCHED_OTHER as EINVAL, while sched_getparam does not. sched_getparam does not set the return value for the priority in the case of SCHED_OTHER and does not return an error value to indicate the return value is invalid. This is also inconsistent with sched_setscheduler which allows setting of priority for SCHED_OTHER. (8) sched_get_priority_max() in ksched.c ---------------------------------------- The values returned for all cases are wrong. (9) sched_get_priority_min() in ksched.c ---------------------------------------- The value returned for SCHED_OTHER is wrong. (10) sched_setscheduler() in ksched.c ------------------------------------- Allows a priority to be set for SCHED_OTHER (which is equated to RTP_PRIO_NORMAL), but the value set is rtprio.prio which is not not used in any way by RTP_PRIO_NORMAL processes, so this is meaningless. Nor is the value range checked. Also, after priority reset, if the process is on a run queue, there is no adjustment of the run queue. This is a similar issue to the donice and rtprio problems noted above. Also, want_resched is always set after a priority adjustment, whether the priority change necessitates it or not (need_resched() is called instead of maybe_resched()). (11) sched_*() functions generally, see ksched.c and p1003_1b.c --------------------------------------------------------------- The permission gate implemented here is much more restrictive than the permission gates for setpriority() and rtprio(), even though these functions do not implement any more functionality. In general the caller must be uid 0 to execute any of these functions. Most apps that use these functions expect to be able to execute all but the sched_set*() functions as a normal user. This causes problems in porting applications that use the sched_*() interface. If these functions are completely implemented in terms of donice() and dortprio(), they could simply use the permission gate already in place for these functions. Note: The modified patches leave the code for the separate permission gate in place, but effectively disable it. See the CAN_AFFECT macro in p1003_1b.c. The permission gates in donice and dortprio are used instead. Also, the sched_*() functions currently require a custom kernel. These functions are the primary scheduling interface for Linux, so that some linux applications running in Linux emu require them. Also, since these are POSIX and Unix98 interfaces, a number of applications use them, which affects ported applications. Properly implemented, theses should add *very little* code to the kernel since they can easily be implemented in terms of donice() and dortprio(). Therefore it would be very desireable to include the necessary options in the GENERIC kernel. Also, these functions do not have any way of dealing with RTP_PRIO_IDLE processes, and thus are an incomplete interface to the FreeBSD priority scheduling system. -- Richard Seaman, Jr. email: dick@tar.com 5182 N. Maple Lane phone: 414-367-5450 Chenequa WI 53058 fax: 414-367-5852 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990415084309.B442>