Date: Thu, 26 Mar 2015 18:43:08 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Dmitry Chagin <dchagin@freebsd.org> Cc: src-committers@freebsd.org, svn-src-user@freebsd.org Subject: Re: svn commit: r280674 - user/dchagin/lemul/sys/compat/linux Message-ID: <20150326174425.N2380@besplex.bde.org> In-Reply-To: <201503260636.t2Q6aZhH078741@svn.freebsd.org> References: <201503260636.t2Q6aZhH078741@svn.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 26 Mar 2015, Dmitry Chagin wrote: > Log: > Linux nanosleep() and clock_nanosleep() system calls always > writes the remaining time into the structure pointed to by rmtp > unless rmtp is NULL. The value of *rmtp can then be used to call > nanosleep() again and complete the specified pause if the previous > call was interrupted. > > Note. clock_nanosleep() with an absolute time value does not write > the remaining time. FreeBSD doesn't even have clock_nanosleep(). It also sleeps on a wrong clock id (CLOCK_MONOTONIC instead of CLOCK_REALTIME) in nanosleep(). clock_nanosleep() exists mainly because CLOCK_REALTIME is usually the wrong clock to sleep on, but is the one specified for nanosleep() for historical reasons. It is stupid for the emulator to have clock_nanosleep() before the host system. The the emulator doesn't seem to have it either. It seems to just use the native nanosleep(), so sleeps on the wrong clock id for all except CLOCK_MONOTONIC. > Modified: user/dchagin/lemul/sys/compat/linux/linux_time.c > ============================================================================== > --- user/dchagin/lemul/sys/compat/linux/linux_time.c Thu Mar 26 06:00:42 2015 (r280673) > +++ user/dchagin/lemul/sys/compat/linux/linux_time.c Thu Mar 26 06:36:34 2015 (r280674) > ... > @@ -490,25 +488,19 @@ linux_nanosleep(struct thread *td, struc > return (error); > } > error = kern_nanosleep(td, &rqts, rmtp); The emulator could fix up cases where the sleep on the wrong clock is too short, by calculating the error and sleeping again. It doesn't do this, but if it just used the correct clock id for calculating the remaining time for the purpose of returning it, then it would know the time to sleep again (when there is no error but the remaining time is > 0). > @@ -558,24 +550,19 @@ linux_clock_nanosleep(struct thread *td, > return (error); > } > error = kern_nanosleep(td, &rqts, rmtp); Linux apparently uses the correct clock id for nanosleep(), since this function only tries to support that clock id (LINUX_CLOCK_REALTIME). The function has already returned EINVAL for other clock ids. But since FreeBSD nanosleep() is broken, the unique clock id that can be supported here is actually LINUX_CLOCK_MONOTONIC. Strictly, not even that. FreeBSD used to use getnanouptime() in nanosleep(), so nanosleep() only gave 1/HZ resolution. There is a clock id CLOCK_MONOTIC_FAST for this, so nanosleep() strictly only used that clock id, and if clock_nanosleep() were implemented then it should use getnanouptime() precisely when the caller uses this clock id. Presumably Linux doesn't have LINUX_CLOCK_MONOTONIC_FAST to map to the FreeBSD mistake CLOCK_MONOTONIC_FAST, so the best this function can do is map to CLOCK_MONOTONIC. Changing to this would probably break more than it fixes. Apparently, Linux software does use clock_nanosleep(), but with CLOCK_REALTIME, else the error for using it with CLOCK_MONOTIC would be noticed. clock_nanosleep() is not very useful without TIMER_ABSTIME since it is only a verbose spelling of nanosleep() then. It is useful with TIMER_ABSTIME, but that case is not supported. Perhaps the software uses clock_nanosleep() because it wants to use TIMER_ABSTIME someday when that is supported. Now, nanosleep() uses sbintimes so it isn't clear what clock id it sleeps on. The clock is still monotonic and not affected by leap seconds or suspensions. Its resolution seems to be much lower than 1/HZ even when HZ is 100. "sleep 1" often sleeps for 65 milliseconds extra on freefall, but shorter sleeps rarely sleep by more than 3 milliseconds extra. > - if (error != 0) { > - LIN_SDT_PROBE1(time, linux_clock_nanosleep, nanosleep_error, > - error); > - LIN_SDT_PROBE1(time, linux_clock_nanosleep, return, error); > - return (error); > - } > - > if (args->rmtp != NULL) { > + /* XXX. Not for TIMER_ABSTIME */ TIMER_ABSTIME has already been handled (by XXXing and returning an error). TIMER_ABSTIME is only very useful with CLOCK_REALTIME. It allows sleeping until a specified real time without being messed up by leap seconds and clock steps. For CLOCK_MONOTONIC, it is not even clear what an absolute time is. I think POSIX specifies CLOCK_MONOTONIC to increment in as clocks to physical seconds as it can, but in FreeBSD it stops incrementing during suspend. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150326174425.N2380>