Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Feb 2011 10:22:24 +0100
From:      Jerome Flesch <jerome.flesch@netasq.com>
To:        Chuck Swiger <cswiger@mac.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Process timing issue
Message-ID:  <4D638050.2010906@netasq.com>
In-Reply-To: <8C8FE4A5-F031-466A-9CB8-46D79EEA280D@mac.com>
References:  <4D6291A5.4050206@netasq.com> <8C8FE4A5-F031-466A-9CB8-46D79EEA280D@mac.com>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Feb 21, 2011, at 8:24 AM, Jerome Flesch wrote:
>> While investigating a timing issue with one of our program, we found out something weird: We've written a small test program that just calls clock_gettime() a lot of times and checks that the time difference between calls makes sense. In the end, it seems it doesn't always do.
>>
>> Calling twice in a row clock_gettime() takes usually less than 1ms. But with an average load, about 1 time in 200000, more than 10ms are spent between both calls for no apparent reason. According to our tests, when it happens, the time between both calls can go from few milliseconds to many seconds (our best score so far is 10 seconds :). Same goes for gettimeofday().
>
> A scheduler quantum of 10ms (or HZ=100) is a common granularity; probably some other process got the CPU and your timer process didn't run until the next or some later scheduler tick.  If you are maxing out the available CPU by running many "openssl speed" tasks, then this behavior is more-or-less expected.
>
We did most of our tests with kern.hz=1000 (the default FreeBSD value as 
far as I know) and we also tried with kern.hz=2000 and kern.hz=10000. It 
didn't change a thing.

Also, we are talking about a process not being scheduled for more than 
100ms with only 1 instance of openssl on the same CPU core. Even with a 
scheduler quantum of 10ms, I find that worrying :/

We expected both processes (the test program and openssl) to have each 
half the CPU time and being scheduled quite often (at least once each 
10ms). According to the output of our test program, it works fine for 
most of the calls to clock_gettime(), but from time to time (about 1 
loop in 200000 on my computer), we have a latency pike (>= 100ms).

Thing is, these pikes wouldn't worry us much if they wouldn't last 
longer than 1s, but they do on some occasions.


>> We tried setting the test program to the highest priority possible (rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing.
>>
>
>> Does anyone know if there is a reason for this behavior ? Is there something that can be done to improve things ?
>
> FreeBSD doesn't offer hard realtime guarantees, and it values maximizing throughput for all tasks more than it does providing absolute minimum latency even for something wanting RT.  There has been some discussion in the past about making RT tasks with very high priority less pre-emptible by lower priority tasks by removing or reducing the priority lowering that occurs when a task gets allocated CPU time.
>
> What problem are you trying to solve where continuous CPU load and minimum latency realtime are both required?
>
We are not looking for hard realtime guarantees. Most of our tests were 
done in normal priority. Using real time priority on our test program 
was just a try to see it improves things. From what I can tell, it 
doesn't :/


> Regards,




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D638050.2010906>