Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Aug 1998 22:37:34 +0100
From:      Brian Somers <brian@Awfulhak.org>
To:        Brian Feldman <green@unixhelp.org>
Cc:        Poul-Henning Kamp <phk@critter.freebsd.dk>, Terry Lambert <tlambert@primenet.com>, bde@zeta.org.au, freebsd-current@FreeBSD.ORG, jwd@unx.sas.com
Subject:   Re: 13 months of user time? 
Message-ID:  <199808202137.WAA03270@awfulhak.org>
In-Reply-To: Your message of "Thu, 20 Aug 1998 02:08:08 EDT." <Pine.BSF.4.02.9808200203190.24018-100000@zone.syracuse.net> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Okay, how about we try out Mike's idea? Someone who experiences the
> SIGXCPU kill problem could try putting the following in kern/kern_synch.c
> line 638:
> if (switchtime.tv_usec < p->p_switchtime.tv_usec ||
>     switchtime.tv_sec < p->p_switchtime.tv_sec)
> 	panic("bogus microuptime twiddling");

I had a ``if I was going to SIGXCPU, output the above values'' 
diagnostic in my kernel, and in all cases, switchtime.tv_usec was 
less than p->p_switchtime.tv_usec (tv_sec was the same for each var). 
Also (just for the record), the tv_usec values were *never* >1000000.

>From what I can see, and given that the tv_sec values != 0 (which my 
diagnostics confirmed), p->p_switchtime is being copied from 
switchtime in mi_switch(), and then being compared at a later point 
(also in mi_switch()).  ``switchtime'' at this point HAS GONE 
BACKWARDS.  This means that successive calls to microuptime() are 
filling the passed variables with non-increasing values.  This is 
confirmed by the only other call to microuptime() in /sys/kern as 
others are seeing the ``calcru: negative time...'' error which is 
impossible if microuptime() only ever increases (isn't it?).

*If* microuptime() is returning non-increasing values under certain 
circumstances, then that means that either the timecounter pointer is 
being mis-optimised because it's not volatile (phk has pooh-poohed 
that idea though - I'm not sure why, but he's probably right, as 
tc[1] and tc[2] are the only values that *should* be getting pointed 
at as actual time values), *OR* that the amount that tv_usec 
is adjusted by is > LONG_MAX or < 0 (I think this is impossible as 
tc_scale_micro is assigned as something divided by 1000) *OR* 
tco_delta() is returning non-increasing values...... hmm

In /sys/i386/isa/clock.c, should i8254_offset be reset after it's 
added to ``count'' ?  What happens when i8254_offset wraps ?  Might 
this be the problem ?  Would it only be a problem for machines that 
have an irregular clock heart-beat, sometimes allowing loads of calls 
to i8254_get_timecount() before clkintr() happens ??

I reckon a diagnostic in microuptime() that compares the value 
assigned to *tv with the previous value and moans if they decrease 
may prove informative.... and maybe a similar thing in 
i8254_get_timecount() - the machine I was having problems with was 
running apm, so it used the i8254 timecounter rather than the tsc 
counter.

> And see if we get some nice panics and cores. Is it worth a shot? I've
> never gotten a SIGXCPU out of place, so my machine wouldn't be the one to
> test this on.

Same here.  The machine I had that did this was given back to the 
shop.

> Cheers,
> Brian Feldman
> green@unixhelp.org

-- 
Brian <brian@Awfulhak.org>, <brian@FreeBSD.org>, <brian@OpenBSD.org>
      <http://www.Awfulhak.org>;
Don't _EVER_ lose your sense of humour....



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808202137.WAA03270>