Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Nov 1998 16:16:47 -0500 (EST)
From:      Bill Paul <wpaul@skynet.ctr.columbia.edu>
To:        phk@critter.freebsd.dk (Poul-Henning Kamp)
Cc:        current@FreeBSD.ORG
Subject:   Re: Grrr... calcru: negative time blah blah blah
Message-ID:  <199811062116.QAA15892@skynet.ctr.columbia.edu>
In-Reply-To: <5110.910380360@critter.freebsd.dk> from "Poul-Henning Kamp" at Nov 6, 98 08:26:00 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Of all the gin joints in all the towns in all the world, Poul-Henning 
Kamp had to walk into mine and say:
 
> The numbers in the tc_diag_buffer are basically the number of ticks
> on the timecounter between each use of it.
> 
> >Timecounter "i8254"  frequency 1193182 Hz
> 
> Normally, the largest number of ticks should be the frequency of
> the timecounter divided by 100, in this case 11931, (plus or minus
> epsilon).
> 
> When smaller counts are in the buffer it means that one of the *time()
> functions were called, so that is normal.

Okay. Was wondering about that.
 
> >debug.tc_diag_buffer: 11932 11932 11932 11937 [...]
> >[...] 11932 11932 49982 50184 50259 11934 11930 [...]
> 
> This is bad, really really bad.  Best case sequence is:
> 
> 	11932: hardclock()
> 	11932: hardclock()
> 	49982: {micro|nano}[run]time()
> 	50184: {micro|nano}[run]time()
> 	50259: hardclock()
> 	11934: hardclock()
> 
> At least 4 calls to hardclock() is missing here.  

Hm...

[chop]
 
> Neither of the two tests seems to have triggered any of my tests
> since "debug.tc_diag_stop" is still zero.
> 
> This is consistent with a model of "lost interrupts", which I think the
> above data support.  A "reading the i8254 wrong" theory doesn't match
> the values you have here, more +/- 256 values should have been present.
> 
> Does the problem also exist for a !SMP case ?

Okay, I built a kernel with no apm0 device and tried again. It seems
that now I don't get the calcru error messages and the X server actually
runs without exploding. No other processes die, at least not during
the time I had it running. However, the system did get sluggish again
after the X server started.

The dmesg is different this time:

FreeBSD 3.0-19981103-SNAP #0: Fri Nov  6 15:00:49 EST 1998
    root@wormhole.ee.columbia.edu:/usr/src2/sys/compile/TEST
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 448623175 Hz
CPU: Pentium II (quarter-micron) (448.62-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x652  Stepping=2
  Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,<b24>>
real memory  = 536870912 (524288K bytes)
avail memory = 519811072 (507628K bytes)

The display adapter is on IRQ 11:

vga0: <VGA-compatible display device> rev 0x01 int a irq 11 on pci1.0.0

sysctl kern.timecounter shows this:
kern.timecounter.frequency: 448623175
kern.timecounter.adjustment: 0

Running sysctl debug _BEFORE_ triggering the problem by running the
X server shows this:

debug.elf_trace: 0
debug.tc_diag_buffer: 4486306 5056 5895 4486672 4485895 4486284 4486227 4486290 4486255 4486280 4486266 4486401 4486576 4485848 4486279 4486280 4486266 4486279 4486280 4486317 4486228 4486712 4485845 4486268 4486280 4486266 4486279 4486280 4486266 4486279
 4486712 4485834 4486279 4486280 4486266 4486279 4486280 4486266 4486401 4486549 4485875 4486279 4486280 4486266 4486279 4486280 4486266 4486401 4486576 4485848 4486279 4486280 4486266 4486279 4486317 4486229 4486279 4486671 4485875 4486279 4486280 448626
6 4486279 4486280 2920176 2992441 2993279 3055877 3791199 3903903 3913227 3918977 4486298 4486685 4485834 4486335 4486224 4486266 4486279 4486343 4486203 4486401 4486576 4485848 4486279 4486280 4486266 4486279 4486280 4486369 4486176 4486671 4485875 44862
79 4486280 4486266 4486279 4486280 4486266 4486306 4486658 4485861 4486279 4486294 4486272 4486259 4486280 4486266 4486401 4486576 4485848 4486279 4486280 4486266 4486279 4486280 4486266 4486401 4486552 4485872 4486279 4486280 4486266 4486279 4486280 4486
290 4486255 4486685 4485861 4486279 4486280 4486266 4486279 4486283 4486263 4486306 4486685 4485834 4486279 4486283 4486263 4486279 4486280 4486266 4486279 4486698 4485848 4486282 4486277 4486266 4486279 4486280 4486266 6057 7051 4486416 4486598 4485811 4
486279 4486348 4486198 4486279 4486280 4486266 4486401 4486576 4485848 4486282 4486277 4486266 4486279 4486280 4486266 4486279 4486671 4485875 4486279 4486289 4486257 4486279 4486280 4486266 4486401 4486600 4485824 4486279 4486280 4486266 4486279 4486280 
4486266 4486401 4486576 4485848 4486279 4486280 4486266 4486279 4486280 4486266 4486401 4486576 4485848 4486279 4486280 4486266 4486315 4486244 4486266 4486279 4486712 4485834 4486279 4486280 4486266 4486279 4486280 4486266 4486309 4486679 4485837 4486279
 4486280 4486290 4486255 4486294 4486252 4486279 4486712 2422 3425 4485857 4486256 4486280 4486266 4486279 4486280 4486266 4486401 4486587 4485837 4486279 4486318 4486228 4486279 4486303 4486243 4486279 4486671 4485875 4486282 4486277 4486266 4486303 4486
285 4486237
debug.tc_diag_index: 10825
debug.tc_diag_stop: 0
debug.tc_diag_maxforward: -692379806
debug.fdexpand: 0
debug.debugger_on_panic: 1
debug.ttydebug: 0
debug.nchash: 32767
debug.ncnegfactor: 16
debug.numneg: 12
debug.numcache: 215
debug.vfscache: 1
debug.vnsize: 164
debug.ncsize: 36
debug.numvnodes: 235
debug.wantfreevnodes: 25
debug.freevnodes: 24
debug.disablecwd: 0
debug.if_tun_debug: 0
debug.ncr_debug: 0


Running sysctl debug _AFTER_ running and stopping the X server shows
this:

debug.elf_trace: 0
debug.tc_diag_buffer: 24640675 24640697 24640635 24640686 24640672 24640673 24641091 4938 12316 24640342 24640621 24640649 24640659 24640686 19566403 19641525 19642603 19757035 20530799 20631303 20644237 20684791 20703487 24641328 24640255 24640435 246410
77 24640268 24640686 24640690 24640656 24640671 24640659 24640790 24640596 24641023 24640295 24640672 24640673 24640672 24640673 24640686 24640659 24640699 24641102 24640216 24640686 24640694 24640638 24640686 24640672 24640673 24640686 24641077 24640254 
24640686 24640659 24640686 24640673 24640659 24640686 24640775 24640988 24640268 24640659 24640672 24640784 24640572 24640772 24640600 24640756 24640997 24640240 24640673 24640686 24640659 24640672 24640686 24640659 24640686 24641105 24640227 24640686 246
40672 24640673 24640686 24640659 24640686 24640672 24641078 24640267 24640694 24640638 24640687 24640671 24640673 24640726 24640740 24640970 24640257 24640692 24640666 24640674 24640669 24640693 24640651 24640673 24641077 24640268 24640686 24640659 246406
72 24640686 24640713 24640632 24640673 24641064 24640281 6230 7513 24640980 24640468 24640607 24640638 24640669 6715434 6807259 6808209 6981571 6991770 6997735 24640744 24640614 24641091 24640292 24640638 24640669 24640719 24640773 24640539 24640673 24640
693 24641084 24640584 24640332 24640683 24640659 24640672 24640686 24640659 24640746 24641078 24640217 24640789 24640587 24640632 24640686 24640714 24640631 24640718 7359 8943 9843 24641105 24640293 24640597 24640636 24640686 24640672 24640714 24640729 24
640575 24641091 24640254 24640672 24640686 24640659 24640686 24640715 24640617 24640686 24641181 24640164 24640786 24640584 24640657 24640676 24640692 24640653 24640715 24641035 24640268 24640672 24640715 24640644 24640659 24640672 24640713 24640632 24641
078 24640281 24640684 24640661 24640672 24640673 24640686 24640659 24640794 24640996 24640251 24640662 24640673 24640659 24640686 24640708 24640637 24640686 24641064 24640281 24640672 24640659 24640686 24640673 24640659 24640686 24640722 24641055 24640240
 24640673 24640738 24640607 24640732 24640626 24640662 24640721 24641053 24640254 24640698 24640647 24640673 24640686 24640662 24640669 24640689 24641088 24640277 24640650 24640659 24640686 24640675 24640670 24640686 24640659 24641116 24640242 24640670
debug.tc_diag_index: 17027
debug.tc_diag_stop: 0
debug.tc_diag_maxforward: -692379806
debug.fdexpand: 0
debug.debugger_on_panic: 1
debug.ttydebug: 0
debug.nchash: 32767
debug.ncnegfactor: 16
debug.numneg: 20
debug.numcache: 332
debug.vfscache: 1
debug.vnsize: 164
debug.ncsize: 36
debug.numvnodes: 332
debug.wantfreevnodes: 25
debug.freevnodes: 28
debug.disablecwd: 0
debug.if_tun_debug: 0
debug.ncr_debug: 0

Everything seems to have been divided in half. The machine still runs
but the console is sluggish. Unfortunately, I don't see any way in the
BIOS setup to disable any of the interrupts, and slicing a brand new
board probably wouldn't sit well with the EE faculty, largely because
these machines (and many others) were donations from Intel (they dropped
a couple million dollars worth of stuff on Columbia and a couple other
universities).

Isn't there any way I can mask a particular interrupt so the dispatcher
just ignores it? Not a great fix I grant you, but it would help prove
the theory.

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811062116.QAA15892>