Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Sep 2010 17:54:15 +1000
From:      Jurgen Weber <jurgen@ish.com.au>
To:        freebsd-stable@freebsd.org
Subject:   cpu timer issues
Message-ID:  <4CA19F27.6050903@ish.com.au>

next in thread | raw e-mail | index | archive | help
  Hello List

We have been having issues with some firewall machines of ours using 
pfSense.

FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun 
Dec  6 23:20:31 EST 2009 
sullrich@FreeBSD_7.2_pfSense_1.2.3_snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7  
i386

MotherBoard: 
http://www.supermicro.com/products/motherboard/Xeon3000/3200/X7SBi-LN4.cfm

Originally the systems started out by showing a lot of packet loss, the 
system time would fall behind, and the value of "#vmstat -i | grep 
timer" was dropping below 2000. I was lead to believe by the guys at 
pfSense that this is where the value should sit. I would also receive 
errors in messages that looked like " kernel: calcru: runtime went 
backwards from 244314 usec to 236341".

We tried a variety of things, disabling USB, turning off the Intel Speed 
Step in the BIOS, disabling ACPI, etc, etc. All having little to no 
effect. The only thing that would right it is restarting the box but 
over time it would degrade again. I talked to the SuperMicro and they 
said that this is a FreeBSD issue and pretty much washed their hands of it.

After a couple of months of dealing with this and just rebooting the 
systems reguarly, the symptoms slowly but surely disappeared. eg. The 
kernel messages went away, the system time was not falling behind and I 
was experiencing no packet loss but the "#vmstat -i | grep timer" value 
would continue to decrease over time. Eventually I think, when it 
finally got the 0 the machine restarted (I am only guessing here).

After this restart it worked again for a couple of hours and then it 
restarted again.

After the second time the system has not missed a beat, it has been fine 
and the "#vmstat -i | grep timer" value remained near the 2000 mark... 
We setup some zabbix monitoring to watch it. As mentioned it was fine 
for about a month. Until today. Today the value has dropped to 0, but 
the system has not restarted and over the last couple of hours the value 
has increased to 47.

This machine is mission critical, we have two in a fail over scenario 
(using pfSense's CARP features) and it seems unfortunate that we have an 
issue with two brand new SuperMicro boxes that affect both machines. 
While at the moment everything seems fine I want to ensure that I have 
no further issues. Does anyone have any suggestions?

Lastly I have double check both of the below:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CALCRU-NEGATIVE-RUNTIME
We disabled EIST.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COMPUTER-CLOCK-SKEW

# dmesg | grep Timecounter
Timecounter "i8254" frequency 1193182 Hz quality 0
Timecounters tick every 1.000 msec
# sysctl kern.timecounter.hardware
kern.timecounter.hardware: i8254

Only have one timer to choose from.

Thanks

Jurgen




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CA19F27.6050903>