Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Sep 2010 13:33:48 +0300
From:      borislav nikolov <vf1100c@gmail.com>
To:        Jurgen Weber <jurgen@ish.com.au>
Cc:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: cpu timer issues
Message-ID:  <FC918FA4-770F-4F93-B179-F76BFFBFBD50@gmail.com>
In-Reply-To: <4CA19F27.6050903@ish.com.au>
References:  <4CA19F27.6050903@ish.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help

On 28.09.2010, at 10:54, Jurgen Weber <jurgen@ish.com.au> wrote:

> Hello List
>=20
> We have been having issues with some firewall machines of ours using pfSen=
se.
>=20
> FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun D=
ec  6 23:20:31 EST 2009 sullrich@FreeBSD_7.2_pfSense_1.2.3_snaps.pfsense.org=
:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7  i386
>=20
> MotherBoard: http://www.supermicro.com/products/motherboard/Xeon3000/3200/=
X7SBi-LN4.cfm
>=20
> Originally the systems started out by showing a lot of packet loss, the sy=
stem time would fall behind, and the value of "#vmstat -i | grep timer" was d=
ropping below 2000. I was lead to believe by the guys at pfSense that this i=
s where the value should sit. I would also receive errors in messages that l=
ooked like " kernel: calcru: runtime went backwards from 244314 usec to 2363=
41".
>=20
> We tried a variety of things, disabling USB, turning off the Intel Speed S=
tep in the BIOS, disabling ACPI, etc, etc. All having little to no effect. T=
he only thing that would right it is restarting the box but over time it wou=
ld degrade again. I talked to the SuperMicro and they said that this is a Fre=
eBSD issue and pretty much washed their hands of it.
>=20
> After a couple of months of dealing with this and just rebooting the syste=
ms reguarly, the symptoms slowly but surely disappeared. eg. The kernel mess=
ages went away, the system time was not falling behind and I was experiencin=
g no packet loss but the "#vmstat -i | grep timer" value would continue to d=
ecrease over time. Eventually I think, when it finally got the 0 the machine=
 restarted (I am only guessing here).
>=20
> After this restart it worked again for a couple of hours and then it resta=
rted again.
>=20
> After the second time the system has not missed a beat, it has been fine a=
nd the "#vmstat -i | grep timer" value remained near the 2000 mark... We set=
up some zabbix monitoring to watch it. As mentioned it was fine for about a m=
onth. Until today. Today the value has dropped to 0, but the system has not r=
estarted and over the last couple of hours the value has increased to 47.
>=20
> This machine is mission critical, we have two in a fail over scenario (usi=
ng pfSense's CARP features) and it seems unfortunate that we have an issue w=
ith two brand new SuperMicro boxes that affect both machines. While at the m=
oment everything seems fine I want to ensure that I have no further issues. D=
oes anyone have any suggestions?
>=20
> Lastly I have double check both of the below:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CAL=
CRU-NEGATIVE-RUNTIME
> We disabled EIST.
>=20
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COM=
PUTER-CLOCK-SKEW
>=20
> # dmesg | grep Timecounter
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Timecounters tick every 1.000 msec
> # sysctl kern.timecounter.hardware
> kern.timecounter.hardware: i8254
>=20
> Only have one timer to choose from.
>=20
> Thanks
>=20
> Jurgen
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


Hello,
vmsat -i calculates interrupt rate based on interrupt count/uptime, and the i=
nterrupt count is 32 bit integer.=20
With high values of kern.hz it will overflow in few days (with kern.hz=3D400=
0 it will happen every 12 days or so).
If that is the case, use systat -vmstat 1 to get accurate interrupt rate.
That is just fyi, because i was confused once and it scared me abit, and i s=
tarted changing counters untill i noticed this.

p.s. please forgive my poor english=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FC918FA4-770F-4F93-B179-F76BFFBFBD50>