Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Sep 2010 22:11:54 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        stable@FreeBSD.org
Subject:   Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
Message-ID:  <20100928051154.GA73859@icarus.home.lan>
In-Reply-To: <201009280425.o8S4PA0d058131@gw.catspoiler.org>
References:  <201009280425.o8S4PA0d058131@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 27, 2010 at 09:25:10PM -0700, Don Lewis wrote:
> CPU time accounting is broken on one of my machines running 8-STABLE.  I
> ran a test with a simple program that just loops and consumes CPU time:
> 
> % time ./a.out
> 94.544u 0.000s 19:14.10 8.1%	62+2054k 0+0io 0pf+0w
> 
> The display in top shows the process with WCPU at 100%, but TIME
> increments very slowly.
> 
> Several hours after booting, I got a bunch of "calcru: runtime went
> backwards" messages, but they stopped right away and never appeared
> again.
> 
> Aug 23 13:40:07 scratch ntpd[1159]: ntpd 4.2.4p5-a (1)
> Aug 23 13:43:18 scratch ntpd[1160]: kernel time sync status change 2001
> Aug 23 18:05:57 scratch dbus-daemon: [system] Reloaded configuration
> Aug 23 18:06:16 scratch dbus-daemon: [system] Reloaded configuration
> Aug 23 18:12:40 scratch ntpd[1160]: time reset +18.059948 s
> [snip]
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6836685136 usec to 5425839798 usec for pid 1526 (csh)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 4747 usec to 2403 usec for pid 1519 (csh)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 5265 usec to 2594 usec for pid 1494 (hald-addon-mouse-sy)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 7818 usec to 3734 usec for pid 1488 (console-kit-daemon)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 977 usec to 459 usec for pid 1480 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 958 usec to 450 usec for pid 1479 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 957 usec to 449 usec for pid 1478 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 952 usec to 447 usec for pid 1477 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 959 usec to 450 usec for pid 1476 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 975 usec to 458 usec for pid 1475 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1026 usec to 482 usec for pid 1474 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1333 usec to 626 usec for pid 1473 (getty)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 2469 usec to 1160 usec for pid 1440 (inetd)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 719 usec to 690 usec for pid 1402 (sshd)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 120486 usec to 56770 usec for pid 1360 (cupsd)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6204 usec to 2914 usec for pid 1289 (dbus-daemon)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 179 usec to 84 usec for pid 1265 (moused)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 22156 usec to 10407 usec for pid 1041 (nfsd)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1292 usec to 607 usec for pid 1032 (mountd)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 8801 usec to 4134 usec for pid 664 (devd)
> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 19 usec to 9 usec for pid 9 (sctp_iterator)
> 
> 
> If I reboot and run the test again, the CPU time accounting seems to be
> working correctly.
> % time ./a.out
> 1144.226u 0.000s 19:06.62 99.7%	5+168k 0+0io 0pf+0w
> 
> 
> I'm not sure how long this problem has been present. I do remember
> seeing the calcru messages with an August 23rd kernel.  I have not seen
> the calcru messages when running -CURRENT on the same hardware.  I also
> have not seen this same problem on my other Athlon 64 box running the
> August 23rd kernel.
> 
> Before reboot:
> # sysctl kern.timecounter
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-1000000)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.i8254.mask: 4294967295
> kern.timecounter.tc.i8254.counter: 3534
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.i8254.quality: 0
> kern.timecounter.tc.ACPI-fast.mask: 16777215
> kern.timecounter.tc.ACPI-fast.counter: 8685335
> kern.timecounter.tc.ACPI-fast.frequency: 3579545
> kern.timecounter.tc.ACPI-fast.quality: 1000
> kern.timecounter.tc.TSC.mask: 4294967295
> kern.timecounter.tc.TSC.counter: 2204228369
> kern.timecounter.tc.TSC.frequency: 2500018183
> kern.timecounter.tc.TSC.quality: 800
> kern.timecounter.invariant_tsc: 0
> 
> After reboot:
> % sysctl kern.timecounter
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-1000000)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.i8254.mask: 4294967295
> kern.timecounter.tc.i8254.counter: 2241
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.i8254.quality: 0
> kern.timecounter.tc.ACPI-fast.mask: 16777215
> kern.timecounter.tc.ACPI-fast.counter: 4636239
> kern.timecounter.tc.ACPI-fast.frequency: 3579545
> kern.timecounter.tc.ACPI-fast.quality: 1000
> kern.timecounter.tc.TSC.mask: 4294967295
> kern.timecounter.tc.TSC.counter: 1429996208
> kern.timecounter.tc.TSC.frequency: 2500020459
> kern.timecounter.tc.TSC.quality: 800
> kern.timecounter.invariant_tsc: 0
> 
> 
> 
> Here's my kernel config file (uni-processor i386 kernel running on an
> Athlon 64 x2 CPU):
> include         GENERIC
> 
> nocpu           I486_CPU
> nocpu           I586_CPU
> 
> nooptions       SCHED_4BSD              # 4BSD scheduler
> options         SCHED_ULE
> 
> # Debugging for use in -current
> options         KDB                     # Enable kernel debugger support.
> options         DDB                     # Support DDB.
> options         GDB                     # Support remote GDB.
> 
> nooptions       SMP                     # Symmetric MultiProcessor Kernel
> nodevice        apic                    # I/O APIC
>  
> nodevice        atapicd                 # ATAPI CDROM drives
> device          atapicam                # emulate ATAPI devices as SCSI ditto via CAM
> 
> 
> /var/run/dmesg.boot:
> Copyright (c) 1992-2010 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 8.1-STABLE #6: Thu Sep 23 16:03:29 PDT 2010
>     dl@scratch.catspoiler.org:/usr/obj/usr/src/sys/GENERICDDB i386
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (2500.02-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x60fb1  Family = f  Model = 6b  Stepping = 1
>   Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x2001<SSE3,CX16>
>   AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD Features2=0x11f<LAHF,CMP,SVM,ExtAPIC,CR8,Prefetch>
> real memory  = 4294967296 (4096 MB)
> avail memory = 3607351296 (3440 MB)
> kbd1 at kbdmux0
> ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20100331/tbfadt-655)
> acpi0: <Nvidia AWRDACPI> on motherboard
> acpi0: [ITHREAD]
> acpi0: Power Button (fixed)
> acpi0: reservation of 0, a0000 (3) failed
> acpi0: reservation of 100000, dbdf0000 (3) failed
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
> cpu0: <ACPI CPU> on acpi0
> acpi_hpet0: <High Precision Event Timer> iomem 0xfeff0000-0xfeff03ff on acpi0
> device_attach: acpi_hpet0 attach returned 12
> acpi_button0: <Power Button> on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pci0: <memory, RAM> at device 0.0 (no driver attached)
> isab0: <PCI-ISA bridge> at device 1.0 on pci0
> isa0: <ISA bus> on isab0
> pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
> pci0: <memory, RAM> at device 1.2 (no driver attached)
> ohci0: <OHCI (generic) USB controller> mem 0xfe02f000-0xfe02ffff irq 10 at device 2.0 on pci0
> ohci0: [ITHREAD]
> usbus0: <OHCI (generic) USB controller> on ohci0
> ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfe02e000-0xfe02e0ff irq 11 at device 2.1 on pci0
> ehci0: [ITHREAD]
> usbus1: EHCI version 1.0
> usbus1: <EHCI (generic) USB 2.0 controller> on ehci0
> ohci1: <OHCI (generic) USB controller> mem 0xfe02d000-0xfe02dfff irq 5 at device 4.0 on pci0
> ohci1: [ITHREAD]
> usbus2: <OHCI (generic) USB controller> on ohci1
> ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfe02c000-0xfe02c0ff irq 10 at device 4.1 on pci0
> ehci1: [ITHREAD]
> usbus3: EHCI version 1.0
> usbus3: <EHCI (generic) USB 2.0 controller> on ehci1
> atapci0: <nVidia nForce MCP67 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0
> ata0: <ATA channel 0> on atapci0
> ata0: [ITHREAD]
> ata1: <ATA channel 1> on atapci0
> ata1: [ITHREAD]
> pci0: <multimedia, HDA> at device 7.0 (no driver attached)
> pcib1: <ACPI PCI-PCI bridge> at device 8.0 on pci0
> pci1: <ACPI PCI bus> on pcib1
> fwohci0: <Texas Instruments TSB43AB22/A> mem 0xfd0ff000-0xfd0ff7ff,0xfd0f8000-0xfd0fbfff irq 11 at device 7.0 on pci1
> fwohci0: [ITHREAD]
> fwohci0: OHCI version 1.10 (ROM=1)
> fwohci0: No. of Isochronous channels is 4.
> fwohci0: EUI64 00:50:8d:00:00:99:f0:69
> fwohci0: Phy 1394a available S400, 2 ports.
> fwohci0: Link S400, max_rec 2048 bytes.
> firewire0: <IEEE1394(FireWire) bus> on fwohci0
> dcons_crom0: <dcons configuration ROM> on firewire0
> dcons_crom0: bus_addr 0x1090000
> fwe0: <Ethernet over FireWire> on firewire0
> if_fwe0: Fake Ethernet address: 02:50:8d:99:f0:69
> fwe0: Ethernet address: 02:50:8d:99:f0:69
> fwip0: <IP over FireWire> on firewire0
> fwip0: Firewire address: 00:50:8d:00:00:99:f0:69 @ 0xfffe00000000, S400, maxrec 2048
> fwohci0: Initiate bus reset
> fwohci0: fwohci_intr_core: BUS reset
> fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1, CYCLEMASTER mode
> ahc0: <Adaptec 19160B Ultra160 SCSI adapter> port 0xcc00-0xccff mem 0xfd0fe000-0xfd0fefff irq 11 at device 9.0 on pci1
> ahc0: [ITHREAD]
> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> atapci1: <nVidia nForce MCP67 SATA300 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xdc00-0xdc0f mem 0xfe026000-0xfe027fff irq 10 at device 9.0 on pci0
> atapci1: [ITHREAD]
> atapci1: AHCI v1.10 controller with 4 3Gbps ports, PM supported
> ata2: <ATA channel 0> on atapci1
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci1
> ata3: [ITHREAD]
> ata4: <ATA channel 2> on atapci1
> ata4: [ITHREAD]
> ata5: <ATA channel 3> on atapci1
> ata5: [ITHREAD]
> nfe0: <NVIDIA nForce MCP67 Networking Adapter> port 0xd800-0xd807 mem 0xfe02b000-0xfe02bfff,0xfe02a000-0xfe02a0ff,0xfe029000-0xfe02900f irq 15 at device 10.0 on pci0
> miibus0: <MII bus> on nfe0
> e1000phy0: <Marvell 88E1116 Gigabit PHY> PHY 1 on miibus0
> e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
> nfe0: Ethernet address: 00:50:8d:9f:6d:e3
> nfe0: [FILTER]
> pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0
> pci2: <ACPI PCI bus> on pcib2
> pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0
> pci3: <ACPI PCI bus> on pcib3
> atapci2: <SiI 3132 SATA300 controller> port 0xac00-0xac7f mem 0xfdcff000-0xfdcff07f,0xfdcf8000-0xfdcfbfff irq 5 at device 0.0 on pci3
> atapci2: [ITHREAD]
> ata6: <ATA channel 0> on atapci2
> ata6: [ITHREAD]
> ata7: <ATA channel 1> on atapci2
> ata7: [ITHREAD]
> pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0
> pci4: <ACPI PCI bus> on pcib4
> pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0
> pci5: <ACPI PCI bus> on pcib5
> pcib6: <ACPI PCI-PCI bridge> at device 15.0 on pci0
> pci6: <ACPI PCI bus> on pcib6
> pcib7: <ACPI PCI-PCI bridge> at device 16.0 on pci0
> pci7: <ACPI PCI bus> on pcib7
> pcib8: <ACPI PCI-PCI bridge> at device 17.0 on pci0
> pci8: <ACPI PCI bus> on pcib8
> vgapci0: <VGA-compatible display> mem 0xfb000000-0xfbffffff,0xe0000000-0xefffffff,0xfc000000-0xfcffffff irq 10 at device 18.0 on pci0
> acpi_tz0: <Thermal Zone> on acpi0
> atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0
> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
> fdc0: [FILTER]
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> atkbd0: [ITHREAD]
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: [GIANT-LOCKED]
> psm0: [ITHREAD]
> psm0: model MouseMan+, device ID 0
> pmtimer0 on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> ppc0: parallel port not found.
> powernow0: <PowerNow! K8> on cpu0
> Timecounter "TSC" frequency 2500018183 Hz quality 800
> Timecounters tick every 1.000 msec
> firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0)  (me) 
> firewire0: bus manager 0 
> usbus0: 12Mbps Full Speed USB v1.0
> usbus1: 480Mbps High Speed USB v2.0
> usbus2: 12Mbps Full Speed USB v1.0
> usbus3: 480Mbps High Speed USB v2.0
> ugen0.1: <nVidia> at usbus0
> uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
> ugen1.1: <nVidia> at usbus1
> uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
> ugen2.1: <nVidia> at usbus2
> uhub2: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
> ugen3.1: <nVidia> at usbus3
> uhub3: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
> uhub0: 6 ports with 6 removable, self powered
> uhub2: 6 ports with 6 removable, self powered
> ad6: 476940MB <SAMSUNG HD501LJ CR100-13> at ata3-master UDMA100 SATA 3Gb/s
> uhub1: 6 ports with 6 removable, self powered
> uhub3: 6 ports with 6 removable, self powered
> unknown: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 sks=0x40 0x00 0x01
> cd0 at ata0 bus 0 scbus1 target 0 lun 0
> cd0: <TOSHIBA CD-ROM XM-6602B 1017> Removable CD-ROM SCSI-0 device 
> cd0: 3.300MB/s transfers
> cd0: cd present [138590 x 2048 byte records]
> da0 at ahc0 bus 0 scbus0 target 0 lun 0
> da0: <SEAGATE ST336706LW 010A> Fixed Direct Access SCSI-3 device 
> da0: 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
> da0: Command Queueing enabled
> da0: 35003MB (71687370 512 byte sectors: 255H 63S/T 4462C)
> Trying to mount root from ufs:/dev/ad6s1a
> nfe0: link state changed to UP

Do you have something called Cool'n'Quiet enabled in your BIOS?  Solely
as a test, try disabling it.  If this solves the problem, add "device
cpufreq" to your kernel configuration, buildkernel/installkernel, and
re-enable the option in your BIOS.  It's worth a shot anyway.

Some reference threads which are old and may be centralised around Intel
CPUs but I imagine the same problem can happen on any platform which
adjusts clock frequencies.

http://lists.freebsd.org/pipermail/freebsd-questions/2006-October/133253.html
http://www.mail-archive.com/freebsd-stable@freebsd.org/msg95530.html

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100928051154.GA73859>