From owner-freebsd-stable@FreeBSD.ORG Fri May 11 21:37:27 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4849A16A400 for ; Fri, 11 May 2007 21:37:27 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 2DB3C13C45A for ; Fri, 11 May 2007 21:37:27 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.8/8.13.7) with ESMTP id l4BLbQi1067777; Fri, 11 May 2007 14:37:26 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.13.8/8.13.4/Submit) id l4BLbQ2w067776; Fri, 11 May 2007 14:37:26 -0700 (PDT) Date: Fri, 11 May 2007 14:37:26 -0700 (PDT) From: Matthew Dillon Message-Id: <200705112137.l4BLbQ2w067776@apollo.backplane.com> To: Peter Jeremy References: <20070510.225643.-713548429.imp@bsdimp.com> <200705111011.l4BABTfh061274@lurza.secnetix.de> <20070511195829.GM826@turion.vk2pj.dyndns.org> Cc: freebsd-stable@freebsd.org Subject: Re: clock problem X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2007 21:37:27 -0000 :One of our customers has 6 GPS-locked NTP servers. Only problem is :that two of them are reporting a time that is exactly one second :different to the other four. You shouldn't rely solely on your :GPS or DCF receiver - use it as the primary source but have some :secondary sources for sanity checks. (From experience, I can state :that ntpd does not behave well when presented with two stratum 1 :servers that differ by 1 second). : :--=20 :Peter Jeremy Ntp will also become really unhappy when chunky time slips occur or if the skew rate is more then a few hundred ppm. Ntp will also blow up if it loses the network link for a long period of time. It will just give up and stop making corrections entirely, even after the link is restored. This is particularly true when it is used over a dialup (me having done that for over a year in 1997, so I can tell you how badly it works). A slow time slip over a day could still be chunky, which would imply lost interrupts. Determining whether the problem is due to an 8254 rollover or lost hardclock interrupts is easy... just set 'hz' to something really high, like 20000, and see if your time goes crazy. If it does, then you have your culprit. I don't know if those bugs are still present in FreeBSD, but I do remember that I had to redo all the timekeeping in DragonFly because lost interrupts from high 'hz' settings were causing timekeeping to go nuts. That turned out to mainly be due to the same 8254 timer being used to generate the hardclock interrupt AND handle time keeping. i.e. at high hz settings one was not getting the full 1/18 second benefit from the timer. You just can't do that... it doesn't work. It is almost 100% guarenteed to result in a bad time base. It is easy to test.. just set your kern.hz in the boot env, reboot, and see if things blow up or not. Time keeping should be stable regardless of what hz is set to (provisio: never set hz less then 100). Unfortunately, all the timebases in the system have their own quirks. Blame the hardware manufacturers. The 8254 timer 0 is actually the MOST consistent of the lot, with the ACPI timer coming a close second. TSC Haha. Good luck. Nice wide timer, easy to read, but any power savings mode, including the failsafe modes that intel has when a cpu overheats, will probably blow it up. Because of that it is not really a good idea to use it as a timebase. I shake my fist at Intel! $#%$#%$#% ACPI timer Despite the hardware bugs this almost always works as a timebase, but sometimes the frequency changes when the cpu goes into power savings mode or EST, and sometimes the frequency is something other then what it is supposed to be. 8254 timer 0 Almost always works as a timebase, but only if not also used to generate high-speed interrupts (because interrupts are lost easily). Set it to a full cycle (1/18 second) and you will be fine. Set it to anything else and you will lose interrupts. The BIOS will sometimes mess with timer 0, but not as often as it messes with timer 2. 8254 timer 1 Sometimes works as a time base, but can lock older machines up. Can even lock up newer machines. Why? Because hardware manufacturers are idiots. 8254 timer 2 Often can be used as a time base, but video bios calls often try to use it too. #@%$#%$# bios makers! Still, this is better then losing interrupts when timer 0 is set to high speed so DragonFly uses timer 2 for its timebase as a default until the ACPI timer becomes available, with a boot option to use timer 1 instead. Using timer 2 as a time base means you don't get motherboard speaker sound (the old beep beep BEEP!). Do I care? No. LAPIC timer Dunno. Probably best to use it as a high speed clock interrupt which would free 8254 timer 0 to use as a time base. RTC interrupt Basically unusable. Stable, but doesn't have sufficient resolution to be helpful and takes forever to read. -Matt Matthew Dillon