Date: Thu, 16 Mar 2017 18:13:34 +0100 From: "O. Hartmann" <ohartmann@walstatt.org> To: freebsd-current@freebsd.org Subject: Re: ntpd dies nightly on a server with jails Message-ID: <20170316181318.605a3e4f@thor.intern.walstatt.dynvpn.de> In-Reply-To: <201703152012.v2FKCbvg078762@slippy.cwsent.com> References: <20170315071724.78bb0bdc@freyja.zeit4.iv.bundesimmobilien.de> <201703152012.v2FKCbvg078762@slippy.cwsent.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/AnUc3gF=nKU3g_pM6PEHylK Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am Wed, 15 Mar 2017 13:12:37 -0700 Cy Schubert <Cy.Schubert@komquats.com> schrieb: Thank you very much for responding. > Hi O.Hartmann, >=20 > I'll try to answer as much as I can in the noon hour I have left. >=20 > In message <20170315071724.78bb0bdc@freyja.zeit4.iv.bundesimmobilien.de>,= =20 > "O. H > artmann" writes: > > Running a host with several jails on recent CURRENT (12.0-CURRENT #8 r3= 15187: > > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis. > >=20 > > The box is an older two-socket Fujitsu server equipted with two four-co= re > > Intel(R) Xeon(R) CPU L5420 @ 2.50GHz. > >=20 > > The box has several jails, each jail does NOT run service ntpd. Each ja= il has > > its dedicated loopback, lo1 throughout lo5 (for the moment) with dedica= ted IP > > : > > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not). > >=20 > > The host itself has two main NICs, broadcom based. bcm0 is dedicated to= the > > host, bcm1 is shared amongst the jails: each jail has an IP bound to bc= m1 via > > whihc the jails communicate with the network. > >=20 > > I try to capture log informations via syslog, but FreeBSD's ntpd seems = to be > > very, very sparse with such informations, coverging to null - I can't s= ee > > anything suiatble in the logs why NTPD dies almost every night leaving = the > > system with a wild reset of time. Sometimes it is a gain of 6 hours, so= metime > > s > > it is only half an hour. I leave the box at 16:00 local time usually an= d take > > care again at ~ 7 o'clock in the morning local time. =20 >=20 > We will need to turn on debugging. Unfortunately debug code is not compil= ed=20 > into the binary. We have two options. You can either update=20 > src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exa= ct=20 > same ntp) with the DEBUG option -- this is probably simpler. Then enable= =20 > debug with -d and -D. -D increases verbosity. I just committed a debug=20 > option to both ntp ports to assist here. I realised that this wasn't the case when I turned the switch on ntpd simpl= y on - the output was the same as before. So I feared that I have to recompile with de= bugging explicitely switched on ... >=20 > Next question: Do you see any indication of a core dump? I'd be intereste= d=20 > in looking at it if possible. I have, intentionally, switched off core dumping. I will switch that on. Bu= t in all messages being logged and searched for "ntp", I never saw any error resulti= ng in a crash, but I'll look tomorrow closer. >=20 > >=20 > > When the clock is floating that wild, in all cases ntpd isn't running a= ny mor > > e. > > I try to restart with options -g and -G to adjust the time quickly at t= he > > beginning, which works fine. =20 >=20 > This is disconcerting. If your clock is floating wildly without ntpd=20 > running there are other issues that might be at play here. At most the=20 > clock might drift a little, maybe a minute or two a day but not by a lot.= =20 > Does the drift cause your clocks to run fast or slow? Today, I switched off ntpd on the jail-bearing host. After an hour or so th= e gain of the clock wasn't apart from my DCF77 clock - at least not within the granularit= y of the minutes. So I switched on ntpd again. After a while, I checked status via "= service ntpd status", and I would bet off my ass that the result was "is running with PI= D XXX". The next minute I did the same, the clock was off by almost half an hour (alway= s behind real time, never before!) and ntpd wasn't running. A coincidence? I can not tell= , I did a "clear" on the terminal :-( But that was strange. >=20 > >=20 > > Apart from possible misconfigurations of the jails (I'm quite new to ja= ils an > > d > > their pitfalls), I was wondering what causes ntpd to die. i can't deter= mine > > exactly the time of its death, so it might be related to diurnal/period= ic > > processes (I use only the most vanilla configurations on periodic, exce= pt for > > checking ZFS's scrubbing enabled). =20 >=20 > As I'm a little rushed for time, I didn't catch whether the jails=20 > themselves were also running ntpd... just thought I'd ask. I don't see ho= w=20 > zfs scrubbing or any other periodic scripts could cause this. The jails do not have ntpd running since all the docs I read tell, that the= jail-bearing host provides the time. So I checked/ double-checked, that they do not have= ntpd running. By mentioning ZFS and scrubbing I was more thinking about time-adjusting pe= riodic jobs like adjkerntz or friends - if there are any I'm not aware of. I see, it's = more confusing. >=20 > >=20 > > I'ven't had the chance to check whether the hardware is completely all = right, > > but from a superficial point of view there is no issue with high gain o= f the > > internal clock or other hardware issues. =20 >=20 > It's probably a good idea to check. I don't think that would cause ntpd a= ny=20 > gas. I've seen RTC battery messages on my gear which haven't caused ntpd= =20 > any problem. I have two machines which complain about RTC battery being=20 > dead, where in fact I have replaced the batteries and the messages still= =20 > are displayed at boot. I'm not sure if it's possible for a kernel to dama= ge=20 > the RTC. In my case that doesn't cause ntpd any problems. It's probably=20 > good to check anyway. The server hardware in question is quite old, from 2008/09, so it has seen = its best days long ago. I haven't checked so far the battery status, but that is next I d= o or change the battery cell pro actively for a fresh one. My fear is that one of the time servers I try to sync with is compromised a= nd serving wrong times. But I have no clue on that. I have my difficulties understanding the logic behind ntp.conf regarding "r= estrict". It might be possible that I misconfigured in a very stupid way (due to lack of understanding) ntpd that way, that it could be set by any outer-world times= erver. I'll check this tomorrow while in office again. >=20 > >=20 > > If there are known issues with jails (the problem occurs since I use th= ose), > > advice is appreciated. =20 >=20 > Not that I know of. >=20 >=20 I'll check the jails anyway. I was asking since I use on 5 jails lo1 - lo5 = with each having a dedicated loopback IP (127.0.1.1 - 127.0.5.1). And the jail host i= s reporting listening on all (cloned) loopback interfaces with UDP4, port 123. I have another machine in the very same network segment, but without jails.= I'll take the configuration and let that box run a while (it is more recent hardware = (Haswell XEON) and the very same recent CURRENT). =20 Kind regards, Oliver --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/AnUc3gF=nKU3g_pM6PEHylK Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iLQEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWMrHvgAKCRDS528fyFhY lJRBAf4xyUZwnc80eZDltNc3Kn1rOg0JwrGA/TcPRjD7aCOL8oLhbXWrvw0RTvbi qh95PkDiIL+zOO2HckAJkmo0LGs9AfdXeEvNlMarWmCqKIW+7WKGOmgA6BMxxrj5 YH1ExY2ggUZ18/dxVmqcFnIavrg1E3cSM6trEkg3MPOoHRvEpWY= =rDBW -----END PGP SIGNATURE----- --Sig_/AnUc3gF=nKU3g_pM6PEHylK--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170316181318.605a3e4f>