Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Jul 2015 05:09:56 -0700
From:      David Wolfskill <david@catwhisker.org>
To:        current@freebsd.org
Subject:   Segmentation fault running ntpd
Message-ID:  <20150718120956.GC1155@albert.catwhisker.org>

next in thread | raw e-mail | index | archive | help

--F8feX0NACk7Ps8wc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Lousy timing (no pun intended -- it's early in the day for me),
given the recent MFC, but as I was booting my laptop to yesterday's
head:

FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127  r2856=
52M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015     root@g1-245.catwhisker=
=2Eorg:/common/S3/obj/usr/src/sys/CANARY  amd64

to build today's head (@r285670; still in progress as I type), I
happened to note [Oh, great -- we can no longer copy/paste from
console now??!?  Fine, I'll transcribe by hand.... :-(]:

=2E..
bound to 172.17.1.245 -- renewal in 43200 seconds.
pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
Starting Network: lo0 em0 iwn0 lagg0.
=2E..

Trying to examine the /ntpd.core, I see:
root@g1-245:/ # gdb `which ntpd` ntpd.core=20
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols f=
ound)...
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...d=
one.
Loaded symbols for /lib/libcrypto.so.7
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...=
done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x00000008011cd6a0 in sbrk () from /lib/libc.so.7
[New Thread 801c07400 (LWP 100122/<unknown>)]
[New Thread 801c06400 (LWP 100120/<unknown>)]
(gdb) bt
#0  0x00000008011cd6a0 in sbrk () from /lib/libc.so.7
#1  0x00000008ccbd4f34 in ?? ()
#2  0x0000000000000005 in ?? ()
#3  0x0000000801800448 in ?? ()
#4  0x00000008011ca888 in sbrk () from /lib/libc.so.7
#5  0x00000008018000c8 in ?? ()
#6  0x00000008018000c0 in ?? ()
#7  0x0000000000000208 in ?? ()
#8  0x0000000801c32fb0 in ?? ()
#9  0x0000000000000001 in ?? ()
#10 0x0000000801cc20c8 in ?? ()
#11 0x0000000000000030 in ?? ()
#12 0x0000000801cc20c8 in ?? ()
#13 0x00007fffffffe480 in ?? ()
#14 0x00000008011cd240 in sbrk () from /lib/libc.so.7
#15 0x0000000000000280 in ?? ()
#16 0x00000008014bbc70 in malloc_message () from /lib/libc.so.7
#17 0x00000008018000c0 in ?? ()
#18 0x0000000801800448 in ?? ()
#19 0x0000000000000032 in ?? ()
#20 0x0000000801800458 in ?? ()
#21 0x00000008014bbc68 in malloc_message () from /lib/libc.so.7
#22 0x0000000801cc2000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#23 0x00000008014bba60 in malloc_message () from /lib/libc.so.7
#24 0x0000000801cc20d8 in ?? ()
#25 0x00000000000000a0 in ?? ()
#26 0x0000000000000208 in ?? ()
#27 0x00007fffffffe4d0 in ?? ()
#28 0x00000008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
Previous frame inner to this frame (corrupt stack?)
(gdb)=20

which seems... well, not especially useful, as far as I can tell.


This is (as mentioned above) on my laptop; as such, it is expected to
"wander" from one network to another.  Accordingly:

* Since it could be connected to a network I do not control, I use a
  packet filter (IPFW, in my case) to reduce my exposure from a
  possibly-hostile network.

* Rather than enabling ntpd in /etc/rc.conf, I use
  /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP
  lease.  (For networks I control, I also set up the DHCP server to
  advertise what NTP server the DHCP clients should use, but the code in
  dhclient-exit-hooks merely prefers that, rather han requiring it.)

* In my world-view -- at least for networks I control -- DNS zone files
  are the Source of Truth with respect to hostname <-> IP address
  correspondence, and Dynamic DNS is Evil.  I populate my zone files
  with appropriate A & PTR records so that every assignable DHCP
  address has a PTR record, and the hostname to which it points has
  an A record that points back to that IP address.  Accordingly, I
  also use /etc/dhclient-exit-hooks so the laptop can find out what
  its hostname is, and set it accordingly.

Mind, I've been doing the above for well over a decade, so that doesn't
qualify as "new."

And most of the time, it Just Works (which is a significant reason I
keep doing it).

A couple of other things that are more recent, and possibly of
relevance:

* As alluded to above, I have the em0 & wlan0 (iwn(4)) NICs set up using
  Link Aggregation in "failover" mode.  In practice, I rarely use
  the em0 (wired) NIC -- I had originally done that based on a
  misperception of how I thought things were set up at work, and
  then just left the configuration alone and relied on the wireless
  NIC.  (At home, I have things set up so that the failover would
  work, but doing so would be a little awkward for reasons that
  aren't relevant here.)

* I have the laptop configured to run xdm(1)... after the DHCP lease is
  acquired and the hostname is set.  My ~/.xsession script is set
  up so it fires up ssh-agent, requests a passphrase, and then
  (among other things) establishes an SSH session to the "mail hub"
  at home and re-establish a tmux session where I'm running mutt
  to handle my email.  I've noticed that in head, these connections
  sometimes fail to get initialized, and sometimes will time out,
  while sessions started a few minutes later will have no problem.
  That seems peculiar, but was sufficiently ... well, "nebulous" that
  I didn't think it warranted a whine of its own here.  But on the
  chance that it's related to ntpd giving up the ghost prematurely,
  it seemed but a reasonable exercise of "Full Disclosure" to mention
  it in this context -- even though it's also something I've been doing
  since the (late) 1990s.

So: Any suggestions for either diagnosing what the root cause is or
changing the configuration so that the failure no longer occurs?

Thanks!

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

--F8feX0NACk7Ps8wc
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJVqkIUXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ4RThEMDY4QTIxMjc1MDZFRDIzODYzRTc4
QTY3RjlDOERFRjQxOTNCAAoJEIpn+cje9Bk7aOwP/RBAlxeatsFi61kEoQU7oJRv
v9IH8HrdHRBgZIjnOvh6qA9zMMVuAbCSkYVmvwZcFfj55tEiHlsINWRqprSQvE2Q
tm0LIN3nkiQ6PGCO1FE6AE8A14EpsNkciJyERkvX9Ue38Yd5WuB+c/vqrR4FvPhq
CVvn1oyDhyECo2j7Bf8hX7XZjYAKQasyp2odfslP6xnVvDrIhEPm4hB4QYBu0e3E
ImvOn10oNBunyWYUtzPa8MowXpRNBVx7UOlf1dXrXDNteym+6CyJKgPjgUGe7Wwo
faensw4gUwAdXOo61Bb93F/L+zxKyR2ojL+PWsQMYO2TOuEpnJ5aQj+wDREKtDUW
0m62JonKtyaDy6UgGEH6mUcscsSqu2l+EYEBy1DZerE1zrLiyS+arhDfG5fbdbSG
oQVr44GnVkrXV+4aJhG+wvwLZZlw8QkNx8/DdfD0HaQqtI+iSdIMgJ5bsnWJ+sLA
IEn+Hm0jRrWk7RR+li+Z55LLLlfMREZRuMCCtEROh2NsO8rmkgwiY1jOQZX3NVmM
mcKuH1VYJ4YMJWbkS8nSUFKl8tgP0pFuB8WA++T6bpqpXGwGOKKgLGHNG/oXAsuL
Z/o2M3Z0fhZjFAci0fw7UrF0XYzaeEsOGkGmMQP9FlYTHGkvFYeN6+b6y+SYCy+R
ZZhfEk2cEocznBqFyt/D
=zZTG
-----END PGP SIGNATURE-----

--F8feX0NACk7Ps8wc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150718120956.GC1155>