Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 02 Nov 2013 07:33:40 -0600
From:      Ian Lepore <ian@FreeBSD.org>
To:        Tim Kientzle <tim@kientzle.com>
Cc:        freebsd-arm@FreeBSD.org, Howard Su <howard0su@gmail.com>
Subject:   Re: sshd crash
Message-ID:  <1383399220.31172.116.camel@revolution.hippie.lan>
In-Reply-To: <EB18203F-C516-4917-9AA4-DBA6E66DAAB6@kientzle.com>
References:  <CAAvnz_rj43Ww6=mMfnp2u5TA2pWb20vWOqyAtuK08wgzy0dH6A@mail.gmail.com> <1383313834.31172.65.camel@revolution.hippie.lan> <CAHNYxxMMF_GJv10drYuQFO%2Bav%2BTdp8OBvJfFZObEZ=tgaBovSA@mail.gmail.com> <1383328423.31172.92.camel@revolution.hippie.lan> <CAHNYxxNiuKP8wfTaZuL%2BBXiLcYA9eU3LBb-659ZBYr-WBSmZeQ@mail.gmail.com> <1383343354.31172.102.camel@revolution.hippie.lan> <EB18203F-C516-4917-9AA4-DBA6E66DAAB6@kientzle.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2013-11-01 at 22:35 -0700, Tim Kientzle wrote:
> On Nov 1, 2013, at 3:02 PM, Ian Lepore <ian@freebsd.org> wrote:
> 
> > On Sat, 2013-11-02 at 02:40 +0800, Jia-Shiun Li wrote:
> >> On Sat, Nov 2, 2013 at 1:53 AM, Ian Lepore <ian@freebsd.org> wrote:
> >>> On Sat, 2013-11-02 at 01:44 +0800, Jia-Shiun Li wrote:
> >>>> may I add: putty causes this to happen. mine 0.62. But ssh from another
> >>>> FreeBSD host has no problem.
> >>>> 
> >>>> I suspect it to be some issues related to memory or malloc issues
> >>>> specific to bbb. 'tmux a -d' without existing detached sessions
> >>>> causes tmux client to core dump. But sshd and it are both fine on rpi.
> >>>> 
> >>>> -Jia-Shiun.
> >>> 
> >>> This is the first I've heard of being able to ssh to an arm platform
> >>> that doesn't have PrivSep disabled, since about July or so.  I've never
> >>> heard a report yet that anything on the client side could make a
> >>> difference.
> >>> 
> >>> It's definitely not a beaglebone thing, it happens on every arm board
> >>> I've got... dreamplug, rpi, bbw, imx53, wandboard.
> >> 
> >> 
> >> Ok let me make sure I did not mix things up. ;)
> >> 
> >> IIRC  I once saw similar issue on rpi shortly. But after another
> >> weekly update it was gone. I did not pay too much attention on rpi,
> >> and thought it was bbb specific.
> >> 
> >> I did not change sshd_config, UsePrivilegeSeparation supposed
> >> remaining on as default is.
> 
> I started looking into it a couple of months ago but didn't get
> very far; Diane Bruce got a lot further than I did.
> 
> If I recall correctly, it started up when the malloc libc symbols
> were changed.  That may have altered what malloc implementation
> sshd used.
> 
> So it could be a long-standing stray write that jemalloc just
> happens to detect.
> 
> It could also be related to locking (there's some multi-threaded
> crypto code in sshd that may be involved).

There's lots of stuff with lock in the name, but I don't think there are
actually any threads involved in sshd, just forking.  ldd says sshd
doesn't link to libthr.

I'm not sure it's a mundane stray-write either.  The routine that's
asserting is checking to see if the contents of a page are all-zero
because a jemalloc internal flag is set that says it should be.  I had
the routine print the non-zero data it found, and it looks like this:

not-zero at 0 0x20c99000 = 0x20800a00
not-zero at 1 0x20c99004 = 0x00000001
not-zero at 2 0x20c99008 = 0x0000002f
not-zero at 3 0x20c9900c = 0xffffffff
not-zero at 4 0x20c99010 = 0x00007fff
not-zero at 5 0x20c99014 = 0x00000003
not-zero at 96 0x20c99180 = 0x5a5a5a5a
not-zero at 97 0x20c99184 = 0x5a5a5a5a
not-zero at 98 0x20c99188 = 0x5a5a5a5a

The 0x5a continues to the end of the page.  So jemalloc has metadata
that says it thinks the page is all-zeroes, and the page is a mix of
data and some zeroes and the 5a junk-fill byte.  It seems more like the
metadata is in error somehow.  (Maybe a stray write hit the metadata.)

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1383399220.31172.116.camel>