Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Nov 2013 10:39:53 -0500
From:      Diane Bruce <db@db.net>
To:        Ian Lepore <ian@FreeBSD.org>
Cc:        Tim Kientzle <tim@kientzle.com>, jasone@FreeBSD.org, freebsd-arm@FreeBSD.org, Howard Su <howard0su@gmail.com>
Subject:   Re: sshd crash
Message-ID:  <20131102153953.GA39106@night.db.net>
In-Reply-To: <1383399220.31172.116.camel@revolution.hippie.lan>
References:  <CAAvnz_rj43Ww6=mMfnp2u5TA2pWb20vWOqyAtuK08wgzy0dH6A@mail.gmail.com> <1383313834.31172.65.camel@revolution.hippie.lan> <CAHNYxxMMF_GJv10drYuQFO%2Bav%2BTdp8OBvJfFZObEZ=tgaBovSA@mail.gmail.com> <1383328423.31172.92.camel@revolution.hippie.lan> <CAHNYxxNiuKP8wfTaZuL%2BBXiLcYA9eU3LBb-659ZBYr-WBSmZeQ@mail.gmail.com> <1383343354.31172.102.camel@revolution.hippie.lan> <EB18203F-C516-4917-9AA4-DBA6E66DAAB6@kientzle.com> <1383399220.31172.116.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote:
> On Fri, 2013-11-01 at 22:35 -0700, Tim Kientzle wrote:
> > On Nov 1, 2013, at 3:02 PM, Ian Lepore <ian@freebsd.org> wrote:
> > 
> > > On Sat, 2013-11-02 at 02:40 +0800, Jia-Shiun Li wrote:
> > >> On Sat, Nov 2, 2013 at 1:53 AM, Ian Lepore <ian@freebsd.org> wrote:
> > >>> On Sat, 2013-11-02 at 01:44 +0800, Jia-Shiun Li wrote:
> > >>>> may I add: putty causes this to happen. mine 0.62. But ssh from another
> > >>>> FreeBSD host has no problem.
> > >>>> 
> > >>>> I suspect it to be some issues related to memory or malloc issues
> > >>>> specific to bbb. 'tmux a -d' without existing detached sessions
> > >>>> causes tmux client to core dump. But sshd and it are both fine on rpi.
> > >>>> 
> > >>>> -Jia-Shiun.
> > >>> 
> > >>> This is the first I've heard of being able to ssh to an arm platform
> > >>> that doesn't have PrivSep disabled, since about July or so.  I've never
> > >>> heard a report yet that anything on the client side could make a
> > >>> difference.
> > >>> 
> > >>> It's definitely not a beaglebone thing, it happens on every arm board
> > >>> I've got... dreamplug, rpi, bbw, imx53, wandboard.
> > >> 
> > >> 
> > >> Ok let me make sure I did not mix things up. ;)
> > >> 
> > >> IIRC  I once saw similar issue on rpi shortly. But after another
> > >> weekly update it was gone. I did not pay too much attention on rpi,
> > >> and thought it was bbb specific.
> > >> 
> > >> I did not change sshd_config, UsePrivilegeSeparation supposed
> > >> remaining on as default is.
> > 
> > I started looking into it a couple of months ago but didn't get
> > very far; Diane Bruce got a lot further than I did.
> > 
> > If I recall correctly, it started up when the malloc libc symbols
> > were changed.  That may have altered what malloc implementation
> > sshd used.
> > 
> > So it could be a long-standing stray write that jemalloc just
> > happens to detect.
> > 
> > It could also be related to locking (there's some multi-threaded
> > crypto code in sshd that may be involved).
> 
> There's lots of stuff with lock in the name, but I don't think there are
> actually any threads involved in sshd, just forking.  ldd says sshd
> doesn't link to libthr.
> 
> I'm not sure it's a mundane stray-write either.  The routine that's
> asserting is checking to see if the contents of a page are all-zero
> because a jemalloc internal flag is set that says it should be.  I had
> the routine print the non-zero data it found, and it looks like this:
> 
> not-zero at 0 0x20c99000 = 0x20800a00
> not-zero at 1 0x20c99004 = 0x00000001
> not-zero at 2 0x20c99008 = 0x0000002f
> not-zero at 3 0x20c9900c = 0xffffffff
> not-zero at 4 0x20c99010 = 0x00007fff
> not-zero at 5 0x20c99014 = 0x00000003
> not-zero at 96 0x20c99180 = 0x5a5a5a5a
> not-zero at 97 0x20c99184 = 0x5a5a5a5a
> not-zero at 98 0x20c99188 = 0x5a5a5a5a
> 
> The 0x5a continues to the end of the page.  So jemalloc has metadata
> that says it thinks the page is all-zeroes, and the page is a mix of
> data and some zeroes and the 5a junk-fill byte.  It seems more like the
> metadata is in error somehow.  (Maybe a stray write hit the metadata.)
> 
> -- Ian
> 

I did a ln -s "quarantine:16000000" /etc/malloc.conf
which also works. This led me down the garden path of thinking
it might be a use after free. This was the conclusion jasone also
came to. Which led to me reporting this possibility to secteam and des.


http://docs.freebsd.org/cgi/getmsg.cgi?fetch=199241+0+archive/2013/freebsd-arm/20130728.freebsd-arm

Nevertheless, running efence from ports failed to come up with
any use after free.

I put together some notes for des at
http://www.freebsd.org/~db/fordes

The rev is question

http://svnweb.freebsd.org/base?view=revision&revision=250991
> 

When jemalloc was turned on for userland. There existed an older malloc
(also by jasone)

/usr/src/lib/libc/stdlib/malloc.c

I agree with Ian, it is not thread locking. I have a thread test
program which does not show any faults in our thread locking.

Yes we it is purely associated with the fork. 

zbb@ also reported a similar problem with another platform.

===
Hello.

I'm sending you the logs. Please see below.

Best regards
Zbyszek Bodek


1.
=======
--- ExprConstant.o ---
<jemalloc>:
/home/zbb/projects/armsp/freebsd-arm-superpages/lib/libc/../../contrib/jemalloc/include/jemalloc/internal/arena.h:757:
Failed assertion: "binind < NBINS"
./StmtNodes.inc.h: In member function 'RetTy clang::StmtVisitorBase<Ptr,
ImplClass, RetTy>::Visit(typename Ptr<clang::Stmt>::type) [with Ptr =
clang::make_const_ptr, ImplClass = <unnamed>::LValueExprEvaluator, RetTy =
bool]':
./StmtNodes.inc.h:873: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html>; for instructions.
*** [ExprConstant.o] Error code 1

make[6]: stopped in /usr/src/lib/clang/libclangast
make[6]: stopped in /usr/src/lib/clang/libclangast
*** [all] Error code 2

make[5]: stopped in /usr/src/lib/clang
1 error

make[5]: stopped in /usr/src/lib/clang
*** [all] Error code 2

make[4]: stopped in /usr/src/lib
1 error

make[4]: stopped in /usr/src/lib
A failure has been detected in another branch of the parallel make

make[3]: stopped in /usr/src
*** [libraries] Error code 2

make[2]: stopped in /usr/src
1 error

make[2]: stopped in /usr/src
*** [_libraries] Error code 2

make[1]: stopped in /usr/src
1 error

make[1]: stopped in /usr/src
*** [buildworld] Error code 2

make: stopped in /usr/src
1 error


2.
=======
--- ExprConstant.o ---
<jemalloc>:
/home/zbb/projects/armsp/freebsd-arm-superpages/lib/libc/../../contrib/jemalloc/include/jemalloc/internal/arena.h:757:
Failed assertion: "binind < NBINS"
/usr/src/lib/clang/libclangast/../../../contrib/llvm/tools/clang/lib/AST/ExprConstant.cpp:
In member function 'RetTy<unnamed>::ExprEvaluatorBase<Derived,
RetTy>::VisitCallExpr(const clang::CallExpr*) [with Derived =
<unnamed>::IntExprEvaluator, RetTy = bool]':
/usr/src/lib/clang/libclangast/../../../contrib/llvm/tools/clang/lib/AST/ExprConstant.cpp:3190:
internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html>; for instructions.
*** [ExprConstant.o] Error code 1

----- End forwarded message -----

There is also an open bug report for that one.
>From both zbb and Matthias Meyser see PR 182060

It's time to bring in jasone again I think and I have included him
on the cc. jemalloc has a number
of fill places using the same pattern. I modified the pattern
to be different in order to track what we are seeing. Where I have
left it now is I think it might be associated with the thread cache 
code, because the pattern I see comes from that branch of his code.

I have copious notes here but will have to dig them up.

Both Ian and I were rather hoping zbb@ had fixed this one when
he fixed a stupid in the arm vm, Ian tells me it is still there.

- Diane
-- 
- db@FreeBSD.org db@db.net http://www.db.net/~db



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131102153953.GA39106>