From owner-freebsd-arm@FreeBSD.ORG Sat Nov 2 15:40:27 2013 Return-Path: Delivered-To: freebsd-arm@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BC588F68; Sat, 2 Nov 2013 15:40:27 +0000 (UTC) (envelope-from db@db.net) Received: from diana.db.net (unknown [IPv6:2620:64:0:1:223:7dff:fea2:c8f2]) by mx1.freebsd.org (Postfix) with ESMTP id 9EA842AB4; Sat, 2 Nov 2013 15:40:27 +0000 (UTC) Received: from night.db.net (localhost [127.0.0.1]) by diana.db.net (Postfix) with ESMTP id 8E1122AA34B; Sat, 2 Nov 2013 09:40:24 -0600 (MDT) Received: by night.db.net (Postfix, from userid 1000) id B210E1CC18; Sat, 2 Nov 2013 10:39:53 -0500 (EST) Date: Sat, 2 Nov 2013 10:39:53 -0500 From: Diane Bruce To: Ian Lepore Subject: Re: sshd crash Message-ID: <20131102153953.GA39106@night.db.net> References: <1383313834.31172.65.camel@revolution.hippie.lan> <1383328423.31172.92.camel@revolution.hippie.lan> <1383343354.31172.102.camel@revolution.hippie.lan> <1383399220.31172.116.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1383399220.31172.116.camel@revolution.hippie.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Tim Kientzle , jasone@FreeBSD.org, freebsd-arm@FreeBSD.org, Howard Su X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Nov 2013 15:40:27 -0000 On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote: > On Fri, 2013-11-01 at 22:35 -0700, Tim Kientzle wrote: > > On Nov 1, 2013, at 3:02 PM, Ian Lepore wrote: > > > > > On Sat, 2013-11-02 at 02:40 +0800, Jia-Shiun Li wrote: > > >> On Sat, Nov 2, 2013 at 1:53 AM, Ian Lepore wrote: > > >>> On Sat, 2013-11-02 at 01:44 +0800, Jia-Shiun Li wrote: > > >>>> may I add: putty causes this to happen. mine 0.62. But ssh from another > > >>>> FreeBSD host has no problem. > > >>>> > > >>>> I suspect it to be some issues related to memory or malloc issues > > >>>> specific to bbb. 'tmux a -d' without existing detached sessions > > >>>> causes tmux client to core dump. But sshd and it are both fine on rpi. > > >>>> > > >>>> -Jia-Shiun. > > >>> > > >>> This is the first I've heard of being able to ssh to an arm platform > > >>> that doesn't have PrivSep disabled, since about July or so. I've never > > >>> heard a report yet that anything on the client side could make a > > >>> difference. > > >>> > > >>> It's definitely not a beaglebone thing, it happens on every arm board > > >>> I've got... dreamplug, rpi, bbw, imx53, wandboard. > > >> > > >> > > >> Ok let me make sure I did not mix things up. ;) > > >> > > >> IIRC I once saw similar issue on rpi shortly. But after another > > >> weekly update it was gone. I did not pay too much attention on rpi, > > >> and thought it was bbb specific. > > >> > > >> I did not change sshd_config, UsePrivilegeSeparation supposed > > >> remaining on as default is. > > > > I started looking into it a couple of months ago but didn't get > > very far; Diane Bruce got a lot further than I did. > > > > If I recall correctly, it started up when the malloc libc symbols > > were changed. That may have altered what malloc implementation > > sshd used. > > > > So it could be a long-standing stray write that jemalloc just > > happens to detect. > > > > It could also be related to locking (there's some multi-threaded > > crypto code in sshd that may be involved). > > There's lots of stuff with lock in the name, but I don't think there are > actually any threads involved in sshd, just forking. ldd says sshd > doesn't link to libthr. > > I'm not sure it's a mundane stray-write either. The routine that's > asserting is checking to see if the contents of a page are all-zero > because a jemalloc internal flag is set that says it should be. I had > the routine print the non-zero data it found, and it looks like this: > > not-zero at 0 0x20c99000 = 0x20800a00 > not-zero at 1 0x20c99004 = 0x00000001 > not-zero at 2 0x20c99008 = 0x0000002f > not-zero at 3 0x20c9900c = 0xffffffff > not-zero at 4 0x20c99010 = 0x00007fff > not-zero at 5 0x20c99014 = 0x00000003 > not-zero at 96 0x20c99180 = 0x5a5a5a5a > not-zero at 97 0x20c99184 = 0x5a5a5a5a > not-zero at 98 0x20c99188 = 0x5a5a5a5a > > The 0x5a continues to the end of the page. So jemalloc has metadata > that says it thinks the page is all-zeroes, and the page is a mix of > data and some zeroes and the 5a junk-fill byte. It seems more like the > metadata is in error somehow. (Maybe a stray write hit the metadata.) > > -- Ian > I did a ln -s "quarantine:16000000" /etc/malloc.conf which also works. This led me down the garden path of thinking it might be a use after free. This was the conclusion jasone also came to. Which led to me reporting this possibility to secteam and des. http://docs.freebsd.org/cgi/getmsg.cgi?fetch=199241+0+archive/2013/freebsd-arm/20130728.freebsd-arm Nevertheless, running efence from ports failed to come up with any use after free. I put together some notes for des at http://www.freebsd.org/~db/fordes The rev is question http://svnweb.freebsd.org/base?view=revision&revision=250991 > When jemalloc was turned on for userland. There existed an older malloc (also by jasone) /usr/src/lib/libc/stdlib/malloc.c I agree with Ian, it is not thread locking. I have a thread test program which does not show any faults in our thread locking. Yes we it is purely associated with the fork. zbb@ also reported a similar problem with another platform. === Hello. I'm sending you the logs. Please see below. Best regards Zbyszek Bodek 1. ======= --- ExprConstant.o --- : /home/zbb/projects/armsp/freebsd-arm-superpages/lib/libc/../../contrib/jemalloc/include/jemalloc/internal/arena.h:757: Failed assertion: "binind < NBINS" ./StmtNodes.inc.h: In member function 'RetTy clang::StmtVisitorBase::Visit(typename Ptr::type) [with Ptr = clang::make_const_ptr, ImplClass = ::LValueExprEvaluator, RetTy = bool]': ./StmtNodes.inc.h:873: internal compiler error: Abort trap Please submit a full bug report, with preprocessed source if appropriate. See for instructions. *** [ExprConstant.o] Error code 1 make[6]: stopped in /usr/src/lib/clang/libclangast make[6]: stopped in /usr/src/lib/clang/libclangast *** [all] Error code 2 make[5]: stopped in /usr/src/lib/clang 1 error make[5]: stopped in /usr/src/lib/clang *** [all] Error code 2 make[4]: stopped in /usr/src/lib 1 error make[4]: stopped in /usr/src/lib A failure has been detected in another branch of the parallel make make[3]: stopped in /usr/src *** [libraries] Error code 2 make[2]: stopped in /usr/src 1 error make[2]: stopped in /usr/src *** [_libraries] Error code 2 make[1]: stopped in /usr/src 1 error make[1]: stopped in /usr/src *** [buildworld] Error code 2 make: stopped in /usr/src 1 error 2. ======= --- ExprConstant.o --- : /home/zbb/projects/armsp/freebsd-arm-superpages/lib/libc/../../contrib/jemalloc/include/jemalloc/internal/arena.h:757: Failed assertion: "binind < NBINS" /usr/src/lib/clang/libclangast/../../../contrib/llvm/tools/clang/lib/AST/ExprConstant.cpp: In member function 'RetTy::ExprEvaluatorBase::VisitCallExpr(const clang::CallExpr*) [with Derived = ::IntExprEvaluator, RetTy = bool]': /usr/src/lib/clang/libclangast/../../../contrib/llvm/tools/clang/lib/AST/ExprConstant.cpp:3190: internal compiler error: Abort trap Please submit a full bug report, with preprocessed source if appropriate. See for instructions. *** [ExprConstant.o] Error code 1 ----- End forwarded message ----- There is also an open bug report for that one. >From both zbb and Matthias Meyser see PR 182060 It's time to bring in jasone again I think and I have included him on the cc. jemalloc has a number of fill places using the same pattern. I modified the pattern to be different in order to track what we are seeing. Where I have left it now is I think it might be associated with the thread cache code, because the pattern I see comes from that branch of his code. I have copious notes here but will have to dig them up. Both Ian and I were rather hoping zbb@ had fixed this one when he fixed a stupid in the arm vm, Ian tells me it is still there. - Diane -- - db@FreeBSD.org db@db.net http://www.db.net/~db