Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Nov 2013 08:51:57 -0800
From:      Jason Evans <jasone@freebsd.org>
To:        Diane Bruce <db@db.net>
Cc:        Tim Kientzle <tim@kientzle.com>, freebsd-arm@FreeBSD.org, Ian Lepore <ian@FreeBSD.org>, Howard Su <howard0su@gmail.com>
Subject:   Re: sshd crash
Message-ID:  <2F2E1775-A459-4D0F-A464-F41B8A7EAB9B@freebsd.org>
In-Reply-To: <20131102153953.GA39106@night.db.net>
References:  <CAAvnz_rj43Ww6=mMfnp2u5TA2pWb20vWOqyAtuK08wgzy0dH6A@mail.gmail.com> <1383313834.31172.65.camel@revolution.hippie.lan> <CAHNYxxMMF_GJv10drYuQFO%2Bav%2BTdp8OBvJfFZObEZ=tgaBovSA@mail.gmail.com> <1383328423.31172.92.camel@revolution.hippie.lan> <CAHNYxxNiuKP8wfTaZuL%2BBXiLcYA9eU3LBb-659ZBYr-WBSmZeQ@mail.gmail.com> <1383343354.31172.102.camel@revolution.hippie.lan> <EB18203F-C516-4917-9AA4-DBA6E66DAAB6@kientzle.com> <1383399220.31172.116.camel@revolution.hippie.lan> <20131102153953.GA39106@night.db.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 2, 2013, at 8:39 AM, Diane Bruce <db@db.net> wrote:
> On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote:
>>=20
>> I'm not sure it's a mundane stray-write either.  The routine that's
>> asserting is checking to see if the contents of a page are all-zero
>> because a jemalloc internal flag is set that says it should be.  I =
had
>> the routine print the non-zero data it found, and it looks like this:
>>=20
>> not-zero at 0 0x20c99000 =3D 0x20800a00
>> not-zero at 1 0x20c99004 =3D 0x00000001
>> not-zero at 2 0x20c99008 =3D 0x0000002f
>> not-zero at 3 0x20c9900c =3D 0xffffffff
>> not-zero at 4 0x20c99010 =3D 0x00007fff
>> not-zero at 5 0x20c99014 =3D 0x00000003
>> not-zero at 96 0x20c99180 =3D 0x5a5a5a5a
>> not-zero at 97 0x20c99184 =3D 0x5a5a5a5a
>> not-zero at 98 0x20c99188 =3D 0x5a5a5a5a
>>=20
>> The 0x5a continues to the end of the page.  So jemalloc has metadata
>> that says it thinks the page is all-zeroes, and the page is a mix of
>> data and some zeroes and the 5a junk-fill byte.  It seems more like =
the
>> metadata is in error somehow.  (Maybe a stray write hit the =
metadata.)

This looks to me like the sort of thing that would happen if the chunk =
page map were corrupted.  This could happen due to a double free, =
freeing an interior pointer of a multi-page allocation, or a variety of =
more complicated errors.  The page is filled with 0x5a bytes, yet =
jemalloc thinks the page should contain 0x00 bytes, and that implies =
that the chunk page table claims this is the first use of the page since =
it was mapped.

Does this problem reproduce on amd64?  If so, I'll dig in and figure out =
if jemalloc is to blame.  If not on amd64, given enough hand holding re: =
hardware acquisition and configuration I can probably be convinced to =
set up an ARM system.

Thanks,
Jason=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2F2E1775-A459-4D0F-A464-F41B8A7EAB9B>