Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Feb 2017 22:20:10 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 217239] head (e.g.:) -r313864 arm64 vs. jemalloc without MALLOC_PRODUCTION: various examples of tbin->avail being zero lead to SIGSEGV's
Message-ID:  <bug-217239-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D217239

            Bug ID: 217239
           Summary: head (e.g.:) -r313864 arm64 vs. jemalloc without
                    MALLOC_PRODUCTION: various examples of tbin->avail
                    being zero lead to SIGSEGV's
           Product: Base System
           Version: CURRENT
          Hardware: arm64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: markmi@dsl-only.net

Now that the fork trampoline for arm64 no longer allows
interrupts to mess up the stack pointer things run longer
and other issues show up. This report is for without
MALLOC_PRODUCTION defined during buildworld. The kernel
build is production style.

[I've not tried the contrasting case of having
MALLOC_PRODUCTION defined. I'll also note that
I tried powerpc64 and had no problems for without
MALLOC_PRODUCTION: this seems arm64 (aarch64)
specific.]

I've accumulated examples of each of the following
getting SIGSEGV in jemalloc code and producing core
files:

script
powerpd
su

(Note: I'm primarily building things from the console
so the variety of activity is fairly limimted.)

>From register values it appears that in each
tbin->avail=3D=3D0 and calculations subtract a
positive number from that.

All the script examples look like (e.g.):

(lldb) bt
* thread #1: tid =3D 100143, 0x00000000404e9f08
libc.so.7`__je_tcache_dalloc_large(tsd=3D0x00000000405fd010,
tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a25600, size=3D<unavailable>,
slow_path=3D<unavailable>) + 228 at tcache.h:451, name =3D 'script', stop r=
eason =3D
signal SIGSEGV
  * frame #0: 0x00000000404e9f08
libc.so.7`__je_tcache_dalloc_large(tsd=3D0x00000000405fd010,
tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a25600, size=3D<unavailable>,
slow_path=3D<unavailable>) + 228 at tcache.h:451
    frame #1: 0x0000000040510cfc libc.so.7`__free(ptr=3D0x0000000040a25600)=
 + 124
at jemalloc_jemalloc.c:2016
    frame #2: 0x000000004058c5d8 libc.so.7`cleanfile(fp=3D0x00000000405e4cf=
0,
c=3D<unavailable>) + 96 at fclose.c:62
    frame #3: 0x000000004058c69c libc.so.7`fclose(fp=3D0x00000000405e4cf0) =
+ 60
at fclose.c:134
    frame #4: 0x000000000040255c script`done(eno=3D0) + 268 at script.c:375
    frame #5: 0x000000000040218c script`main [inlined] finish + 2772 at
script.c:323
    frame #6: 0x0000000000402154 script`main(argc=3D<unavailable>,
argv=3D<unavailable>) + 2716 at script.c:299
    frame #7: 0x0000000000401610 script`__start + 360
    frame #8: 0x0000000040414658 ld-elf.so.1`.rtld_start + 24 at
rtld_start.S:41
(lldb) down
frame #0: 0x00000000404e9f08
libc.so.7`__je_tcache_dalloc_large(tsd=3D0x00000000405fd010,
tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a25600, size=3D<unavailable>,
slow_path=3D<unavailable>) + 228 at tcache.h:451
   448          }
   449          assert(tbin->ncached < tbin_info->ncached_max);
   450          tbin->ncached++;
-> 451          *(tbin->avail - tbin->ncached) =3D ptr;
   452=20=20
   453          tcache_event(tsd, tcache);
   454  }

They are from long running builds with lots of output
in the typescript generated and the crash happens during
the cleanup at the end.

All the powerd examples look like (e.g.):

(lldb) bt
* thread #1: tid =3D 100099, 0x00000000404eaa10
libc.so.7`__je_tcache_dalloc_small(tsd=3D0x00000000405fe010,
tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a1e000, binind=3D2,
slow_path=3D<unavailable>) + 164 at tcache.h:421, name =3D 'powerd', stop r=
eason =3D
signal SIGSEGV
  * frame #0: 0x00000000404eaa10
libc.so.7`__je_tcache_dalloc_small(tsd=3D0x00000000405fe010,
tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a1e000, binind=3D2,
slow_path=3D<unavailable>) + 164 at tcache.h:421
    frame #1: 0x0000000040511cfc libc.so.7`__free(ptr=3D0x0000000040a1e000)=
 + 124
at jemalloc_jemalloc.c:2016
    frame #2: 0x000000000040201c powerd`main(argc=3D<unavailable>,
argv=3D<unavailable>) + 3332 at powerd.c:786
    frame #3: 0x0000000000401270 powerd`__start + 360
    frame #4: 0x0000000040415658 ld-elf.so.1`.rtld_start + 24 at
rtld_start.S:41
(lldb) down
frame #0: 0x00000000404eaa10
libc.so.7`__je_tcache_dalloc_small(tsd=3D0x00000000405fe010,
tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a1e000, binind=3D2,
slow_path=3D<unavailable>) + 164 at tcache.h:421
   418          }
   419          assert(tbin->ncached < tbin_info->ncached_max);
   420          tbin->ncached++;
-> 421          *(tbin->avail - tbin->ncached) =3D ptr;
   422=20=20
   423          tcache_event(tsd, tcache);
   424  }

So every similar to the script failures: these are during the
cleanup at the end (but dalloc large vs. small).

All the su examples look like (e.g.):

(lldb) bt
* thread #1: tid =3D 100156, 0x000000004054b1dc
libc.so.7`__je_arena_tcache_fill_small(tsdn=3D<unavailable>, arena=3D<unava=
ilable>,
tbin=3D<unavailable>, binind=3D<unavailable>, prof_accumbytes=3D<unavailabl=
e>) + 212
at jemalloc_arena.c:2442, name =3D 'su', stop reason =3D signal SIGSEGV
  * frame #0: 0x000000004054b1dc
libc.so.7`__je_arena_tcache_fill_small(tsdn=3D<unavailable>, arena=3D<unava=
ilable>,
tbin=3D<unavailable>, binind=3D<unavailable>, prof_accumbytes=3D<unavailabl=
e>) + 212
at jemalloc_arena.c:2442
    frame #1: 0x000000004052e5a0 libc.so.7`__je_tcache_alloc_small [inlined]
__je_tcache_alloc_small_hard(tsdn=3D<unavailable>, arena=3D0x00000000408001=
40,
tbin=3D0x0000000040a0d0a8, binind=3D4) + 20 at jemalloc_tcache.c:79
    frame #2: 0x000000004052e58c
libc.so.7`__je_tcache_alloc_small(tsd=3D0x0000000040647010,
arena=3D0x0000000040800140, tcache=3D0x0000000040a0d000, size=3D<unavailabl=
e>,
binind=3D4, zero=3Dfalse, slow_path=3Dtrue) + 332 at tcache.h:298
    frame #3: 0x0000000040555184 libc.so.7`__malloc(size=3D1) + 184 at
jemalloc_jemalloc.c:1645
    frame #4: 0x000000004046979c
libpam.so.6`openpam_vasprintf(str=3D0x0000ffffffffe520, fmt=3D"", ap=3D<una=
vailable>)
+ 92 at openpam_vasprintf.c:53
    frame #5: 0x0000000040469714
libpam.so.6`openpam_asprintf(str=3D<unavailable>, fmt=3D<unavailable>) + 12=
0 at
openpam_asprintf.c:52
    frame #6: 0x000000004046960c libpam.so.6`_openpam_log(level=3D<unavaila=
ble>,
func=3D"", fmt=3D"") + 224 at openpam_log.c:125
    frame #7: 0x0000000040466914
libpam.so.6`openpam_dispatch(pamh=3D<unavailable>, primitive=3D<unavailable=
>,
flags=3D<unavailable>) + 1256 at openpam_dispatch.c:182
    frame #8: 0x0000000040463b54
libpam.so.6`pam_setcred(pamh=3D0x0000000040a44000, flags=3D2) + 112 at
pam_setcred.c:66
    frame #9: 0x0000000040b77730 su`main(argc=3D<unavailable>,
argv=3D<unavailable>) + 2280 at su.c:475
    frame #10: 0x0000000040b76da0 su`__start + 360
    frame #11: 0x0000000040415658 ld-elf.so.1`.rtld_start + 24 at
rtld_start.S:41
(lldb) down
frame #0: 0x000000004054b1dc
libc.so.7`__je_arena_tcache_fill_small(tsdn=3D<unavailable>, arena=3D<unava=
ilable>,
tbin=3D<unavailable>, binind=3D<unavailable>, prof_accumbytes=3D<unavailabl=
e>) + 212
at jemalloc_arena.c:2442
   2439                             true);
   2440                 }
   2441                 /* Insert such that low regions get used first. */
-> 2442                 *(tbin->avail - nfill + i) =3D ptr;
   2443         }
   2444         if (config_stats) {
   2445                 bin->stats.nmalloc +=3D i;

So not as close but also during cleanup (of the parent
process of the fork) during PAM_END() before exit.

See also bugzilla 217138.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-217239-8>