Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Apr 2021 18:41:24 +0000
From:      bugzilla-noreply@freebsd.org
To:        ports-bugs@FreeBSD.org
Subject:   [Bug 255445] lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS
Message-ID:  <bug-255445-7788@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D255445

            Bug ID: 255445
           Summary: lang/python 3.8/3.9 SIGSEV core dumps in libthr
                    TrueNAS
           Product: Ports & Packages
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: New
          Keywords: crash
          Severity: Affects Many People
          Priority: ---
         Component: Individual Port(s)
          Assignee: python@FreeBSD.org
          Reporter: yocalebo@gmail.com
             Flags: maintainer-feedback?(python@FreeBSD.org)
          Assignee: python@FreeBSD.org

Seeing many TrueNAS (previously FreeNAS) users dump core on the main
middlewared process (python) starting with our version 12.0 release.

Relevant OS information:
12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 f2858df162b(HEAD) TRUENAS  amd64

Python versions that experience the core dump:
Python 3.8.7
Python 3.9.4

When initially researching this, I did find a regression with threading and
python 3.8 on freeBSD and was able to resolve that particular problem by
backporting the commits:
https://github.com/python/cpython/commit/4d96b4635aeff1b8ad41d41422ce808ce0=
b971c8
and
https://github.com/python/cpython/commit/9ad58acbe8b90b4d0f2d2e139e38bb5aa3=
2b7fb6.

The reason why I backported those commits is because all of the core dumps =
that
I've analyzed are panic'ing in the same spot (or very close to it). For
example, here are 2 backtraces showing null-ptr dereference.

Core was generated by `python3.8: middlewared'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0 cond_signal_common (cond=3D<optimized out>) at
/truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:457
warning: Source file is more recent than executable.
 457 mp =3D td->mutex_obj;
 [Current thread is 1 (LWP 100733)]
 (gdb) list
 452                _sleepq_unlock(cvp);
 453                    return (0);
 454                }
 455
 456                td =3D _sleepq_first(sq);
 457                mp =3D td->mutex_obj;
 458                cvp->__has_user_waiters =3D _sleepq_remove(sq, td);
 459                if (PMUTEX_OWNER_ID(mp) =3D=3D TID(curthread)) {
 460                    if (curthread->nwaiter_defer >=3D MAX_DEFER_WAITERS=
) {
 461                        _thr_wake_all(curthread->defer_waiters,=20

(gdb) p *td
Cannot access memory at address 0x0


and another one
Core was generated by `python3.8: middlewared'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  cond_signal_common (cond=3D<optimized out>) at
/truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:459warning: Sou=
rce
file is more recent than executable.
459             if (PMUTEX_OWNER_ID(mp) =3D=3D TID(curthread)) {
[Current thread is 1 (LWP 101105)]
(gdb) list
454             }
455
456             td =3D _sleepq_first(sq);
457             mp =3D td->mutex_obj;
458             cvp->__has_user_waiters =3D _sleepq_remove(sq, td);
459             if (PMUTEX_OWNER_ID(mp) =3D=3D TID(curthread)) {
460                     if (curthread->nwaiter_defer >=3D MAX_DEFER_WAITERS=
) {
461                             _thr_wake_all(curthread->defer_waiters,
462                                 curthread->nwaiter_defer);
463                             curthread->nwaiter_defer =3D 0;
(gdb) p *mp
Cannot access memory at address 0x0

I'm trying to instrument a program to "stress" test threading (tearing down=
 and
recreating etc etc) but I've been unsuccessful at tickling this particular
problem. The end-users that have seen this core dump sometimes go 1month +
without a problem. Hoping someone more knowledgeable can at least give me a
pointer or help me figure this one out. I have access to my VM that has all=
 the
relevant core dumps available so if someone needs remote access to it to "p=
oke"
around, please let me know. You can reach me at caleb [at] ixsystems.com

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-255445-7788>