Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Sep 2019 14:12:56 -0400
From:      Ryan Stone <rysto32@gmail.com>
To:        Mariusz Zaborski <oshogbo@freebsd.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Deadlock involving truss -f, pdfork() and wait4()
Message-ID:  <CAFMmRNzhsk-1BO6iyJz7_qtgnRqC%2BfPpYSuQnxEZyno%2BO1K71Q@mail.gmail.com>
In-Reply-To: <CAGOYWV-v1-F3dWQovdj5TZMZeBFifDm020AXnPYdZSHPKLG4SA@mail.gmail.com>
References:  <CAFMmRNwRWZsQhMSY7MwS2kU-qNJ0%2BPk676mi59_Y3QsbcEmPVQ@mail.gmail.com> <CAGOYWV-v1-F3dWQovdj5TZMZeBFifDm020AXnPYdZSHPKLG4SA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This gets me a little further but now the wait4 call by the parent
never reaps the child and instead blocks forever:

# truss -f ./pdfork -p
 706: mmap(0x0,131072,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34361
970688 (0x800221000)
 706: issetugid()                               = 0 (0x0)
 706: open("/etc/libmap.conf",O_RDONLY|O_CLOEXEC,010440020) = 3 (0x3)
 706: fstat(3,{ mode=-rw-r--r-- ,inode=241058,size=47,blksize=32768 }) = 0 (0x0
)
 706: read(3,"# $FreeBSD$\nincludedir /usr/loc"...,47) = 47 (0x2f)
 706: close(3)                                  = 0 (0x0)
 706: open("/usr/local/etc/libmap.d",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,
0165) ERR#2 'No such file or directory'
 706: open("/var/run/ld-elf.so.hints",O_RDONLY|O_CLOEXEC,010002355) = 3 (0x3)
 706: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0A\0\0\0\0"...,128) = 128 (0x80)
 706: fstat(3,{ mode=-r--r--r-- ,inode=72,size=193,blksize=4096 }) = 0 (0x0)
 706: pread(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,65,0x80) = 65 (0x41)
 706: close(3)                                  = 0 (0x0)
 706: open("/lib/libc.so.7",O_RDONLY|O_CLOEXEC|O_VERIFY,057400) = 3 (0x3)
 706: fstat(3,{ mode=-r--r--r-- ,inode=402754,size=1915744,blksize=32768 }) = 0
(0x0)
 706: mmap(0x0,4096,PROT_READ,MAP_PRIVATE|MAP_PREFAULT_READ,3,0x0) = 3436210176
0 (0x800241000)
 706: mmap(0x0,4169728,PROT_NONE,MAP_GUARD,-1,0x0) = 34362105856 (0x800242000)
 706: mmap(0x800242000,524288,PROT_READ,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE|MAP_PR
EFAULT_READ,3,0x0) = 34362105856 (0x800242000)
 706: mmap(0x8002c2000,1327104,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NO
CORE|MAP_PREFAULT_READ,3,0x80000) = 34362630144 (0x8002c2000)
 706: mmap(0x800406000,61440,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_PRE
FAULT_READ,3,0x1c4000) = 34363957248 (0x800406000)
 706: mmap(0x800415000,2256896,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_A
NON,-1,0x0) = 34364018688 (0x800415000)
 706: munmap(0x800241000,4096)                  = 0 (0x0)
 706: close(3)                                  = 0 (0x0)
 706: mprotect(0x80040c000,36864,PROT_READ)     = 0 (0x0)
 706: sysarch(AMD64_SET_FSBASE,0x7fffffffe110)  = 0 (0x0)
 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 706: mprotect(0x80040c000,36864,PROT_READ|PROT_WRITE) = 0 (0x0)
 706: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 706: mprotect(0x80040c000,36864,PROT_READ)     = 0 (0x0)
 706: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
 706: readlink("/etc/malloc.conf",0x7fffffffd830,1024) ERR#2 'No such file or d
irectory'
 706: issetugid()                               = 0 (0x0)
 706: __sysctl(0x7fffffffd7e0,0x2,0x7fffffffd7dc,0x7fffffffd7d0,0x0,0x0) = 0 (0
x0)
 706: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21
),-1,0x0) = 34368126976 (0x800800000)
 706: cap_getmode({ 0 })                        = 0 (0x0)
 706: open("/dev/hpet0",O_RDONLY,00)            = 3 (0x3)
 706: mmap(0x0,4096,PROT_READ,MAP_SHARED,3,0x0) = 34362101760 (0x800241000)
 706: close(3)                                  = 0 (0x0)
 706: mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),
-1,0x0) = 34366275584 (0x80063c000)
 706: mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21
),-1,0x0) = 34370224128 (0x800a00000)
 706: mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-
1,0x0) = 34366308352 (0x800644000)
 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 706: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 706: mprotect(0x203000,4096,PROT_READ)         = 0 (0x0)
 706: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
 706: pdfork(0x7fffffffeba8,0x0)                = 708 (0x2c4)
 708: <new process>
 708: nanosleep({ 1.000000000 })                = 0 (0x0)
 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 708: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 708: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS
Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
 708: sigprocmask(SIG_SETMASK,{ },0x0)          = 0 (0x0)
load: 0.27  cmd: pdfork 706 [wait] 18.20r 0.00u 0.00s 0% 2072k

# ps
PID TT  STAT    TIME COMMAND
698 u0  Is   0:00.01 login [pam] (login)
700 u0  I    0:00.04 -sh (sh)
705 u0  I+   0:00.10 truss -f ./pdfork -p
706 u0  IX+  0:00.01 ./pdfork -p
708 u0  Z+   0:00.00 <defunct>
714  0  S    0:00.01 su
715  0  S    0:00.01 su (sh)
716  0  R+   0:00.00 ps
# procstat -kk 708
 PID    TID COMM                TDNAME              KSTACK
# procstat -kk 706
 PID    TID COMM                TDNAME              KSTACK
 706 100095 pdfork              -                   mi_switch+0x174
sleepq_switch+0x110 sleepq_catch_signals+0x417 slee
pq_wait_sig+0xf _sleep+0x2d0 kern_wait6+0x48f sys_wait4+0x78
amd64_syscall+0x337 fast_syscall_common+0x101
# procstat -kk 705
 PID    TID COMM                TDNAME              KSTACK
 705 100077 truss               -                   mi_switch+0x174
sleepq_switch+0x110 sleepq_catch_signals+0x417 slee
pq_wait_sig+0xf _sleep+0x2d0 kern_wait6+0x48f sys_wait6+0x9f
amd64_syscall+0x337 fast_syscall_common+0x101

On Fri, Sep 13, 2019 at 10:05 AM Mariusz Zaborski <oshogbo@freebsd.org> wrote:
>
> Hello Ryan,
>
> Can you verify is this patch fix your issue:
> https://reviews.freebsd.org/D20362
>
> Thanks,
> Mariusz
>
> On Thu, 12 Sep 2019 at 21:37, Ryan Stone <rysto32@gmail.com> wrote:
> >
> > I've hit an issue with a simple use of pdfork().  I have a process
> > that calls pdfork() and the parent immediately does a wait4() on the
> > child pid.  This works fine under normal conditions, but if the parent
> > is run under truss -f, the three processes deadlock.  If I switch out
> > pdfork() for fork(), the deadlock does not occur.
> >
> > This C file demonstrates the issue:
> >
> > https://people.freebsd.org/~rstone/pdfork.c
> >
> > If I run "truss -f ./pdfork", which uses fork(), it completes within a
> > second.  If I run "truss -f ./pdfork -p", which uses pdfork(), the
> > processes deadlock.  If I run "./pdfork -p" without truss, it
> > completes normally.
> >
> > procstat reports the following kernel stacks:
> >
> > 27572 102043 truss               -                   mi_switch+0xe2
> > sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf
> > kern_wait6+0x695 sys_wait6+0x9f amd64_syscall+0x36e
> > fast_syscall_common+0x101
> > 27573 102469 pdfork              -                   mi_switch+0xe2
> > sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf
> > kern_wait6+0x695 sys_wait4+0x78 amd64_syscall+0x36e
> > fast_syscall_common+0x101
> > 27574 102053 pdfork              -                   mi_switch+0xe2
> > thread_suspend_switch+0xd4 ptracestop+0x13b fork_return+0x14e
> > fork_exit+0x83 fork_trampoline+0xe
> >
> > As near as I can tell, truss is blocked waiting for ptrace events, the
> > parent process is blocked in wait4, and the child process is perhaps
> > waiting for its parent to exit the kernel so it can send the ptrace
> > event?
> >
> > I really don't see anything obvious in the pdfork() code path that
> > would cause this to happen when fork() doesn't have the problem.  It
> > may be that pdfork() just changes the timing enough to expose a latent
> > bug.
> >
> > I'm seeing this on a recentish current (r351363).
> > _______________________________________________
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNzhsk-1BO6iyJz7_qtgnRqC%2BfPpYSuQnxEZyno%2BO1K71Q>