From owner-freebsd-current@freebsd.org Fri Sep 13 18:13:08 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6D8EBF7458 for ; Fri, 13 Sep 2019 18:13:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46VNzS2Mzgz3xqP; Fri, 13 Sep 2019 18:13:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-qt1-x843.google.com with SMTP id g16so1413199qto.9; Fri, 13 Sep 2019 11:13:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=R0jxhCKH/ykKWRzx1nsZPZVQKqlCqmbFf6nq9D07Rvc=; b=rIKvbvDmFxIpKY1c3bg8XRfFW7LHSFcr+COjLUbZoUd5/tXQMmOD0q3BudZD3vz1Pj GaS0t93yplaY9AD8V7pV7nq5gNx5DuYChYwryHHpAh1it0mx8bcbYWhWHG81WLyCB0nM P/xWG2+OoyfYUXFkMg7Wv4qAd/tnzF2wZ2grnj2igTntLDOHCyyl+GgXGqe9AcupSN73 qakFqWVHvPJLHW2BcTP1nv8fzM5AViB7FcLj/yslhy0ZDOE+vz6yngrXTMT7Qch4mx68 rxMvyDpDtpAez+ONkNFxtSSkcvqCZiHL5Zgq5nTe2Er3IF/oUw2IRAEAERupGt9JOJRX kNww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=R0jxhCKH/ykKWRzx1nsZPZVQKqlCqmbFf6nq9D07Rvc=; b=FaPGw8ip4LF9csKuN55RrpgAJRIwYrqBaXw9nevvrXwl9o2DkGQ4jGzGji6i3yKb8u Q1LOm696XVRguAm0f3ILy10TYBI5a9J9WUuh+83wAAdUvMUeIoplL3/Xn6K//1yegA5i DhqX6wWD3aeoVFW68moDfug87nA1NFuQqp/GrF0KEH2KZe33qWeGUGDN63vYF07XuXfM lKLqqcdS2ns4zOJR198O2WMKWKk4mq/nULigqKdANyJp6D0i5Hx8BcY4NcvXHN3XjXR4 0JUJ+FmHYRmNgYSiH3+tz3RSR8TcB+XJPdA+Entasg4HSRiAZezWHH6STwvdd1sPdvmy k+3A== X-Gm-Message-State: APjAAAVilOikH1XBiQJGRwDJbdgN+tfBeOwm46HLlgQjd3FqLN+XYTd6 tnFfGGb64SDAMS3c5oYX/5s9xujAs5hHwD0//6/mVwUnoWk= X-Google-Smtp-Source: APXvYqzL2HOnOw5TETrgwQE87xHs5U1IuVhM76OT4tgZ1ELif7X1E7TRuFBNbjyaEhibUs2koJIqtAMGa9pwLX/roQk= X-Received: by 2002:ac8:845:: with SMTP id x5mr4477819qth.42.1568398386792; Fri, 13 Sep 2019 11:13:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ryan Stone Date: Fri, 13 Sep 2019 14:12:56 -0400 Message-ID: Subject: Re: Deadlock involving truss -f, pdfork() and wait4() To: Mariusz Zaborski Cc: FreeBSD Current Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46VNzS2Mzgz3xqP X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-6.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2019 18:13:08 -0000 This gets me a little further but now the wait4 call by the parent never reaps the child and instead blocks forever: # truss -f ./pdfork -p 706: mmap(0x0,131072,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34361 970688 (0x800221000) 706: issetugid() = 0 (0x0) 706: open("/etc/libmap.conf",O_RDONLY|O_CLOEXEC,010440020) = 3 (0x3) 706: fstat(3,{ mode=-rw-r--r-- ,inode=241058,size=47,blksize=32768 }) = 0 (0x0 ) 706: read(3,"# $FreeBSD$\nincludedir /usr/loc"...,47) = 47 (0x2f) 706: close(3) = 0 (0x0) 706: open("/usr/local/etc/libmap.d",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC, 0165) ERR#2 'No such file or directory' 706: open("/var/run/ld-elf.so.hints",O_RDONLY|O_CLOEXEC,010002355) = 3 (0x3) 706: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0A\0\0\0\0"...,128) = 128 (0x80) 706: fstat(3,{ mode=-r--r--r-- ,inode=72,size=193,blksize=4096 }) = 0 (0x0) 706: pread(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,65,0x80) = 65 (0x41) 706: close(3) = 0 (0x0) 706: open("/lib/libc.so.7",O_RDONLY|O_CLOEXEC|O_VERIFY,057400) = 3 (0x3) 706: fstat(3,{ mode=-r--r--r-- ,inode=402754,size=1915744,blksize=32768 }) = 0 (0x0) 706: mmap(0x0,4096,PROT_READ,MAP_PRIVATE|MAP_PREFAULT_READ,3,0x0) = 3436210176 0 (0x800241000) 706: mmap(0x0,4169728,PROT_NONE,MAP_GUARD,-1,0x0) = 34362105856 (0x800242000) 706: mmap(0x800242000,524288,PROT_READ,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE|MAP_PR EFAULT_READ,3,0x0) = 34362105856 (0x800242000) 706: mmap(0x8002c2000,1327104,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NO CORE|MAP_PREFAULT_READ,3,0x80000) = 34362630144 (0x8002c2000) 706: mmap(0x800406000,61440,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_PRE FAULT_READ,3,0x1c4000) = 34363957248 (0x800406000) 706: mmap(0x800415000,2256896,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_A NON,-1,0x0) = 34364018688 (0x800415000) 706: munmap(0x800241000,4096) = 0 (0x0) 706: close(3) = 0 (0x0) 706: mprotect(0x80040c000,36864,PROT_READ) = 0 (0x0) 706: sysarch(AMD64_SET_FSBASE,0x7fffffffe110) = 0 (0x0) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: mprotect(0x80040c000,36864,PROT_READ|PROT_WRITE) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: mprotect(0x80040c000,36864,PROT_READ) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: readlink("/etc/malloc.conf",0x7fffffffd830,1024) ERR#2 'No such file or d irectory' 706: issetugid() = 0 (0x0) 706: __sysctl(0x7fffffffd7e0,0x2,0x7fffffffd7dc,0x7fffffffd7d0,0x0,0x0) = 0 (0 x0) 706: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21 ),-1,0x0) = 34368126976 (0x800800000) 706: cap_getmode({ 0 }) = 0 (0x0) 706: open("/dev/hpet0",O_RDONLY,00) = 3 (0x3) 706: mmap(0x0,4096,PROT_READ,MAP_SHARED,3,0x0) = 34362101760 (0x800241000) 706: close(3) = 0 (0x0) 706: mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12), -1,0x0) = 34366275584 (0x80063c000) 706: mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21 ),-1,0x0) = 34370224128 (0x800a00000) 706: mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),- 1,0x0) = 34366308352 (0x800644000) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: mprotect(0x203000,4096,PROT_READ) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: pdfork(0x7fffffffeba8,0x0) = 708 (0x2c4) 708: 708: nanosleep({ 1.000000000 }) = 0 (0x0) 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 708: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 708: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 708: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) load: 0.27 cmd: pdfork 706 [wait] 18.20r 0.00u 0.00s 0% 2072k # ps PID TT STAT TIME COMMAND 698 u0 Is 0:00.01 login [pam] (login) 700 u0 I 0:00.04 -sh (sh) 705 u0 I+ 0:00.10 truss -f ./pdfork -p 706 u0 IX+ 0:00.01 ./pdfork -p 708 u0 Z+ 0:00.00 714 0 S 0:00.01 su 715 0 S 0:00.01 su (sh) 716 0 R+ 0:00.00 ps # procstat -kk 708 PID TID COMM TDNAME KSTACK # procstat -kk 706 PID TID COMM TDNAME KSTACK 706 100095 pdfork - mi_switch+0x174 sleepq_switch+0x110 sleepq_catch_signals+0x417 slee pq_wait_sig+0xf _sleep+0x2d0 kern_wait6+0x48f sys_wait4+0x78 amd64_syscall+0x337 fast_syscall_common+0x101 # procstat -kk 705 PID TID COMM TDNAME KSTACK 705 100077 truss - mi_switch+0x174 sleepq_switch+0x110 sleepq_catch_signals+0x417 slee pq_wait_sig+0xf _sleep+0x2d0 kern_wait6+0x48f sys_wait6+0x9f amd64_syscall+0x337 fast_syscall_common+0x101 On Fri, Sep 13, 2019 at 10:05 AM Mariusz Zaborski wrote: > > Hello Ryan, > > Can you verify is this patch fix your issue: > https://reviews.freebsd.org/D20362 > > Thanks, > Mariusz > > On Thu, 12 Sep 2019 at 21:37, Ryan Stone wrote: > > > > I've hit an issue with a simple use of pdfork(). I have a process > > that calls pdfork() and the parent immediately does a wait4() on the > > child pid. This works fine under normal conditions, but if the parent > > is run under truss -f, the three processes deadlock. If I switch out > > pdfork() for fork(), the deadlock does not occur. > > > > This C file demonstrates the issue: > > > > https://people.freebsd.org/~rstone/pdfork.c > > > > If I run "truss -f ./pdfork", which uses fork(), it completes within a > > second. If I run "truss -f ./pdfork -p", which uses pdfork(), the > > processes deadlock. If I run "./pdfork -p" without truss, it > > completes normally. > > > > procstat reports the following kernel stacks: > > > > 27572 102043 truss - mi_switch+0xe2 > > sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf > > kern_wait6+0x695 sys_wait6+0x9f amd64_syscall+0x36e > > fast_syscall_common+0x101 > > 27573 102469 pdfork - mi_switch+0xe2 > > sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf > > kern_wait6+0x695 sys_wait4+0x78 amd64_syscall+0x36e > > fast_syscall_common+0x101 > > 27574 102053 pdfork - mi_switch+0xe2 > > thread_suspend_switch+0xd4 ptracestop+0x13b fork_return+0x14e > > fork_exit+0x83 fork_trampoline+0xe > > > > As near as I can tell, truss is blocked waiting for ptrace events, the > > parent process is blocked in wait4, and the child process is perhaps > > waiting for its parent to exit the kernel so it can send the ptrace > > event? > > > > I really don't see anything obvious in the pdfork() code path that > > would cause this to happen when fork() doesn't have the problem. It > > may be that pdfork() just changes the timing enough to expose a latent > > bug. > > > > I'm seeing this on a recentish current (r351363). > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"