Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Dec 2018 22:45:18 +0000
From:      bugzilla-noreply@freebsd.org
To:        testing@freebsd.org
Subject:   [Bug 233646] Flakey test case: bin.sh.builtins.functional_test.kill1
Message-ID:  <bug-233646-32464-oabj91MzZ1@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-233646-32464@https.bugs.freebsd.org/bugzilla/>
References:  <bug-233646-32464@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D233646

Jilles Tjoelker <jilles@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|New                         |Open

--- Comment #3 from Jilles Tjoelker <jilles@FreeBSD.org> ---
In the below text, wait(2) means any wait system call; sh(1) uses wait3() w=
hich
appears as wait4() in ktrace.

The test case is meant to test that a terminated, wait(2)ed for but not
wait(1)ed for job can be passed to kill(1) without error (the command will =
do
nothing). The part with the second background job, p2 and wait is intended =
to
wait for the first background job to terminate and be wait(2)ed for, without
taking excessive time or wait(1)ing for it (which would make the %1
specification invalid). If the first background job is slow to terminate, t=
he
kill command will do something but this is harmless. If the first background
job terminates but the kernel has not returned it yet via wait(2), the kill
command will kill a zombie which per POSIX does nothing successfully.

I noticed that the problem is quickly reproduced on head using a loop like
  while sh builtins/kill1.0; do :; done
using head's sh as well as stable/11's sh, while it can run for quite a whi=
le
on stable/11 using stable/11's sh as well as head's sh built against stable=
/11.

Reproducing with ktrace -i seems hard, but reproducing with plain ktrace wo=
rks.
The below ktrace extract seems to indicate that the kernel is at fault,
returning an [ESRCH] error for killing a zombie:

 19837 sh       CALL  fork
 19837 sh       RET   fork 19838/0x4d7e
 19837 sh       CALL  wait4(0xffffffff,0x7fffffffe91c,0x1<WNOHANG>,0)
 19837 sh       RET   wait4 0
 19837 sh       CALL  fork
 19837 sh       RET   fork 19839/0x4d7f
 19837 sh       CALL  sigprocmask(SIG_BLOCK,0x7fffffffe820,0x7fffffffe810)
 19837 sh       RET   sigprocmask 0
 19837 sh       CALL  sigaction(SIGCHLD,0x7fffffffe850,0x7fffffffe830)
 19837 sh       RET   sigaction 0
 19837 sh       CALL  wait4(0xffffffff,0x7fffffffe80c,0x1<WNOHANG>,0)
 19837 sh       RET   wait4 19839/0x4d7f
 19837 sh       CALL  sigaction(SIGCHLD,0x7fffffffe830,0)
 19837 sh       RET   sigaction 0
 19837 sh       CALL  sigprocmask(SIG_SETMASK,0x7fffffffe810,0)
 19837 sh       RET   sigprocmask 0
 19837 sh       CALL  kill(0x4d7e,SIGTERM)
 19837 sh       RET   kill -1 errno 3 No such process

Process ID 18007 has not been returned by a wait4() call, so it must either=
 be
still running or a zombie. In either case, a kill() on it must succeed.

It appears that there is no test that specifically verifies that killing a
zombie process succeeds.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-233646-32464-oabj91MzZ1>