Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Mar 2017 15:47:32 -0800
From:      Bryan Drewery <bdrewery@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        current@FreeBSD.org
Subject:   Re: r314708: panic: tdsendsignal: ksi on queue
Message-ID:  <5e56d3d6-9e92-1b50-f720-ff16c58b74dd@FreeBSD.org>
In-Reply-To: <20170309231151.GA49720@stack.nl>
References:  <d510a9da-8293-ba22-a1e6-75b3ea7ffa1d@FreeBSD.org> <20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--AhJqqOKViOhgPgqtrJdCTlNNlw6FLnGlg
Content-Type: multipart/mixed; boundary="LpE6BXWECbPK2gqQEwJLJ73iWeEc1UPF5";
 protected-headers="v1"
From: Bryan Drewery <bdrewery@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: current@FreeBSD.org
Message-ID: <5e56d3d6-9e92-1b50-f720-ff16c58b74dd@FreeBSD.org>
Subject: Re: r314708: panic: tdsendsignal: ksi on queue
References: <d510a9da-8293-ba22-a1e6-75b3ea7ffa1d@FreeBSD.org>
 <20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl>
In-Reply-To: <20170309231151.GA49720@stack.nl>

--LpE6BXWECbPK2gqQEwJLJ73iWeEc1UPF5
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
>> Yes, there is a race, apparently, with the child zombie still not fini=
shing
>> sending the SIGCHLD to the parent and parent exiting.  The following s=
hould
>> fix the issue, but I do not think that reproducing the problem is easy=
=2E
>=20
>> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
>> index c524fe5df37..ba5ff84e9de 100644
>> --- a/sys/kern/kern_exit.c
>> +++ b/sys/kern/kern_exit.c
>> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>>  {
>>  	struct proc *p, *nq, *q, *t;
>>  	struct thread *tdt;
>> +	ksiginfo_t ksi;
>> =20
>>  	mtx_assert(&Giant, MA_NOTOWNED);
>>  	KASSERT(rval =3D=3D 0 || signo =3D=3D 0, ("exit1 rv %d sig %d", rval=
, signo));
>> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>>  			proc_reparent(q, q->p_reaper);
>>  			if (q->p_state =3D=3D PRS_ZOMBIE) {
>>  				PROC_LOCK(q->p_reaper);
>> -				pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
>> +				if (q->p_ksi !=3D NULL) {
>> +					ksiginfo_init(&ksi);
>> +					ksiginfo_copy(q->p_ksi, &ksi);
>> +				}
>> +				pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=3D
>> +				    NULL ? &ksi : NULL);
>>  				PROC_UNLOCK(q->p_reaper);
>>  			}
>>  		} else {

I just got something weird with this patch that wasn't happening before:

/usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
exp-10amd64 -p commit -z test devel/ccache
[poudriere runs and completes with exit status 0]
> time: command terminated abnormally                                    =
                    =20
>        28.08 real         9.92 user        10.38 sys                   =
                    =20
>      23464  maximum resident set size                                  =
                    =20
>       4996  average shared memory size                                 =
                    =20
>         88  average unshared data size                                 =
                    =20
>        127  average unshared stack size                                =
                    =20
>     282705  page reclaims                                              =
                    =20
>       5623  page faults                                                =
                    =20
>          0  swaps                                                      =
                    =20
>       2673  block input operations                                     =
                    =20
>       4836  block output operations                                    =
                    =20
>         33  messages sent                                              =
                    =20
>          0  messages received                                          =
                    =20
>         37  signals received                                           =
                    =20
>      11226  voluntary context switches                                 =
                    =20
>        780  involuntary context switches                               =
                    =20
> zsh: alarm      /usr/bin/time -l src/bin/poudriere -e /usr/local/etc te=
stport -j exp-10amd64
exit status: 142 (SIGALRM).

I don't see time(1) using SIGALRM or proc reaper at all.

Rerunning it, and trying other simpler test cases, does not produce the
same result.  It may be some race unrelated to this patch, dunno.


>=20
> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo=

> should always be the zombie's p_ksi; otherwise, the siginfo may be lost=

> if there are too many signals pending for the target process or in the
> system. If the siginfo is lost and the reaper normally passes si_pid to=

> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
> will remain until the reaper terminates.
>=20
> Conceptually the siginfo is sent to one process at a time only, so the
> bug is an artifact of the implementation. Perhaps the piece of code
> added in r309886 can be moved or the ksiginfo can be removed from the
> parent's queue.
>=20
> If such a fix is not possible, it may be better to send a bare SIGCHLD
> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
> in this situation and document that reapers must use WAIT_ANY or P_ALL.=

> (However, compared to the pre-r309886 situation they can still use
> SIGCHLD to get notified when to call waitpid() or similar.)
>=20


--=20
Regards,
Bryan Drewery


--LpE6BXWECbPK2gqQEwJLJ73iWeEc1UPF5--

--AhJqqOKViOhgPgqtrJdCTlNNlw6FLnGlg
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBAgAGBQJYwemUAAoJEDXXcbtuRpfPV1YH/2vbQvMKbamftIZe8j2ASUGF
rdSdZMsVIE+zWnCckNEYcw6aPqp4YrRUH0lJZlm6Z+R9/JU8iTDUqySVRFa3jbRB
o/cEqqb/Af3Frof50ASKk0H66+mkL23NVhkU6a/jHNbtf22BPDSjk099H/fOyyX5
T+eqaGjycS0jEs9wo6RyODItpTUplG68JRCwnESy1xgnuNhLQJGRBaqS3OIx7v0I
oR1I6wYBJQoYGb/tqEUyWNJ6Myb7LeLRfwVxhCjRJKYfipzdOdvZQb8LE/85Nott
a6i/sm9cilxF1rf97zONKf20US7W2Vn4ukce3op9BxjDwoY8cHhBfYjFIlc/fBY=
=kb7q
-----END PGP SIGNATURE-----

--AhJqqOKViOhgPgqtrJdCTlNNlw6FLnGlg--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5e56d3d6-9e92-1b50-f720-ff16c58b74dd>