Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Jun 2018 02:04:32 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alexander Motin <mav@FreeBSD.org>, Doug Rabson <dfr@rabson.org>
Subject:   Re: nfsd kernel threads won't die via SIGKILL
Message-ID:  <YTOPR0101MB095352561BFA6428ACBB19FDDD4A0@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <20180624093330.GX2430@kib.kiev.ua>
References:  <YTXPR0101MB0959B4E960B85ACAC07B819DDD740@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>, <20180624093330.GX2430@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
>On Sat, Jun 23, 2018 at 09:03:02PM +0000, Rick Macklem wrote:
>> During testing of the pNFS server I have been frequently killing/restart=
ing the nfsd.
>> Once in a while, the "slave" nfsd process doesn't terminate and a "ps ax=
Hl" shows:
>>   0 48889     1   0   20  0  5884  812 svcexit  D     -   0:00.01 nfsd: =
server
>>   0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: =
server
>> ... more of the same
>>   0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: =
server
>>   0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   1:51.78 nfsd: =
server
>>   0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   2:27.75 nfsd: =
server
>>
>> You can see that the top thread (the one that was created with the proce=
ss) is
>> stuck in "D"  on "svcexit".
>> The rest of the threads are still servicing NFS RPCs. If you still have =
an NFS mount >>on
>> the server, the mount continues to work and the CPU time for the last tw=
o threads
>> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the pro=
cess and
>> these threads (created by kthread_add) are here, but the
>> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other=
 >>threads.
>>
>>                        if (ismaster || (!ismaster &&
>> 1207                              grp->sg_threadcount > grp->sg_minthrea=
ds))
>> 1208                                  error =3D cv_timedwait_sig(&st->st=
_cond,
>> 1209                                      &grp->sg_lock, 5 * hz);
>> 1210                          else
>> 1211                                  error =3D cv_wait_sig(&st->st_cond=
,
>> 1212                                      &grp->sg_lock);
>>
>> The top thread (referred to in svc.c as "ismaster" did return from here =
with EINTR
>> and has now done an msleep() here, waiting for the other threads to term=
inate.
>>
>>        /* Waiting for threads to stop. */
>> 1387          for (g =3D 0; g < pool->sp_groupcount; g++) {
>> 1388                  grp =3D &pool->sp_groups[g];
>> 1389                  mtx_lock(&grp->sg_lock);
>> 1390                  while (grp->sg_threadcount > 0)
>> 1391                          msleep(grp, &grp->sg_lock, 0, "svcexit", 0=
);
>> 1392                  mtx_unlock(&grp->sg_lock);
>> 1393          }
>>
>> Although I can't be sure if this patch has fixed the problem because it =
happens
>> intermittently, I have not seen the problem since applying this patch:
>> --- rpc/svc.c.sav     2018-06-21 22:52:11.623955000 -0400
>> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
>> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
>>               grp =3D &pool->sp_groups[g];
>>               mtx_lock(&grp->sg_lock);
>>               while (grp->sg_threadcount > 0)
>> -                     msleep(grp, &grp->sg_lock, 0, "svcexit", 0);
>> +                     msleep(grp, &grp->sg_lock, 0, "svcexit", 1);
>>               mtx_unlock(&grp->sg_lock);
>>       }
>>  }
>>
>> As you can see, all it does is add a timeout to the msleep().
>> I am not familiar with the signal delivery code in sleepqeue, so it prob=
ably
>> isn't correct, but my theory is alonge the lines of...
>>
>> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
>> and if that happens before the other threads return EINTR from cv_wait_s=
ig(),
>> they no longer do so?
>> And I thought that waking up from the msleep() via timeouts would maybe =
allow
>> the other threads to return EINTR from cv_wait_sig()?
>>
>> Does this make sense? rick
>> ps: I'll post if I see the problem again with the patch applied.
>> pss: This is a single core i386 system, just in case that might affect t=
his.
>
>No, the patch does not make sense. I think it was just coincidental that
>with the patch you did not get the hang.
>
>Signals are delivered to a thread, which should take the appropriate
>actions. For the kernel process like rpc pool, the signals are never
>delivered, they are queued in the randomly selected thread' signal queue
>and sit there. The interruptible sleeps are aborted in the context
>of that thread, but nothing else happens. So if you need to make svc
>pools properly killable, all threads must check at least for EINTR and
>instruct other threads to exit as well.
I'm not sure I understand what the "randomly selected thread signal queue" =
means,
but it seems strange that this usually works. (The code is at least 10years=
 old.
Originally committed by dfr@. I've added him to the cc list in case he unde=
rstands
this?
Is it that, usually, the threads will all return EINTR before the master on=
e gets
to where the msleep() happens if the count is > 0?

>Your description at the start of the message of the behaviour after
>SIGKILL, where other threads continued to serve RPCs, exactly matches
>above explanation. You need to add some global 'stop' flag, if it is not
>yet present, and recheck it after each RPC handled. Any thread which
>notes EINTR or does a direct check for the pending signal, should set
>the flag and wake up every other thread in the pool.
Ok, I'll code up a patch with a global "stop" flag and test it for a while.
If it seems ok, I'll put it up in phabricator and ask you to review it.

Thanks, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTOPR0101MB095352561BFA6428ACBB19FDDD4A0>