Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Apr 2019 06:56:22 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        "kib@freebsd.org" <kib@freebsd.org>,  "freebsd-current@FreeBSD.org" <freebsd-current@freebsd.org>
Subject:   Re: Do the pidhashtbl_locks added by r340742 need to be sx locks?
Message-ID:  <CAGudoHHANiZQN_2BtYDB9vr2W0rez8pFeZjCbW_tPj3028avcg@mail.gmail.com>
In-Reply-To: <QB1PR01MB35374204FA4B6764F34BFA23DD2F0@QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM>
References:  <QB1PR01MB35371458E0E88BA357633D50DD2F0@QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM> <CAGudoHGOyQVjrxt=555wsC_VQruZBnrA_jMjnXpLN8N9V%2B7-Xw@mail.gmail.com> <QB1PR01MB35374204FA4B6764F34BFA23DD2F0@QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/11/19, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Mateusz Guzik wrote:
>>On 4/11/19, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>> Hi,
>>>
>>> I finally got around to looking at what effect replacing pfind_locked()
>>> with
>>> pfind() has for the NFSv4 client and it is broken.
>>>
>>> The problem is that the NFS code needs to call some variant of "pfind()"
>>> while
>>> holding a mutex lock. The current _pfind() code uses the
>>> pidhashtbl_locks,
>>> which are "sx" locks.
>>>
>>> There are a few ways to fix this:
>>> 1 - Create a custom version of _pfind() for the NFS client with the
>>> sx_X()
>>> calls
>>>       removed, plus replace the locking of allproc_lock with locking of
>>> all
>>> the
>>>       pidhashtbl_locks, so that the "sx" locks are acquired before the
>>> mutex.
>>>       --> Not very efficient, but since it is only done once/sec, I can
>>> live
>>> with it.
>>> 2 - Similar to the above, but still lock the allproc_lock and use a loop
>>> of
>>>      FOREACH_PROC_IN_SYSTEM(p) instead of a hash list for the pid in the
>>>      custom pfind(). (I don't know if this would be preferable to
>>> locking
>>> all
>>>      the pidhashtbl_locks for other users of pfind()?)
>>> 3 - Convert the pidhashtbl_locks to mutexes. Then the NFS client doesn't
>>> need
>>>      to acquire any proc related locks and it just works.
>>>      I can't see anywhere that "sleeps" while holding the
>>> pidhashtbl_locks,
>>> so I
>>>      think they can be converted, although I haven't tried it yet?
>>>
>>> From my perspective, #3 seems the better solution.
>>> What do others think?
>>>
>>
>>Changing the lock type to rwlock may be doable and worthwhile on its own,
>>but I don't think it would constitute the right fix.
>>
>>Preferably there would be an easy to use mechanism which allows
>>registering per-process callbacks. Currently it can be somewhat emulated
>>with EVENTHANDLERs, but it would give calls for all exiting processes, not
>>only the ones of interest. Then there would be no need to periodically
>>scan as you would always get notified on process exit.
> Long ago when I first did the NFSv4 code for OpenBSD2.6, I had a callback
> function
> pointer in "struct proc" which the NFS code set non-null to get a callback.
> { The code still has remnants of that because it still has
> nfscl_cleanup_common(),
>    which was code shared by that callback and this approach which was used
> for
>    the Mac OS X port, where I couldn't change "struct proc". }
> I have never added anything like that for FreeBSD, but I suppose we could
> look
> at doing it that way.
> To be honest, since the current code works fine and can be difficult to test
> well,
> I hesitate to change to using a callback.
>
>>Note the current code does not ref processes it is interested in any
>>manner and just performs a timestamp check to see if it got the one it
>>expected (with pid reuse in mind).
>>
>>So I think a temporary hack which will do the trick will take the current
>>approach further: rely on struct proc being type-stable (i.e. never being
>>freed) and also store the pointer. You can always safely PROC_LOCK it, do
>>checks to see the proc is alive and has the right timestamp...
> Hmm, so you are saying that every element of the proc_zone always has a
> valid
> p_mtx field in it that can be safely PROC_LOCK()'d no matter if the element
> refers to a process at that time?
> I would also need help with the code to determine if the structure refers
> to
> a process that currently exists with the same pid and creation time.

You can just:
PROC_LOCK(p);
if (p->p_state != PRS_NORMAL) { /* it's not the one you want, bail */ }
/* the current timestamp check goes here */
PROC_UNLOCK(p);
>
> I suppose saving "p" with the lock/open owner string and then doing what
> you
> suggest is possible, but it would take some work.
>
> For now, I can just grab all the pidhashtbl_locks once/sec and fix head so
> it works.
>

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHHANiZQN_2BtYDB9vr2W0rez8pFeZjCbW_tPj3028avcg>