Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Feb 2016 16:38:15 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Eric van Gyzen <vangyzen@FreeBSD.org>
Cc:        threads@freebsd.org, arch@freebsd.org
Subject:   Re: libthr shared locks
Message-ID:  <20160213143815.GB91220@kib.kiev.ua>
In-Reply-To: <56BE69B8.9020808@FreeBSD.org>
References:  <20151223172528.GT3625@kib.kiev.ua> <56BE69B8.9020808@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Thank you for the review.

On Fri, Feb 12, 2016 at 05:24:40PM -0600, Eric van Gyzen wrote:
> * thr_mutex.c line 116
> 
> The parentheses around (m) can be removed now.
Done.

> 
> * thr_mutex.c lines 331-333
> 
>     m->m_qe.tqe_prev =
>         TAILQ_NEXT(last_priv, m_qe)->
>         m_qe.tqe_prev;                         
> 
> This seems to read the m_qe.tqe_prev field from a shared mutex.  Is that
> safe?  It seems like a race.  The following would seem more direct,
> avoiding the shared mutex:
> 
>     m->m_qe.tqe_prev = &TAILQ_NEXT(last_prev, m_qe);
This is indeed racy, relying on the parent process not unlocking the
shared mutexes. But after your note, I think that the whole list
iteration is unsafe, because of the same unlocking in parent.

So in fact I have to return to what I did in the previous version of the
patch, where I kept two queues for each type of the mutexes, one total,
and one private.  The private queue keeps the order of the total, so that
reinitialization of the total queue on the fork is correct for ceiling
ordering.

> * thr_mutex.c line 354
> 
>     *(q)->tqh_last = last_priv;
> 
> This seems to modify the tqe_next field in a shared mutex.  Is that
> safe?  Furthermore, that mutex was/is the last on the list, but we seem
> to set its tqe_next pointer to an earlier element, creating a cycle in
> the list.
This code is gone due to the previous note.

> 
> * thr_mutex.c line 451
> 
> __pthread_mutex_trylock() calls __thr_pshared_offpage() twice [for
> pshared mutexes].  You could eliminate one call by moving
> mutex_trylock_common() inline.
I see.  In fact, I really wanted to eliminate the CHECK_AND_INIT_MUTEX.
See my attempt in the updated patch.

> 
> * thr_pshared.c line 165
> 
> res = NULL seems unnecessary.
Done.

> 
> * thr_pshared.c
> 
> In __thr_pshared_offpage(), can pshared_lookup() fail in the !doalloc
> case?  pshared_hash seems to be an authority, not just an optimization. 
> I ask so that I can understand the code and more effectively review it. 
> In particular, if pshared_lookup() fails and UMTX_SHM_LOOKUP succeeds,
> is it possible to get multiple mappings for the same shm object?

Do you mean, is it possible (and if yes, is it harmful) to have several
virtual addresses in one process for the same key ?

I think there is a bug in pshared_insert() where I preferred new val
over the hashed val (mapping address). In the situation where there are
two threads trying to lock the same object, it may cause first thread to
operate on unmapped address. The scenario is:
- both threads do not find the key in hash;
- first thread performs pshared_insert() and returns the address;
- second thread performs pshared_insert() and replaces the address
  in hash, also invalidating the address still used by first thread.
I changed the pshared_insert() to keep the existing hash value, and
unmap the new val.

That said, I think it is probably not very harmful to have different
callers to operate on different mappings for the same key (of course,
the backing page must be shared).  I can only think of the problems
due to locked mutex list manipulation functions failing if the address
of element changed.  I said that this should not be very harmful since
I suspect that only list invariants checks would fail, and not actual
removal, but I did not checked it.

For now, I think that the invariant I have to ensure is that calls to
lock and unlock from the same thread for the same key get the same
offpage virtual address.

> 
> * thr_barrier.c line 110
> 
>     if (bar == NULL)
>         return (EFAULT);
> 
> POSIX does not mention EFAULT.  Should we return ENOMEM, or can we
> "extend" the standard?  (Ditto for all other _init functions.)
I think EFAULT is permitted by the 'undefined behaviour' clause, since
the primary cause for the offpage allocation failure is wrong address
passed to the __thr_pshared_offpage(). In other words, if an implementation
did not used offpage, but directly write something to the *m memory, it
would get SIGSEGV.

As noted above, malloc() failure would also lead to EFAULT (and this is
what probably caused your question), but I would consider this improbable,
while invalid address passed to the init function a more likely cause.
I do not want to complicate code to distinguish the cases.

> 
> * thr_cond.c line 106
> 
> You could use cattr instead of the ?: expression.
Done.

> 
> * thr_rwlock.c
> 
> rwlock_init() assumes that __thr_pshared_offpage() does not fail.
Done, I return EFAULT in case offpage allocation fails.  If you object
still against other EFAULTs, I would change all of them to be consistent.

Updated patch is at
https://www.kib.kiev.ua/kib/pshared/pshared.2.patch



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160213143815.GB91220>