Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Jan 2015 13:14:39 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r277199 - in head/sys: fs/devfs kern
Message-ID:  <54B7AF2F.3080802@selasky.org>
In-Reply-To: <20150115115123.GA42409@kib.kiev.ua>
References:  <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/15/15 12:51, Konstantin Belousov wrote:
> On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote:
>>
>> I see no leakage in that case either!
> Because these cases come through destroy_dev().
>
>>
>> Are there more cases which I don't see?
> You are breaking existig devfs KPI by your hack.  You introduce yet another
> reference on the device, which is not supposed to be there.

Hi Konstantin,

I need a non-sleeping way to say a character device is no longer 
supposed to be used and be able to re-use the device name right away 
creating a new device. I guess you got that.

>
> If some code calls delist_dev(), it could be said that it is a contract
> of the new function that destroy_dev() must be called eventually on
> the cdev. Then, the reference could be said to be shared-owned by
> delist_dev() and destroy_dev(). But, for arbitrary devfs user this new
> reference is unacceptable and breaks interface.

delist_dev() changes no references. It can be called multiple times 
even, also inside destroy_devl(). Also I think that the 
"destroy_dev_sched_cbl()" function should call delist_dev() first so 
that we don't have a time from when the "destroy_dev_sched_cbl()" 
function is called where the device entry still exists in devfs mounts 
until the final destroy_devl() is done by a taskqueue.

>>>> In the case of direct free through #1, the reference count is ignored
>>>> and it doesn't matter if it is one or zero. Only in the case of
>>>> destruction through destroy_dev() it matters.
>>>>
>>>> Like the comment says in destroy_devl():
>>>>
>>>> /* Avoid race with dev_rel() */
>>>>
>>>> The problem is that the "cdev->si_refcount" is zero when the initial
>>>> devfs_create() is called. Then one ref is made. When we clear the
>>>> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running
>>>> process to destroy all the FS related structures and the reference count
>>>> goes back to zero when the "cdp" is removed from the "cdevp_list". Then
>>>> the cdev is freed too early. This happens because destroy_devl() is
>>>> dropping the dev_lock() to sleep waiting for pending references.
>>> Basically, this is very good explanation why your delist hack is wrong,
>>> for one of the reason.  Another reason is explained below.
>>> You are trying to cover it with additional reference, but this is wrong
>>> as well.
>>>
>>>>
>>>> Do you see something else?
>>>
>>> I think that what you are trying to do with the CDP_ACTIVE hack is doomed
>>> anyway, because you are allowing for devfs directory to have two entries
>>> with the same name, until the populate loop cleans up the inactive one.
>>> In the meantime, any access to the directory operates on random entry.
>>
>> The entry will not be random, because upon an open() call to a character
>> device, I believe the devfs_lookup() function will be called, which
>> always populate the devfs tree at first by calls to
>> devfs_populate_xxx(). Any delisted devices which don't have the
>> "CDP_ACTIVE" bit set, will never be seen by any open.
> Entry can be random, since after the populate loop is ran, your code in
> other thread could start and create duplicate entry. There is a window
> in the lookup where both directory vnode lock and mount point sx locks
> are dropped. So running the populate does not guarantee anything.

If there is such a race, it is already there! My patch changes nothing 
in that area:

Thread1:
Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes 
waiting for some refs for xxx milliseconds.
Thread2:
Tries to create create a new character device having the same name like 
the one in thread1. Device name duplication check is missed because 
CDP_ACTIVE is cleared. Still thread1 is waiting.
Thread3:
Tries to open character device created by thread2 while thread1 is still 
waiting for some ref held by a userspace app to go away.

This can happen already before my patches! What do you think?

>
>>
>> Regarding leftover filedescriptors which still access the old "cdev"
>> this is not a problem, and these will be closed when the si_refcount
>> goes to zero after the destroy_devl() call.
>>
>>>
>>> The checks for existent names in make_dev() are performed for the reason,
>>> and you makes the rounds to effectively ignore it.
>>>
>>
>> These checks are still correct and don't conflict with my patch from
>> what I can see. Else the existing destroy_devl() would also be broken
>> even before my patch with regard to the "random" selection of character
>> devices at open() from userspace.
>
> The checks are done to avoid duplicate names.  Your patch makes these
> checks ineffective (i.e. broken).

At what level do you mean duplicate names, I don't get this fully? At 
the directory level (DE nodes)? Or inside the list of character devices 
(LIST_XXX)?

> Let me summarize:
> - the extra reference on arbitrary cdev should be eliminated. The
>    delist_dev_locked() may add the ref and set some CDP_ flag to
>    indicate to destroy_dev() that it should do dev_rel().

It is possible to do this. I thought about this before doing my patch, 
but decided to try to avoid adding yet another cdev flag.

> - the existence of the duplicated entries should be either eliminated
>    (I am not sure it is possible with your code), or we must ensure
>    that only one name with CDP_ACTIVE flag set and given name exists.
>    Then, all lookup code must be audited to take CDP_ACTIVE into account
>    when accessing names.  I see at least devfs_find() and devfs_mknod()
>    which must be changed.  I did not performed full audit.

I will check this path out aswell.

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B7AF2F.3080802>