Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Jan 2015 11:49:09 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r277199 - in head/sys: fs/devfs kern
Message-ID:  <54B79B25.3070707@selasky.org>
In-Reply-To: <20150115093841.GX42409@kib.kiev.ua>
References:  <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/15/15 10:38, Konstantin Belousov wrote:
> On Thu, Jan 15, 2015 at 08:41:31AM +0100, Hans Petter Selasky wrote:
>> On 01/15/15 04:31, Konstantin Belousov wrote:
>>> On Wed, Jan 14, 2015 at 10:07:13PM +0000, Hans Petter Selasky wrote:
>>>> Author: hselasky
>>>> Date: Wed Jan 14 22:07:13 2015
>>>> New Revision: 277199
>>>> URL: https://svnweb.freebsd.org/changeset/base/277199
>>>>
>>>> Log:
>>>>     Avoid race with "dev_rel()" when using the recently added
>>>>     "delist_dev()" function. Make sure the character device structure
>>>>     doesn't go away until the end of the "destroy_dev()" function due to
>>>>     concurrently running cleanup code inside "devfs_populate()".
>>>>
>>>>     MFC after:	1 week
>>>>     Reported by:	dchagin@
>>>>
>>>> Modified:
>>>>     head/sys/fs/devfs/devfs_devs.c
>>>>     head/sys/kern/kern_conf.c
>>>>
>>>> Modified: head/sys/fs/devfs/devfs_devs.c
>>>> ==============================================================================
>>>> --- head/sys/fs/devfs/devfs_devs.c	Wed Jan 14 22:05:57 2015	(r277198)
>>>> +++ head/sys/fs/devfs/devfs_devs.c	Wed Jan 14 22:07:13 2015	(r277199)
>>>> @@ -137,6 +137,12 @@ devfs_alloc(int flags)
>>>>    	vfs_timestamp(&ts);
>>>>    	cdev->si_atime = cdev->si_mtime = cdev->si_ctime = ts;
>>>>    	cdev->si_cred = NULL;
>>>> +	/*
>>>> +	 * Avoid race with dev_rel() by setting the initial
>>>> +	 * reference count to 1. This last reference is taken
>>>> +	 * by the destroy_dev() function.
>>>> +	 */
>>>> +	cdev->si_refcount = 1;
>>> This is wrong.  Not all devices are destroyed with destroy_dev().
>>> dev_rel() must be allowed to clean up allocated device.
>>>
>>> That said, I do not understand what race you are trying to solve.
>>> Freeing of the accessible cdev memory cannot happen in parallel while
>>> dev_mtx is owned.
>>>
>>> Please do not commit (to devfs) without seeking for the review first.
>>
>> Hi Konstantin,
>>
>>   From my analysis there are basically three ways for a cdev to die:
>>
>> 1) Through dev_free_devlocked()
>> 2) Through destroy_devl() which then later calls dev_free_devlocked()
>> 3) Through destroy_dev_sched() which really is a wrapper around
>> destroy_devl().
> You only look from the consumers PoV.  Devfs cdev can be dereferenced
> because e.g. clone handler decides that cdev is not valid/needed,
> and now the memory is never freed due to extra reference.
>
> Do not assume that all cdevs go through destroy_dev().

Hi,

All cdevs go through either case #2 or case #1 eventually from what I 
can see, including clone devices, which call destroy_devl() in the end 
aswell. See the "clone_destroy()" function!

I did a simple test with /dev/dspX.Y which use clone devices. I did:

vmstat -m | grep -i devfs1

1) Before plugging USB audio device:

        DEVFS1   157    79K       -      189  512

2) Plug USB audio device:

        DEVFS1   164    82K       -      196  512

3) Play something (env AUDIODEV=/dev/dsp2.4 play track01.wav)

        DEVFS1   165    83K       -      197  512

4) Stop playing (clone device still exits):

        DEVFS1   165    83K       -      197  512

5) Detach USB audio device:

        DEVFS1   157    79K       -      197  512

I see no leakage in that case!

Other case:

1) After "kldload if_tap"

       DEVFS1   158    79K       -      201  512

2) After creating TAP device (cat /dev/tap99)

        DEVFS1   159    80K       -      204  512

3) After creating TAP device (cat /dev/tap101)

        DEVFS1   160    80K       -      207  512

5) After "kldunload if_tap":

        DEVFS1   158    79K       -      207  512

6) After "kldload if_tap" again:

        DEVFS1   158    79K       -      207  512

I see no leakage in that case either!

Are there more cases which I don't see?

>
>>
>> In the case of direct free through #1, the reference count is ignored
>> and it doesn't matter if it is one or zero. Only in the case of
>> destruction through destroy_dev() it matters.
>>
>> Like the comment says in destroy_devl():
>>
>> /* Avoid race with dev_rel() */
>>
>> The problem is that the "cdev->si_refcount" is zero when the initial
>> devfs_create() is called. Then one ref is made. When we clear the
>> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running
>> process to destroy all the FS related structures and the reference count
>> goes back to zero when the "cdp" is removed from the "cdevp_list". Then
>> the cdev is freed too early. This happens because destroy_devl() is
>> dropping the dev_lock() to sleep waiting for pending references.
> Basically, this is very good explanation why your delist hack is wrong,
> for one of the reason.  Another reason is explained below.
> You are trying to cover it with additional reference, but this is wrong
> as well.
>
>>
>> Do you see something else?
>
> I think that what you are trying to do with the CDP_ACTIVE hack is doomed
> anyway, because you are allowing for devfs directory to have two entries
> with the same name, until the populate loop cleans up the inactive one.
> In the meantime, any access to the directory operates on random entry.

The entry will not be random, because upon an open() call to a character 
device, I believe the devfs_lookup() function will be called, which 
always populate the devfs tree at first by calls to 
devfs_populate_xxx(). Any delisted devices which don't have the 
"CDP_ACTIVE" bit set, will never be seen by any open.

Regarding leftover filedescriptors which still access the old "cdev" 
this is not a problem, and these will be closed when the si_refcount 
goes to zero after the destroy_devl() call.

>
> The checks for existent names in make_dev() are performed for the reason,
> and you makes the rounds to effectively ignore it.
>

These checks are still correct and don't conflict with my patch from 
what I can see. Else the existing destroy_devl() would also be broken 
even before my patch with regard to the "random" selection of character 
devices at open() from userspace.

--HPS




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B79B25.3070707>