Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Jan 2015 13:51:23 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r277199 - in head/sys: fs/devfs kern
Message-ID:  <20150115115123.GA42409@kib.kiev.ua>
In-Reply-To: <54B79B25.3070707@selasky.org>
References:  <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote:
> On 01/15/15 10:38, Konstantin Belousov wrote:
> > On Thu, Jan 15, 2015 at 08:41:31AM +0100, Hans Petter Selasky wrote:
> >> On 01/15/15 04:31, Konstantin Belousov wrote:
> >>> On Wed, Jan 14, 2015 at 10:07:13PM +0000, Hans Petter Selasky wrote:
> >>>> Author: hselasky
> >>>> Date: Wed Jan 14 22:07:13 2015
> >>>> New Revision: 277199
> >>>> URL: https://svnweb.freebsd.org/changeset/base/277199
> >>>>
> >>>> Log:
> >>>>     Avoid race with "dev_rel()" when using the recently added
> >>>>     "delist_dev()" function. Make sure the character device structure
> >>>>     doesn't go away until the end of the "destroy_dev()" function due to
> >>>>     concurrently running cleanup code inside "devfs_populate()".
> >>>>
> >>>>     MFC after:	1 week
> >>>>     Reported by:	dchagin@
> >>>>
> >>>> Modified:
> >>>>     head/sys/fs/devfs/devfs_devs.c
> >>>>     head/sys/kern/kern_conf.c
> >>>>
> >>>> Modified: head/sys/fs/devfs/devfs_devs.c
> >>>> ==============================================================================
> >>>> --- head/sys/fs/devfs/devfs_devs.c	Wed Jan 14 22:05:57 2015	(r277198)
> >>>> +++ head/sys/fs/devfs/devfs_devs.c	Wed Jan 14 22:07:13 2015	(r277199)
> >>>> @@ -137,6 +137,12 @@ devfs_alloc(int flags)
> >>>>    	vfs_timestamp(&ts);
> >>>>    	cdev->si_atime = cdev->si_mtime = cdev->si_ctime = ts;
> >>>>    	cdev->si_cred = NULL;
> >>>> +	/*
> >>>> +	 * Avoid race with dev_rel() by setting the initial
> >>>> +	 * reference count to 1. This last reference is taken
> >>>> +	 * by the destroy_dev() function.
> >>>> +	 */
> >>>> +	cdev->si_refcount = 1;
> >>> This is wrong.  Not all devices are destroyed with destroy_dev().
> >>> dev_rel() must be allowed to clean up allocated device.
> >>>
> >>> That said, I do not understand what race you are trying to solve.
> >>> Freeing of the accessible cdev memory cannot happen in parallel while
> >>> dev_mtx is owned.
> >>>
> >>> Please do not commit (to devfs) without seeking for the review first.
> >>
> >> Hi Konstantin,
> >>
> >>   From my analysis there are basically three ways for a cdev to die:
> >>
> >> 1) Through dev_free_devlocked()
> >> 2) Through destroy_devl() which then later calls dev_free_devlocked()
> >> 3) Through destroy_dev_sched() which really is a wrapper around
> >> destroy_devl().
> > You only look from the consumers PoV.  Devfs cdev can be dereferenced
> > because e.g. clone handler decides that cdev is not valid/needed,
> > and now the memory is never freed due to extra reference.
> >
> > Do not assume that all cdevs go through destroy_dev().
> 
> Hi,
> 
> All cdevs go through either case #2 or case #1 eventually from what I 
> can see, including clone devices, which call destroy_devl() in the end 
> aswell. See the "clone_destroy()" function!
> 
> I did a simple test with /dev/dspX.Y which use clone devices. I did:
> 
> vmstat -m | grep -i devfs1
> 
> 1) Before plugging USB audio device:
> 
>         DEVFS1   157    79K       -      189  512
> 
> 2) Plug USB audio device:
> 
>         DEVFS1   164    82K       -      196  512
> 
> 3) Play something (env AUDIODEV=/dev/dsp2.4 play track01.wav)
> 
>         DEVFS1   165    83K       -      197  512
> 
> 4) Stop playing (clone device still exits):
> 
>         DEVFS1   165    83K       -      197  512
> 
> 5) Detach USB audio device:
> 
>         DEVFS1   157    79K       -      197  512
> 
> I see no leakage in that case!
> 
> Other case:
> 
> 1) After "kldload if_tap"
> 
>        DEVFS1   158    79K       -      201  512
> 
> 2) After creating TAP device (cat /dev/tap99)
> 
>         DEVFS1   159    80K       -      204  512
> 
> 3) After creating TAP device (cat /dev/tap101)
> 
>         DEVFS1   160    80K       -      207  512
> 
> 5) After "kldunload if_tap":
> 
>         DEVFS1   158    79K       -      207  512
> 
> 6) After "kldload if_tap" again:
> 
>         DEVFS1   158    79K       -      207  512
> 
> I see no leakage in that case either!
Because these cases come through destroy_dev().

> 
> Are there more cases which I don't see?
You are breaking existig devfs KPI by your hack.  You introduce yet another
reference on the device, which is not supposed to be there.

If some code calls delist_dev(), it could be said that it is a contract
of the new function that destroy_dev() must be called eventually on
the cdev. Then, the reference could be said to be shared-owned by
delist_dev() and destroy_dev(). But, for arbitrary devfs user this new
reference is unacceptable and breaks interface.

> 
> >
> >>
> >> In the case of direct free through #1, the reference count is ignored
> >> and it doesn't matter if it is one or zero. Only in the case of
> >> destruction through destroy_dev() it matters.
> >>
> >> Like the comment says in destroy_devl():
> >>
> >> /* Avoid race with dev_rel() */
> >>
> >> The problem is that the "cdev->si_refcount" is zero when the initial
> >> devfs_create() is called. Then one ref is made. When we clear the
> >> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running
> >> process to destroy all the FS related structures and the reference count
> >> goes back to zero when the "cdp" is removed from the "cdevp_list". Then
> >> the cdev is freed too early. This happens because destroy_devl() is
> >> dropping the dev_lock() to sleep waiting for pending references.
> > Basically, this is very good explanation why your delist hack is wrong,
> > for one of the reason.  Another reason is explained below.
> > You are trying to cover it with additional reference, but this is wrong
> > as well.
> >
> >>
> >> Do you see something else?
> >
> > I think that what you are trying to do with the CDP_ACTIVE hack is doomed
> > anyway, because you are allowing for devfs directory to have two entries
> > with the same name, until the populate loop cleans up the inactive one.
> > In the meantime, any access to the directory operates on random entry.
> 
> The entry will not be random, because upon an open() call to a character 
> device, I believe the devfs_lookup() function will be called, which 
> always populate the devfs tree at first by calls to 
> devfs_populate_xxx(). Any delisted devices which don't have the 
> "CDP_ACTIVE" bit set, will never be seen by any open.
Entry can be random, since after the populate loop is ran, your code in
other thread could start and create duplicate entry. There is a window
in the lookup where both directory vnode lock and mount point sx locks
are dropped. So running the populate does not guarantee anything.

> 
> Regarding leftover filedescriptors which still access the old "cdev" 
> this is not a problem, and these will be closed when the si_refcount 
> goes to zero after the destroy_devl() call.
> 
> >
> > The checks for existent names in make_dev() are performed for the reason,
> > and you makes the rounds to effectively ignore it.
> >
> 
> These checks are still correct and don't conflict with my patch from 
> what I can see. Else the existing destroy_devl() would also be broken 
> even before my patch with regard to the "random" selection of character 
> devices at open() from userspace.

The checks are done to avoid duplicate names.  Your patch makes these
checks ineffective (i.e. broken).

Let me summarize:
- the extra reference on arbitrary cdev should be eliminated. The
  delist_dev_locked() may add the ref and set some CDP_ flag to
  indicate to destroy_dev() that it should do dev_rel().
- the existence of the duplicated entries should be either eliminated
  (I am not sure it is possible with your code), or we must ensure
  that only one name with CDP_ACTIVE flag set and given name exists.
  Then, all lookup code must be audited to take CDP_ACTIVE into account
  when accessing names.  I see at least devfs_find() and devfs_mknod()
  which must be changed.  I did not performed full audit.
  



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150115115123.GA42409>