Date: Thu, 15 Jan 2015 11:49:09 +0100 From: Hans Petter Selasky <hps@selasky.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern Message-ID: <54B79B25.3070707@selasky.org> In-Reply-To: <20150115093841.GX42409@kib.kiev.ua> References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/15/15 10:38, Konstantin Belousov wrote: > On Thu, Jan 15, 2015 at 08:41:31AM +0100, Hans Petter Selasky wrote: >> On 01/15/15 04:31, Konstantin Belousov wrote: >>> On Wed, Jan 14, 2015 at 10:07:13PM +0000, Hans Petter Selasky wrote: >>>> Author: hselasky >>>> Date: Wed Jan 14 22:07:13 2015 >>>> New Revision: 277199 >>>> URL: https://svnweb.freebsd.org/changeset/base/277199 >>>> >>>> Log: >>>> Avoid race with "dev_rel()" when using the recently added >>>> "delist_dev()" function. Make sure the character device structure >>>> doesn't go away until the end of the "destroy_dev()" function due to >>>> concurrently running cleanup code inside "devfs_populate()". >>>> >>>> MFC after: 1 week >>>> Reported by: dchagin@ >>>> >>>> Modified: >>>> head/sys/fs/devfs/devfs_devs.c >>>> head/sys/kern/kern_conf.c >>>> >>>> Modified: head/sys/fs/devfs/devfs_devs.c >>>> ============================================================================== >>>> --- head/sys/fs/devfs/devfs_devs.c Wed Jan 14 22:05:57 2015 (r277198) >>>> +++ head/sys/fs/devfs/devfs_devs.c Wed Jan 14 22:07:13 2015 (r277199) >>>> @@ -137,6 +137,12 @@ devfs_alloc(int flags) >>>> vfs_timestamp(&ts); >>>> cdev->si_atime = cdev->si_mtime = cdev->si_ctime = ts; >>>> cdev->si_cred = NULL; >>>> + /* >>>> + * Avoid race with dev_rel() by setting the initial >>>> + * reference count to 1. This last reference is taken >>>> + * by the destroy_dev() function. >>>> + */ >>>> + cdev->si_refcount = 1; >>> This is wrong. Not all devices are destroyed with destroy_dev(). >>> dev_rel() must be allowed to clean up allocated device. >>> >>> That said, I do not understand what race you are trying to solve. >>> Freeing of the accessible cdev memory cannot happen in parallel while >>> dev_mtx is owned. >>> >>> Please do not commit (to devfs) without seeking for the review first. >> >> Hi Konstantin, >> >> From my analysis there are basically three ways for a cdev to die: >> >> 1) Through dev_free_devlocked() >> 2) Through destroy_devl() which then later calls dev_free_devlocked() >> 3) Through destroy_dev_sched() which really is a wrapper around >> destroy_devl(). > You only look from the consumers PoV. Devfs cdev can be dereferenced > because e.g. clone handler decides that cdev is not valid/needed, > and now the memory is never freed due to extra reference. > > Do not assume that all cdevs go through destroy_dev(). Hi, All cdevs go through either case #2 or case #1 eventually from what I can see, including clone devices, which call destroy_devl() in the end aswell. See the "clone_destroy()" function! I did a simple test with /dev/dspX.Y which use clone devices. I did: vmstat -m | grep -i devfs1 1) Before plugging USB audio device: DEVFS1 157 79K - 189 512 2) Plug USB audio device: DEVFS1 164 82K - 196 512 3) Play something (env AUDIODEV=/dev/dsp2.4 play track01.wav) DEVFS1 165 83K - 197 512 4) Stop playing (clone device still exits): DEVFS1 165 83K - 197 512 5) Detach USB audio device: DEVFS1 157 79K - 197 512 I see no leakage in that case! Other case: 1) After "kldload if_tap" DEVFS1 158 79K - 201 512 2) After creating TAP device (cat /dev/tap99) DEVFS1 159 80K - 204 512 3) After creating TAP device (cat /dev/tap101) DEVFS1 160 80K - 207 512 5) After "kldunload if_tap": DEVFS1 158 79K - 207 512 6) After "kldload if_tap" again: DEVFS1 158 79K - 207 512 I see no leakage in that case either! Are there more cases which I don't see? > >> >> In the case of direct free through #1, the reference count is ignored >> and it doesn't matter if it is one or zero. Only in the case of >> destruction through destroy_dev() it matters. >> >> Like the comment says in destroy_devl(): >> >> /* Avoid race with dev_rel() */ >> >> The problem is that the "cdev->si_refcount" is zero when the initial >> devfs_create() is called. Then one ref is made. When we clear the >> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running >> process to destroy all the FS related structures and the reference count >> goes back to zero when the "cdp" is removed from the "cdevp_list". Then >> the cdev is freed too early. This happens because destroy_devl() is >> dropping the dev_lock() to sleep waiting for pending references. > Basically, this is very good explanation why your delist hack is wrong, > for one of the reason. Another reason is explained below. > You are trying to cover it with additional reference, but this is wrong > as well. > >> >> Do you see something else? > > I think that what you are trying to do with the CDP_ACTIVE hack is doomed > anyway, because you are allowing for devfs directory to have two entries > with the same name, until the populate loop cleans up the inactive one. > In the meantime, any access to the directory operates on random entry. The entry will not be random, because upon an open() call to a character device, I believe the devfs_lookup() function will be called, which always populate the devfs tree at first by calls to devfs_populate_xxx(). Any delisted devices which don't have the "CDP_ACTIVE" bit set, will never be seen by any open. Regarding leftover filedescriptors which still access the old "cdev" this is not a problem, and these will be closed when the si_refcount goes to zero after the destroy_devl() call. > > The checks for existent names in make_dev() are performed for the reason, > and you makes the rounds to effectively ignore it. > These checks are still correct and don't conflict with my patch from what I can see. Else the existing destroy_devl() would also be broken even before my patch with regard to the "random" selection of character devices at open() from userspace. --HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B79B25.3070707>