From owner-svn-src-all@FreeBSD.ORG Thu Jan 15 12:13:54 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DC207AF2; Thu, 15 Jan 2015 12:13:53 +0000 (UTC) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 87071987; Thu, 15 Jan 2015 12:13:53 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 617331FE022; Thu, 15 Jan 2015 13:13:51 +0100 (CET) Message-ID: <54B7AF2F.3080802@selasky.org> Date: Thu, 15 Jan 2015 13:14:39 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua> In-Reply-To: <20150115115123.GA42409@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Jan 2015 12:13:54 -0000 On 01/15/15 12:51, Konstantin Belousov wrote: > On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote: >> >> I see no leakage in that case either! > Because these cases come through destroy_dev(). > >> >> Are there more cases which I don't see? > You are breaking existig devfs KPI by your hack. You introduce yet another > reference on the device, which is not supposed to be there. Hi Konstantin, I need a non-sleeping way to say a character device is no longer supposed to be used and be able to re-use the device name right away creating a new device. I guess you got that. > > If some code calls delist_dev(), it could be said that it is a contract > of the new function that destroy_dev() must be called eventually on > the cdev. Then, the reference could be said to be shared-owned by > delist_dev() and destroy_dev(). But, for arbitrary devfs user this new > reference is unacceptable and breaks interface. delist_dev() changes no references. It can be called multiple times even, also inside destroy_devl(). Also I think that the "destroy_dev_sched_cbl()" function should call delist_dev() first so that we don't have a time from when the "destroy_dev_sched_cbl()" function is called where the device entry still exists in devfs mounts until the final destroy_devl() is done by a taskqueue. >>>> In the case of direct free through #1, the reference count is ignored >>>> and it doesn't matter if it is one or zero. Only in the case of >>>> destruction through destroy_dev() it matters. >>>> >>>> Like the comment says in destroy_devl(): >>>> >>>> /* Avoid race with dev_rel() */ >>>> >>>> The problem is that the "cdev->si_refcount" is zero when the initial >>>> devfs_create() is called. Then one ref is made. When we clear the >>>> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running >>>> process to destroy all the FS related structures and the reference count >>>> goes back to zero when the "cdp" is removed from the "cdevp_list". Then >>>> the cdev is freed too early. This happens because destroy_devl() is >>>> dropping the dev_lock() to sleep waiting for pending references. >>> Basically, this is very good explanation why your delist hack is wrong, >>> for one of the reason. Another reason is explained below. >>> You are trying to cover it with additional reference, but this is wrong >>> as well. >>> >>>> >>>> Do you see something else? >>> >>> I think that what you are trying to do with the CDP_ACTIVE hack is doomed >>> anyway, because you are allowing for devfs directory to have two entries >>> with the same name, until the populate loop cleans up the inactive one. >>> In the meantime, any access to the directory operates on random entry. >> >> The entry will not be random, because upon an open() call to a character >> device, I believe the devfs_lookup() function will be called, which >> always populate the devfs tree at first by calls to >> devfs_populate_xxx(). Any delisted devices which don't have the >> "CDP_ACTIVE" bit set, will never be seen by any open. > Entry can be random, since after the populate loop is ran, your code in > other thread could start and create duplicate entry. There is a window > in the lookup where both directory vnode lock and mount point sx locks > are dropped. So running the populate does not guarantee anything. If there is such a race, it is already there! My patch changes nothing in that area: Thread1: Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes waiting for some refs for xxx milliseconds. Thread2: Tries to create create a new character device having the same name like the one in thread1. Device name duplication check is missed because CDP_ACTIVE is cleared. Still thread1 is waiting. Thread3: Tries to open character device created by thread2 while thread1 is still waiting for some ref held by a userspace app to go away. This can happen already before my patches! What do you think? > >> >> Regarding leftover filedescriptors which still access the old "cdev" >> this is not a problem, and these will be closed when the si_refcount >> goes to zero after the destroy_devl() call. >> >>> >>> The checks for existent names in make_dev() are performed for the reason, >>> and you makes the rounds to effectively ignore it. >>> >> >> These checks are still correct and don't conflict with my patch from >> what I can see. Else the existing destroy_devl() would also be broken >> even before my patch with regard to the "random" selection of character >> devices at open() from userspace. > > The checks are done to avoid duplicate names. Your patch makes these > checks ineffective (i.e. broken). At what level do you mean duplicate names, I don't get this fully? At the directory level (DE nodes)? Or inside the list of character devices (LIST_XXX)? > Let me summarize: > - the extra reference on arbitrary cdev should be eliminated. The > delist_dev_locked() may add the ref and set some CDP_ flag to > indicate to destroy_dev() that it should do dev_rel(). It is possible to do this. I thought about this before doing my patch, but decided to try to avoid adding yet another cdev flag. > - the existence of the duplicated entries should be either eliminated > (I am not sure it is possible with your code), or we must ensure > that only one name with CDP_ACTIVE flag set and given name exists. > Then, all lookup code must be audited to take CDP_ACTIVE into account > when accessing names. I see at least devfs_find() and devfs_mknod() > which must be changed. I did not performed full audit. I will check this path out aswell. --HPS