From owner-svn-src-all@FreeBSD.ORG  Fri Jan 16 08:59:27 2015
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C0BA8DB0;
 Fri, 16 Jan 2015 08:59:27 +0000 (UTC)
Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6172EF70;
 Fri, 16 Jan 2015 08:59:27 +0000 (UTC)
Received: from laptop015.home.selasky.org
 (cm-176.74.213.204.customer.telag.net [176.74.213.204])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 60BD51FE023;
 Fri, 16 Jan 2015 09:59:23 +0100 (CET)
Message-ID: <54B8D31B.9030805@selasky.org>
Date: Fri, 16 Jan 2015 10:00:11 +0100
From: Hans Petter Selasky <hps@selasky.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern
References: <201501142207.t0EM7Dfn041543@svn.freebsd.org>
 <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org>
 <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org>
 <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org>
 <20150116080332.GE42409@kib.kiev.ua>
In-Reply-To: <20150116080332.GE42409@kib.kiev.ua>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jan 2015 08:59:28 -0000

Hi Konstantin,

On 01/16/15 09:03, Konstantin Belousov wrote:
> On Thu, Jan 15, 2015 at 01:14:39PM +0100, Hans Petter Selasky wrote:
>> On 01/15/15 12:51, Konstantin Belousov wrote:
>>> On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote:
>>>>
>>>> I see no leakage in that case either!
>>> Because these cases come through destroy_dev().
>>>
>>>>
>>>> Are there more cases which I don't see?
>>> You are breaking existig devfs KPI by your hack.  You introduce yet another
>>> reference on the device, which is not supposed to be there.
>>
>> Hi Konstantin,
>>
>> I need a non-sleeping way to say a character device is no longer
>> supposed to be used and be able to re-use the device name right away
>> creating a new device. I guess you got that.
> Yes, I got it.  The devfs design is not suitable for this, and your
> hack is the good witness of the fact.
>
> My opinion is that you should have tried to handle the issue at the driver
> level, instead of making this devfs issue.  I.e., if you already have
> cdev node with the correct name, driver should have replaced the private
> data to point to new device.

I think this way cannot be implemented in a clean way, because of 
locking order reversal. And if you try to avoid the LOR you end up with 
the race. Chess mate sort of ;-)  Let me explain:

Thread 1:
usb_sx_lock();
   cdev = Look in freelist for existing device();
else
   cdev = make_dev();
usb_sx_unlock();

Thread 2:
usb_sx_lock();
put cdev on freelist
usb_sx_unlock();

Thread 3:
usb_sx_lock();
cdev = remove first entry in freelist
usb_sx_unlock();

/*
  * XXX because USB needs to call destroy_dev() unlocked we
  * are racing with Thread 1 again
  */
destroy_dev(cdev);

>
> This would also close a window where /dev node is non-existent or operate
> erronously.

I'm not saying I plan so, but I think "cdevs" at some point need to 
understand mutexes and locks. That means like with other API's in the 
kernel we can associate a lock with the "cdev", and this lock is then 
used to ensure an atomic shutdown of the system in a non-blocking 
fashion. In my past experience multithreaded APIs should be high level 
implemented like this:

NON-BLOCKING methods:
lock(); **
xxx_start();
xxx_stop();
unlock();

BLOCKING methods:
setup(); // init
unsetup(); // drain

Any callbacks should always be called locked **

In devfs there was no non-blocking stop before I added the delist_dev() 
function.

>
> You do not understand my point.
>
> I object against imposing one additional global reference on all cdevs
> just to cope with the delist hack.  See the patch at the end of the message.

It's fine by me.
>
> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
> delist_dev(), the node still exists in the /dev. It is only removed
> after populate loop is run sometime later. dev_sched() KPI is inheritly
> racy, drivers must handle the races for other reasons.

The populate loop is all running under the dev_lock() from what I can 
see and make_dev() is also keeping the same lock when inserting and 
removing new cdevs. The populate loop should always give a consistent 
view of the character devices available, and I don't see how "cdev" 
structures without the CDP_ACTIVE flag can appear with recently created 
ones, even if the name is the same.

>
>>
>>> Entry can be random, since after the populate loop is ran, your code in
>>> other thread could start and create duplicate entry. There is a window
>>> in the lookup where both directory vnode lock and mount point sx locks
>>> are dropped. So running the populate does not guarantee anything.
>>
>> If there is such a race, it is already there! My patch changes nothing
>> in that area:
>>
>> Thread1:
>> Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes
>> waiting for some refs for xxx milliseconds.
>> Thread2:
>> Tries to create create a new character device having the same name like
>> the one in thread1. Device name duplication check is missed because
>> CDP_ACTIVE is cleared. Still thread1 is waiting.
>> Thread3:
>> Tries to open character device created by thread2 while thread1 is still
>> waiting for some ref held by a userspace app to go away.
>>
>> This can happen already before my patches! What do you think?
> Possibly.
>


>> At what level do you mean duplicate names, I don't get this fully? At
>> the directory level (DE nodes)? Or inside the list of character devices
>> (LIST_XXX)?
> It does not matter, dup at either one directory level, or dup of full
> names in the global list are equivalent (bad) things.

Like I write above I don't see where the problem is. At the cdev level, 
we are protecting the cdev's LIST with dev_lock() and only one entry 
will exist having CDP_ACTIVE bit set per unique cdev name and path. Else 
we will hit a panic in make_dev() and friends.

In the directory entry level the populate loop will also ensure a 
consistent view, and hence the cdev's LIST is consistent, the view 
presented to userspace will also be consistent.

That system functions can still call into the dangling read/write/ioctl 
functions is another story, and that is why I tell, that in order to 
simplify this teardown, we possibly should associate a client selectable 
lock with each CDEV, for teardown purposes. Like done for several years 
in the callout and USB APIs and possibly many other places.

>
>>
>>> Let me summarize:
>>> - the extra reference on arbitrary cdev should be eliminated. The
>>>     delist_dev_locked() may add the ref and set some CDP_ flag to
>>>     indicate to destroy_dev() that it should do dev_rel().
>>
>> It is possible to do this. I thought about this before doing my patch,
>> but decided to try to avoid adding yet another cdev flag.
>>
>>> - the existence of the duplicated entries should be either eliminated
>>>     (I am not sure it is possible with your code), or we must ensure
>>>     that only one name with CDP_ACTIVE flag set and given name exists.
>>>     Then, all lookup code must be audited to take CDP_ACTIVE into account
>>>     when accessing names.  I see at least devfs_find() and devfs_mknod()
>>>     which must be changed.  I did not performed full audit.
>>
>> I will check this path out aswell.
>>
>
> It seems it is simpler for me to try to clean up after the commit.
> The patch was only lightly tested.  I post the patch for discussion,
> not for you to committing it.  I will expedite the change into HEAD
> after the consensus on it is made and adequate testing is performed.

I don't see any problems about your patch, except it adds a bit more 
code to the kernel.

--HPS