From owner-svn-src-all@FreeBSD.ORG  Sun Jan 18 11:07:40 2015
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 25332E60;
 Sun, 18 Jan 2015 11:07:40 +0000 (UTC)
Received: from mail.turbocat.net (mail.turbocat.net
 [IPv6:2a01:4f8:d16:4514::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BF076BDE;
 Sun, 18 Jan 2015 11:07:39 +0000 (UTC)
Received: from laptop015.home.selasky.org
 (cm-176.74.213.204.customer.telag.net [176.74.213.204])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 3E96B1FE023;
 Sun, 18 Jan 2015 12:07:36 +0100 (CET)
Message-ID: <54BB942A.5070200@selasky.org>
Date: Sun, 18 Jan 2015 12:08:26 +0100
From: Hans Petter Selasky <hps@selasky.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern
References: <201501142207.t0EM7Dfn041543@svn.freebsd.org>
 <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org>
 <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org>
 <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org>
 <20150116080332.GE42409@kib.kiev.ua> <54B8D31B.9030805@selasky.org>
 <20150118105410.GN42409@kib.kiev.ua>
In-Reply-To: <20150118105410.GN42409@kib.kiev.ua>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Jan 2015 11:07:40 -0000

Hi Konstantin,

On 01/18/15 11:54, Konstantin Belousov wrote:
> On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote:
>> Hi Konstantin,
>>
>> On 01/16/15 09:03, Konstantin Belousov wrote:
>>> My opinion is that you should have tried to handle the issue at the driver
>>> level, instead of making this devfs issue.  I.e., if you already have
>>> cdev node with the correct name, driver should have replaced the private
>>> data to point to new device.
>>
>> I think this way cannot be implemented in a clean way, because of
>> locking order reversal. And if you try to avoid the LOR you end up with
>> the race. Chess mate sort of ;-)  Let me explain:
>>
>> Thread 1:
>> usb_sx_lock();
>>     cdev = Look in freelist for existing device();
>> else
>>     cdev = make_dev();
>> usb_sx_unlock();
>>
>> Thread 2:
>> usb_sx_lock();
>> put cdev on freelist
>> usb_sx_unlock();
>>
>> Thread 3:
>> usb_sx_lock();
>> cdev = remove first entry in freelist
>> usb_sx_unlock();
>>
>> /*
>>    * XXX because USB needs to call destroy_dev() unlocked we
>>    * are racing with Thread 1 again
>>    */
>> destroy_dev(cdev);
> I do not understand the 'freelist' part.
>
> You have some usb (or whatever other kind) device, which is represented by
> some data structure.  This data structure references cdev.  In reverse,
> cdev points to this data structure by si_drv1 or similar driver-owned
> member of struct cdev.
>
> The structures naming usb devices have some mapping to/from usb bus addresses.
> When device is destroyed or created due to external plug event, there is
> some mechanism that ensures mapping consistence.  It could be lock, it
> could be single-threaded process which discover the bus, or something
> else, I do not know.
>
> While the structure notes that device with some address goes out and comes
> in, it can look up the corresponding cdev and just change the si_drv1 pointer
> to point to the new device.

What about events for "devd" that a new character devices is present in 
the system or being destroyed for user-space applications?

>
> I find it very strange that you need sleepless operation which can
> _both_ destroy and create cdev. At least one operation of creation or
> destruction must sleep, at least in the current devfs design. It could
> be made sleepless, by making VFS visible part even less connected to the
> cdev part, but this is not how it (can) currently work.

I think it would be good practice that all resources needed for creating 
a character device can be allocated prior to registration. That means we 
first should allocate all resources needed, but not register as a single 
function.

For example:

Once make_dev() has returned, we can get an "d_open" callback. But 
"si_drv1/2" is always set by drivers _after_ that "make_dev()" has 
returned! This causes a race we could simply avoid by splitting the 
make_dev() like I suggest. Instead of putting more logic and code inside 
the drivers to handle the race!

>>
>>>
>>> This would also close a window where /dev node is non-existent or operate
>>> erronously.
>>
>> I'm not saying I plan so, but I think "cdevs" at some point need to
>> understand mutexes and locks. That means like with other API's in the
>> kernel we can associate a lock with the "cdev", and this lock is then
>> used to ensure an atomic shutdown of the system in a non-blocking
>> fashion. In my past experience multithreaded APIs should be high level
>> implemented like this:
>>
>> NON-BLOCKING methods:
>> lock(); **
>> xxx_start();
>> xxx_stop();
>> unlock();
>>
>> BLOCKING methods:
>> setup(); // init
>> unsetup(); // drain
>>
>> Any callbacks should always be called locked **
>>
>> In devfs there was no non-blocking stop before I added the delist_dev()
>> function.
>>
>>>
>>> You do not understand my point.
>>>
>>> I object against imposing one additional global reference on all cdevs
>>> just to cope with the delist hack.  See the patch at the end of the message.
>>
>> It's fine by me.
>>>
>>> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
>>> delist_dev(), the node still exists in the /dev. It is only removed
>>> after populate loop is run sometime later. dev_sched() KPI is inheritly
>>> racy, drivers must handle the races for other reasons.
>>
>> The populate loop is all running under the dev_lock() from what I can
>> see and make_dev() is also keeping the same lock when inserting and
>> removing new cdevs. The populate loop should always give a consistent
> No, populate loop is not run under dev_mtx.

Are you sure:

> static int
> devfs_populate_loop(struct devfs_mount *dm, int cleanup)
> {
>         struct cdev_priv *cdp;
>         struct devfs_dirent *de;
>         struct devfs_dirent *dd, *dt;
>         struct cdev *pdev;
>         int de_flags, depth, j;
>         char *q, *s;
>
>         sx_assert(&dm->dm_lock, SX_XLOCKED);
>         dev_lock();
           ^^^^ what is this ?
>         TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) {

> It would not be able to
> allocate memory, even in M_NOWAIT fashion.  The dev_mtx is after the
> vm and vnode locks.  Only iteration over the cdevp_list is protected by
> dev_mtx, which is dropped right after something which require an
> action, is found on the list. Populate() needs some way to avoid
> reading inconsistent data from the cdev layer, but there is not
> guarantee that we are always up to date.
>
>> view of the character devices available, and I don't see how "cdev"
>> structures without the CDP_ACTIVE flag can appear with recently created
>> ones, even if the name is the same.
> It cannot simply because it is not synchronized with the device
> creation/destruction.


> Lookup (or any other VFS-level code) is not protected by dev_mtx, it
> is only dm lock which ensures that populate loop is not run in parallel,
> modifying direntries.

See comment above.

>
>>
>> That system functions can still call into the dangling read/write/ioctl
>> functions is another story, and that is why I tell, that in order to
>> simplify this teardown, we possibly should associate a client selectable
>> lock with each CDEV, for teardown purposes. Like done for several years
>> in the callout and USB APIs and possibly many other places.
> There is d_purge method which provides the tear-down semantic for
> drivers which needs it.  Only tty uses the method AFAIK.
>
> Devfs ensures that no client code is active when cdev is removed for real.

--HPS