From owner-svn-src-all@FreeBSD.ORG Sun Jan 18 11:07:40 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 25332E60; Sun, 18 Jan 2015 11:07:40 +0000 (UTC) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF076BDE; Sun, 18 Jan 2015 11:07:39 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 3E96B1FE023; Sun, 18 Jan 2015 12:07:36 +0100 (CET) Message-ID: <54BB942A.5070200@selasky.org> Date: Sun, 18 Jan 2015 12:08:26 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org> <20150116080332.GE42409@kib.kiev.ua> <54B8D31B.9030805@selasky.org> <20150118105410.GN42409@kib.kiev.ua> In-Reply-To: <20150118105410.GN42409@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Jan 2015 11:07:40 -0000 Hi Konstantin, On 01/18/15 11:54, Konstantin Belousov wrote: > On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote: >> Hi Konstantin, >> >> On 01/16/15 09:03, Konstantin Belousov wrote: >>> My opinion is that you should have tried to handle the issue at the driver >>> level, instead of making this devfs issue. I.e., if you already have >>> cdev node with the correct name, driver should have replaced the private >>> data to point to new device. >> >> I think this way cannot be implemented in a clean way, because of >> locking order reversal. And if you try to avoid the LOR you end up with >> the race. Chess mate sort of ;-) Let me explain: >> >> Thread 1: >> usb_sx_lock(); >> cdev = Look in freelist for existing device(); >> else >> cdev = make_dev(); >> usb_sx_unlock(); >> >> Thread 2: >> usb_sx_lock(); >> put cdev on freelist >> usb_sx_unlock(); >> >> Thread 3: >> usb_sx_lock(); >> cdev = remove first entry in freelist >> usb_sx_unlock(); >> >> /* >> * XXX because USB needs to call destroy_dev() unlocked we >> * are racing with Thread 1 again >> */ >> destroy_dev(cdev); > I do not understand the 'freelist' part. > > You have some usb (or whatever other kind) device, which is represented by > some data structure. This data structure references cdev. In reverse, > cdev points to this data structure by si_drv1 or similar driver-owned > member of struct cdev. > > The structures naming usb devices have some mapping to/from usb bus addresses. > When device is destroyed or created due to external plug event, there is > some mechanism that ensures mapping consistence. It could be lock, it > could be single-threaded process which discover the bus, or something > else, I do not know. > > While the structure notes that device with some address goes out and comes > in, it can look up the corresponding cdev and just change the si_drv1 pointer > to point to the new device. What about events for "devd" that a new character devices is present in the system or being destroyed for user-space applications? > > I find it very strange that you need sleepless operation which can > _both_ destroy and create cdev. At least one operation of creation or > destruction must sleep, at least in the current devfs design. It could > be made sleepless, by making VFS visible part even less connected to the > cdev part, but this is not how it (can) currently work. I think it would be good practice that all resources needed for creating a character device can be allocated prior to registration. That means we first should allocate all resources needed, but not register as a single function. For example: Once make_dev() has returned, we can get an "d_open" callback. But "si_drv1/2" is always set by drivers _after_ that "make_dev()" has returned! This causes a race we could simply avoid by splitting the make_dev() like I suggest. Instead of putting more logic and code inside the drivers to handle the race! >> >>> >>> This would also close a window where /dev node is non-existent or operate >>> erronously. >> >> I'm not saying I plan so, but I think "cdevs" at some point need to >> understand mutexes and locks. That means like with other API's in the >> kernel we can associate a lock with the "cdev", and this lock is then >> used to ensure an atomic shutdown of the system in a non-blocking >> fashion. In my past experience multithreaded APIs should be high level >> implemented like this: >> >> NON-BLOCKING methods: >> lock(); ** >> xxx_start(); >> xxx_stop(); >> unlock(); >> >> BLOCKING methods: >> setup(); // init >> unsetup(); // drain >> >> Any callbacks should always be called locked ** >> >> In devfs there was no non-blocking stop before I added the delist_dev() >> function. >> >>> >>> You do not understand my point. >>> >>> I object against imposing one additional global reference on all cdevs >>> just to cope with the delist hack. See the patch at the end of the message. >> >> It's fine by me. >>> >>> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling >>> delist_dev(), the node still exists in the /dev. It is only removed >>> after populate loop is run sometime later. dev_sched() KPI is inheritly >>> racy, drivers must handle the races for other reasons. >> >> The populate loop is all running under the dev_lock() from what I can >> see and make_dev() is also keeping the same lock when inserting and >> removing new cdevs. The populate loop should always give a consistent > No, populate loop is not run under dev_mtx. Are you sure: > static int > devfs_populate_loop(struct devfs_mount *dm, int cleanup) > { > struct cdev_priv *cdp; > struct devfs_dirent *de; > struct devfs_dirent *dd, *dt; > struct cdev *pdev; > int de_flags, depth, j; > char *q, *s; > > sx_assert(&dm->dm_lock, SX_XLOCKED); > dev_lock(); ^^^^ what is this ? > TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) { > It would not be able to > allocate memory, even in M_NOWAIT fashion. The dev_mtx is after the > vm and vnode locks. Only iteration over the cdevp_list is protected by > dev_mtx, which is dropped right after something which require an > action, is found on the list. Populate() needs some way to avoid > reading inconsistent data from the cdev layer, but there is not > guarantee that we are always up to date. > >> view of the character devices available, and I don't see how "cdev" >> structures without the CDP_ACTIVE flag can appear with recently created >> ones, even if the name is the same. > It cannot simply because it is not synchronized with the device > creation/destruction. > Lookup (or any other VFS-level code) is not protected by dev_mtx, it > is only dm lock which ensures that populate loop is not run in parallel, > modifying direntries. See comment above. > >> >> That system functions can still call into the dangling read/write/ioctl >> functions is another story, and that is why I tell, that in order to >> simplify this teardown, we possibly should associate a client selectable >> lock with each CDEV, for teardown purposes. Like done for several years >> in the callout and USB APIs and possibly many other places. > There is d_purge method which provides the tear-down semantic for > drivers which needs it. Only tty uses the method AFAIK. > > Devfs ensures that no client code is active when cdev is removed for real. --HPS