From owner-svn-src-all@FreeBSD.ORG Fri Jan 16 08:59:27 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C0BA8DB0; Fri, 16 Jan 2015 08:59:27 +0000 (UTC) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6172EF70; Fri, 16 Jan 2015 08:59:27 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 60BD51FE023; Fri, 16 Jan 2015 09:59:23 +0100 (CET) Message-ID: <54B8D31B.9030805@selasky.org> Date: Fri, 16 Jan 2015 10:00:11 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org> <20150116080332.GE42409@kib.kiev.ua> In-Reply-To: <20150116080332.GE42409@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jan 2015 08:59:28 -0000 Hi Konstantin, On 01/16/15 09:03, Konstantin Belousov wrote: > On Thu, Jan 15, 2015 at 01:14:39PM +0100, Hans Petter Selasky wrote: >> On 01/15/15 12:51, Konstantin Belousov wrote: >>> On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote: >>>> >>>> I see no leakage in that case either! >>> Because these cases come through destroy_dev(). >>> >>>> >>>> Are there more cases which I don't see? >>> You are breaking existig devfs KPI by your hack. You introduce yet another >>> reference on the device, which is not supposed to be there. >> >> Hi Konstantin, >> >> I need a non-sleeping way to say a character device is no longer >> supposed to be used and be able to re-use the device name right away >> creating a new device. I guess you got that. > Yes, I got it. The devfs design is not suitable for this, and your > hack is the good witness of the fact. > > My opinion is that you should have tried to handle the issue at the driver > level, instead of making this devfs issue. I.e., if you already have > cdev node with the correct name, driver should have replaced the private > data to point to new device. I think this way cannot be implemented in a clean way, because of locking order reversal. And if you try to avoid the LOR you end up with the race. Chess mate sort of ;-) Let me explain: Thread 1: usb_sx_lock(); cdev = Look in freelist for existing device(); else cdev = make_dev(); usb_sx_unlock(); Thread 2: usb_sx_lock(); put cdev on freelist usb_sx_unlock(); Thread 3: usb_sx_lock(); cdev = remove first entry in freelist usb_sx_unlock(); /* * XXX because USB needs to call destroy_dev() unlocked we * are racing with Thread 1 again */ destroy_dev(cdev); > > This would also close a window where /dev node is non-existent or operate > erronously. I'm not saying I plan so, but I think "cdevs" at some point need to understand mutexes and locks. That means like with other API's in the kernel we can associate a lock with the "cdev", and this lock is then used to ensure an atomic shutdown of the system in a non-blocking fashion. In my past experience multithreaded APIs should be high level implemented like this: NON-BLOCKING methods: lock(); ** xxx_start(); xxx_stop(); unlock(); BLOCKING methods: setup(); // init unsetup(); // drain Any callbacks should always be called locked ** In devfs there was no non-blocking stop before I added the delist_dev() function. > > You do not understand my point. > > I object against imposing one additional global reference on all cdevs > just to cope with the delist hack. See the patch at the end of the message. It's fine by me. > > WRT destroy_dev_sched_cb() calling delist_dev(), even after calling > delist_dev(), the node still exists in the /dev. It is only removed > after populate loop is run sometime later. dev_sched() KPI is inheritly > racy, drivers must handle the races for other reasons. The populate loop is all running under the dev_lock() from what I can see and make_dev() is also keeping the same lock when inserting and removing new cdevs. The populate loop should always give a consistent view of the character devices available, and I don't see how "cdev" structures without the CDP_ACTIVE flag can appear with recently created ones, even if the name is the same. > >> >>> Entry can be random, since after the populate loop is ran, your code in >>> other thread could start and create duplicate entry. There is a window >>> in the lookup where both directory vnode lock and mount point sx locks >>> are dropped. So running the populate does not guarantee anything. >> >> If there is such a race, it is already there! My patch changes nothing >> in that area: >> >> Thread1: >> Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes >> waiting for some refs for xxx milliseconds. >> Thread2: >> Tries to create create a new character device having the same name like >> the one in thread1. Device name duplication check is missed because >> CDP_ACTIVE is cleared. Still thread1 is waiting. >> Thread3: >> Tries to open character device created by thread2 while thread1 is still >> waiting for some ref held by a userspace app to go away. >> >> This can happen already before my patches! What do you think? > Possibly. > >> At what level do you mean duplicate names, I don't get this fully? At >> the directory level (DE nodes)? Or inside the list of character devices >> (LIST_XXX)? > It does not matter, dup at either one directory level, or dup of full > names in the global list are equivalent (bad) things. Like I write above I don't see where the problem is. At the cdev level, we are protecting the cdev's LIST with dev_lock() and only one entry will exist having CDP_ACTIVE bit set per unique cdev name and path. Else we will hit a panic in make_dev() and friends. In the directory entry level the populate loop will also ensure a consistent view, and hence the cdev's LIST is consistent, the view presented to userspace will also be consistent. That system functions can still call into the dangling read/write/ioctl functions is another story, and that is why I tell, that in order to simplify this teardown, we possibly should associate a client selectable lock with each CDEV, for teardown purposes. Like done for several years in the callout and USB APIs and possibly many other places. > >> >>> Let me summarize: >>> - the extra reference on arbitrary cdev should be eliminated. The >>> delist_dev_locked() may add the ref and set some CDP_ flag to >>> indicate to destroy_dev() that it should do dev_rel(). >> >> It is possible to do this. I thought about this before doing my patch, >> but decided to try to avoid adding yet another cdev flag. >> >>> - the existence of the duplicated entries should be either eliminated >>> (I am not sure it is possible with your code), or we must ensure >>> that only one name with CDP_ACTIVE flag set and given name exists. >>> Then, all lookup code must be audited to take CDP_ACTIVE into account >>> when accessing names. I see at least devfs_find() and devfs_mknod() >>> which must be changed. I did not performed full audit. >> >> I will check this path out aswell. >> > > It seems it is simpler for me to try to clean up after the commit. > The patch was only lightly tested. I post the patch for discussion, > not for you to committing it. I will expedite the change into HEAD > after the consensus on it is made and adequate testing is performed. I don't see any problems about your patch, except it adds a bit more code to the kernel. --HPS