Date: Fri, 16 Jan 2015 10:00:11 +0100 From: Hans Petter Selasky <hps@selasky.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern Message-ID: <54B8D31B.9030805@selasky.org> In-Reply-To: <20150116080332.GE42409@kib.kiev.ua> References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org> <20150116080332.GE42409@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Konstantin, On 01/16/15 09:03, Konstantin Belousov wrote: > On Thu, Jan 15, 2015 at 01:14:39PM +0100, Hans Petter Selasky wrote: >> On 01/15/15 12:51, Konstantin Belousov wrote: >>> On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote: >>>> >>>> I see no leakage in that case either! >>> Because these cases come through destroy_dev(). >>> >>>> >>>> Are there more cases which I don't see? >>> You are breaking existig devfs KPI by your hack. You introduce yet another >>> reference on the device, which is not supposed to be there. >> >> Hi Konstantin, >> >> I need a non-sleeping way to say a character device is no longer >> supposed to be used and be able to re-use the device name right away >> creating a new device. I guess you got that. > Yes, I got it. The devfs design is not suitable for this, and your > hack is the good witness of the fact. > > My opinion is that you should have tried to handle the issue at the driver > level, instead of making this devfs issue. I.e., if you already have > cdev node with the correct name, driver should have replaced the private > data to point to new device. I think this way cannot be implemented in a clean way, because of locking order reversal. And if you try to avoid the LOR you end up with the race. Chess mate sort of ;-) Let me explain: Thread 1: usb_sx_lock(); cdev = Look in freelist for existing device(); else cdev = make_dev(); usb_sx_unlock(); Thread 2: usb_sx_lock(); put cdev on freelist usb_sx_unlock(); Thread 3: usb_sx_lock(); cdev = remove first entry in freelist usb_sx_unlock(); /* * XXX because USB needs to call destroy_dev() unlocked we * are racing with Thread 1 again */ destroy_dev(cdev); > > This would also close a window where /dev node is non-existent or operate > erronously. I'm not saying I plan so, but I think "cdevs" at some point need to understand mutexes and locks. That means like with other API's in the kernel we can associate a lock with the "cdev", and this lock is then used to ensure an atomic shutdown of the system in a non-blocking fashion. In my past experience multithreaded APIs should be high level implemented like this: NON-BLOCKING methods: lock(); ** xxx_start(); xxx_stop(); unlock(); BLOCKING methods: setup(); // init unsetup(); // drain Any callbacks should always be called locked ** In devfs there was no non-blocking stop before I added the delist_dev() function. > > You do not understand my point. > > I object against imposing one additional global reference on all cdevs > just to cope with the delist hack. See the patch at the end of the message. It's fine by me. > > WRT destroy_dev_sched_cb() calling delist_dev(), even after calling > delist_dev(), the node still exists in the /dev. It is only removed > after populate loop is run sometime later. dev_sched() KPI is inheritly > racy, drivers must handle the races for other reasons. The populate loop is all running under the dev_lock() from what I can see and make_dev() is also keeping the same lock when inserting and removing new cdevs. The populate loop should always give a consistent view of the character devices available, and I don't see how "cdev" structures without the CDP_ACTIVE flag can appear with recently created ones, even if the name is the same. > >> >>> Entry can be random, since after the populate loop is ran, your code in >>> other thread could start and create duplicate entry. There is a window >>> in the lookup where both directory vnode lock and mount point sx locks >>> are dropped. So running the populate does not guarantee anything. >> >> If there is such a race, it is already there! My patch changes nothing >> in that area: >> >> Thread1: >> Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes >> waiting for some refs for xxx milliseconds. >> Thread2: >> Tries to create create a new character device having the same name like >> the one in thread1. Device name duplication check is missed because >> CDP_ACTIVE is cleared. Still thread1 is waiting. >> Thread3: >> Tries to open character device created by thread2 while thread1 is still >> waiting for some ref held by a userspace app to go away. >> >> This can happen already before my patches! What do you think? > Possibly. > >> At what level do you mean duplicate names, I don't get this fully? At >> the directory level (DE nodes)? Or inside the list of character devices >> (LIST_XXX)? > It does not matter, dup at either one directory level, or dup of full > names in the global list are equivalent (bad) things. Like I write above I don't see where the problem is. At the cdev level, we are protecting the cdev's LIST with dev_lock() and only one entry will exist having CDP_ACTIVE bit set per unique cdev name and path. Else we will hit a panic in make_dev() and friends. In the directory entry level the populate loop will also ensure a consistent view, and hence the cdev's LIST is consistent, the view presented to userspace will also be consistent. That system functions can still call into the dangling read/write/ioctl functions is another story, and that is why I tell, that in order to simplify this teardown, we possibly should associate a client selectable lock with each CDEV, for teardown purposes. Like done for several years in the callout and USB APIs and possibly many other places. > >> >>> Let me summarize: >>> - the extra reference on arbitrary cdev should be eliminated. The >>> delist_dev_locked() may add the ref and set some CDP_ flag to >>> indicate to destroy_dev() that it should do dev_rel(). >> >> It is possible to do this. I thought about this before doing my patch, >> but decided to try to avoid adding yet another cdev flag. >> >>> - the existence of the duplicated entries should be either eliminated >>> (I am not sure it is possible with your code), or we must ensure >>> that only one name with CDP_ACTIVE flag set and given name exists. >>> Then, all lookup code must be audited to take CDP_ACTIVE into account >>> when accessing names. I see at least devfs_find() and devfs_mknod() >>> which must be changed. I did not performed full audit. >> >> I will check this path out aswell. >> > > It seems it is simpler for me to try to clean up after the commit. > The patch was only lightly tested. I post the patch for discussion, > not for you to committing it. I will expedite the change into HEAD > after the consensus on it is made and adequate testing is performed. I don't see any problems about your patch, except it adds a bit more code to the kernel. --HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B8D31B.9030805>