From owner-freebsd-arch@FreeBSD.ORG Sat Mar 15 16:55:38 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A27F91065675 for ; Sat, 15 Mar 2008 16:55:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au [211.29.132.197]) by mx1.freebsd.org (Postfix) with ESMTP id 331C28FC1F for ; Sat, 15 Mar 2008 16:55:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m2FGtIjX022004 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 16 Mar 2008 03:55:21 +1100 Date: Sun, 16 Mar 2008 03:55:18 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ed Schouten In-Reply-To: <20080315124008.GF80576@hoeg.nl> Message-ID: <20080316015903.N39516@delplex.bde.org> References: <20080315124008.GF80576@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Mar 2008 16:55:38 -0000 On Sat, 15 Mar 2008, Ed Schouten wrote: > The last couple of days I'm seeing some strange things in my mpsafetty > branch related to terminal revocation. > > In my current TTY design, I hold a count (t_ldisccnt) of the amount of > threads that are sleeping in the line discipline. I need to store such a > count, because it's not possible to change line disciplines while some > threads are still blocked inside the discipline. This means that when > d_close() is called on a TTY, t_ldisccnt should always be 0. There > cannot be any threads stuck inside the line discipline when there aren't > any descriptors referencing it. > > Unfortunately, this isn't entirely true with the current VFS/devfs > design. When vgone() is called, a VOP_CLOSE() is performed , which means > there could be a dozen threads still stuck inside a device driver, but > the close routine is already called to clean up stuff. There are a > *real* lot of drivers that blindly clean up their stuff in the d_close() > routine, expecting that the device is completely unused. This can > easily be demonstrated by revoking a bpf device, while running tcpdump. Yes, most drivers are broken here, but the problem is rarely noticed because revoke() isn't normally applied to any devices except ttys. Even ordinary close() can cause problems when a thread is sleeping in device open, but this too is only common for ttys (for callin and callout devices). The tty driver is about the only driver that handles this problem almost correctly. It uses a generation count. All tty drivers are supposed to sleep using only ttysleep(). ttysleep() checks the generation count and returns ERESTART if the generation is new. All tty drivers should consider this error to be fatal and propagate it up to the syscall level where the syscall is restarted. This tends to happen naturally, but some places (in device close IIRC), the driver ignores the error and does more i/o (to finish cleaning up in close -- close and open can easily pass each other and clobber each others state when this happens). More I/O also tends to occur if a revoke() happens when a thread is blocked but not sleeping. Then ttysleep() isn't in sight, so the thread has no idea that the generation count changed. Giant locking limits this problem. > To be honest, I'm not completely sure how to solve this issue, though I > know it should at least do something similar to this: > > - The device driver should have a seperate routine (d_revoke) to wake > up any blocked threads, to make sure they leave the device driver > properly. Something is needed to signal blocked but non-sleeping threads. I think the wakeup for ttys now normally occurs as a side effect of flushing i/o. revoke() normally calls ttyclose() which calls ttyldclose() which normally calls ttylclose() which flushes i/o which wakes up threads waiting on the i/o. I don't see how ttylclose() can work right in the usual !FNONBLOCK case. Maybe revoke() sets FNONBLOCK. The generation count stuff doesn't help here because the flush is done before incrementing the generation count. There is an obvious race here for threads doing i/o instead of waiting for it. These muse be blocked (by Giant now for ttys, or by your MPSAFE locking). They will run when revoke() releases the lock and find the i/o flushed and maybe the generation count incrememented, but they normally won't check these states and will just blunder on doing more i/o. It would be painful to check these states every time the lock is aquired, but this seems to be necessary. Magic Giant locking makes the places where the lock is acquired hard see. > - Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the > vnode into deadfs, with the exception of the close() routine. Maybe > it's better to add a new function to do this, vrevoke(). > > This means that when a revoke() call is performed, all blocked threads > are woken up, will leave the driver, to find out their terminal has been > revoked. Further system calls will fail, because the vnode is in deadfs, > but when the processes close the descriptor, the device driver can still > clean up everything. I think vfs already moves the vnode to deadfs. It doesn't do anything to synchronize with threads running in device drivers. The forced last-close() should complete synchronously as part of revoke(). Then other threads leave the device driver asynchronously, hopefully not much later. Then if the generation count stuff is working right, the syscall is restarted, but now file descriptors point to deadfs so the syscall normally fails. I think the async completion is OK provided it is done right (don't delay it indefinitely, and don't do more i/o on completion). It doesn't seem to be useful to make revoke() wait for the completions. I don't think it would work well to move everything except d_close to deadfs. Other problems near here: - neither vfs nor drivers currently know how many threads are in a driver. vfs uses vp->v_rdev->si_usecount, but this doesn't quite work since it doesn't count threads sleeping in open. Maybe ones excuting last close too -- this would be more of a problem. revoke() just uses vcount(), which just acquires the device locks and returns si_usecount after releasing the device lock. (I don't understand this locking -- what stops the count changing after the lock is released, or if it cannot changed then why acquire the lock?) This can result in revoke() not calling device close when it should. Drivers can obviously keep count of their activities using large code. I can't see any way for vfs to keep count short of asking drivers for their counts. - there can be any number of threads in device open and close concurrently, even without the complications for revoke(). The most problematic cases happen when last-close blocks, as is common for ttys waiting for output to drain (since no one cares about their output actually working and ensures draining it using tcdrain() -- normal losing programs finish up with something like printf(); exit(); and depend on the close() in exit(); blocking to drain the output). Then new opens are allowed, and this is useful for doing ioctls() to unblocked blocked closes. If the new open or fcntl sets non-blocking mode, then the last-close for the new open may pass the blocked last close. If the new mode is blocking, then the last-close for the new open may block too. The number of threads in last-close is thus unlimited. A thundering herd of them tends to stomp on each other when they are all unblocked at the same time. The connections of this with revoke() are: - it takes vfs's not counting of all threads in the device driver to allow the useful behaviour of opens while a close is blocked and the necessary behaviour of last-close while another last-close is executing (drivers should be aware of this possibility and merge the closes, but don't). - I think revoke() sets FNONBLOCK somewhere. Thus it tends to unblock any thread waiting in last-close for output to drain. Less problematic cases occur when opens block. ttyopen() understands this possibility and handles it almost right using its t_wopeners count. ttyopen() uses various sleeps where it should use ttysleep() or check the generation count itself; this results in it looping internally instead of restarting the syscall, which is only a small error since for open() alone, restarting the syscall would call back to the same non-dead device open except in unusual cases where there was a signal and syscalls are not restarted, or the device name went away. There is still a problem with the vfs usage counting -- in one case involving callin and callout devices whose details I forget, last-close is not called when it needs to be called to wake up all the threads sleeping in open so that they can enter a new state. Bruce