From owner-freebsd-bugs@FreeBSD.ORG Fri Mar 16 12:02:34 2007 Return-Path: X-Original-To: freebsd-bugs@freebsd.org Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1862516A401 for ; Fri, 16 Mar 2007 12:02:34 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210]) by mx1.freebsd.org (Postfix) with ESMTP id AC76E13C480 for ; Fri, 16 Mar 2007 12:02:33 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id 809695DFE4B; Fri, 16 Mar 2007 23:02:32 +1100 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 31D508C0C; Fri, 16 Mar 2007 23:02:31 +1100 (EST) Date: Fri, 16 Mar 2007 23:02:29 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Doug Ambrisko In-Reply-To: <200703150430.l2F4U7Kk069045@freefall.freebsd.org> Message-ID: <20070316222022.C34871@besplex.bde.org> References: <200703150430.l2F4U7Kk069045@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@freebsd.org Subject: Re: kern/109152: [rp] RocketPort panic from device_unbusy() X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Mar 2007 12:02:34 -0000 On Thu, 15 Mar 2007, Doug Ambrisko wrote: > Craig Leres writes: > | I was still able to crash Doug Ambrisko's version of the rp driver. > | I think the problem is that in some cases rp_handle_port() calls > | rpclose() which calls device_unbusy(). Later rpclose() is called > | again and we hit the panic. > | > | I looked at the last known good version of the driver I'd used, > | 1.45.2.2 from 4.10-RELEASE, and found it has code to avoid calling > | rpclose() twice. I did something similar that seems to work. > | > | Note: The appended diffs are against version 1.67.2.2 of rp.c. > ... > I wonder if this should be a reference count? Good find. It is already a (buggy) reference count, and that is what causes the panic. device_busy() increments the device reference count, and device_unbusy() decrements it. A panic occurs when the reference count is decremented below 0. device_busy()/device_unbusy() pairs cannot work in general, since even vfs doesn't know how to do the reference counting necessary for using them. Fortunately, only 6 drivers in -current use such pairs. These drivers are psm, bktr, drm, fdc, rp and agp. The panic is most likely to be triggered by rp, because it is the only tty driver among these, the vfs refcount problems are largest near revoke(2), and revoke()s are usually only done on tty devices. revoke() sometimes causes a last close on a device that is already closed or in the process of being closed (driver blocked in it close function). The result in naive drivers like rp, that do an unconditional device_busy() in their open function and an unconditional device_unbusy() in their close function, is the device refcount going below 0 on the doubled close. Conditional busy/unbusy calls are little better. vfs doesn't know the correct vfs refcount, and device drivers cannot know what vfs is doing better than vfs (they can only know better what they are doing, but the problem is partly internal). Naive unconditional calls to device_busy()/device_unbusy() in a driver's open/close functions also cannot work. It is possible for revoke() to be called while a driver's i/o/ioctl functions are active. revoke() then calls close() to cancel activity on the device. Most drivers are not aware of the possibility of close being called on active devices, so they don't do anything to force the i/o/ioctl functions to finish before the close, and it is quite likely for the close to complete first. Then calling device_unbusy() for the close doesn't panic, but is wrong when it decrements the device refcount to 0, since the device is still busy in the i/o/ioctl functions at that point. Panics might occur would an uncontrolled way if upper layers actually used the device refcount alone to device whether to unload the device, but I think the device refcount is unused except for invariants checking (not under INVARIANTS) in device_unbusy(). Since device_busy()/unbusy() are unusable, the invariants checking reduces to checking that they are never used in cases where they have an effect :-). I sent mail to Craig about different details of the bugs. The preceding paragraph (about why device_unbusy() cannot be called unconditionally in device close, since being closed doesn't imply being unbusied) is the best argument for never using these functions. Bruce