From owner-freebsd-usb@FreeBSD.ORG Sun Jan 27 18:44:36 2008 Return-Path: Delivered-To: freebsd-usb@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FDFF16A420 for ; Sun, 27 Jan 2008 18:44:36 +0000 (UTC) (envelope-from henrik@gulbra.net) Received: from av10-1-sn2.hy.skanova.net (av10-1-sn2.hy.skanova.net [81.228.8.181]) by mx1.freebsd.org (Postfix) with ESMTP id F0A9D13C46E for ; Sun, 27 Jan 2008 18:44:35 +0000 (UTC) (envelope-from henrik@gulbra.net) Received: by av10-1-sn2.hy.skanova.net (Postfix, from userid 502) id 860A238800; Sun, 27 Jan 2008 19:44:34 +0100 (CET) Received: from smtp4-1-sn2.hy.skanova.net (smtp4-1-sn2.hy.skanova.net [81.228.8.92]) by av10-1-sn2.hy.skanova.net (Postfix) with ESMTP id 5C20D38311; Sun, 27 Jan 2008 19:44:34 +0100 (CET) Received: from [192.168.0.2] (81-235-156-87-no29.tbcn.telia.com [81.235.156.87]) by smtp4-1-sn2.hy.skanova.net (Postfix) with ESMTP id 8C89437E48; Sun, 27 Jan 2008 19:44:33 +0100 (CET) From: Henrik Gulbrandsen To: "M. Warner Losh" In-Reply-To: <20080126.204038.2076808019.imp@bsdimp.com> References: <200801260034.m0Q0YVVD012819@freefall.freebsd.org> <1201348494.2277.96.camel@Particle> <20080127032955.GI53741@server.vk2pj.dyndns.org> <20080126.204038.2076808019.imp@bsdimp.com> Content-Type: text/plain Date: Sun, 27 Jan 2008 19:44:19 +0100 Message-Id: <1201459459.2277.288.camel@Particle> Mime-Version: 1.0 X-Mailer: Evolution 2.12.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: freebsd-usb@freebsd.org, oliver@freebsd.org Subject: usb/46176: Background information X-BeenThere: freebsd-usb@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD support for USB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Jan 2008 18:44:36 -0000 As promised, here is some extra background information to help people understand my latest patch attempt for usb/46176, just to make you a little less nervous when you look at it. I will post a link to this message at the usual place ( http://www.gulbra.net/freebsd-usb/ ), so you can find it when it's needed. Now, back to our regular program: On Sat, 2008-01-26 at 20:40 -0700, M. Warner Losh wrote: > And my time this month is very limited, as will next month's because > work is being insane. I appreciate the fact that you still take the time to comment, since I would normally get a long email backlog under similar circumstances. Anyway, whenever you or someone else gets the time to think about this again, it might be a good idea to have a look at the scsi_da.c patch. This email will give some extra motivation for it. My memory is already starting to fade, but according to my handwritten notes, the chain of events is as follows: The umass detach correctly triggers the following sequence: umass_cam_detach_sim(umass_softc *sc) xpt_bus_deregister(cam_sim_path(sc->umass_sim)) cam_sim_free(ss->umass_sim, TRUE) ...and the only serious bug remaining is that a later unmount will try to close the da device via daclose, which will access the SIM that has already been released. Actually, it calls cam_periph_lock(), which will try to lock a mutex located in the SIM struct. Now, as you may recall, the SCSI subsystem has actually been designed to handle disappearing devices, presumably for things like hot swapping, so the xpt_bus_deregister() call has already invoked this call chain: xpt_bus_deregister(cam_sim_path(sc->umass_sim)) xpt_async(AC_LOST_DEVICE, &bus_path, NULL) daasync(...) cam_periph_async(...) cam_periph_invalidate(...) daoninvalidate(...) disk_gone(softc->disk) [in geom_disk.c] xpt_print(periph->path, "lost device\n") The xpt_print(...) gives us some feedback to the /var/log/messages file, so we know where we are in the process. Under normal circumstances, when there is no mounted file system to interfere with things, the call stack would return to cam_periph_invalidate() and do the following: cam_periph_invalidate(...) camperiphfree(periph) dacleanup(periph) [called as periph->periph_dtor(periph) ] xpt_print(periph->path, "removing device entry\n"); disk_destroy(softc->disk) [in geom_disk.c] ...shimmering GEOM magic... At this point, the disk is gone, and the periph struct will be released. Unfortunately, in our case the file system still holds a reference to the periph, which keeps it from being freed in cam_periph_invalidate(). Instead, we end up with an existing device that holds an invalid pointer to the already released SIM struct, ultimately leading to system panic. My patch for this problem simply adds an extra explicit daclose() call, immediately after the "lost device" printing listed above, and sets the periph pointer in our disk struct to NULL. Some of you may be worried about that disk_destroy() call. At least, it suddenly made me worried, so I just double-checked it. It's true that having a destroyed disk is potentially dangerous, since the file system will normally access this during the unmount. However, it turns out that GEOM is already prepared for this case and has a special check in g_disk_access(), so the second call to daclose() will never actually be made. The alternative option would have been for GEOM to recognize the extra disk reference and keep the struct alive until daclose(). In that case, the first statement in that function would already be checking our periph and return ENXIO when it's NULL, so we would still be safe. >From a system design perspective, it would be cleaner if the daclose() call originated from the same layer as the original daopen(), but under the circumstances I don't think this is too bad. It's hard to avoid... Finally, take the above description with a grain of salt, since I had to reconstruct some parts of it that were not found in my notes. Also, when I tested the disk_destroy() stuff a few minutes ago, I realized that the msdosfs_vfsops.c patch is as vital as the scsi_da.c one, which probably means that the -o sync option is equally vital, since the msdosfs fix is intended to be restricted to that. It looks like GEOM has another ENXIO problem in this area, but I'm not planning to investigate that today... /Henrik