From owner-svn-src-head@FreeBSD.ORG Fri Dec 7 23:48:55 2012 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0C5B7953; Fri, 7 Dec 2012 23:48:55 +0000 (UTC) (envelope-from ken@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) by mx1.freebsd.org (Postfix) with ESMTP id DAF798FC12; Fri, 7 Dec 2012 23:48:54 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.5/8.14.5) with ESMTP id qB7Nms2x013176; Fri, 7 Dec 2012 23:48:54 GMT (envelope-from ken@svn.freebsd.org) Received: (from ken@localhost) by svn.freebsd.org (8.14.5/8.14.5/Submit) id qB7NmsBL013174; Fri, 7 Dec 2012 23:48:54 GMT (envelope-from ken@svn.freebsd.org) Message-Id: <201212072348.qB7NmsBL013174@svn.freebsd.org> From: "Kenneth D. Merry" Date: Fri, 7 Dec 2012 23:48:54 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r244001 - head/sys/cam X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 23:48:55 -0000 Author: ken Date: Fri Dec 7 23:48:54 2012 New Revision: 244001 URL: http://svnweb.freebsd.org/changeset/base/244001 Log: Fix a panic during CAM EDT traversal. The problem was a race condition between the EDT traversal used by things like 'camcontrol devlist', and CAM peripheral driver removal. The EDT traversal code holds the CAM topology lock, and wants to show devices that have been invalidated. It acquires a reference to the peripheral to make sure the peripheral it is examining doesn't go away. However, because the peripheral removal code in camperiphfree() drops the CAM topology lock to call the peripheral's destructor routine, we can run into a situation where the EDT traversal increments the peripheral reference count after free process is already in progress. At that point, the reference count is ignored, because it was 0 when we started the process. Fix this race by setting a flag, CAM_PERIPH_FREE, that I previously added and checked in xptperiphtraverse() and xptpdperiphtravsere(), but failed to use. If the EDT traversal code sees that flag, it will know that the peripheral free process has already started, and that it should not access that peripheral. Also, fix an inconsistency in the locking between xptpdperiphtraverse() and xptperiphtraverse(). They now both hold the CAM topology lock while calling the peripheral traversal function. cam_xpt.c: Change xptperiphtraverse() to hold the CAM topology lock across calls to the traversal function. Take out the comment in xptpdperiphtraverse() that referenced the locking inconsistency. cam_periph.c: Set the CAM_PERIPH_FREE flag when we are in the process of freeing a peripheral driver. Sponsored by: Spectra Logic Corporation MFC after: 1 week Modified: head/sys/cam/cam_periph.c head/sys/cam/cam_xpt.c Modified: head/sys/cam/cam_periph.c ============================================================================== --- head/sys/cam/cam_periph.c Fri Dec 7 23:18:30 2012 (r244000) +++ head/sys/cam/cam_periph.c Fri Dec 7 23:48:54 2012 (r244001) @@ -615,6 +615,14 @@ camperiphfree(struct cam_periph *periph) } /* + * We need to set this flag before dropping the topology lock, to + * let anyone who is traversing the list that this peripheral is + * about to be freed, and there will be no more reference count + * checks. + */ + periph->flags |= CAM_PERIPH_FREE; + + /* * The peripheral destructor semantics dictate calling with only the * SIM mutex held. Since it might sleep, it should not be called * with the topology lock held. Modified: head/sys/cam/cam_xpt.c ============================================================================== --- head/sys/cam/cam_xpt.c Fri Dec 7 23:18:30 2012 (r244000) +++ head/sys/cam/cam_xpt.c Fri Dec 7 23:48:54 2012 (r244001) @@ -2178,8 +2178,8 @@ xptperiphtraverse(struct cam_ed *device, * invalidated, but not peripherals that are scheduled to * be freed. So instead of calling cam_periph_acquire(), * which will fail if the periph has been invalidated, we - * just check for the free flag here. If it is free, we - * skip to the next periph. + * just check for the free flag here. If it is in the + * process of being freed, we skip to the next periph. */ if (periph->flags & CAM_PERIPH_FREE) { next_periph = SLIST_NEXT(periph, periph_links); @@ -2192,16 +2192,9 @@ xptperiphtraverse(struct cam_ed *device, */ periph->refcount++; - xpt_unlock_buses(); - retval = tr_func(periph, arg); /* - * We need the lock for list traversal. - */ - xpt_lock_buses(); - - /* * Grab the next peripheral before we release this one, so * our next pointer is still valid. */ @@ -2283,11 +2276,6 @@ xptpdperiphtraverse(struct periph_driver */ periph->refcount++; - /* - * XXX KDM we have the toplogy lock here, but in - * xptperiphtraverse(), we drop it before calling the - * traversal function. Which is correct? - */ retval = tr_func(periph, arg); /*