From owner-svn-src-head@FreeBSD.ORG Thu Aug 22 14:21:09 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 25C9945E; Thu, 22 Aug 2013 14:21:09 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E88AE20F1; Thu, 22 Aug 2013 14:21:08 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r7MEL7pR050671; Thu, 22 Aug 2013 08:21:07 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r7MEL7qi050670; Thu, 22 Aug 2013 08:21:07 -0600 (MDT) (envelope-from ken) Date: Thu, 22 Aug 2013 08:21:07 -0600 From: "Kenneth D. Merry" To: Dmitry Morozovsky Subject: Re: svn commit: r254615 - head/sys/dev/mps Message-ID: <20130822142107.GA49996@nargothrond.kdm.org> References: <201308212130.r7LLUvO5008991@svn.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2i Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Aug 2013 14:21:09 -0000 On Thu, Aug 22, 2013 at 16:42:41 +0400, Dmitry Morozovsky wrote: > Ken, > > On Wed, 21 Aug 2013, Kenneth D. Merry wrote: > > > Author: ken > > Date: Wed Aug 21 21:30:56 2013 > > New Revision: 254615 > > URL: http://svnweb.freebsd.org/changeset/base/254615 > > > > Log: > > Fix mps(4) driver breakage that came in in change 253550 that > > manifested itself in out of chain frame conditions. > > > > When the driver ran out of chain frames, the request in question > > would get completed early, and go through mpssas_scsiio_complete(). > > > > In mpssas_scsiio_complete(), the negation of the CAM status values > > (CAM_STATUS_MASK | CAM_SIM_QUEUED) was ORed in instead of being > > ANDed in. This resulted in a bogus CAM CCB status value. This > > didn't show up in the non-error case, because the status was reset > > to something valid (e.g. CAM_REQ_CMP) later on in the function. > > > > But in the error case, such as when the driver ran out of chain > > frames, the CAM_REQUEUE_REQ status was ORed in to the bogus status > > value. This led to the CAM transport layer repeatedly releasing > > the SIM queue, because it though that the CAM_RELEASE_SIMQ flag had > > been set. The symptom was messages like this on the console when > > INVARIANTS were enabled: > > > > xpt_release_simq: requested 1 > present 0 > > xpt_release_simq: requested 1 > present 0 > > xpt_release_simq: requested 1 > present 0 > > what is real impact of the bug? Your system will essentially hang, certainly as far as anything connected to the controller in question. > > > > mps_sas.c: In mpssas_scsiio_complete(), use &= to take status > > bits out. |= adds them in. > > > > In the error case in mpssas_scsiio_complete(), set > > the status to CAM_REQUEUE_REQ, don't OR it in. > > > > MFC after: 3 days > > This patch does not apply cleanly as r253550 had not been merged, and the first > masking does not occur on contemporary stable/9. Comments? As far as I know, this is not a problem on the version of the driver in stable/9. But then again, I have not tested the out of chain frames code since early 2011 when I last fixed it. If you want to verify the behavior is correct in stable/9, do this: 1. enable INVARIANTS 2. In /boot/loader.conf: hw.mps.max_chains=32 3. Use up most of your memory. If you're using ZFS, just do a sequential write to a file so that the ARC starts filling up with cached data. Look at the free memory in top to see how much you've used. This will cause enough fragmentation to lead to more scatter/gather segments getting used in the driver. 4. Do something like this: ((i=0)); while [ $i -lt 60 ]; do dd if=/dev/da0 of=/dev/null bs=1m & ((i++)); done 5. Look for an out of chain frames message on the console. To see how far you are towards using the chain frames, run 'sysctl dev.mps'. You can see how many chain frames you have free, and how many requests have failed. This change just needs to be merged along with the other changes to avoid having the regression in stable. Ken -- Kenneth Merry ken@FreeBSD.ORG