From owner-freebsd-current@FreeBSD.ORG Thu Apr 16 20:03:25 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A41A106566B for ; Thu, 16 Apr 2009 20:03:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4B17D8FC24 for ; Thu, 16 Apr 2009 20:03:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id F2B2D46B0C; Thu, 16 Apr 2009 16:03:24 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id C9DEA8A01B; Thu, 16 Apr 2009 16:03:23 -0400 (EDT) From: John Baldwin To: Alexey Shuvaev Date: Thu, 16 Apr 2009 15:58:56 -0400 User-Agent: KMail/1.9.7 References: <200904161336.18557.jhb@freebsd.org> <20090416184738.GA60409@wep4035.physik.uni-wuerzburg.de> In-Reply-To: <20090416184738.GA60409@wep4035.physik.uni-wuerzburg.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200904161558.56919.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 16 Apr 2009 16:03:23 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=0.1 required=4.2 tests=RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: current@freebsd.org Subject: Re: [PATCH] Possible fix to recent data corruption on HEAD since USB2 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Apr 2009 20:03:25 -0000 On Thursday 16 April 2009 2:47:38 pm Alexey Shuvaev wrote: > On Thu, Apr 16, 2009 at 01:36:18PM -0400, John Baldwin wrote: > > Due to some good sleuthing by avg@, > > there is a patch that might fix the recent > > reports of data corruption on current. It would explain some of the recent > > reports where a file that was read would have missing gaps of bytes. The > > problem is with the BUS_DMA_KEEP_PG_OFFSET changes to bus_dma. When a bounce > > page was used by USB2, the changes to bus_dma would actually change the > > starting virtual and physical addresses of the bounce page. When the bounce > > page was no longer needed it was left in this bogus state. Later if another > > device used the same bounce page for DMA it would use the wrong offset and > > address. The issue there is if the second device was doing a full page of > > I/O. In that case the DMA from the device would actually spill over into the > > next page which could in theory be used by another DMA request. It could > > also break alignment assumptions (since the previous PG_OFFSET may not be > > aligned and the bus_dma code assumes bounce pages for the !PG_OFFSET case are > > page aligned). The quick fix is to always restore the bounce page to the > > normal state when a PG_OFFSET DMA request is finished. I'd actually prefer > > not ever touching the page's starting addresses, but those changes would be > > more invasive I believe. > > > > http://www.FreeBSD.org/~jhb/patches/dma_sg.patch > > > Am I right that hardware prerequisite in order to observe these problems > is amd64 + 4Gb or more of RAM? Well, i386 with PAE would do it as well. Basically, you need USB + one other device that use bounce pages and the other device ends up with corruption. > Is it possible to fabricate some (artificial) test case to stress this > particular situation (interleaved use of bounce pages by USB and some other > device (?HDD?))? I haven't constructed one though it might be possible to do so. > Asking because as I understand the data corruption is silent > and affected consumer (of bounce pages) should have some mechanism > of detecting this (e.g. zfs' CRCs). > In my case stess testing unpatched system till UFS filesystems are dead > is no fun... Understood. I know some other folks are going to test this and if there is early success that may make the risk easier to take. -- John Baldwin