From owner-freebsd-current@FreeBSD.ORG Thu Apr 16 19:16:58 2009 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DADB9106566C for ; Thu, 16 Apr 2009 19:16:58 +0000 (UTC) (envelope-from shuvaev@physik.uni-wuerzburg.de) Received: from mailrelay.rz.uni-wuerzburg.de (mailrelay.rz.uni-wuerzburg.de [132.187.3.28]) by mx1.freebsd.org (Postfix) with ESMTP id 62C8D8FC12 for ; Thu, 16 Apr 2009 19:16:58 +0000 (UTC) (envelope-from shuvaev@physik.uni-wuerzburg.de) Received: from virusscan.mail (localhost [127.0.0.1]) by mailrelay.mail (Postfix) with ESMTP id A928CA07A2; Thu, 16 Apr 2009 20:47:39 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by virusscan.mail (Postfix) with ESMTP id 9CFC7A079D; Thu, 16 Apr 2009 20:47:39 +0200 (CEST) Received: from mail.physik.uni-wuerzburg.de (wthp192.physik.uni-wuerzburg.de [132.187.40.192]) by mailmaster.uni-wuerzburg.de (Postfix) with ESMTP id 89F47A079B; Thu, 16 Apr 2009 20:47:39 +0200 (CEST) Received: from wep4035 ([132.187.37.35]) by mail.physik.uni-wuerzburg.de (Lotus Domino Release 8.0.2HF443) with ESMTP id 2009041620473829-7825 ; Thu, 16 Apr 2009 20:47:38 +0200 Received: by wep4035 (sSMTP sendmail emulation); Thu, 16 Apr 2009 20:47:38 +0200 Date: Thu, 16 Apr 2009 20:47:38 +0200 From: Alexey Shuvaev To: John Baldwin Message-ID: <20090416184738.GA60409@wep4035.physik.uni-wuerzburg.de> References: <200904161336.18557.jhb@freebsd.org> MIME-Version: 1.0 In-Reply-To: <200904161336.18557.jhb@freebsd.org> Organization: Universitaet Wuerzburg User-Agent: Mutt/1.5.18 (2008-05-17) X-MIMETrack: Itemize by SMTP Server on domino1/uni-wuerzburg(Release 8.0.2HF443 | November 25, 2008) at 04/16/2009 08:47:38 PM, Serialize by Router on domino1/uni-wuerzburg(Release 8.0.2HF443 | November 25, 2008) at 04/16/2009 08:47:38 PM, Serialize complete at 04/16/2009 08:47:38 PM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: by amavisd-new at uni-wuerzburg.de Cc: current@FreeBSD.org Subject: Re: [PATCH] Possible fix to recent data corruption on HEAD since USB2 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Apr 2009 19:17:01 -0000 On Thu, Apr 16, 2009 at 01:36:18PM -0400, John Baldwin wrote: > Due to some good sleuthing by avg@, > there is a patch that might fix the recent > reports of data corruption on current. It would explain some of the recent > reports where a file that was read would have missing gaps of bytes. The > problem is with the BUS_DMA_KEEP_PG_OFFSET changes to bus_dma. When a bounce > page was used by USB2, the changes to bus_dma would actually change the > starting virtual and physical addresses of the bounce page. When the bounce > page was no longer needed it was left in this bogus state. Later if another > device used the same bounce page for DMA it would use the wrong offset and > address. The issue there is if the second device was doing a full page of > I/O. In that case the DMA from the device would actually spill over into the > next page which could in theory be used by another DMA request. It could > also break alignment assumptions (since the previous PG_OFFSET may not be > aligned and the bus_dma code assumes bounce pages for the !PG_OFFSET case are > page aligned). The quick fix is to always restore the bounce page to the > normal state when a PG_OFFSET DMA request is finished. I'd actually prefer > not ever touching the page's starting addresses, but those changes would be > more invasive I believe. > > http://www.FreeBSD.org/~jhb/patches/dma_sg.patch > Am I right that hardware prerequisite in order to observe these problems is amd64 + 4Gb or more of RAM? Is it possible to fabricate some (artificial) test case to stress this particular situation (interleaved use of bounce pages by USB and some other device (?HDD?))? Asking because as I understand the data corruption is silent and affected consumer (of bounce pages) should have some mechanism of detecting this (e.g. zfs' CRCs). In my case stess testing unpatched system till UFS filesystems are dead is no fun... Alexey.