Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Apr 2009 14:49:57 -0500
From:      Robert Noland <rnoland@FreeBSD.org>
To:        Damian Gerow <dgerow@afflictions.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: [PATCH] Possible fix to recent data corruption on HEAD since USB2
Message-ID:  <1239997797.24514.6.camel@balrog.2hip.net>
In-Reply-To: <20090417103634.GD1186@plebeian.afflictions.org>
References:  <200904161336.18557.jhb@freebsd.org> <20090416184738.GA60409@wep4035.physik.uni-wuerzburg.de> <200904161558.56919.jhb@freebsd.org> <49E79F49.6000606@samsco.org> <20090417103634.GD1186@plebeian.afflictions.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-Pa7/Zu/6E/eJPJ7ski+p
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Fri, 2009-04-17 at 06:36 -0400, Damian Gerow wrote:
> Scott Long wrote:
> : John Baldwin wrote:
> : > On Thursday 16 April 2009 2:47:38 pm Alexey Shuvaev wrote:
> : >> On Thu, Apr 16, 2009 at 01:36:18PM -0400, John Baldwin wrote:
> : >>> Due to some good sleuthing by avg@,
> : >>> there is a patch that might fix the recent=20
> : >>> reports of data corruption on current.  It would explain some of th=
e recent=20
> : >>> reports where a file that was read would have missing gaps of bytes=
.  The=20
> : >>> problem is with the BUS_DMA_KEEP_PG_OFFSET changes to bus_dma.  Whe=
n a bounce=20
> : >>> page was used by USB2, the changes to bus_dma would actually change=
 the=20
> : >>> starting virtual and physical addresses of the bounce page.  When t=
he bounce=20
> : >>> page was no longer needed it was left in this bogus state.  Later i=
f another=20
> : >>> device used the same bounce page for DMA it would use the wrong off=
set and=20
> : >>> address.  The issue there is if the second device was doing a full =
page of=20
> : >>> I/O.  In that case the DMA from the device would actually spill ove=
r into the=20
> : >>> next page which could in theory be used by another DMA request.  It=
 could=20
> : >>> also break alignment assumptions (since the previous PG_OFFSET may =
not be=20
> : >>> aligned and the bus_dma code assumes bounce pages for the !PG_OFFSE=
T case are=20
> : >>> page aligned).  The quick fix is to always restore the bounce page =
to the=20
> : >>> normal state when a PG_OFFSET DMA request is finished.   I'd actual=
ly prefer=20
> : >>> not ever touching the page's starting addresses, but those changes =
would be=20
> : >>> more invasive I believe.
> : >>>
> : >>> http://www.FreeBSD.org/~jhb/patches/dma_sg.patch
> : >>>
> : >> Am I right that hardware prerequisite in order to observe these prob=
lems
> : >> is amd64 + 4Gb or more of RAM?
> : >=20
> : > Well, i386 with PAE would do it as well.  Basically, you need USB + o=
ne other
> : > device that use bounce pages and the other device ends up with corrup=
tion.
> : >=20
> : >> Is it possible to fabricate some (artificial) test case to stress th=
is
> : >> particular situation (interleaved use of bounce pages by USB and som=
e other
> : >> device (?HDD?))?
> : >=20
> : > I haven't constructed one though it might be possible to do so.
> : >=20
> : >> Asking because as I understand the data corruption is silent
> : >> and affected consumer (of bounce pages) should have some mechanism
> : >> of detecting this (e.g. zfs' CRCs).
> : >> In my case stess testing unpatched system till UFS filesystems are d=
ead
> : >> is no fun...
> : >=20
> : > Understood.  I know some other folks are going to test this and if th=
ere is
> : > early success that may make the risk easier to take.
> : >=20
> :=20
> : I have pretty high confidence that John and Andriy found the problem an=
d
> : fixed it with this patch.  It'll be good to get it tested, but I think
> : that the risk to tester will be pretty low.
>=20
> Having been running the patch for sixteen hours now, I can safely say tha=
t
> it fixes my issues.

I think that I agree... I crashed my amd64 box a few times last night
and haven't had massive damage, which is refreshing... I haven't been
brave enough to panic with more than usb keyboard though...

robert.

>   - Damian
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org=
"
--=20
Robert Noland <rnoland@FreeBSD.org>
FreeBSD

--=-Pa7/Zu/6E/eJPJ7ski+p
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (FreeBSD)

iEUEABECAAYFAkno3WUACgkQM4TrQ4qfROMX4wCeJU/Z6Xu9IlQk1r9TpEc2el3L
a40AmLViDHujdB2CSw9DN9C643q7nq0=
=oVgP
-----END PGP SIGNATURE-----

--=-Pa7/Zu/6E/eJPJ7ski+p--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1239997797.24514.6.camel>