Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Feb 2008 10:45:36 +0100
From:      junics-fbsdstable@atlantis.maniacs.se
To:        Joe Peterson <joe@skyrush.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Analysis of disk file block with ZFS checksum error
Message-ID:  <47B2BC40.90404@atlantis.maniacs.se>
In-Reply-To: <47B0A45C.4090909@skyrush.com>
References:  <47ACD7D4.5050905@skyrush.com>		<D6B0BBFB-D6DB-4DE1-9094-8EA69710A10C@apple.com>		<47ACDE82.1050100@skyrush.com>		<20080208173517.rdtobnxqg4g004c4@www.wolves.k12.mo.us>		<47ACF0AE.3040802@skyrush.com>	<1202747953.27277.7.camel@buffy.york.ac.uk> <47B0A45C.4090909@skyrush.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Joe Peterson wrote:

*cut*
> I suppose the best ZFS could then do is retry the write (if its
> failure was even detected - still not sure if ZFS does a re-check of the
> disk data checksum after the disk write), not knowing until the later
> scrub that the block had corrupted a file.
>   
*cut*

Disclaimer: I have only experimented with ZFS in a VM and read much of 
the documentation, but never used it "properly". Please correct me if i 
am wrong.

1) If it where able to verify written data directly after a write, then 
it would probably be an optional feature. I don't recall such an option 
when I experimented, nor can i find it in the online man pages.... (DOS 
actually had something like: set verify=on)
2) It would cause a lot of head seeking and killing performance, unless 
queued into an elevator seek batch job when the disks are idle. 
(Wikipedia: Elevator_algorithm)
3) It would need to disable all disk read caching to really verify what 
was written to the surface correctly. Probably a complex problem 
considering all the different types of hardware out there, also in 
keeping ZFS portable.
4) ZFS is designed to be run in a redundant configuration, so once it 
reads the bad block on request or scrub then it would be able to 
overwrite the bad block from the redundant data. (See details on self 
healing in the ZFS docs)
4.1) If your ZFS is up to date then you could probably set the copies=2 
parameter on the mount point and do a "poor mans raid1", if it is a 
hardware problem that is... _All_ metadata is already written at least 
twice, even in a single disk configuration. I think it will try to keep 
the blocks apart 1/8 of the total space.
4.2) Overwriting bad blocks plays nice with internal disk sector 
relocation. Pending sectors in smartctl -a is a thing of the past :)

I actually have two bad disks that i probably will try it on, once 7.0 
is released. They are heat damaged so bad sectors are popping up 
semi-frequently.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47B2BC40.90404>