Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Nov 2013 14:30:51 +0000
From:      Frank Leonhardt <frank2@fjl.co.uk>
To:        freebsd-questions@freebsd.org
Subject:   Re: rare, random issue with read(), mmap() failing to read entire file
Message-ID:  <5288D31B.2040105@fjl.co.uk>
In-Reply-To: <9CB46A22C0BE40029652144B2586462A@d40>
References:  <9CB46A22C0BE40029652144B2586462A@d40>

next in thread | previous in thread | raw e-mail | index | archive | help
On 16/11/2013 02:56, John Refling wrote:
>   
>
> I'm having some very insidious issues with copying and verifying (identical)
> data from several hard disks.  This might be a hardware issue or something
> very deep in the disk / filesystem code.  I have verified this with several
> disks and motherboards.  It corrupts 0.0096% of my files, different files
> each time!
>
>   
>
> Background:
>
>   
>
> 1.  I have a 500 GB USB hard disk (the new 4,096 [4k] sector size) which I
> have been using to store a master archive of over 70,000 files.
>
>   
>
> 2.  To make a backup of the USB disk, I copied everything over to a 500 GB
> SATA hard disk.  [Various combinations of `cp -r', `scp -r', `tar -cf - . |
> rsh ... tar -xf -', etc.]
>
>   
>
> 3.  To verify that the copy was correct, I did sha256 sums of all files on
> both disks.
>
>   
>
> 4.  When comparing the sha256 sums on both drives, I discovered that 6 or so
> files did not compare OK from one drive to the other.
>
>   
>
> 5.  When I checked the files individually, the files compared OK, and even
> when I recomputed their individual sha256 sums, I got DIFFERENT sha256 sums
> which were correct this time!
>
>   
>
> The above lead me to investigate further, and using ONLY the USB disk, I
> recomputed the sha256 sums for all files ON THAT DISK.  A small number
> (6-12) of files ON THE SAME DISK had different sha256 sums than previously
> computed!  The disk is read-only so nothing could have changed.
>
>   
>
> To try to get to the bottom of this, I took the sha256 code and put it in my
> own file reading routine, which reads-in data from the file using read().
> On summing up the total bytes read in the read() loop, I discovered that on
> the files that failed to compare, the read() returned EOF before the actual
> EOF. According to the manual page this is impossible.  I compared the total
> number of bytes read by the read() loop to the stat() file length value, and
> they were different!  Obviously, the sha256 sum will be different since not
> all the file is read.
>
>   
>
> This happens consistently on 6 to 12 files out of 70,000+ *every* time, and
> on DIFFERENT files *every* time.  So things work 99.9904% of the time.
>
>   
>
> But something fails 0.0096% (one hundredth of one percent) of the time,
> which with a large number of files is significant!
>
>   
>
> Instead of read(), I tried mmap()ing chunks of the file.  Using mmap() to
> access the data in the file instead of read() resulted in a (different)
> sha256 sum than the read() version!  The mmap() version was correct, except
> in ONE case where BOTH versions were WRONG, when compared to a 3rd and 4th
> run!
>
>   
>
> Using `diff -rq disk1 disk2` resulted in similar issues.  There were always
> a few files that failed to compare.  Doing another `diff -rq disk1 disk2`
> resulted in a few *other* files that failed to compare, while the ones that
> didn't compare OK the first time, DID compare OK the second time.  This
> happened to 6-12 files out of 70,000+.
>
>   
>
> Whatever is affecting my use of read() in my sha256 routine seems to also
> affect system utilities such as diff!
>
>   
>
> This gets really insidious because I don't know if the original `cp -r disk1
> disk2` did these short reads on a few files while copying the files, thus
> corrupting my archive backup (on 6-12 files)!
>
>   
>
> Some of the files that fail are small (10KB) and some are huge (8GB).
>
>   
>
> HELP!
>
>   
>
> It takes 7 hours to recompute the sha256 sums of the files on the disk so
> random experiments are time consuming, but I'm willing to try things that
> are suggested.
>
>   
>
> System details:
>
>   
>
> This is observed with the following disks:
>
>   
>
> Western Digital 500GB SATA 512 byte sectors
>
> Hitachi 500GB SATA 512 byte sectors
>
> Iomega RPHD-UG3 500GB USB 4096 byte sectors
>
>   
>
> in combination with these motherboards:
>
>   
>
> P4M800Pro-M V2.0: Pentium D 2.66 GHz, 2GB memory
>
> HP/Compaq Evo: Pentium 4, 2.8 GHz, 2GB memory
>
>   
>
> OP System version:
>
> Freebsd: 9.1 RELEASE #0
>
>   
>
> no hardware errors noted in /var/log/messages during the file reading
>
>   
>
> did Spinrite on disks to freshen (re-read/write) all sectors, with no
> errors.
>
>   
>
> The file systems were built using:
>
>   
>
> dd if=/dev/zero of=/dev/xxx bs=2m
>
> newfs -m0 /dev/xxx
>
>   
>
> Looked through the mailing lists and bug reports but can't see anything
> similar.
>
>   
>
> Thanks for your help,
>
>   
>
> John Refling
>
>

First off, my commiserations. These hard-to-trace faults are no fun. The 
first time I had this problem was on a early 68000 machine, connected to 
ST506 drives using an Adaptec 4000 host adapter (in 1985), and this is 
when I realised data error checking wasn't the complete chain I'd hoped 
it was. It culminated it blowing the mains at the HQ of the company who 
made it, while I reproduced the problem for them, but that's another story.

So, a few thoughts, most of which have probably occurred to you anyway 
but it's always the obvious that gets overlooked:

1) If there was a fundamental problem with read() and UFS (or even a the 
device driver interface) then it would have shown up long before now. I 
doubt it's a kernel software issue.

2) My instinct is to distrust the USB drive interface. It's taken years 
to develop this particular prejudice and I wouldn't dismiss it lightly. 
Can you remove it from the equation (i.e. disconnect the USB adapter and 
connect the drive directly using SATA?

3) IDE drives are "smart", which means when they get data corruption 
they try really, really hard to hide it and show a perfect volume to 
Windoze. Imagine the scenario where the IDE drive finds a problem and 
"corrects" it, possibly with a warning (i.e. an obviously dodgy retry 
count or time elapsed). The USB bus adapter adds its CRC to the data and 
sends it to the host but doesn't pass on any kind of warning. This can 
mess you up even when with USB out of the picture; it's just another 
opportunity to sanitise bad data.

4) Normally I'd distrust DMA controllers in combination with dodgy RAM. 
You've used two motherboards so this should be ruled out (unless you 
used the same RAM in both???).

5) A dump of the drives diagnostics using smartctl (ports) may point to 
confirmation of some of the above.

6) Reading an entire drive can often give you heat-related problems, 
especially if you're doing it file by file (there are more head 
movements that way). Check the drive isn't getting over-cooked (use 
smartctl to read its internal temperature), and if you suspect the drive 
may be diff (I do), then handle it really carefully - i.e. stop flogging 
and start imaging it, slowly.

7) If a sector was breaking down, reading it could "damage" it further.  
The drive may discover its dodgy, think it's recovered the data but get 
it wrong, map it to another sector and write corrupt version to it. The 
fact it's done this may not get back to the OS through the layers.

8) Imagine a scenario where data is read correctly from the drive to a 
buffer on the USB bus adapter; the CRC checks out. This buffer is then 
corrupted (bad RAM, poor power &c). The data will then be forwarded on 
the USB with new CRC calculated based on bad data and will check out 
with the host.

If you were around London I'd suggest bringing it around, as I have good 
snooping kit, but I suspect you're in California.

Good luck! Frank.

P.S. Spinrite was great in the days of MFM and RLL, band steppers and 
they ST506 interface - I don't see the point since IDE.










Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5288D31B.2040105>