Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jun 2005 22:58:04 +1000
From:      Stephen McKay <smckay@internode.on.net>
To:        Peter Jeremy <PeterJeremy@optushome.com.au>
Cc:        stable@freebsd.org, Stephen McKay <smckay@internode.on.net>
Subject:   Re: Data corruption in cd9660 on FreeBSD 4.11? 
Message-ID:  <200506281258.j5SCw46l010235@dungeon.home>
In-Reply-To: <20050624213433.GA50157@cirb503493.alcatel.com.au> from Peter Jeremy at "Sat, 25 Jun 2005 07:34:33 %2B1000"
References:  <200506241231.j5OCV6jp047730@dungeon.home> <20050624213433.GA50157@cirb503493.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
I haven't finished all the suggested tests, but since I'm taking so long
to do so, I thought I should send what I have so far.

On Saturday, 25th June 2005, Peter Jeremy wrote:

>On Fri, 2005-Jun-24 22:31:06 +1000, Stephen McKay wrote:
>>I'm experiencing data corruption when reading CDs and DVDs on FreeBSD 4.11.
>...
>>So, can anyone suggest any more tests I could try?  Or is there a kind of
>>hardware fault that could cause this substitution of whole blocks read from
>>CDs without causing any other problems?
>
>You might like to post the relevant sections of a verbose boot - the
>ATA and CD probes.

I've appended it to this messages, so that the flow is not ruined.

Note that I am not currently using ATAPI-CAM for my tests.  I am using
/dev/acd0a and /dev/acd1a to mount the CDs in the DVD-ROM and DVD-R
respectively.  Also the "non-ATA66 cable" thing is true; it is a plain
ATA33 cable.

>Are you running the CD/DVD drives in PIO or UDMA modes?

I normally run both DVD drives at UDMA33.  My test runs normally fail
every 2nd or 3rd run.  I've seen it do 5 OK runs in a row once though,
so I don't yet have a very good test.

I tested with PIO4 and ran 12 consecutive test runs without error.  It was
a little slower at 150 seconds per run instead of the normal 135, possibly
because 75% to 80% of the cpu was dedicated to interrupt handling (doing
pio, I assume).

It seems that either DMA or ATA interrupts (or maybe both) are required
to cause the problem.

Also, I tried some tests with the noclusterr mount option on the CD.  The
test ran much slower (approx 232 seconds instead of 135) but I also saw
no failures (with only 6 test runs though as I was pressed for time).

The noclusterr option is interesting because it defeats read clustering
resulting in the ATA driver issuing only 2K reads instead of up to 64K
at a time.  I assume that the 64K reads would require scatter-gather DMA,
so maybe this is relevant to the problem.  Oddly, I noticed that a fixed
value of 65534 is found in atapi-all.c as a request size limit.  No, not
65536 = 2^16, but 2 bytes less.  Puzzling.

>Have you tried anything other than ISO9660 filesystems on a physical CD?

I have not tried anything but cd9660 file systems on CDs and DVDs.  I will
see if I can build a UFS file system to test with, when I get a chance.

>What happens if you just dd the CD-ROM?

When I dd the CD-ROM it seems to work correctly.  I have done this only
infrequently however, so I may just be lucky to not have had a failure.

I've now done 6 consecutive dd reads of my test CD-ROM in UDMA33 mode with
no errors.  It only takes 125 seconds, so it's a bit faster than comparing
directory trees.  Only 6 tests isn't many, so I'll do more later, this time
with other system activity.

>What happens if you use a vnode
>mount (see vnconfig(8)) of an ISO filesystem sitting in a UFS filesystem?

I'll test this when I get a chance.

>Anything unusual in your kernel config file?

Nothing too unusual.  I'm running a uni-processor kernel with HTT disabled.
I skimmed through my config and this is the only interesting thing: HZ=500
I don't think that's too dangerous.  On the other hand, it does increase
the rate of interrupts, and if there's a race somewhere, it may make it
worse.

>Have you tried building a kernel with WITNESS and/or DIAGNOSTIC?

I'm now running with INVARIANTS, INVARIANT_SUPPORT, and DIAGNOSTIC on
4.11.  No change in the failure rate and no significant slowdown either.

>Any chance of you repeating the tests with a 5.x system?  Maybe
>on a spare small partition or using a 5.4-RELEASE disk1 as a live
>filesystem.

I was experimenting with current in late April, so I installed that drive
for testing.  So far, I have not been able to reproduce the failure on
April's current though I've only had time for a quick run of 6 repetitions.

Current takes the same time (135 seconds, on average) to read and compare
the CD.  That seems good, considering all the debugging is still enabled.
I'm pretty sure that ATA MK III is in this kernel.

Sadly, it panics immediately if I run "atacontrol mode 1" so I'm just
assuming it is running in DMA mode by the speed of it.  (And I have
hw.ata.atapi_dma=1 in /boot/loader.conf).

That's where I'm up to so far in stress testing.  Right now I'm trying to
understand some unusual looking code in ata_dmasetupd_cb() in 4.11's
ata-dma.c.  The attached comment is "A maximum segment size was specified
for bus_dma_tag_create, but some busdma code does not seem to honor this,
so fix up if needed."  The "fix-up" code seems to be gone in current, so
it looks suspicious to me.  When I work out what it does, I'll report back.

Stephen.

------------------------------------------------------------------

Verbose boot of 4.11-p10 (the ata related parts, at least):

atapci0: <Intel ICH5 ATA100 controller> port 0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 irq 0 at device 31.1 on pci0
ata0: iobase=0x01f0 altiobase=0x03f6 bmaddr=0xfc00
ata0: mask=03 ostat0=50 ostat2=00
ata0-master: ATAPI 00 00
ata0-slave: ATAPI 00 00
ata0: mask=03 stat0=50 stat1=00
ata0-master: ATA 01 a5
ata0: devices=01
ata0: at 0x1f0 irq 14 on atapci0
ata1: iobase=0x0170 altiobase=0x0376 bmaddr=0xfc08
ata1: mask=03 ostat0=50 ostat2=50
ata1-slave: ATAPI 14 eb
ata1-master: ATAPI 14 eb
ata1: mask=03 stat0=00 stat1=00
ata1: devices=0c
ata1: at 0x170 irq 15 on atapci0
pci0: <unknown card> (vendor=0x8086, dev=0x24d3) at 31.3 irq 10
ata-: ata0 exists, using next available unit number
ata-: ata1 exists, using next available unit number
Trying Read_Port at 203
Trying Read_Port at 243
Trying Read_Port at 283
Trying Read_Port at 2c3
Trying Read_Port at 303
Trying Read_Port at 343
Trying Read_Port at 383
Trying Read_Port at 3c3
isa_probe_children: disabling PnP devices
isa_probe_children: probing non-PnP devices
orm0: <Option ROM> at iomem 0xc0000-0xcefff on isa0
pmtimer0 on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
ata2 failed to probe at port 0x1f0 irq 14 on isa0
ata3 failed to probe at port 0x170 irq 15 on isa0

[Irrelevant stuff snipped]

ad0: success setting UDMA5 on Intel chip
Creating DISK ad0
ar: FreeBSD check1 failed
ad0: <Maxtor 6Y120L0/YAR41VW0> ATA-7 disk at ata0-master
ad0: 117246MB (240121728 sectors), 238216 C, 16 H, 63 S, 512 B
ad0: 16 secs/int, 1 depth queue, UDMA100
ad0: piomode=4 dmamode=2 udmamode=6 cblid=1
ata1-master: piomode=4 dmamode=2 udmamode=2 dmaflag=1
ata1-master: success setting UDMA2 on Intel chip
acd0: <Pioneer DVD-ROM ATAPIModel DVD-103S 0116/E1.16> DVD-ROM drive at ata1 as master
acd0:  512KB buffer, UDMA33
acd0: Reads: CD-R, CD-RW, CD-DA stream, DVD-ROM, DVD-R
acd0: Writes:
acd0: Audio: play, 256 volume levels
acd0: Mechanism: ejectable tray, unlocked
acd0: Medium: no/blank disc
ata1-slave: piomode=4 dmamode=2 udmamode=4 dmaflag=1
ata1-slave: DMA limited to UDMA33, non-ATA66 cable or device
ata1-slave: success setting UDMA2 on Intel chip
acd1: <LITE-ON DVDRW SOHW-1673S/JS01> DVD-R drive at ata1 as slave
acd1: read 8268KB/s (4782KB/s) write 8268KB/s (8268KB/s), 2048KB buffer, UDMA33
acd1: Reads: CD-R, CD-RW, CD-DA stream, DVD-ROM, DVD-R, packet
acd1: Writes: CD-R, CD-RW, DVD-R, test write, burnproof
acd1: Audio: play, 256 volume levels
acd1: Mechanism: ejectable tray, unlocked
acd1: Medium: no/blank disc
(probe2:ata1:0:0:0): INQUIRY. CDB: 12 1 80 0 ff 0 
(probe2:ata1:0:0:0): ILLEGAL REQUEST asc:24,0
(probe2:ata1:0:0:0): Invalid field in CDB
(probe3:ata1:0:1:0): INQUIRY. CDB: 12 1 80 0 ff 0 
(probe3:ata1:0:1:0): ILLEGAL REQUEST asc:24,0
(probe3:ata1:0:1:0): Invalid field in CDB
(probe2:ata1:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 
(probe2:ata1:0:0:0): NOT READY asc:3a,0
(probe2:ata1:0:0:0): Medium not present
(probe3:ata1:0:1:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 
(probe3:ata1:0:1:0): NOT READY asc:3a,0
(probe3:ata1:0:1:0): Medium not present
(probe2:ata1:0:0:0): INQUIRY. CDB: 12 1 80 0 ff 0 
(probe2:ata1:0:0:0): ILLEGAL REQUEST asc:24,0
(probe2:ata1:0:0:0): Invalid field in CDB
(probe3:ata1:0:1:0): INQUIRY. CDB: 12 1 80 0 ff 0 
(probe3:ata1:0:1:0): ILLEGAL REQUEST asc:24,0
(probe3:ata1:0:1:0): Invalid field in CDB
Creating DISK cd0
Creating DISK cd1
pass0 at ata1 bus 0 target 0 lun 0
pass0: <PIONEER DVD-ROM DVD-103 1.16> Removable CD-ROM SCSI-0 device 
pass0: 33.000MB/s transfers
pass1 at ata1 bus 0 target 1 lun 0
pass1: <LITE-ON DVDRW SOHW-1673S JS01> Removable CD-ROM SCSI-0 device 
pass1: 33.000MB/s transfers
Mounting root from ufs:/dev/ad0s2a
(cd0:ata1:0:0:0): READ CD RECORDED CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
(cd0:ata1:0:0:0): NOT READY asc:3a,0
(cd0:ata1:0:0:0): Medium not present
cd0 at ata1 bus 0 target 0 lun 0
cd0: <PIONEER DVD-ROM DVD-103 1.16> Removable CD-ROM SCSI-0 device 
cd0: 33.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present
(cd1:ata1:0:1:0): READ CD RECORDED CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
(cd1:ata1:0:1:0): NOT READY asc:3a,0
(cd1:ata1:0:1:0): Medium not present
cd1 at ata1 bus 0 target 1 lun 0
cd1: <LITE-ON DVDRW SOHW-1673S JS01> Removable CD-ROM SCSI-0 device 
cd1: 33.000MB/s transfers
cd1: Attempt to query device size failed: NOT READY, Medium not present
ad0s1: type 0x7, start 63, end = 12289724, size 12289662 : OK
ad0s2: type 0xa5, start 12289725, end = 22523129, size 10233405 : OK
ad0s3: type 0x83, start 22523130, end = 30716279, size 8193150 
ad0s3: C/H/S start 1023/0/1 (16434495) != start 22523130: invalid
ad0s4: type 0x5, start 30716280, end = 240107489, size 209391210 
ad0s4: C/H/S start 1023/0/1 (16434495) != start 30716280: invalid
ad0s5: type 0x82, start 30716343, end = 32820794, size 2104452 
ad0s5: C/H/S start 1023/1/1 (16434558) != start 30716343: invalid
ad0<extended>: type 0x5, start 32820795, end = 41013944, size 8193150 
ad0<extended>: C/H/S start 1023/0/1 (16434495) != start 32820795: invalid
ad0s6: type 0x83, start 32820858, end = 41013944, size 8193087 
ad0s6: C/H/S start 1023/1/1 (16434558) != start 32820858: invalid
ad0<extended>: type 0x5, start 41013945, end = 171526004, size 130512060 
ad0<extended>: C/H/S start 1023/0/1 (16434495) != start 41013945: invalid
ad0s7: type 0x7, start 41014008, end = 171526004, size 130511997 
ad0s7: C/H/S start 1023/1/1 (16434558) != start 41014008: invalid
ad0<extended>: type 0x5, start 171526005, end = 240107489, size 68581485 
ad0<extended>: C/H/S start 1023/0/1 (16434495) != start 171526005: invalid
ad0s8: type 0xa5, start 171526068, end = 240107489, size 68581422 
ad0s8: C/H/S start 1023/1/1 (16434558) != start 171526068: invalid

------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200506281258.j5SCw46l010235>