From owner-freebsd-hackers  Thu May 23 08:02:14 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id IAA18455
          for hackers-outgoing; Thu, 23 May 1996 08:02:14 -0700 (PDT)
Received: from fire.dkrz.de (fire.dkrz.de [136.172.110.250])
          by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id IAA18436
          for <freebsd-hackers@FreeBSD.org>; Thu, 23 May 1996 08:02:08 -0700 (PDT)
Received: from racer.dkrz.de (racer.dkrz.de [136.172.110.55]) by fire.dkrz.de (8.7.5/8.7.3) with ESMTP id RAA20486; Thu, 23 May 1996 17:01:05 +0200 (MET DST)
Received: (from gwk@localhost) by racer.dkrz.de (8.7.4/8.7.3) id RAA05186; Thu, 23 May 1996 17:00:06 +0200 (MET DST)
Date: Thu, 23 May 1996 17:00:06 +0200 (MET DST)
Message-Id: <199605231500.RAA05186@racer.dkrz.de>
From: "Georg-W. Koltermann" <gwk@racer.dkrz.de>
To: msmith@atrad.adelaide.edu.au
Cc: freebsd-hackers@FreeBSD.org
In-reply-to: <199605221230.WAA04592@genesis.atrad.adelaide.edu.au> (message
	from Michael Smith on Wed, 22 May 1996 22:00:21 +0930 (CST))
Subject: Re: 960501-SNAP: data corruption reading /dev/rwt0 (Wangtek)
X-Attribution: gwk
Reply-to: gwk@cray.com
Sender: owner-hackers@FreeBSD.org
X-Loop: FreeBSD.org
Precedence: bulk

>>>>> Michael Smith <msmith@atrad.adelaide.edu.au> writes:
    ....  (stuff deleted)
    >> One week ago I got a new PC (586/100, 32MB, ASUS T2P4), and
    >> tried to install 960501-SNAP from tape.  Turned out most of the
    >> files read from tape were corrupted, so the installation would
    >> abort after unpacking one or two files of the first dist.  I
    >> switched to the shell on VT4 and tried to untar the tape
    >> manually, then cat | zcat | cpio -itv to check if I could read
    >> the pieces.  The tar finished without an error indication, but
    >> typically the zcat would abort readily, so I could only list a
    >> few files from the archive.
    >> 
    >> I untarred the same tape a couple of times, trying different
    >> block sizes on the tar command.  Eventually (using a blocking
    >> count of 16) I could read the tape without corruption, and
    >> could install the snap.
    Michael>  This sounds like the interface card isn't talking to the
    Michael> rest of the system very well.  Have you tried fiddling
    Michael> with the ISA bus timing?  Try increasing the back-to-back
    Michael> IO delay, 8- and 16-bit I/O waitstates, 8- and 16-bit DMA
    Michael> waitstates, etc., anything at all related to timing on
    Michael> the ISA bus.  Also make sure the ISA bus clock is around
    Michael> 8MHz.
    Michael> 
    >> Now with the snap loaded and running flawlessly, I want to read
    >> my backups from tape (cpio -H crc format), but again I am
    >> getting data corruption.  When I look at the files extracted,
    >> they look fine up to a certain point where they just contain
    >> binary zeroes.  I have again
    Michael>  This sounds horribly like it's something getting out of
    Michael> sync.

Not really.  You see, that data corruptions does not happen randomly
(and of course I know more today than I knew yesterday).

a) Every point of corruption that I checked is exactly 512 bytes of
   data being replaced by binary zeroes.  That's exactly one tape
   block.

b) If I start reading from tape, then abort with ^C after I see the
   first messages about corrupted files, and then start the same
   extraction command again, corruption will typically happen at the
   same point as before.  I. e. if I am extracting a cpio -H crc
   archive from tape, abort after the first couple of messages about
   bad CRC, then reenter the same cpio command again, I will see the
   same messages about bad CRC for the same files in the archive.

c) There is a way how I can work around the problem:

   1. Extract the whole tape file once with a big block size, say 64
      kB.  Let it run to completion, don't ^C out (sigh...).

   2. Extract the same tape again, this time with a small block size
      (8 kB).

   The second pass will work without a singe error!!!  Unfortunately
   subsequent passes will again result in corrupted data.

I also changed the BIOS setup of my motherboard to the most
conservative settings, i. e. what ASUS calls "BIOS defaults, for
troubleshooting".  That, among other things, inserts 8 wait states for
8 bit ISA I/O requests (my tape controller is a 8 bit adapter).  NO
CHANGE to data corruption, same problem as before.

Just to be sure I tried extracting the tape with a generic
2.1.0-RELEASE kernel, booted from the install/fixit floppies, and that
also showed the same problem.

I think there is a software problem with the wt driver, maybe related
to doing DMA on a 32 MB machine (bounce buffers?).  Whether the
problem appears or not depends on something done at wtopen() time.  If
something bad happens at wtopen() time, spurious tape blocks will be
replaced by zeroes.  If that bad thing does not happen during
wtopen(), then all data until the next wtclose() will be read
correctly.

QUESTIONS:

Is there an easy way how I can restrict my machine to using just the
lower 16 MB of memory, so that bounce buffers will not be needed?

Out of curiosity, does anyone run a wt type tape with an ISA bus
adapter on a machine with more than 16 MB memory?

Georg-W. Koltermann, gwk@cray.com