From owner-freebsd-hackers Thu May 23 08:02:14 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id IAA18455 for hackers-outgoing; Thu, 23 May 1996 08:02:14 -0700 (PDT) Received: from fire.dkrz.de (fire.dkrz.de [136.172.110.250]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id IAA18436 for ; Thu, 23 May 1996 08:02:08 -0700 (PDT) Received: from racer.dkrz.de (racer.dkrz.de [136.172.110.55]) by fire.dkrz.de (8.7.5/8.7.3) with ESMTP id RAA20486; Thu, 23 May 1996 17:01:05 +0200 (MET DST) Received: (from gwk@localhost) by racer.dkrz.de (8.7.4/8.7.3) id RAA05186; Thu, 23 May 1996 17:00:06 +0200 (MET DST) Date: Thu, 23 May 1996 17:00:06 +0200 (MET DST) Message-Id: <199605231500.RAA05186@racer.dkrz.de> From: "Georg-W. Koltermann" To: msmith@atrad.adelaide.edu.au Cc: freebsd-hackers@FreeBSD.org In-reply-to: <199605221230.WAA04592@genesis.atrad.adelaide.edu.au> (message from Michael Smith on Wed, 22 May 1996 22:00:21 +0930 (CST)) Subject: Re: 960501-SNAP: data corruption reading /dev/rwt0 (Wangtek) X-Attribution: gwk Reply-to: gwk@cray.com Sender: owner-hackers@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk >>>>> Michael Smith writes: .... (stuff deleted) >> One week ago I got a new PC (586/100, 32MB, ASUS T2P4), and >> tried to install 960501-SNAP from tape. Turned out most of the >> files read from tape were corrupted, so the installation would >> abort after unpacking one or two files of the first dist. I >> switched to the shell on VT4 and tried to untar the tape >> manually, then cat | zcat | cpio -itv to check if I could read >> the pieces. The tar finished without an error indication, but >> typically the zcat would abort readily, so I could only list a >> few files from the archive. >> >> I untarred the same tape a couple of times, trying different >> block sizes on the tar command. Eventually (using a blocking >> count of 16) I could read the tape without corruption, and >> could install the snap. Michael> This sounds like the interface card isn't talking to the Michael> rest of the system very well. Have you tried fiddling Michael> with the ISA bus timing? Try increasing the back-to-back Michael> IO delay, 8- and 16-bit I/O waitstates, 8- and 16-bit DMA Michael> waitstates, etc., anything at all related to timing on Michael> the ISA bus. Also make sure the ISA bus clock is around Michael> 8MHz. Michael> >> Now with the snap loaded and running flawlessly, I want to read >> my backups from tape (cpio -H crc format), but again I am >> getting data corruption. When I look at the files extracted, >> they look fine up to a certain point where they just contain >> binary zeroes. I have again Michael> This sounds horribly like it's something getting out of Michael> sync. Not really. You see, that data corruptions does not happen randomly (and of course I know more today than I knew yesterday). a) Every point of corruption that I checked is exactly 512 bytes of data being replaced by binary zeroes. That's exactly one tape block. b) If I start reading from tape, then abort with ^C after I see the first messages about corrupted files, and then start the same extraction command again, corruption will typically happen at the same point as before. I. e. if I am extracting a cpio -H crc archive from tape, abort after the first couple of messages about bad CRC, then reenter the same cpio command again, I will see the same messages about bad CRC for the same files in the archive. c) There is a way how I can work around the problem: 1. Extract the whole tape file once with a big block size, say 64 kB. Let it run to completion, don't ^C out (sigh...). 2. Extract the same tape again, this time with a small block size (8 kB). The second pass will work without a singe error!!! Unfortunately subsequent passes will again result in corrupted data. I also changed the BIOS setup of my motherboard to the most conservative settings, i. e. what ASUS calls "BIOS defaults, for troubleshooting". That, among other things, inserts 8 wait states for 8 bit ISA I/O requests (my tape controller is a 8 bit adapter). NO CHANGE to data corruption, same problem as before. Just to be sure I tried extracting the tape with a generic 2.1.0-RELEASE kernel, booted from the install/fixit floppies, and that also showed the same problem. I think there is a software problem with the wt driver, maybe related to doing DMA on a 32 MB machine (bounce buffers?). Whether the problem appears or not depends on something done at wtopen() time. If something bad happens at wtopen() time, spurious tape blocks will be replaced by zeroes. If that bad thing does not happen during wtopen(), then all data until the next wtclose() will be read correctly. QUESTIONS: Is there an easy way how I can restrict my machine to using just the lower 16 MB of memory, so that bounce buffers will not be needed? Out of curiosity, does anyone run a wt type tape with an ISA bus adapter on a machine with more than 16 MB memory? Georg-W. Koltermann, gwk@cray.com