From owner-freebsd-questions  Sat Aug 15 14:20:40 1998
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id OAA13667
          for freebsd-questions-outgoing; Sat, 15 Aug 1998 14:20:40 -0700 (PDT)
          (envelope-from owner-freebsd-questions@FreeBSD.ORG)
Received: from bogslab.ucdavis.edu (bogslab.ucdavis.edu [128.120.162.26])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA13662
          for <questions@freebsd.org>; Sat, 15 Aug 1998 14:20:39 -0700 (PDT)
          (envelope-from greg@bogslab.ucdavis.edu)
Received: from deal1.bogs.org (deal1.bogs.org [198.137.203.51]) by bogslab.ucdavis.edu (8.7.4/8.7.3) with ESMTP id OAA05119 for <questions@freebsd.org>; Sat, 15 Aug 1998 14:20:01 -0700 (PDT)
Received: from deal1.bogs.org (localhost [127.0.0.1]) by deal1.bogs.org (8.8.8/8.6.12) with ESMTP id OAA19242 for <questions@freebsd.org>; Sat, 15 Aug 1998 14:18:29 -0700 (PDT)
Message-Id: <199808152118.OAA19242@deal1.bogs.org>
To: questions@FreeBSD.ORG
Subject: Lessons re CDROM+NFS/FTP install errors (long)
Reply-To: gkshenaut@ucdavis.edu
Date: Sat, 15 Aug 1998 14:18:23 -0700
From: User GregGreg Shenaut <greg@bogslab.ucdavis.edu>
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I recently tried to install a system from the 2.2.7 CDROM, using
a 2.2.5 system as an NFS/FTP server.  When I initially tried to
mount the CDROM on the 2.2.5 system, it failed with "BLANK CHECK"
errors and often crashed the 2.2.5 host.  I never could read it at
all.  I then tried a 2.2.6 CDROM, and it worked perfectly, at least
at first.

So, I called Walnut Creek and they immediately sent me back a
replacement for the CDROM.

While I was waiting for it, I spent a day installing a 2.2.6 system.
It took a whole day because as time went on, I started getting more
and more read errors on the CDROM.  Finally, though, at the end of
the day I gave up, with *almost* a complete 2.2.6 system installed.

The next morning when I woke up, I had an idea.  I live in the
central valley of California, where the temperatures have been in
the 80's in the mornings, reaching well above 100 in the afternoon.
It ocurred to me that the CDROM error rate correlated with the
thermometer reading a bit too much for my comfort.  So I turned
the air conditioner on full blast, got the temperature into the
60's, and aimed a fan at the CDROM reader, then tried to install
from the previously unreadable 2.2.7 CDROM.  There was still an
occasional error, but the install's native retrying was enough to
deal with most of the problems; in a few hours I had a 2.2.7 system
up and running.

So, the first lesson, at least for some CDROM readers (mine is a
Pioneer DRM-600) , is that they are temperature sensitive, and the
failure mode mimics bad media in that some CDROMs may be totally
unreadable, while others will appear to read OK.

While I was wrassling with this, I discovered another thing which
can mess up an installation if your hardware is a bit flaky:
buffering.  When you are installing over a network, there are two
main ways to do it: NFS and FTP.  I generally use NFS because you
don't have to type as much to set it up.

In NFS, the system seems to buffer up a ton of stuff so that less
hardware access is required.  This is (I believe) on top of buffering
by the normal disk system.  Under certain conditions, a read error
on the CDROM would cause the target system to report a *write*
error (which at first caused me to suspect bad blocks on the target
machine's disk drive--see below).  I still don't know why this
happened, but I eventually figured out that it just had to do with
the CDROM read errors.  The problem is that when install retried,
then sometimes the bad blocks stayed somewhere in the buffers--you
would get consistent errors at the same point, retry after retry,
but the light on the CDROM drive would never blink:  the (bad) data
was just being read out of a buffer.  The workaround was to install
something else to saturate the buffer, then go back to the missing
distribution(s) or package(s).

A similar thing seemed to happen in FTP, but since now data was
just in the host's disk cache, it seemed to clear up more easily,
or maybe I was just luckier with FTP.

There was one final lesson I learned: in an attempt to deal with
what I thought was a bad disk on the target system, I enabled bad144
sector replacement.  This was back when I was loading the 2.2.6
system.  I didn't get enough unrecoverable errors to keep me from
completing a successful install, but when I tried to reboot the
system, one of the blocks required for booting was at the far end
of the disk--maybe it was just trying to read the bad sector table
itself--and boot totally failed.  I had a fairly complete, painfully
loaded, 2.2.6 system that was going to require me to buy more
hardware to boot it.  So, if you think you *must* enable bad144 on
a boot disk, you probably should create a separate fdisk partition
to contain just the root (and swap?); if I understand correctly,
bad144 is done within each fdisk partition independently, so no
blocks above that partition will be needed for booting.  However,
for the record, it turned out that the disk was OK and I went on
to load a system on it later without bad144:  I have a feeling that
bad144 is probably never needed on modern SCSI or IDE disks.

-Greg

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message