From owner-freebsd-questions Sat Aug 15 14:20:40 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA13667 for freebsd-questions-outgoing; Sat, 15 Aug 1998 14:20:40 -0700 (PDT) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from bogslab.ucdavis.edu (bogslab.ucdavis.edu [128.120.162.26]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA13662 for ; Sat, 15 Aug 1998 14:20:39 -0700 (PDT) (envelope-from greg@bogslab.ucdavis.edu) Received: from deal1.bogs.org (deal1.bogs.org [198.137.203.51]) by bogslab.ucdavis.edu (8.7.4/8.7.3) with ESMTP id OAA05119 for ; Sat, 15 Aug 1998 14:20:01 -0700 (PDT) Received: from deal1.bogs.org (localhost [127.0.0.1]) by deal1.bogs.org (8.8.8/8.6.12) with ESMTP id OAA19242 for ; Sat, 15 Aug 1998 14:18:29 -0700 (PDT) Message-Id: <199808152118.OAA19242@deal1.bogs.org> To: questions@FreeBSD.ORG Subject: Lessons re CDROM+NFS/FTP install errors (long) Reply-To: gkshenaut@ucdavis.edu Date: Sat, 15 Aug 1998 14:18:23 -0700 From: User GregGreg Shenaut Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I recently tried to install a system from the 2.2.7 CDROM, using a 2.2.5 system as an NFS/FTP server. When I initially tried to mount the CDROM on the 2.2.5 system, it failed with "BLANK CHECK" errors and often crashed the 2.2.5 host. I never could read it at all. I then tried a 2.2.6 CDROM, and it worked perfectly, at least at first. So, I called Walnut Creek and they immediately sent me back a replacement for the CDROM. While I was waiting for it, I spent a day installing a 2.2.6 system. It took a whole day because as time went on, I started getting more and more read errors on the CDROM. Finally, though, at the end of the day I gave up, with *almost* a complete 2.2.6 system installed. The next morning when I woke up, I had an idea. I live in the central valley of California, where the temperatures have been in the 80's in the mornings, reaching well above 100 in the afternoon. It ocurred to me that the CDROM error rate correlated with the thermometer reading a bit too much for my comfort. So I turned the air conditioner on full blast, got the temperature into the 60's, and aimed a fan at the CDROM reader, then tried to install from the previously unreadable 2.2.7 CDROM. There was still an occasional error, but the install's native retrying was enough to deal with most of the problems; in a few hours I had a 2.2.7 system up and running. So, the first lesson, at least for some CDROM readers (mine is a Pioneer DRM-600) , is that they are temperature sensitive, and the failure mode mimics bad media in that some CDROMs may be totally unreadable, while others will appear to read OK. While I was wrassling with this, I discovered another thing which can mess up an installation if your hardware is a bit flaky: buffering. When you are installing over a network, there are two main ways to do it: NFS and FTP. I generally use NFS because you don't have to type as much to set it up. In NFS, the system seems to buffer up a ton of stuff so that less hardware access is required. This is (I believe) on top of buffering by the normal disk system. Under certain conditions, a read error on the CDROM would cause the target system to report a *write* error (which at first caused me to suspect bad blocks on the target machine's disk drive--see below). I still don't know why this happened, but I eventually figured out that it just had to do with the CDROM read errors. The problem is that when install retried, then sometimes the bad blocks stayed somewhere in the buffers--you would get consistent errors at the same point, retry after retry, but the light on the CDROM drive would never blink: the (bad) data was just being read out of a buffer. The workaround was to install something else to saturate the buffer, then go back to the missing distribution(s) or package(s). A similar thing seemed to happen in FTP, but since now data was just in the host's disk cache, it seemed to clear up more easily, or maybe I was just luckier with FTP. There was one final lesson I learned: in an attempt to deal with what I thought was a bad disk on the target system, I enabled bad144 sector replacement. This was back when I was loading the 2.2.6 system. I didn't get enough unrecoverable errors to keep me from completing a successful install, but when I tried to reboot the system, one of the blocks required for booting was at the far end of the disk--maybe it was just trying to read the bad sector table itself--and boot totally failed. I had a fairly complete, painfully loaded, 2.2.6 system that was going to require me to buy more hardware to boot it. So, if you think you *must* enable bad144 on a boot disk, you probably should create a separate fdisk partition to contain just the root (and swap?); if I understand correctly, bad144 is done within each fdisk partition independently, so no blocks above that partition will be needed for booting. However, for the record, it turned out that the disk was OK and I went on to load a system on it later without bad144: I have a feeling that bad144 is probably never needed on modern SCSI or IDE disks. -Greg To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message