Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Mar 2005 19:52:22 +0100
From:      Anthony Atkielski <atkielski.anthony@wanadoo.fr>
To:        freebsd-questions@freebsd.org
Subject:   Re: Installation instructions for Firefox somewhere?
Message-ID:  <165771504.20050302195222@wanadoo.fr>
In-Reply-To: <LOBBIFDAGNMAMLGJJCKNGEKDFAAA.tedm@toybox.placo.com>
References:  <9810408603.20050302055304@wanadoo.fr> <LOBBIFDAGNMAMLGJJCKNGEKDFAAA.tedm@toybox.placo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ted Mittelstaedt writes:

> HP didn't manufacture either of the drives nor the SCSI controller so
> why would you think that they know what they are talking about?

They rebranded the drives and took the top 10% or so of production
batches (according to someone I knew on the inside). They also charged
more for them.

> This happens when the SCSI target disk0 stop answering
> commands from the SCSI adapter.
>
>> Feb 26 20:09:26 contactdish kernel: (da1:ahc0:0:2:0): Retrying Command
>> Feb 26 20:09:26 contactdish kernel: (da1:ahc0:0:2:0): Queue Full
>> Feb 26 20:09:26 contactdish kernel: (da1:ahc0:0:2:0): Retrying Command
>
> Same thing as above just with the second disk.

So perhaps FreeBSD is issuing commands that the disk drives don't like.

Incidentally, I've discovered that I can instantly generate similar
messages by issuing "smartctl -a /dev/da0" (or da1).

> The usual problem is bad termination that causes this, because what
> happens with bad termination is that electrical noise causes one or
> more targets on the bus to receive a command that is garbage, that
> target shuts down and goes out of sync with the other initiators and
> targets on the bus, as soon as that happens all targets shut down.

I would have seen the same problem with Windows if that were the case.
The hardware was the same, and the long delay (20-30 seconds) produced
each time this happens would have been impossible to ignore.

> But it can also be caused by a device that isn't totally compliant
> with the standard interfering with another device on the bus (although
> this is rare) And it can also be caused by the adapter card driver
> sending a command to a target that the target doesen't understand or
> does not process properly, this can happen when during the probe on
> boot, a target responds saying it supports something, then really
> doesen't. IDE devices are infamous for this, claiming to support UDMA,
> PIO mode 4, and such when they really don't support them properly.

Or perhaps FreeBSD doesn't understand that this particular (old) SCSI
hardware can't understand every command it issues.

> The driver for the SCSI adapter has finally given up trying to send
> commands to the adapter card your disks are tied to and has decided to
> just reset the card entirely, which resets the bus and all devices on
> it, which reestablishes sync.

That explains the long delay.

> This is a bit significant, after the bus reset, the second disk (the
> Quantum) isn't answering. But it looks like it later on started
> responding since otherwise your system would probably have paniced.

I've experienced one or two panics, but most of the time it's just a
long delay.  I've seen no evidence of data corruption, although it's
hard to be sure, of course.

> The idea that a SCSI command sent to a disk by the adapter card
> causing this is unlikely, unless either the Seagate or Quantum models
> that you have are known rogues (and I didn't find that they are) it is
> much more likely a conflict on the SCSI bus.

Why now, after eight years?

> Well first of all I already told you to run your BIOS config and set
> the adapter to limit sync negotiation on the Quantum to 10Mb and
> see if that fixed it.

I'll check that the next time I boot.  But it seems to happen on both
drives, not just the Quantum.

> Secondly, you don't know how NT setup the disks and such on your
> system.  It is quite possible that the NT driver saw the mismatch
> and simply reprogrammed the SCSI adapter card to limit both disks
> to 10Mbt transfers.  Or possibly the NT driver decided not to send
> writes to both disks at the same time.  So, comparisons like "it
> worked with NT so the hardware must be good" are almost useless.

So how do I configure FreeBSD to do the same thing?  If NT can do it in
software, so can FreeBSD.

> But the most important thing, and I think why your having so much
> trouble here, is that you are trying to approach this problem
> as though you paid $9,000 for this server, yesterday.

I don't believe in throwing computers away just because they are a few
years old.

> If your Vectra was a brand new prototype in an HP test lab, or
> even if it was 10 days old from HP and you ran into this problem,
> you might have engineers with SCSI analyzers from HP's server
> build department all over you.

If it were a problem with hardware, I would have had exactly that eight
years ago.  But I didn't, so it's not.

> But it's not - this is a server that has a production life that
> is OVER.  I know you don't like Ebay and you probably think that
> everything on it is junk, but people are selling HP servers on
> it right now that are more powerful than yours and younger than
> it for under a hundred bucks - see:

Why should I pay anything for another machine, when I have a perfectly
good one here on my desk?

All I need is software that can drive it.

> The fact of the matter is that ANY life you can get out of this
> server today is found money - it's a freebie.  HP, 8 or 10 years
> ago when they designed this server would have told you THEN that
> they wern't expecting it to be in service today.

If so, they would have been deliberately lying, as even then everyone
knew that HP machines can run for a decade or more with ease.

> Now to be honest I have a soft spot for older hardware, the
> gateway router system that this very message is passing though just
> happens to be a dual Pentium Pro with 128MB of ram and a 4GB
> SCSI disk on an Adaptec controller, here's it's dmesg:

Looks familiar.

> ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0x9300-0x93ff mem

> aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs

These look familiar, too.  I still have an Adaptec SCSI controller in a
box somewhere (it wouldn't run on the cheap hardware I used for my old
server, so I removed it and replaced it with a more modest
controller--in that case, it _was_ hardware).

> da0 at ahc0 bus 0 target 0 lun 0
> da0: <MICROP 4343WS X502> Fixed Direct Access SCSI-2 device
> da0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit), Tagged Queueing

Hmm.

> This is a real live system I just built last week.  It runs a champ.

So what is the difference between yours and mine?

> Said pile included cases even, it also includes a second dual Ppro
> motherboard (sans CPU's).

In some circles it is said that PPros still have considerable value.
They're not fast but they are reliable and classic, with hardware
features that Intel inexplicably abandoned thereafter.

> But, I have learned something in dealing with the older gear, and that
> is that you must be extremely flexible with it.  If I get 2 systems
> that are flakey, I will swap parts between them in an effort to get
> 1 stable system.  More commonly though I have a half dozen or more
> older systems in the pool at a time, that parts move around between,
> and that new systems in a state of disrepair come into, and old systems
> that are in a state of stability go out of to friends and others who
> need systems.

But this is a stable system.  The hardware _does_ work.  I didn't put
this together out of scrap parts.  It has run perfectly for eight years;
I think I can safely say that it's pretty well broken in by now.  So
when I switch from Windows to FreeBSD and it stops working, I know it's
not hardware.

> If you insist on going that route, your only hope is interesting the
> developer of the ahc driver in your problem.  Start by filing a PR in
> the correct manner (ie: by following the instructions in the Handbook)
> and if that does not get a response from the developer (the PR's are
> mailed to the developers of the drivers) then read the source code
> of the driver to find out who it is, look up his e-mail address on the
> website, send him an e-mail begging to take a look at your PR, and
> stop wasting our time here.

I'll consider it.  The waste of time has been mutual.

-- 
Anthony




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?165771504.20050302195222>