Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Mar 2005 05:02:41 -0800
From:      "Ted Mittelstaedt" <tedm@toybox.placo.com>
To:        <freebsd-questions@freebsd.org>
Subject:   RE: Installation instructions for Firefox somewhere?
Message-ID:  <LOBBIFDAGNMAMLGJJCKNGEKDFAAA.tedm@toybox.placo.com>
In-Reply-To: <9810408603.20050302055304@wanadoo.fr>

next in thread | previous in thread | raw e-mail | index | archive | help


> -----Original Message-----
> From: owner-freebsd-questions@freebsd.org
> [mailto:owner-freebsd-questions@freebsd.org]On Behalf Of Anthony
> Atkielski
> Sent: Tuesday, March 01, 2005 8:53 PM
> To: freebsd-questions@freebsd.org
> Subject: Re: Installation instructions for Firefox somewhere?
>
>
> Ted Mittelstaedt writes:
>
> > It appears you have a narrow-SCSI max 10MB sync disk drive and a
> > ultra -3 20MB sync disk drive on the same adapter card.
> > Such a combination is iffy at best.
>
> The configuration was the one recommended by HP.  I bought the second
> drive from HP directly.  They both have the same type of SCSI
> interface,
> approved by HP.
>

HP didn't manufacture either of the drives nor the SCSI controller so
why would you think that they know what they are talking about?  HP
does the same thing Compaq does (now really the same since they
are the same company) they buy off-the-shelf parts from other
manufacturers
and bundle them together into systems that they sell.  Dell, Gateway
and all the rest of them do the same thing.  A very few of their
products (like the Vectra XU 6/200 that you have) they do design
the motherboards, but that's it.  And of course they design the
sheetmetal.  But for the motherboards in most of their stuff they
get OEMs to make them for them.

And despite all the testing on occasion they screw up and release
patches that patch around hardware problems.

> I'm tired of hearing why it's not FreeBSD's fault.  When you
> can tell me
> exactly what theses messages mean, instead of guessing, let me know.

Ok here goes:

Feb 26 20:09:23 contactdish kernel: (da0:ahc0:0:0:0): Retrying Command
Feb 26 20:09:23 contactdish kernel: (da0:ahc0:0:0:0): Request Requeued
Feb 26 20:09:23 contactdish kernel: (da0:ahc0:0:0:0): Retrying Command

This happens when the SCSI target disk0 stop answering
commands from the SCSI adapter.

Feb 26 20:09:26 contactdish kernel: (da1:ahc0:0:2:0): Retrying Command
Feb 26 20:09:26 contactdish kernel: (da1:ahc0:0:2:0): Queue Full
Feb 26 20:09:26 contactdish kernel: (da1:ahc0:0:2:0): Retrying Command

Same thing as above just with the second disk.

The usual problem is bad termination
that causes this, because what happens with bad termination is that
electrical noise causes one or more targets on the bus to receive a
command that is garbage, that target shuts down and goes out of sync with
the other initiators and targets on the bus, as soon as that happens all
targets shut down.  But it can also be caused
by a device that isn't totally compliant with the standard interfering
with another device on the bus (although this is rare)  And it can also
be caused by the adapter card driver sending a command to a target that
the target doesen't understand or does not process properly, this can
happen when during the probe on boot, a target responds saying it
supports
something, then really doesen't.  IDE devices are infamous for this,
claiming to support UDMA, PIO mode 4, and such when they really don't
support them properly.

Sometimes if the bus is left quiet, the devices can resync and things
go on.  Mostly though it almost always leads to the next thing that
you have, here:

eb 25 20:09:29 contactdish kernel: ahc0: Recovery Initiated
Feb 25 20:09:29 contactdish kernel: >>>>>>>>>>>>>>>>>> Dump Card State
Begins

The driver for the SCSI adapter has finally given up trying to send
commands
to the adapter card your disks are tied to and has decided to just
reset the card entirely, which resets the bus and all devices on it,
which reestablishes sync.

All the rest of the data that follows is a dump of the state of the card
and the commands sent, and what queue entries are trashed so the
operating
system can pick up where it left off if the card comes back online.

Feb 25 20:09:29 contactdish kernel: (da1:ahc0:0:2:0): SCB 0x49 - timed
out
Feb 25 20:09:29 contactdish kernel: sg[0] - Addr 0x1309b000 : Length 2048
Feb 25 20:09:29 contactdish kernel: (da1:ahc0:0:2:0): Queuing a BDR SCB
Feb 25 20:09:29 contactdish kernel: ahc0: Timedout SCBs already complete.
Interrupts may not be functioning.

This is a bit significant, after the bus reset, the second disk (the
Quantum) isn't answering. But it looks like it later on started
responding
since otherwise your system would probably have paniced.

None of this though is any help here.  You know what the problem is
you just don't know what is causing it.  The idea that a SCSI command
sent to a disk by the adapter card causing this is unlikely,
unless either the Seagate or Quantum models that you have are known
rogues (and I didn't find that they are) it is much more likely a
conflict on the SCSI bus.

> I'm not going to plug and unplug hardware all day based on your
> speculations, particularly since I know this hardware configuration
> works, and has worked for eight years.
>

Well first of all I already told you to run your BIOS config and set
the adapter to limit sync negotiation on the Quantum to 10Mb and
see if that fixed it.  That would not involve you removing stuff.

Secondly, you don't know how NT setup the disks and such on your
system.  It is quite possible that the NT driver saw the mismatch
and simply reprogrammed the SCSI adapter card to limit both disks
to 10Mbt transfers.  Or possibly the NT driver decided not to send
writes to both disks at the same time.  So, comparisons like "it
worked with NT so the hardware must be good" are almost useless.

But the most important thing, and I think why your having so much
trouble here, is that you are trying to approach this problem
as though you paid $9,000 for this server, yesterday.

If your Vectra was a brand new prototype in an HP test lab, or
even if it was 10 days old from HP and you ran into this problem,
you might have engineers with SCSI analyzers from HP's server
build department all over you.

But it's not - this is a server that has a production life that
is OVER.  I know you don't like Ebay and you probably think that
everything on it is junk, but people are selling HP servers on
it right now that are more powerful than yours and younger than
it for under a hundred bucks - see:

http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&category=0&item=5754427891&;
rd=1

The fact of the matter is that ANY life you can get out of this
server today is found money - it's a freebie.  HP, 8 or 10 years
ago when they designed this server would have told you THEN that
they wern't expecting it to be in service today.

Now to be honest I have a soft spot for older hardware, the
gateway router system that this very message is passing though just
happens to be a dual Pentium Pro with 128MB of ram and a 4GB
SCSI disk on an Adaptec controller, here's it's dmesg:

$ dmesg
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.11-RELEASE #0: Mon Feb 21 04:13:14 PST 2005
    root@nat-rtr.freebsd-corp-net-guide.com:/usr/src/sys/compile/NATRTR
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium Pro (199.43-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x619  Stepping = 9

Features=0xfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV>
real memory  = 134217728 (131072K bytes)
avail memory = 126894080 (123920K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc03a9000.
Pentium Pro MTRR support enabled
md0: Malloc disk
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
IOAPIC #0 intpin 16 -> irq 2
IOAPIC #0 intpin 17 -> irq 16
IOAPIC #0 intpin 18 -> irq 17
IOAPIC #0 intpin 19 -> irq 18
pci0: <PCI bus> on pcib0
isab0: <Intel 82371SB PCI to ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 ATA controller> port 0xf000-0xf00f at device 7.1 on
pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: <S3 Trio graphics accelerator> at 17.0 irq 2
de0: <Digital 21140A Fast Ethernet> port 0x9100-0x917f mem
0xe0800000-0xe080007f irq 16 at device 18.0 on pci0
de0: 21140A [10-100Mb/s] pass 2.2
de0: address 00:40:05:43:ce:5f
ed0: <NE2000 PCI Ethernet (RealTek 8029)> port 0x9200-0x921f irq 17 at
device 19.0 on pci0
ed0: address 52:54:05:f2:ab:67, type NE2000 (16 bit)
ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0x9300-0x93ff mem
0xe0801000-0xe0801fff irq 18 at device 20.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcefff on isa0
pmtimer0 on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model MouseMan+, device ID 0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: IEEE1284 device found /NIBBLE
Probing for PnP devices on ppbus0:
ppbus0: <EPSON Stylus C84> PRINTER ESCPL2,BDC,D4
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
IP packet filtering initialized, divert enabled, rule-based forwarding
enabled, default to deny, logging disabled
SMP: AP CPU #1 Launched!
afd0: 239MB <IOMEGA ZIP 250 ATAPI> [239/64/32] at ata0-master PIO3
Waiting 15 seconds for SCSI devices to settle
de0: enabling Full Duplex 100baseTX port
Mounting root from ufs:/dev/da0s1a
da0 at ahc0 bus 0 target 0 lun 0
da0: <MICROP 4343WS X502> Fixed Direct Access SCSI-2 device
da0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit), Tagged Queueing
Enabled
da0: 4146MB (8491920 512 byte sectors: 255H 63S/T 528C)
cd1 at ahc0 bus 0 target 6 lun 0
cd1: <TOSHIBA CD-ROM XM-5701TA 0167> Removable CD-ROM SCSI-2 device
cd1: 10.000MB/s transfers (10.000MHz, offset 8)
cd1: Attempt to query device size failed: NOT READY, Medium not present
cd0 at ahc0 bus 0 target 3 lun 0
cd0: <SONY CD-R   CDU926S 1.1f> Removable CD-ROM SCSI-2 device
cd0: 10.000MB/s transfers (10.000MHz, offset 15)
cd0: Attempt to query device size failed: NOT READY, Medium not present

This is a real live system I just built last week.  It runs a champ.

And it was COMPLETELY free to me.  I assembled it literally from a pile
of parts that I got from a customer who was scrapping them.  Said pile
included cases even, it also includes a second dual Ppro motherboard
(sans CPU's).

It replaced a AMD K6 200Mhz system that also ran fine, and I also got
free.

And these aren't the only ex-Windows, ex-Novell and ex-other systems
I've used over the years.  Nobody could be more of a proponent of
rescuing old gear than I am.

But, I have learned something in dealing with the older gear, and that
is that you must be extremely flexible with it.  If I get 2 systems
that are flakey, I will swap parts between them in an effort to get
1 stable system.  More commonly though I have a half dozen or more
older systems in the pool at a time, that parts move around between,
and that new systems in a state of disrepair come into, and old systems
that are in a state of stability go out of to friends and others who
need systems.

And you do have to call it quits on some gear eventually.  I finally
last year got rid of the last of the 486 stuff that I had sitting around,
and I had some really nice 486 servers, EISA SCSI and
the rest.  The Pentium stuff that's non-Pro and non P2 is going this
year,
as well as all the AT case style stuff.

> As it stands now, all I know for sure is that FreeBSD apparently cannot
> support what Windows can support, and nobody call tell me why.
>

You must understand Anthony that in the FreeBSD and Linux (and sometimes
in the Microsoft and Solaris) worlds, that problem solution is approached
rather differently.  We are dealing with a lot of people here who have
paid nothing for the software, have quite often got the hardware for
free (your Vectra despite your paying $9K for it once, has been fully
depreciated to $0.00 by today) and because of that aren't interested in
paying for a formalized analyze-it-to-death-before-doing-anything
problem solving approach.  Instead, problem solving is done
scientifically,
you make a hypothesis of why something's broken, then test it.  Granted
this method is much more inefficient, but it is cheaper.  And it is not
cost effective to use the expensive analysis on a system that costs
nothing.

If you insist on going that route, your only hope is interesting the
developer of the ahc driver in your problem.  Start by filing a PR in
the correct manner (ie: by following the instructions in the Handbook)
and if that does not get a response from the developer (the PR's are
mailed to the developers of the drivers) then read the source code
of the driver to find out who it is, look up his e-mail address on the
website, send him an e-mail begging to take a look at your PR, and
stop wasting our time here.

Ted



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?LOBBIFDAGNMAMLGJJCKNGEKDFAAA.tedm>