Skip site navigation (1)Skip section navigation (2)
Date:      11 Aug 2002 17:44:50 -0400
From:      Jim Frost <jimf@frostbytes.com>
To:        freebsd-stable@freebsd.org
Subject:   FreeBSD 4.6 rl0 and xl0 watchdog timeout problems (and solution)
Message-ID:  <1029102290.9472.188.camel@snowball.frostbytes.com>

next in thread | raw e-mail | index | archive | help
I'm posting this mostly so that someone who runs into the same problem
can perhaps find the message.

Short version: Try a different slot.  For some reason BSD doesn't see
interrupts in slots that share PCI interrupts.

Long version:

I recently bought a new PC to use as a server to replace an aging Cobalt
Qube2.  The Qube was a great little box ... or it was until the last
security update which broke something in the kernel such that the box
hangs regularly.  Sun's "support", even their paid "support", has a
couple of workarounds that reduce the frequency but they are clearly not
interested in fixing it, and the whole reason I bought the box was so
that I could manage it with point-and-click; I don't really feel like
tracking down sources and building my own kernels anymore.

Anyway, one of the reasons for using the Qube2 is that it's not Windows
and it's not Intel so almost nobody's attack scripts will work even if
the machine has a hole I haven't patched yet on a service that the
firewall and machine configuration exposes.

Not wanting to spend the money on a newer Cobalt box given the crappy
support I got with the one I have, I decided to give in and run an Intel
box again.  No way was I going to run Windows on an exposed box, and I'd
prefer not to run Red Hat (as I do on my laptop) because it's the first
target of the script kiddies.  BSD seemed like a good solution and one
which I'm fairly familiar with from days past.  Besides, my pro-BSD
buddies raved about how fast and stable it is.

So I bought some fairly generic PC from a local supplier: an MSI board
of some sort with a hunk o' RAM and disk, a 1.6GHz P4, and a DLink
DFE-530TX+ ethercard.  Nothing special these days, but not junk either,
and way more capability than I really need on my home server (hey, that
Qube2 was working just fine until Sun broke it).  The local PC company
couldn't guarantee the system would run FreeBSD but they burned it in
with WinXP so at least I knew the parts worked, and the net tells me
that all this stuff should work on BSD.  Besides, at this point the
UNIXen have pretty much got the PC hardware figured out, right?

I ordered up a FreeBSD 4.6 subscription from bsdmall and got to
installing it.  First impression: That installer sucks ass.  I mean,
sucks like the stuff we used to get from Sun in the 3.x days.  Sucks
worse than SysVR3 did.  Sucks sucks sucks.  Never mind that the X11
configuration hung and I had to give up on that and rerun the install
and skip it (Red Hat has got that /nailed/ at this point), the thing
that really pisses me off is that I just wanted it to install everything
on the disk.  What the hell, the disk space is cheap and I am not sure
what I'm going to want.  So far as I can tell there's no way to do that,
so I had to check off like a thousand packages one at a time.  That
SUCKS.  Primitive, irritating, and gawdawful easy to fix.  Wassup with
that?

Not an auspicious start, but I still managed to get the whole install
done in about half the time of any Windows product I've installed in the
last seven or eight years, so it's not /that/ big a deal.  It just looks
way lame relative to any Linux release we've seen since like 1997.

Anyway, I fired it up and got "rl0: watchdog timeout" errors.  Shit. 
I've seen those before from waaaay back when SunOS was my favorite
system, and it meant that the ethernet cable fell out.  The man page for
the rl driver says that that's probably what it is.  Problem is, the
cabling checks out: it was showing good connection lights on both ends. 
Just to be sure I pulled known-good cabling from other stuff.  Still no
go.

I thought maybe the thing was incorrectly sensing the media; I still run
10baseT because it's here and it works and I don't see why I should
spend money on a new hub.  ifconfig said it autoconfigured to
10baseT/UTP but just to be sure I forced the config.  Same problem.

Ok, I've used the various UNIXen enough to know that they're often
sensitive to card firmware versions; maybe the 530TX+ has new firmware
that screwed it up.  So I picked up a 3c905 card and threw it in.  Same
problem.

That didn't leave much.  At this point I figured it's an interrupt
problem of some sort and started looking at the PCI configuration in the
BIOS.  I remember something about NT et al needing something or other
disabled to work on new motherboards and figure that maybe the PC vendor
set that up, but I don't see anything out of the ordinary.

But while I was in there I noticed that four slots share two interrupt
configurations: Slots 1 and 3 share one, and slots 2 and 6 share
another.  Hmm.  The ethercard is in slot 3, one of the shared slots.

On a hunch I move the ethercard to slot 4 and reboot.  Voila, works like
a champ.

I'd be interested in an explanation if someone has one, and if nobody
does then I'd be willing to help track down some details to fix it so
some other poor schmuck doesn't waste a lot of time tracking it down.

So far this has been way more effort than it should have been and I
haven't even gotten to configuring the services I need.  The only reason
I didn't just dump it in favor of RH7.3 was that my discs are at work
right now.  But, now that it's working, I'm going to proceed and hope
for the best.

Hell, if nothing else it boots a lot faster than Linux.

jim frost
jimf@frostbytes.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1029102290.9472.188.camel>