Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Dec 95 11:59 WET
From:      uhclem%nemesis@fw.ast.com (Frank Durda IV)
To:        bugs@freebsd.org
Subject:   Mission Impossible-style crashes on 2.1.0
Message-ID:  <m0tLwDO-000C0aC@nemesis.lonestar.org>

next in thread | raw e-mail | index | archive | help
I have several systems (6) that I recently upgraded to 2.1.0-RELEASE.
These systems all ran 2.0.5 previously, and had no crashes in at
least the previous 90 days (very nice).

However, in the seven days since upgrading to 2.1.0, the same hardware
has experienced an average of 10 unexpected reboots, or just over
one a day.  The systems with more load (news vs no news) crash more often.

Usually these crashes happen when I am not around, so all I find is login:
prompts and no clues to the cause.  /var/log/messages simply shows
that the system was running its normal stuff and then it was booting.

Finally it happened in front of me on one of the systems.  The system was
reasonably idle (tind was active and I was in vi writing and no X), when
the console keyboard went dead.  All the keyboard lights turned off
(NUM LOCK is normally on) and disk activity stopped.  I tried changing
screens and hitting a few keys like NUM LOCK, but no action.  I then
unplugged and reconnected the keyboard.  That didn't help either.
Then around 15 seconds after the system went dead, the screen cleared and
the system rebooted.

This has happened twice when I was at the console, and the above
actions were taken during one of those two events.  All residual
clues suggest that this is the same failure that occurs when I am away.
(All systems are protected by UPS power, so this isn't a power thing and
they are in locations over a 30-mile area.)

Since there is no visible panic and nothing in the logs, I am
looking for suggestions on how to investigate this problem.  I don't
think a triple-fault is occurring, because the system reset would occur
instantly.   The odd thing is the 15-second delay is consistent,
like it is deliberate.   (I wonder if a panic is occurring and not
being displayed for some reason and the 15-second timer for
press a key to avoid a reboot is running.)

(On one system the system froze as above, but stayed there until I
 reset it manually several hours later.  Again, all keyboard lights were
 off and the logs had nothing useful.  Since this only happened once on
 one machine, I'll treat this as a different issue for now.)

On two of the systems where I can't put up with this number of crashes, 
the 2.0.5 kernel is now being run.  After two days, those systems no
longer crash.

These systems are a mix of 486DX/SX-33/25/100 and Pentium 75/90, all with
SCSI, some with IDE+SCSI, no-CD-ROMs at the moment, between 8 and 16Meg of
RAM.  All have WD/SMC Ether cards (usually 8013EW) and the drivers are
active.

Anyway, if anyone has seen something like this, or has suggestions on how
to get more useful information from the system when this happens, I would
appreciate it.  (I do have access to port 80 and port 300 debuggers if
that will help.)  I really don't want to have to run the 2.0.5 kernel on
2.1.0 systems.

Thanks!

Frank Durda IV <uhclem@nemesis.lonestar.org>|"The Knights who say "LETNi"
or uhclem%nemesis@fw.ast.com (Fastest Route)| demand...  A SEGMENT REGISTER!!!"
...letni!rwsys!nemesis!uhclem               |"A what?"
...decvax!fw.ast.com!nemesis!uhclem         |"LETNi! LETNi! LETNi!"  - 1983




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m0tLwDO-000C0aC>