Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Feb 1998 12:41:10 -0700 (MST)
From:      Steve Grandi <grandi@noao.edu>
To:        freebsd-stable@FreeBSD.ORG
Subject:   I need a strategy for making my STABLE installation stable
Message-ID:  <Pine.LNX.3.96.980216115540.9052A-100000@mirfak.tuc.noao.edu>

next in thread | raw e-mail | index | archive | help
My STABLE system hasn't been very stable: I've been averaging one system
crash a day for the past week or so.  The frequency of crashes is
increasing with perhaps one crash a week averaged the past 3 months.  I
need some help in devising a stratey to make things stable...

The hardware:  PentiumPro-200 (Venus MotherBoard), 128 MB of RAM, Adaptec
2940 Ultra-Wide SCSI controller, two Seagate ST32155W 2GB disks, a
Micropolis 3391WS 9GB disk, Plextor SCSI CD-ROM, Intel EtherExpress Pro
10/100B Ethernet card. 

The System: FreeBSD 2.2.5-STABLE kept up-to-date via CVSUP

What's the system doing: DNS server, Semdmail server, FTP server, Net News
server.

Ever since I upgraded to 2.2.5-RELEASE in late November, I've seen far too
many system crashes.  About half the time, the crash would be followed by
a reboot.  The other half of the time the system would just hang with no
response from the console keyboard or active rlogin sessions (but
sometimes the system would still answer PINGs).  Crashes seemed to follow
heavy disk I/O and/or paging (usually soon after an INN expire with a
200MB+ history file). 

Despite nearly 15 years experience with BSD (going back to 4.1BSD on VAX
11/750s), I am sometimes not a very bright lad and it took me a LONG
time to realize that system panics are not noted in any system logs
after a reboot.  I finally wised up and started playing some games.

I compiled DDB into the kernel and had several crashes that caused a drop
into the debugger with the result:

Fatal Trap 12 page fault while in kernel mode
...
supervisor read, page not present
...
current process = 4 (update)
...

I still haven't managed to capture a core file so I won't attempt to type
in the traceback.  I think I have dumpon configured properly through the
dumpdev variable in /etc/rc.conf; but today's perusal of the man pages
seems to indicate that savecore won't save a crash dump if /kernel isn't
the same as the kernel running at the time of the crash.  So I need to
stop tweaking things.

So what strategy should I follow to make the system stable and make the Users
happy again?  Thoughts that I have had:

1) Capture a crash dump and see where that leads me.

2) Start swapping hardware (I have some new memory --- Parity memory
this time!  -- on order and I do have a spare Adapatec board I can lay
my hands on).

3) Keep tweaking the kernel config file.  So far, I have increased
values for MAXDSIZ, DFLDSIZ and NMBCLUSTERS and deleted the options MFS
and AHC_ALLOW_MEMIO.  The next item on my hit list would be deleting
AHC_TAGENABLE.

Any advice out there?

Steve Grandi, National Optical Astronomy Observatories, Tucson, Arizona USA
Internet: grandi@noao.edu  Voice: +1 520 318-8228



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.980216115540.9052A-100000>