Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Mar 2002 02:20:39 -0500
From:      Mike Nowlin <mike@argos.org>
To:        freebsd-stable@freebsd.org
Subject:   4.5-S crashing like clockwork
Message-ID:  <20020320022039.A2315@argos.org>

next in thread | raw e-mail | index | archive | help
I have a pair of identical 1GHz Duron systems (with the exception of the
hard drive size) - they were installed with the same 4.3-R CD around Feb 20,
and both upgraded to 4.5-S on Feb 23.  All was good until around March 7,
when I cvs'd one of them to 4.5-S as of that date, rebuilt everything, and
rebooted.  

The one with the 40G drive in it has been running just fine - no crashes.
The one that I updated on March 7 (60G drive in it) decided to adopt a time
bomb mentality - it crashes like clockwork, literally, every 24 hours.
(Actually, it's usually 23:58:00 or somewhere around there - floats around a
minute or so - I'm guessing that might have something to do with the amount
of time the fscks take.)

My first guess was the BIOS power management.  Took a drive over to the colo
and compared the two machines - identical settings on each.  Just to be
safe, I turned off as much of the power management I could on the crashing
machine (these are the new "don't let the guy shut it off completely"
breed), and made sure that none of the BIOS timers were set anywhere around
24 hours - set them as low as possible (5 min here, 1 min there) in hopes
that it might help track the problem down...  Unfortunately, it still blows
up every 24h.  (BTW: this is 24 hours after a cold reboot, "shutdown -r
now", or a crash - doesn't matter what time of day it starts.)

Since Mar 7, I've cvs'd and rebuilt every few days, hoping that it will go
away....  No luck yet... :(

The reason I mention the drive sizes is due to the process that is causing
the crash - it's always the syncer:

Fatal trap 12: page fault while in kernel mode
fault virtual address	  = 0xd80fad5c
fault code    		  = supervisor read, page not present
instruction pointer	  = 0x8:0xc0180258
stack pointer		  = 0x10:0xccbe4f34
frame pointer		  = 0x10:0xccbe4f58
code segment		  = base 0x0, limit 0xfffff, type 0x1b
     			  = DPL 0, pres 1, def32 1, gran 1
processor eflags          = interrupt enabled, resume, IOPL = 0
current process	          = 6 (syncer)
interrupt mask	      	  = none
trap number		  = 12
panic: page fault

syncing disks... panic: lockmgr: pid 0, not exclusive lock holder 663289855
unlocking
Uptime: 23h57m10s


...that "panic: lockmgr: pid 0,..." message was new tonight - never showed
up before.  I'm ignoring it for now, unless there's some reason that I
shouldn't.

kgdb backtrace looks like:
(kgdb) bt
#0  0xc0151dde in dumpsys ()
#1  0xc0151baf in boot ()
#2  0xc0151fd4 in poweroff_wait ()
#3  0xc014c5e8 in lockmgr ()
#4  0xc017d776 in vfs_unbusy ()
#5  0xc0180eea in sync ()
#6  0xc015194a in boot ()
#7  0xc0151fd4 in poweroff_wait ()
#8  0xc026085e in trap_fatal ()
#9  0xc0260531 in trap_pfault ()
#10 0xc026011b in trap ()
#11 0xc0180258 in sync_fsync ()
#12 0xc017e8e7 in sched_sync ()
(kgdb) 


I just compiled the kernel with -g a little while ago, and am waiting for
00:50 EST tomorrow (when it hurls once more) to get a little more info.


Any ideas?

Thanks - Mike





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020320022039.A2315>