Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 11 Oct 2008 12:06:08 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Tuc <ml@t-b-o-h.net>
Cc:        freebsd-questions@FreeBSD.org
Subject:   Re: Worth persuing a KDB: stack backtrace: ?
Message-ID:  <20081011190608.GA70292@icarus.home.lan>
In-Reply-To: <200810111742.m9BHg9pK099087@setup.house.tucs-beachin-obx-house.com>
References:  <20081011152906.GB65652@icarus.home.lan> <200810111742.m9BHg9pK099087@setup.house.tucs-beachin-obx-house.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 11, 2008 at 01:42:09PM -0400, Tuc wrote:
> > 
> > On Sat, Oct 11, 2008 at 11:22:21AM -0400, Tuc wrote:
> > > 	I have a 5.5-STABLE laptop thats been having issues lately, mostly
> > > related to memory. I bought new chips, and I think I narrowed it down to one
> > > of the Dimm slots being bad. I did a memtest for 25 hours and it seemed stable.
> > 
> > memtest86+ would definitely detect a DIMM slot being bad, so running it
> > for 25 hours successfully means the DIMM and the DIMM slot is likely fine.
> >
> 	Sorry, 2 corrections. :)  
> 
> 	1) I ran memtest86+ 3.4
> 	2) If I tried to run the memory test on the slot I thought was bad, 
> it looked like whatever is underneath the memtest86+ (Linux?) would crap out
> within 6 seconds of startup. I've only tested the "B" slot after running into
> so many issues with the "A" slot (And I checked, you can run one slot only
> on this laptop... Dell Inspiron 8200)

Yes, memtest86 and memtest86+ are both Linux-based.

> > > I started up and started downloading a backup of over 5K emails. (All have
> > > to go through mimedefang, procmail and sendmail... So the system was a bit
> > > sluggish. I started to get things like :
> > > 
> > > Oct 10 22:06:29 himinbjorg kernel: KDB: stack backtrace:
> > > Oct 10 22:06:29 himinbjorg kernel: kdb_backtrace(c3053200,1,dbb54c04,dbb54bf0,c0
> > > 73ba78) at kdb_backtrace+0x29
> > > Oct 10 22:06:29 himinbjorg kernel: getdirtybuf(dbb54be0,0,1,cc1c81e8,1) at getdi
> > > rtybuf+0x27
> > > Oct 10 22:06:29 himinbjorg kernel: flush_deplist(c305354c,1,dbb54c04) at flush_d
> > > eplist+0x34
> > > Oct 10 22:06:29 himinbjorg kernel: flush_inodedep_deps(c216d000,715e,c089bcf8,c0
> > > 808b16,ef) at flush_inodedep_deps+0x7d
> > > Oct 10 22:06:29 himinbjorg kernel: softdep_sync_metadata(dbb54ca0) at softdep_sy
> > > nc_metadata+0x8c
> > > Oct 10 22:06:29 himinbjorg kernel: ffs_fsync(dbb54ca0) at ffs_fsync+0x33e
> > > Oct 10 22:06:29 himinbjorg kernel: fsync(c3549600,dbb54d04,1,1,286) at fsync+0x1
> > > 03
> > > Oct 10 22:06:29 himinbjorg kernel: syscall(2f,2f,bfbf002f,80fef20,0) at syscall+
> > > 0x227
> > > Oct 10 22:06:29 himinbjorg kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > > Oct 10 22:06:29 himinbjorg kernel: --- syscall (95, FreeBSD ELF32, fsync), eip =
> > >  0x28181ca7, esp = 0xbfbf6d1c, ebp = 0xbfbf86e8 ---
> > 
> > This looks more like a filesystem problem, not a memory problem.  All
> > of the functions listed in the backtrace show UFS/FFS problems and
> > filesystem metadata issues of some kind.
> >
> 	Yup, sorry. I should have said that I realized it was a disk issue, but
> that I was thinking that maybe it wasn't really the OS's fault that there are
> "deeper" problems with the laptop that could be manifesting themselves here.
> When I sent all sorts of copious debug to Dell they just told me to replace
> the motherboard completely. I didn't want to spend the $$. At first all the
> problems manifested as memory, memtest86+ would lock up, "crash", etc with the
> memory I had. I bought new memory ($50 for a gig) BEFORE Dell just told me to
> replace the whole motherboard. With the new memory in, only using the "B" slot,
> it appeared more stable. I found these issues only accidentally. I tend to
> type "dmesg" when my fingers are idle and my brain is spinning thinking of
> something and thats when I saw this cruft.

Okay, so now we're talking about disk issues.  So all we've confirmed at
this point is 1) you have bad memory or a bad RAM slot, and 2) you have
a disk that has problems, a corrupted filesystem, or both.

I suppose it's possible for filesystem corruption to occur due to bad
memory, but it's much more likely that you'd experience a kernel panic
before that even had a chance of happening.

At this point I'm really not sure what to tell you, or if FreeBSD can
even help.  You've confirmed you have bad hardware, so the solution at
this point should be obvious.  :-)

> > Booting the machine in single-user mode and run "fsck -y".  I'm betting
> > you'll find errors.  If not, then it's probably a kernel bug -- see
> > below, however.
> >
> 	Probably could use it, yea. It had locked up a few times so I'm
> sure the filesystems weren't in great shape. I thought I had offline 
> fsck'd them, but now not so sure.

I'd recommend setting background_fsck="no" in /etc/rc.conf in the
future.  Backgrounded fsck does not catch all filesystem errors; this
has been discussed very thoroughly on the -stable list.

> > I doubt you're going to get much support on this, since you're running
> > FreeBSD 5.5, which is no longer supported.  Believe me: you will get
> > continual push-back from the rest of the FreeBSD developers asking for
> > support on 5.5.  The RELENG_6 series is on its way out as well, so
> > you should consider installing RELENG_7 (specifically 7.1-BETA at
> > this point).
> > 
> 	Well, that was 1/2 of the reason why I asked if it was even
> worth it to trace it out. 1/2 was the fact its 5.5, the other 1/2 was
> that I've already been told to replace the motherboard. :)
> 
> 	I tried going to 6.X on this machine for a few weeks once before, 
> constantly locked up in the booting of the kernel. I haven't had a spare 
> second otherwise to consider going to 7. I didn't think anyone would
> really want to help on 5.5, but figured I'd toss it out there and see if
> anyone thought it worth while.

I don't think anyone will be very willing to help with this now that
you've confirmed the system has bad memory, possibly a bad DIMM slot,
and very likely filesystem corruption.

As I said above, I think the solution at this point is obvious.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081011190608.GA70292>