Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Dec 2004 10:26:21 +0000 (GMT)
From:      Robert Watson <rwatson@freebsd.org>
To:        Benjamin Lutz <benlutz@datacomm.ch>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: slow system freeze
Message-ID:  <Pine.NEB.3.96L.1041223102130.89131C-100000@fledge.watson.org>
In-Reply-To: <200412230408.48770.benlutz@datacomm.ch>

next in thread | previous in thread | raw e-mail | index | archive | help

On Thu, 23 Dec 2004, Benjamin Lutz wrote:

> I'm having a Problem with FreeBSD 5.3 here. The system slowly freezes.
> 
> It starts with one application that just locks up. Other applications 
> still work, but when I switch to them and do stuff in them, they usually 
> lock up after a few seconds as well. Starting new processes or logging in 
> at a physical console does not work anymore, and after about 30 secs the 
> whole system is frozen. Nothing is printed to the first physical console 
> or the logs. This has happened both under load and while the system was 
> mostly idle (just me irc'ing).
> 
> Now, I realize that this description is very vague, but maybe you can tell 
> me how to even start debugging this? There's no panic, ie no kernel dump 
> I could analyze.
> 
> I'm no kernel developer, but if I had to guess it sounds like a scheduler 
> problem, ie some table being overwritten.
> 
> I've attached my dmesg for reference.

This is actually fairly symptomatic of a deadlock, either due to a leaked
lock, a literal lock deadlock, or a resource deadlock.  If you can get to
the console, either by switching away from X or via a serial console,
compile your kernel with DDB+KDB, break to the debugger, and do the
following:

  ps
  show threads
  show lockedvnods

You might also try building with INVARIANTS and WITNESS support, and see
if the failure mode becomes an assertion failure instead of a wedge.  With
WITNESS compiled in, you can also get more extensive debugging information
using "show locks" and "show witness".

Ideally, with a serial console, you can copy and paste the results of
these commands into an e-mail.  If you don't have a serial console, it's a
bit more laborious: however, what you're looking for is lots of threads
blocked in similar wait channels in the ps output.  You'll see lots of
output like this:

db> ps
  pid   proc     uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
  586 c168adc8    0   585   585 0000002 [SLPQ ttyin 0xc13e1c10][SLP] cu
  585 c16ca000    0   559   585 0004002 [SLPQ ttyin 0xc13e5410][SLP] cu
  559 c16867e0    0   558   559 0004002 [SLPQ pause 0xc1686814][SLP] csh
  558 c16869d8    0     1   558 0004102 [SLPQ wait 0xc16869d8][SLP] login
  557 c1686bd0    0     1   557 0004002 [SLPQ ttyin 0xc13ee810][SLP] getty
  556 c1686dc8    0     1   556 0004002 [SLPQ ttyin 0xc13f4c10][SLP] getty
                                           ^^^^^^^^^^^^^^ this stuff

What we want to know is what the common entries in the "wmesg" column are,
particularly for processes that are known to be in a wedge state.  If
doing this by hand, we don't need the output of "show threads", but
knowing how many lines and what sort of lines appear in "show lockedvnods"
would be useful.

You can find some reasonable documentation on how to get started on kernel
debugging in the handbook.  I'm not sure it addresses live debugging via
DDB in great detail, so I guess I'll take a look and flesh it out some
over the holidays if there isn't enough information there.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041223102130.89131C-100000>