Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Apr 2002 17:55:37 +0200
From:      "Karsten W. Rohrbach" <karsten@rohrbach.de>
To:        "Marc G. Fournier" <scrappy@hub.org>
Cc:        freebsd-stable@FreeBSD.ORG
Subject:   Re: STABLE kernel panicking all too often ...
Message-ID:  <20020417175537.A32675@mail.webmonster.de>
In-Reply-To: <20020417093534.O99298-100000@mail1.hub.org>; from scrappy@hub.org on Wed, Apr 17, 2002 at 09:43:26AM -0300
References:  <20020417034229.D1D82BA05@i8k.babbleon.org> <20020417093534.O99298-100000@mail1.hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--wac7ysb48OaltWcw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Marc G. Fournier(scrappy@hub.org)@2002.04.17 09:43:26 +0000:
> > Also, I'm sure that this just shows my ignorance, but how can it be the=
 case
> > that the load averages are 67-46 when the CPU is 70% idle?  Those two f=
igures
> > seem to be at odds with each other based on my experience.
>=20
> Its relatively consistent:
>=20
> last pid: 13191;  load averages: 34.75, 41.81, 42.68    up 1+04:42:37  07=
:36:27
> 2904 processes:4 running, 2900 sleeping
> CPU states:  3.2% user,  0.0% nice, 29.3% system,  0.2% interrupt, 67.3% =
idle
> Mem: 2376M Active, 235M Inact, 285M Wired, 117M Cache, 199M Buf, 4348K Fr=
ee
> Swap: 3072M Total, 1089M Used, 1982M Free, 35% Inuse, 60K In

just a wild guess, judging from the _massive_ amount of idle time of the
box, a scenario:
- you got plenty of processes running there
- they might be i/o intensive
- your box swaps a lot of memory to/from the disk(s)
- you might have misbehaving storage devices (just an assumption)
- the vm subsystem in this scenario barfs on the number of swapped out
  pages

things i'd try:
- if possible, limit max. no of processes consuming all of your ram, to
  make the box not swap excessively to disk; this also gives you more
  inactive pages used for read cache, IIRC
- put in one or two more disks and distribute the swap load over the
  spindles. this would make the box more responsive, anyway
- check your dmesg output/syslog for scsi bus resets or other symptoms
  of bad cabling or broken disk hardware
- try to spread the load over several considerably smaller boxes, if
  possible
- compile a non-SMP kernel and look what happens. you appear to have
  enough cpu time in spare to try that. there might be a driver that
  SMP stumbles over.

those points are from the perspective of operations, intended as a
quick fix, not from the kernel hacker's point of view to "make things
right", but rather work around the actual problem.

as another wild guess i'd say there's some limit the vm subsystem the
kernel hits in a kind of race condition due to misbehaviour of hardware
in conjunction with the vm subsystem in conjunction with large amounts
of ram and swap on a SMP platform.=20
that's quite a big box you got there.

regards,
/k

--=20
> I'm not as think as you stoned I am.
KR433/KR11-RIPE -- WebMonster Community Founder -- nGENn GmbH Senior Techie
http://www.webmonster.de/ -- ftp://ftp.webmonster.de/ -- http://www.ngenn.n=
et/
GnuPG 0x2964BF46 2001-03-15 42F9 9FFF 50D4 2F38 DBEE  DF22 3340 4F4E 2964 B=
F46
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
Please do not remove my address from To: and Cc: fields in mailing lists. 1=
0x

--wac7ysb48OaltWcw
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org

iD8DBQE8vZr4M0BPTilkv0YRAsn6AJ43bKTt+nBo0NjaFV50E2YHQt6oMgCcDsIA
udV8SwdszxVI/FYN6lcUyCo=
=x9ZT
-----END PGP SIGNATURE-----

--wac7ysb48OaltWcw--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020417175537.A32675>