Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Apr 1996 14:19:01 -0400 (EDT)
From:      Brian Tao <taob@io.org>
To:        FREEBSD-HACKERS-L <freebsd-hackers@freebsd.org>
Subject:   'ps' or procfs stuck in disk wait???
Message-ID:  <Pine.NEB.3.92.960407140423.1573a-100000@zot.io.org>

next in thread | raw e-mail | index | archive | help
    I came across a weird one today.  I noticed the load on one of our
shell servers was consistently above 1.0 (rare for this machine with
only 50 users on it).  I tried 'ps aux | head' to get a quick listing
of the process chewing up the CPU.  No response, can't ^C or ^Z, can't
kill -9 it from another tty.

    'ps x' and 'ps u' worked fine for listing my own processes, but I
couldn't get a full list with 'ps a'.  I resorted to "top -nu 9999" to
see what was going on.  There was a runaway vi which I killed, but the
problem persisted.  I noticed about three dozen instances of cron, sh,
ps and egrep, all paged out.  They were spawned from a cron job I have
running every five minutes to check on zombie and detached processes.

    I was able to kill off everything except the ps's.  Doing a "ps
auxp" on one of the pid's revealed it was sittin in disk wait.  I then
called "ps auxp" on each of the pid's from the output of 'top'.  It
hung on a pwd_mkdb process (password files here are regenerated from a
master copy every 30 minutes on the shell servers).  According to
'top', the process wasn't using any CPU and it was sleeping.  'ps
would hang whenever I pointed it at that pid.

    I looked inside /proc/1522 (the procfs directory associated with
the pwd_mkdb process) and I was able to cat the status file.
Unfortunately, I didn't save it before it was wiped off my xterm by a
screen clear.  :(  The curious thing is that any read operation on the
"mem" file would hang.  I think this is why 'ps' hangs when trying to
retrieve process information.

    Any ideas why this would happen?  A bug in procfs or the VM
system?  I've never seen anything like this before.  The system will
be rebooting itself in about ten minutes, and I doubt I will be able
to recreate this problem.

    Stock 2.1.0R, 128MB physical, 384MB swap, about 8% allocated when
I discovered this condition... I'm stumped on this one.
--
Brian Tao (BT300, taob@io.org)
System and Network Administrator, Internex Online Inc.
"Though this be madness, yet there is method in't"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.92.960407140423.1573a-100000>