Date: Sun, 7 Apr 1996 14:19:01 -0400 (EDT) From: Brian Tao <taob@io.org> To: FREEBSD-HACKERS-L <freebsd-hackers@freebsd.org> Subject: 'ps' or procfs stuck in disk wait??? Message-ID: <Pine.NEB.3.92.960407140423.1573a-100000@zot.io.org>
next in thread | raw e-mail | index | archive | help
I came across a weird one today. I noticed the load on one of our shell servers was consistently above 1.0 (rare for this machine with only 50 users on it). I tried 'ps aux | head' to get a quick listing of the process chewing up the CPU. No response, can't ^C or ^Z, can't kill -9 it from another tty. 'ps x' and 'ps u' worked fine for listing my own processes, but I couldn't get a full list with 'ps a'. I resorted to "top -nu 9999" to see what was going on. There was a runaway vi which I killed, but the problem persisted. I noticed about three dozen instances of cron, sh, ps and egrep, all paged out. They were spawned from a cron job I have running every five minutes to check on zombie and detached processes. I was able to kill off everything except the ps's. Doing a "ps auxp" on one of the pid's revealed it was sittin in disk wait. I then called "ps auxp" on each of the pid's from the output of 'top'. It hung on a pwd_mkdb process (password files here are regenerated from a master copy every 30 minutes on the shell servers). According to 'top', the process wasn't using any CPU and it was sleeping. 'ps would hang whenever I pointed it at that pid. I looked inside /proc/1522 (the procfs directory associated with the pwd_mkdb process) and I was able to cat the status file. Unfortunately, I didn't save it before it was wiped off my xterm by a screen clear. :( The curious thing is that any read operation on the "mem" file would hang. I think this is why 'ps' hangs when trying to retrieve process information. Any ideas why this would happen? A bug in procfs or the VM system? I've never seen anything like this before. The system will be rebooting itself in about ten minutes, and I doubt I will be able to recreate this problem. Stock 2.1.0R, 128MB physical, 384MB swap, about 8% allocated when I discovered this condition... I'm stumped on this one. -- Brian Tao (BT300, taob@io.org) System and Network Administrator, Internex Online Inc. "Though this be madness, yet there is method in't"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.92.960407140423.1573a-100000>