From owner-freebsd-questions  Thu Apr 12 14:11:57 2001
Delivered-To: freebsd-questions@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5159937B446; Thu, 12 Apr 2001 14:11:52 -0700 (PDT)
	(envelope-from dillon@earth.backplane.com)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.2/8.11.2) id f3CLBdc25565;
	Thu, 12 Apr 2001 14:11:39 -0700 (PDT)
	(envelope-from dillon)
Date: Thu, 12 Apr 2001 14:11:39 -0700 (PDT)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200104122111.f3CLBdc25565@earth.backplane.com>
To: "S. Natori" <natori@mad.scientist.com>
Cc: FreeBSD-questions@FreeBSD.ORG, FreeBSD-hackers@FreeBSD.ORG
Subject: Re: nfsd hangs in ``inode'' state
References:  <200104120423.NAA15537@mail.fureai.or.jp>
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:Hello All,
:
:I am running a FreeBSD-4.2 NFS server with dozens of FreeBSD-4.2 NFS
:clients on 100BaseTX LAN.  Recently I found that when the NFS server
:receives a lot of requests in a short period (e.g., 2 clients start X
:with gnome desktop simultaneously), all nfsd server processes hang in
:inode state.
:
:  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
:    0   440     1   0   2  0   360  132 accept Is    ??    0:00.00 nfsd: master (nfsd)
:    0   441   440   0 -14  0   352  124 inode  D     ??    0:03.49 nfsd: server (nfsd)
:    0   442   440   0 -14  0   352  124 inode  D     ??    0:00.17 nfsd: server (nfsd)
:    0   443   440   0 -14  0   352  124 inode  D     ??    0:00.02 nfsd: server (nfsd)
:    0   444   440   0 -14  0   352  124 inode  D     ??    0:00.01 nfsd: server (nfsd)
:
:I cannot kill or restart them. The consoles of the clients print ``NFS
:server not responding'' and I should restart the server. This occurs
:about once a week.
:
:I tried
:  (1) increasing the number of nfsd processes (4 -> 8, 20)
:  (2) replacing the server HDD (SCSI) with another ATA33 HDD
:  (3) changing mount_nfs options (tried removing tcp, adding soft,dumbtimer)
:but all failed to solve the problem.

    It sounds like a deadlock somewhere, probably with some other process.

    A full 'ps axlww' would be useful, and also a gdb backtrace of the
    processes in question (including the 'other' process stuck in some 
    weird wait state if you can find it).  You can gdb a live kernel
    in a meaningful fashion if you have the kernel.debug image of the
    kernel available somewhere.

    gdb -k <location-of-kernel.debug-image> /dev/mem
    proc 441
    back
    proc 442
    back
    proc 443
    back
    proc 444
    back
    proc <other-processes-stuck-in-weird-states>
    back

					-Matt

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message