Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Sep 2008 11:30:22 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Tim Chen <gphoto6@gmail.com>
Subject:   Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000
Message-ID:  <200809161130.22736.jhb@freebsd.org>
In-Reply-To: <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com>
References:  <1f51039c0809150857l50b6be8eu848e21189a4175d6@mail.gmail.com> <200809151606.23933.jhb@freebsd.org> <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 16 September 2008 02:02:14 am Tim Chen wrote:
> On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <jhb@freebsd.org> wrote:
> 
> > On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> > > Currently I was running a mail server using a netapp filer as backend
> > > storage.
> > > >From time to time, the whole system get stuck and lasted for 3-5
> > minutes.
> > > But
> > > after that, everything recovers normally. During the "stuck" moment,
> > using
> > > ps
> > > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> > > status.
> > > The command df certainly does not reponse either.
> >
> > Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck
> > threads
> > when they hang?  If it is "lockf", then make sure you have an up-to-date
> > RELENG_6 kernel as there was a recent fix for a "lockf" hang.
> >
> 
> Thanks for your suggestion. After trying to 'ps axl', it seems all the "D
> status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to
> keep delving the problem?
> 
> My system is RELENG_7 within one week, I always make world to keep my system
> up to date.
> 
> 
> >
> > Alternatively, if things are stuck in "nfsreq", it may be useful to use
> > tcpdump to look at the NFS requests your client is making.  nfsstat can
> > also
> > be useful as you can see which counters are increasing during a hang.
> >
> > When system was stuck, counters of nfsstat grows slowly. It seems only
> read, write, create, remove in RPC counts were increased.
> 
> As to tcpdump, since I am not familiar with that, I will try to read some
> doc and make some tests.
> 
> Thanks very much for your kindly help. Hope the problem can be solved soon.

Also, do the nfsstats thing I suggested.  During a hang, you can do something 
like 'nfsstat > one ; sleep 1 ; nfsstat > two' and compare the 'one' 
and 'two' files to see which counters (if any) are being bumped during the 
hang.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200809161130.22736.jhb>