From owner-freebsd-stable@FreeBSD.ORG Tue Sep 16 19:49:28 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 867A01065674 for ; Tue, 16 Sep 2008 19:49:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 060998FC21 for ; Tue, 16 Sep 2008 19:49:27 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m8GJn1qi096729; Tue, 16 Sep 2008 15:49:21 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 16 Sep 2008 11:30:22 -0400 User-Agent: KMail/1.9.7 References: <1f51039c0809150857l50b6be8eu848e21189a4175d6@mail.gmail.com> <200809151606.23933.jhb@freebsd.org> <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com> In-Reply-To: <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809161130.22736.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Tue, 16 Sep 2008 15:49:22 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8265/Tue Sep 16 15:26:49 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.3 required=4.2 tests=AWL,BAYES_00, DATE_IN_PAST_03_06,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Tim Chen Subject: Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Sep 2008 19:49:28 -0000 On Tuesday 16 September 2008 02:02:14 am Tim Chen wrote: > On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin wrote: > > > On Monday 15 September 2008 11:57:02 am Tim Chen wrote: > > > Currently I was running a mail server using a netapp filer as backend > > > storage. > > > >From time to time, the whole system get stuck and lasted for 3-5 > > minutes. > > > But > > > after that, everything recovers normally. During the "stuck" moment, > > using > > > ps > > > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D" > > > status. > > > The command df certainly does not reponse either. > > > > Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck > > threads > > when they hang? If it is "lockf", then make sure you have an up-to-date > > RELENG_6 kernel as there was a recent fix for a "lockf" hang. > > > > Thanks for your suggestion. After trying to 'ps axl', it seems all the "D > status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to > keep delving the problem? > > My system is RELENG_7 within one week, I always make world to keep my system > up to date. > > > > > > Alternatively, if things are stuck in "nfsreq", it may be useful to use > > tcpdump to look at the NFS requests your client is making. nfsstat can > > also > > be useful as you can see which counters are increasing during a hang. > > > > When system was stuck, counters of nfsstat grows slowly. It seems only > read, write, create, remove in RPC counts were increased. > > As to tcpdump, since I am not familiar with that, I will try to read some > doc and make some tests. > > Thanks very much for your kindly help. Hope the problem can be solved soon. Also, do the nfsstats thing I suggested. During a hang, you can do something like 'nfsstat > one ; sleep 1 ; nfsstat > two' and compare the 'one' and 'two' files to see which counters (if any) are being bumped during the hang. -- John Baldwin