From owner-freebsd-current@FreeBSD.ORG Tue Jun 16 13:40:40 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59407106564A; Tue, 16 Jun 2009 13:40:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 190868FC08; Tue, 16 Jun 2009 13:40:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A5C0446B8A; Tue, 16 Jun 2009 09:40:39 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 93B388A073; Tue, 16 Jun 2009 09:40:38 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Tue, 16 Jun 2009 08:12:48 -0400 User-Agent: KMail/1.9.7 References: <1242075474.72992.118.camel@hood.oook.cz> <4A36B6D8.8000701@FreeBSD.org> <20090616005810.GE1111@egr.msu.edu> In-Reply-To: <20090616005810.GE1111@egr.msu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906160812.49039.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 16 Jun 2009 09:40:38 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Adam McDougall Subject: Re: pointyhat panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 13:40:40 -0000 On Monday 15 June 2009 8:58:11 pm Adam McDougall wrote: > On Mon, Jun 15, 2009 at 10:02:16PM +0100, Kris Kennaway wrote: > > Pav Lucistnik wrote: > > panic: mtx_lock() of destroyed mutex @ /usr/src/sys/rpc/clnt_vc.c:953 > > cpuid = 2 > > KDB: enter: panic > > [thread pid 0 tid 100029 ] > > Stopped at kdb_enter+0x3d: movq $0,0x3f5fb8(%rip) > > db> bt > > Tracing pid 0 tid 100029 td 0xffffff00018e1000 > > kdb_enter() at kdb_enter+0x3d > > panic() at panic+0x17b > > _mtx_lock_flags() at _mtx_lock_flags+0xc5 > > clnt_vc_soupcall() at clnt_vc_soupcall+0x273 > > sowakeup() at sowakeup+0xf8 > > tcp_do_segment() at tcp_do_segment+0x23c9 > > tcp_input() at tcp_input+0x9ec > > ip_input() at ip_input+0xbc > > ether_demux() at ether_demux+0x1ed > > ether_input() at ether_input+0x171 > > em_rxeof() at em_rxeof+0x201 > > em_handle_rxtx() at em_handle_rxtx+0x4b > > taskqueue_run() at taskqueue_run+0x96 > > taskqueue_thread_loop() at taskqueue_thread_loop+0x3f > > fork_exit() at fork_exit+0x12a > > fork_trampoline() at fork_trampoline+0xe > > --- trap 0, rip = 0, rsp = 0xffffffff240a6d40, rbp = 0 --- > > > > The box is in kdb on serial console for now. May 9 -CURRENT, I think. > > > > This happened again. The trigger was this (^C of a find on a busy > netapp volume with a lot of other concurrent nfs traffic to the same > mountpoint): > > pointyhat# find . -name \*.bz2 -mmin -10 > ^Cnfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > nfs server dumpster:/vol/vol4/pointyhat: not responding > load: 4.54 cmd: find 93357 [rpccon] 11.19u 111.62s 0% 4848k > > About 5-10 minutes later the machine panicked. I'll try updating to a > newer -CURRENT. > > Kris > > This sounds like nearly exactly the same symptoms I noticed on > a -current machine a few months ago, I was doing a du on a > nfs mount, decided to ctrl-c it, got the not responding for a > while and a few minutes after the system paniced. I hadn't > had a chance to report it yet but I did find a workaround, > it is stable if I remove "intr" from the NFS mount options. > Hope this helps a little. These should be fixed in the latest HEAD. It would be good to re-enable "intr" and test it before 8.0 is released. -- John Baldwin