From owner-freebsd-stable@FreeBSD.ORG Tue Oct 30 22:51:20 2007 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0FDF16A417 for ; Tue, 30 Oct 2007 22:51:20 +0000 (UTC) (envelope-from chris#@1command.com) Received: from mail.1command.com (mail.1command.com [75.160.109.226]) by mx1.freebsd.org (Postfix) with ESMTP id 7B0E213C48E for ; Tue, 30 Oct 2007 22:51:20 +0000 (UTC) (envelope-from chris#@1command.com) Received: from mail.1command.com (localhost.1command.com [127.0.0.1]) by mail.1command.com (8.13.3/8.13.3) with ESMTP id l9UMp1fn043474; Tue, 30 Oct 2007 14:51:09 -0800 (PST) (envelope-from chris#@1command.com) Received: (from www@localhost) by mail.1command.com (8.13.3/8.13.3/Submit) id l9UMp1eR043473; Tue, 30 Oct 2007 14:51:01 -0800 (PST) (envelope-from chris#@1command.com) Received: from hitme.hitometer.net (hitme.hitometer.net [75.160.109.235]) by webmail.1command.com (H.R. Communications Messaging System) with HTTP; Tue, 30 Oct 2007 14:51:01 -0800 Message-ID: <20071030145101.8oyf6b1wws0ksoc0@webmail.1command.com> X-Priority: 3 (Normal) Date: Tue, 30 Oct 2007 14:51:01 -0800 From: "Chris H." To: Kris Kennaway References: <20071004165755.GA1049@pp.htv.fi> <47120D83.1010703@FreeBSD.org> <20071015203202.GA17964@pp.htv.fi> <20071016004637.GA79351@cdnetworks.co.kr> <20071016185714.GB2186@pp.htv.fi> <20071016130146.pfyan4vs5cwgsoc0@webmail.1command.com> <20071016202251.GC4047@lava.net> <47151FF7.2080501@FreeBSD.org> <20071019011316.5ffmycud8g0oggsg@webmail.1command.com> <47278CF6.6000403@FreeBSD.org> In-Reply-To: <47278CF6.6000403@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: H.R. Communications Internet Messaging System (HCIMS) 4.1 Professional (not for redistribution) / FreeBSD-5.5 Cc: freebsd-stable@FreeBSD.org Subject: Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2007 22:51:21 -0000 Quoting Kris Kennaway : > Chris H. wrote: >> Quoting Kris Kennaway : >> >>> Clifton Royston wrote: >>>> On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: >>>>> excerpt from this list titled: NFS == lock && reboot, that I >>>>> posted follows: >>>>> >>>>> ------8<---SNIP---8<-----SNIP-----8<------- >>>>> # uname -a >>>>> FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri >>>>> Jan 26 16:27:14 PST 2007 >>>>> >>>>> Greetings, >>>>> Does anyone know when NFS and friends will be working again? I >>>>> haven't been able >>>>> to /safely/ use it from 4.8 on. I remember some talk on the list >>>>> sometime ago and >>>>> then it seemed to be resolved, as the discussion ended. So I >>>>> thought it was >>>>> fixed. Seems not. :( >>>>> >>>>> My scenario; >>>>> mount host off root: >>>>> mount script exec'd follows... >>>>> >>>>> #!/bin/sh - >>>>> mount -t nfs host.domain.tld:/ /host >>>>> mount -t nfs host.domain.tld:/var /host/var >>>>> >>>>> confirm mount... >>>>> >>>>> # ls /host >>>>> .snap COPYRIGHT bin >>>>> ... >>>>> usr var tmp >>>>> >>>>> OK looks good... >>>>> >>>>> # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ >>>>> >>>>> Fatal double fault >>>>> eis 0x0blah >>>>> eiblah blah0x >>>>> panic double fault >>>>> no dump device defined >>>>> rebooting in 15sec... >>>>> >>>>> Hmmm... that's not good. :( >>>>> >>>>> ------8<---SNIP---8<-----SNIP-----8<------- >>>>> >>>>> My final solution was to change the lines in /etc/rc.conf >>>>> from: >>>>> nfs_client_enable="YES" >>>>> nfs_reserved_port_only="YES" >>>>> nfs_server_enable="YES" >>>>> rpc_lockd_enable="YES" >>>>> rpc_statd_enable="YES" >>>>> rpcbind_enable="YES" >>>>> >>>>> to: >>>>> nfs_client_enable="YES" >>>>> nfs_reserved_port_only="YES" >>>>> nfs_server_enable="YES" >>>>> #rpc_lockd_enable="YES" >>>>> #rpc_statd_enable="YES" >>>>> rpcbind_enable="YES" >>>>> >>>>> Making those changes ended the "Fatal double fault && reboot in >>>>> 15 seconds..." >>>> >>>> Thanks for this very timely mention! The cluster of servers I am >>>> about to upgrade from 4.8 to 6.2 relies heavily on >>>> NFS to an old Netapp. If I have got to disable rpc_lockd and >>>> rpc_statd, it's good to know that now! >>>> Can I ask, can anybody confirm that they're running 6.2 on NFS >>>> successfully *with* lockd and statd? >>> >>> Er, yes, of course it does. The old message he is quoting is bogus >>> on its own, >> While I'll grant you that I haven't *yet* found/taken the time to create a >> dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount >> point to produce an *instantaneous* "Fatal double fault". I don't think it's >> fair to label my original post entirely /bogus/ - especially in light of >> the recent post I replied to. Which seems to have some very common ground. >> I should probably mention that since my last posting (my original thread), >> I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd >> enabled. Yet none of them produce a "Fatal double fault". They are all >> Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP >> which has a single onboard nve. They are all inter-connected via NFS. >> I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I >> had intended to use for NFS back-up's. But given the NFS issue I had with >> it, it didn't seem to be the best solution. If anyone felt like throwing >> me a "cheat sheet" for creating a dump device out of that drive and a >> "quickie" for producing a backtrace. I'm sure I'd be better able to find >> the required time to produce the required information. I'm sorry. It's >> just that I'm a hundred million miles away from that right now. As I've >> been building several large web applications, and their deadline is fast >> approaching. FWIW I bounced all the servers today, and therefore have >> recent /verbose/ dmesg's. Should any of the information they provide, be >> of any help/use to anyone. >> >> Take care. :) > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html > > It's very unlikely NFS is relevant to the problem (which is what made > it bogus, together with the lack of debugging) and likely that nve is > the cause. The above URL explains in detail how to obtain the > necessary debugging to confirm this. > > Kris > > Thank you Kris, I was recently able to find a small window in my workload. So I decided to use it to provide the "non-bogus" ;) information needed. After reading: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html and: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html a few days ago, I was only unclear on one point in setting up the required environment. So I posted my question to the list "dumpdev question (probably stupid)" which Andrey V. Elsukov immediately responded to. I'll be creating a Crash Dump in the next couple of days. So if it's not already abundantly clear that this is the first time I've attempted to produce this information - now would be the perfect time to /enlighten/ me as to anything you can think of that will ensure you get the information you're looking for. :) Thank you again for your reply. --Chris -- panic: kernel trap (ignored)