Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Oct 2007 14:51:01 -0800
From:      "Chris H." <chris#@1command.com>
To:        Kris Kennaway <kris@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7
Message-ID:  <20071030145101.8oyf6b1wws0ksoc0@webmail.1command.com>
In-Reply-To: <47278CF6.6000403@FreeBSD.org>
References:  <20071004165755.GA1049@pp.htv.fi> <47120D83.1010703@FreeBSD.org> <20071015203202.GA17964@pp.htv.fi> <20071016004637.GA79351@cdnetworks.co.kr> <20071016185714.GB2186@pp.htv.fi> <20071016130146.pfyan4vs5cwgsoc0@webmail.1command.com> <20071016202251.GC4047@lava.net> <47151FF7.2080501@FreeBSD.org> <20071019011316.5ffmycud8g0oggsg@webmail.1command.com> <47278CF6.6000403@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Kris Kennaway <kris@FreeBSD.org>:

> Chris H. wrote:
>> Quoting Kris Kennaway <kris@freebsd.org>:
>>
>>> Clifton Royston wrote:
>>>> On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:
>>>>> excerpt from this list titled: NFS == lock && reboot, that I 
>>>>> posted follows:
>>>>>
>>>>> ------8<---SNIP---8<-----SNIP-----8<-------
>>>>> # uname -a
>>>>> FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri 
>>>>> Jan 26 16:27:14 PST 2007
>>>>>
>>>>> Greetings,
>>>>> Does anyone know when NFS and friends will be working again? I 
>>>>> haven't been able
>>>>> to /safely/ use it from 4.8 on. I remember some talk on the list 
>>>>> sometime ago and
>>>>> then it seemed to be resolved, as the discussion ended. So I 
>>>>> thought it was
>>>>> fixed. Seems not. :(
>>>>>
>>>>> My scenario;
>>>>> mount host off root:
>>>>> mount script exec'd follows...
>>>>>
>>>>> #!/bin/sh -
>>>>> mount -t nfs host.domain.tld:/ /host
>>>>> mount -t nfs host.domain.tld:/var /host/var
>>>>>
>>>>> confirm mount...
>>>>>
>>>>> # ls /host
>>>>> .snap    COPYRIGHT    bin
>>>>> ...
>>>>> usr    var    tmp
>>>>>
>>>>> OK looks good...
>>>>>
>>>>> # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/
>>>>>
>>>>> Fatal double fault
>>>>> eis 0x0blah
>>>>> eiblah blah0x
>>>>> panic double fault
>>>>> no dump device defined
>>>>> rebooting in 15sec...
>>>>>
>>>>> Hmmm... that's not good. :(
>>>>>
>>>>> ------8<---SNIP---8<-----SNIP-----8<-------
>>>>>
>>>>> My final solution was to change the lines in /etc/rc.conf
>>>>> from:
>>>>> nfs_client_enable="YES"
>>>>> nfs_reserved_port_only="YES"
>>>>> nfs_server_enable="YES"
>>>>> rpc_lockd_enable="YES"
>>>>> rpc_statd_enable="YES"
>>>>> rpcbind_enable="YES"
>>>>>
>>>>> to:
>>>>> nfs_client_enable="YES"
>>>>> nfs_reserved_port_only="YES"
>>>>> nfs_server_enable="YES"
>>>>> #rpc_lockd_enable="YES"
>>>>> #rpc_statd_enable="YES"
>>>>> rpcbind_enable="YES"
>>>>>
>>>>> Making those changes ended the "Fatal double fault && reboot in 
>>>>> 15 seconds..."
>>>>
>>>>   Thanks for this very timely mention!  The cluster of servers I am
>>>> about to upgrade from 4.8 <embarrassed cough> to 6.2 relies heavily on
>>>> NFS to an old Netapp.  If I have got to disable rpc_lockd and
>>>> rpc_statd, it's good to know that now!
>>>>    Can I ask, can anybody confirm that they're running 6.2 on NFS
>>>> successfully *with* lockd and statd?
>>>
>>> Er, yes, of course it does.  The old message he is quoting is bogus 
>>> on its own,
>> While I'll grant you that I haven't *yet* found/taken the time to create a
>> dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount
>> point to produce an *instantaneous* "Fatal double fault". I don't think it's
>> fair to label my original post entirely /bogus/ - especially in light of
>> the recent post I replied to. Which seems to have some very common ground.
>> I should probably mention that since my last posting (my original thread),
>> I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd
>> enabled. Yet none of them produce a "Fatal double fault". They are all
>> Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP
>> which has a single onboard nve.   They are all inter-connected via NFS.
>> I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I
>> had intended to use for NFS back-up's. But given the NFS issue I had with
>> it, it didn't seem to be the best solution. If anyone felt like throwing
>> me a "cheat sheet" for creating a dump device out of that drive and a
>> "quickie" for producing a backtrace. I'm sure I'd be better able to find
>> the required time to produce the required information. I'm sorry. It's
>> just that I'm a hundred million miles away from that right now. As I've
>> been building several large web applications, and their deadline is fast
>> approaching. FWIW I bounced all the servers today, and therefore have
>> recent /verbose/ dmesg's. Should any of the information they provide, be
>> of any help/use to anyone.
>>
>> Take care. :)
>
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
>
> It's very unlikely NFS is relevant to the problem (which is what made 
> it bogus, together with the lack of debugging) and likely that nve is 
> the cause.  The above URL explains in detail how to obtain the 
> necessary debugging to confirm this.
>
> Kris
>
>
Thank you Kris,
I was recently able to find a small window in my workload. So I decided to
use it to provide the "non-bogus" ;) information needed. After reading:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
and:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
a few days ago, I was only unclear on one point in setting up the required
environment. So I posted my question to the list "dumpdev question 
(probably stupid)"
which Andrey V. Elsukov immediately responded to.
I'll be creating a Crash Dump in the next couple of days. So if it's 
not already
abundantly clear that this is the first time I've attempted to produce this
information - now would be the perfect time to /enlighten/ me as to 
anything you
can think of that will ensure you get the information you're looking for. :)

Thank you again for your reply.

--Chris

-- 
panic: kernel trap (ignored)






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071030145101.8oyf6b1wws0ksoc0>