Date: Fri, 19 Oct 2007 01:13:16 -0700 From: "Chris H." <chris#@1command.com> To: freebsd-stable@freebsd.org Subject: Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7 Message-ID: <20071019011316.5ffmycud8g0oggsg@webmail.1command.com> In-Reply-To: <47151FF7.2080501@FreeBSD.org> References: <20071004165755.GA1049@pp.htv.fi> <47120D83.1010703@FreeBSD.org> <20071015203202.GA17964@pp.htv.fi> <20071016004637.GA79351@cdnetworks.co.kr> <20071016185714.GB2186@pp.htv.fi> <20071016130146.pfyan4vs5cwgsoc0@webmail.1command.com> <20071016202251.GC4047@lava.net> <47151FF7.2080501@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Kris Kennaway <kris@freebsd.org>: > Clifton Royston wrote: >> On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote: >>> excerpt from this list titled: NFS == lock && reboot, that I posted >>> follows: >>> >>> ------8<---SNIP---8<-----SNIP-----8<------- >>> # uname -a >>> FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan >>> 26 16:27:14 PST 2007 >>> >>> Greetings, >>> Does anyone know when NFS and friends will be working again? I >>> haven't been able >>> to /safely/ use it from 4.8 on. I remember some talk on the list >>> sometime ago and >>> then it seemed to be resolved, as the discussion ended. So I thought it was >>> fixed. Seems not. :( >>> >>> My scenario; >>> mount host off root: >>> mount script exec'd follows... >>> >>> #!/bin/sh - >>> mount -t nfs host.domain.tld:/ /host >>> mount -t nfs host.domain.tld:/var /host/var >>> >>> confirm mount... >>> >>> # ls /host >>> .snap COPYRIGHT bin >>> ... >>> usr var tmp >>> >>> OK looks good... >>> >>> # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ >>> >>> Fatal double fault >>> eis 0x0blah >>> eiblah blah0x >>> panic double fault >>> no dump device defined >>> rebooting in 15sec... >>> >>> Hmmm... that's not good. :( >>> >>> ------8<---SNIP---8<-----SNIP-----8<------- >>> >>> My final solution was to change the lines in /etc/rc.conf >>> from: >>> nfs_client_enable="YES" >>> nfs_reserved_port_only="YES" >>> nfs_server_enable="YES" >>> rpc_lockd_enable="YES" >>> rpc_statd_enable="YES" >>> rpcbind_enable="YES" >>> >>> to: >>> nfs_client_enable="YES" >>> nfs_reserved_port_only="YES" >>> nfs_server_enable="YES" >>> #rpc_lockd_enable="YES" >>> #rpc_statd_enable="YES" >>> rpcbind_enable="YES" >>> >>> Making those changes ended the "Fatal double fault && reboot in 15 >>> seconds..." >> >> Thanks for this very timely mention! The cluster of servers I am >> about to upgrade from 4.8 <embarrassed cough> to 6.2 relies heavily on >> NFS to an old Netapp. If I have got to disable rpc_lockd and >> rpc_statd, it's good to know that now! >> Can I ask, can anybody confirm that they're running 6.2 on NFS >> successfully *with* lockd and statd? > > Er, yes, of course it does. The old message he is quoting is bogus > on its own, While I'll grant you that I haven't *yet* found/taken the time to create a dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount point to produce an *instantaneous* "Fatal double fault". I don't think it's fair to label my original post entirely /bogus/ - especially in light of the recent post I replied to. Which seems to have some very common ground. I should probably mention that since my last posting (my original thread), I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd enabled. Yet none of them produce a "Fatal double fault". They are all Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP which has a single onboard nve. They are all inter-connected via NFS. I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I had intended to use for NFS back-up's. But given the NFS issue I had with it, it didn't seem to be the best solution. If anyone felt like throwing me a "cheat sheet" for creating a dump device out of that drive and a "quickie" for producing a backtrace. I'm sure I'd be better able to find the required time to produce the required information. I'm sorry. It's just that I'm a hundred million miles away from that right now. As I've been building several large web applications, and their deadline is fast approaching. FWIW I bounced all the servers today, and therefore have recent /verbose/ dmesg's. Should any of the information they provide, be of any help/use to anyone. Take care. :) --Chris > I don't know if he ever was able to provide meaningful traces but it > may well be nve as in the upthread discussion. > > Kris > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- panic: kernel trap (ignored)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071019011316.5ffmycud8g0oggsg>