From owner-freebsd-current@FreeBSD.ORG Thu Mar 25 14:03:52 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC38F106564A; Thu, 25 Mar 2010 14:03:52 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 61A8C8FC28; Thu, 25 Mar 2010 14:03:51 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAC8Kq0uDaFvK/2dsb2JhbACbJXO/GYR9BA X-IronPort-AV: E=Sophos;i="4.51,307,1267419600"; d="scan'208";a="70280373" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Mar 2010 10:03:51 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 3C765109C2C8; Thu, 25 Mar 2010 10:03:51 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HTuzWAPzgG6J; Thu, 25 Mar 2010 10:03:48 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 89F0A109C271; Thu, 25 Mar 2010 10:03:48 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2PEGrp03342; Thu, 25 Mar 2010 10:16:53 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Thu, 25 Mar 2010 10:16:53 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Adrenalin In-Reply-To: Message-ID: References: <1242075474.72992.118.camel@hood.oook.cz> <3c1674c90906151408n6febec56m140b089b694f6e13@mail.gmail.com> <20090616073353.GZ33280@droso.net> <200906160812.04284.jhb@freebsd.org> <4A3E234F.6050403@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org, current@freebsd.org Subject: Re: hang in rpccon from interrupting NFS operations (Re: pointyhat panic) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 14:03:53 -0000 On Mon, 22 Mar 2010, Adrenalin wrote: > That's strange, after recompiling the lastest 8_0 that contain the patch ( > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/rpc/clnt_vc.c.diff?r1=1.8.2.2.2.1;r2=1.8.2.2.2.2) > after 5 days it stuck again with same symptoms, I've also got some in the > nfs state: > > FreeBSD .. 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Mar 16 22:56:51 EET > 2010 ..@..:/usr/obj/usr/src/sys/MYGEN amd64 > > When attaching the debugger for an rpccon process, It stuck in here > #0 0x000000080124051c in stat () from /lib/libc.so.7 > > http://img705.imageshack.us/img705/741/10032219218.png > > Can I do the online debug of the kernel, or how can I can help you to solve > the problem ? > Well, sleeping in "rpccon" means that the TCP connect has failed after a soconnect() call. If you can get into a kernel debugger, there is a global structure with more error information in it. It is called: rpc_createerr - and it has 2 enums, followed by an int. The first enum should be 12 (RPC_SYSTEMERR), which is what gets it to tsleep(.."rpccon"..), the second enum doesn't apply to this case and the int after them should be the errno of the soconnect() failure. (The way the code is currently written, it could either be an error return from soconnect() or a value set in so_error after soconnect() returns, while it is in the process of connecting. So, if you can get to that 3rd field, the value there might help tell why the TCP connect is failing. Otherwise, all I can suggest is poking around and trying to figure out why TCP connects are failing. - wedged network interface - routing problem - network infrastructure problem ... (Btw, I was driven a little batty at UofG because the campus network switch I was on would decide to inject TCP RSTs into new connection attempts for some reason. I finally was able to determine this by looking at packet traces on both client and server and see the RSTs coming out of the network on the client end, but never sent on the server end. It was some Cisco related parameter/issue that was never resolved.) Hopefully others with more TCP expertise can make suggestions w.r.t. why the TCP connects are failing? Good luck with it, rick